Environment data poses specific challenges to researchers in data management. Networked sensors collect voluminous data that require systematic planning in workflow, storage and dissemination. Prof. NING Zhi (ENVR) illustrated good data practices with environment big data. He conducted a very well-structured, informative talk on research data management on March 10. This blogpost marks a few important points from his talk.
Prof. Ning’s research collects air quality data through networks of sensors. The data help researchers get to issues in public health, urban design, transportation and energy consumption. Compared with traditional air quality stations, the sensors cost less, are smaller, and easier to deploy. They can be used in both localized and mobile settings. For example: the Smart Campus Air Network monitors environmental quality in real-time, covering both outdoor and indoor locations in the HKUST campus; while the mobile air sensor network in Shanghai involved 200 taxis operating in the city, generating huge amount of data.
Because sensor networks can collect a vast amount of data over a long period of time, in many locations and different settings, researchers need good data management strategies.
Data Management Practices
Prof. Ning explained data management as actions + objectives: the actions of ingesting, storing, organizing and maintaining data, for the purpose of ensuring the data is accurate, available and accessible. This interestingly echoes the FAIR principles in research data management.
He used a framework with 6 Key Parts to describe his data management practice in details:
- Architectural Design
- Workflow and Relation
- Data Repository
- Data Integration
- Data Quality Check
- Data Dissemination
Among the many good practices, a few are particularly worth highlighting here; because they can be beneficial to researchers in all fields:
- Describe your data in a way that other people who are unfamiliar with your data or your research can still find, evaluate, understand and reuse your data
- Include metadata that describe the project (e.g. people, date, funders) and the data (e.g. creators, rights, versions, format, headers)
- Document your methods and all the data decisions (e.g. meaning of data, definitions of fields)
- Use reports consistently and persistently to encapsulate the use of data in experiments. Below is a template; it provides a good reference for all researchers:
3 Tiers of Data
- raw data
- filtered data
- quality controlled data
When you prepare data for archiving or sharing, you should consider which tier to use.
Five Key Messages
Prof. Ning summarized his talk in 5 key messages:
- Data is more than numbers; it comes with a set of properties and description
- Data has different tiers, from raw to shareable. Quality control is critical
- Always put data and description in one repository
- Use consistent format to save data. Using database can help you build and maintain a structure
- Consider ethics and intellectual property issues when sharing
— By Gabi Wong, Library
last modified March 24, 2021