Google Dataset Search is a new search engine which allows you to search for datasets hosted in thousands of repositories across the Web. It looks on publisher sites, digital libraries, dataset providers, and on authors' personal webpages for metadata tags and returns a list of data repositories that best describes the datasets you need for your research.
On the other hand, if you want to share your datasets and make them publicly accessible, you can follow the Google's guidelines for dataset providers which is an open standard for tagging and structuring your datasets. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. The overall approach is to improve discovery of the datasets by adopting a common standard by which Google and other search engines can better understand the content of the datasets.
Here are some examples of what can qualify as a dataset as suggested by Google:
A table or a CSV file with some data
An organized collection of tables
A file in a proprietary format that contains data
A collection of files that together constitute some meaningful dataset
A structured object with data in some other format that you might want to load into a special tool for processing
Images capturing data
Files relating to machine learning, such as trained parameters or neural network structure definitions
Anything that looks like a dataset to you
For example, if you want to obtain COVID-19 data, you might try this query in Dataset Search:
You will see data from more than 100 different data sources. The results can be filtered by last updated (past month, year, or three years), download format (table, text, image, or others), usage rights (commercial or non-commercial use), topic (subject disciplines) and if it is possible to access the dataset for free.
After reading the description and deciding the dataset is useful, you can click on the blue button to navigate to the external site for further actions.
- By Lewis Li, Library
Environment data poses specific challenges to researchers in data management. Networked sensors collect voluminous data that require systematic planning in workflow, storage and dissemination. Prof. NING Zhi (ENVR) illustrated good data practices with environment big data.
A smart city uses innovation and technology to address urban challenges, improve the effectiveness of public services, make the city more liveable and sustainable. To achieve these, open data is an essential foundation.
As a researcher, sometimes you may need to share your data to meet publishers’ or funders’ requirement, but the process of preparing your data can be tedious and time-consuming. In this regard, DataSpace@HKUST can help.
In an online seminar, Prof. Cameron Campbell used datasets of 3 projects to illustrate research data management practices. His advises are not only applicable for historical data, but also valuable for researchers in many disciplines.