Finding and Re-using Data

Two people looking at a computer screen.

Reusing data or secondary analysis is good for research- it is both economical and saves resources in a research landscape where funding is increasingly difficult to obtain. Using existing datasets means that less money and time are spent collecting data.

A huge wealth of data, from small to large scale studies, is available to researchers for further and original analysis, and for teaching.  Many research funders require applicants to demonstrate that they have considered re-using existing data sources when writing data management plans as part of their funding applications. Re-using data helps to avoid data duplication and the problem of over-researching particular groups which is an important ethical consideration for researchers.

The information on this page will help you to find and re-use research datasets in your own research projects. It provides guidance on where you can access published datasets and how to use and cite them. 

Finding sources of data

There isn’t currently a single search engine that can be used to search all published datasets. In addition to search engines, you’ll need to consult either generalist repositories or relevant individual repositories for your discipline.

DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for research data, and promotes data sharing and citation. Using DataCite you can search across multiple repositories to find datasets related to your research area.

Google Dataset Search helps researchers locate online data across a variety of different sources (e.g. publisher's site, a digital repository, or personal web pages). Any web pages that use specific structured metadata to describe datasets, such as schema.org, will be findable by Dataset Search.

OpenAIRE provides access to the datasets of European Commission funded projects.

A first port of call for locating repositories holding data relevant to your research area is re3data (Registry of Research Data Repositories). This is a global registry listing over 2000 repositories which you can browse by subject, content type or country.

Another useful resource is a list of repositories and databases for open data provided by The Open Access Directory. The list is organised by discipline (mainly the sciences but also archaeology, linguistics and social sciences).

There are also discipline specific data services provided by some of the UK Research Councils.

The UK Data Service hosts key national and international social science datasets in its repository. Data from projects funded by the Economic and Social Research Council (ESRC) can be found here and you can search the data by theme or type.

The Natural Environment Research Council (NERC) has an Environmental Data Service. There are five data centres which hold a wide range of data from environmental scientists working in the UK and around the world.

There are a number of data repositories that hold data from many different disciplines. You can search these and filter results to find relevant datasets.

 Zenodo

Developed by CERN, Zenodo is a multidisciplinary data, software and publication repository. It is suitable for all types of research data and is free to use.

Figshare

Interdisciplinary open access repository containing a wide range of research outputs including datasets. Datasets are indexed in Google Dataset Search.

Harvard Dataverse

A repository hosting research data, code and related material from all disciplines worldwide. It includes the world's largest collection of social science research data.

Dryad

A repository covering mainly scientific and medical literature, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication.

Mendeley Data

Generalist data repository.

Re-using data

Image of a tree stump with two signs attached displaying the words 'reuse' and 'recycle'.

If you want to re-use data there are a number of things you need to consider to ensure you are using it in a legal and responsible way.

  • Identify who the rights holder of the data is. What licensing terms and conditions do they apply to use of the data? Are you able to do you re-use the data in the way you need to?
  • The most common licences applied to data held in repositories are Creative Commons You can explore the range of licences on the Creative Commons website.
  • If the rights holder hasn’t specified conditions for re-use, you will need to contact them for permission to use the data. If there isn’t an explicit licence or any terms of use, you should assume that the creator has reserved all rights to the data.
  • Are the metadata and documentation provided sufficient for understanding and reusing the data? Have any guidance or tools been created alongside the dataset to help you explore and manipulate the data?
  • Are you able to archive the data used in your analysis? If you haven’t altered the data you do not need to archive it but you should ensure you cite the data. If you have combined the data with another dataset, then you will need to check the licence applied to the dataset you have used to see whether or not you can share or archive the data. You should fulfil all relevant licence terms of the dataset you have used when you licence the data.

How to cite data correctly

Research data are legitimate, citable products of research which researchers have invested considerable time in creating. They need to be treated like any other type of publication and cited correctly.  

If you are using secondary data in your research you should cite in the same way as you would other publications both in your text and in your list of references.

There are numerous elements that can make up a data citation. The most important are: creator, title, date, location, version, and publisher.

If the dataset has a persistent identifier such as a DOI (digital object identifier) or handle, this should be included in the citation. If neither of these are available, then include the URL.

A basic citation could be structured like this:

Creator (Publication Year) Title. Version (if provided). Publisher. Identifier

The format of the citation will depend on the publication style you opt for. The Digital Curation Centre’s guidance ‘How to Cite Data and link to publications’ provides examples of how a data citation may appear in a variety of styles.

 

Further help and information

If you would like advice on how to find and re-use data, please contact the Scholarly Communications Team at wire@wlv.ac.uk.

Images used on this page

Photo by Desola Lanre-Ologun on Unsplash

Photo by Ralph (Ravi) Kayden on Unsplash