Good Data Curation: The FAIR principles

A recent workshop held at the University of Melbourne brought up the topic of good data curation practices and this is something that I personally can’t stress enough. At the time of this writing I am in the process of completing an introductory survival guide to ChIP-seq analysis and one of the first topics that I cover is the importance of collecting your experimental meta-data. In fact, I dedicated an entire entry to the topic of meta-data.

While I gave a few practical reasons for keeping track of your metadata and what to do if you can’t find everything you need, Dr. Wilkinson published an article on The FAIR Guiding Principles for scientific data management and stewardship which seemed pertinent to this and worth bringing up here. The point of this article also  fits nicely with the need to obtain good quality, and more importantly, relevant control data.

To summarize the article, we  basically need to find a set of common rules by which to operate when generating high throughput data (or any kind, for that matter) and publishing that data in publicly available repositories. The argument is  made that without good record keeping, data deposited may as well be tossed in the trash since if you can’t describe how data was generated then you can’t use or publish results based on that data since  the analysis could potentially be invalidated.

I’m a little surprised that excellence in curation isn’t already a standard put in to practice, but it was made more or less clear that the repositories are willing to sacrifice good curation for participation. In other words, if we allow researchers to describe a bare minimum about the experiment, then more will participate in depositing their data. What alarms me is the apparent general sentiment that publicly or privately funded research is allowed to be handled in this way to begin with without repercussions from the funding bodies. After all, the output from a lab is a reflection on the choice made to fund that lab.

So to remedy  this problem and help spread the word on better data curation practices, lets have a look at the FAIR principles but forward by  Dr. Wilkinson (1).

The FAIR Guiding Principles

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

F2. data are described with rich metadata (defined by R1 below)

F3. metadata clearly and explicitly include the identifier of the data it describes

F4. (meta)data are registered or indexed in a searchable resource

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

A1.1 the protocol is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

A2. metadata are accessible, even when the data are no longer available

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles

I3. (meta)data include qualified references to other (meta)data

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

R1.1. (meta)data are released with a clear and accessible data usage license

R1.2. (meta)data are associated with detailed provenance

R1.3. (meta)data meet domain-relevant community standards

 

To find the original article and more reading on the topic of research funding, follow the links below.


References and more reading

  1. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data3:160018 doi: 10.1038/sdata.2016.18 (2016).
  2. http://www.thenewatlantis.com/publications/the-sources-and-uses-of-us-science-funding
  3. http://undsci.berkeley.edu/article/who_pays

 

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s