A recent workshop held at the University of Melbourne brought up the topic of good data curation practices and this is something that I personally can’t stress enough. At the time of this writing I am in the process of completing an introductory survival guide to ChIP-seq analysis and one of the first topics that I cover is the importance of collecting your experimental meta-data. In fact, I dedicated an entire entry to the topic of meta-data.
While I gave a few practical reasons for keeping track of your metadata and what to do if you can’t find everything you need, Dr. Wilkinson published an article on The FAIR Guiding Principles for scientific data management and stewardship which seemed pertinent to this and worth bringing up here. The point of this article also fits nicely with the need to obtain good quality, and more importantly, relevant control data.
To summarize the article, we basically need to find a set of common rules by which to operate when generating high throughput data (or any kind, for that matter) and publishing that data in publicly available repositories. The argument is made that without good record keeping, data deposited may as well be tossed in the trash since if you can’t describe how data was generated then you can’t use or publish results based on that data since the analysis could potentially be invalidated.
I’m a little surprised that excellence in curation isn’t already a standard put in to practice, but it was made more or less clear that the repositories are willing to sacrifice good curation for participation. In other words, if we allow researchers to describe a bare minimum about the experiment, then more will participate in depositing their data. What alarms me is the apparent general sentiment that publicly or privately funded research is allowed to be handled in this way to begin with without repercussions from the funding bodies. After all, the output from a lab is a reflection on the choice made to fund that lab.
So to remedy this problem and help spread the word on better data curation practices, lets have a look at the FAIR principles but forward by Dr. Wilkinson (1).
The FAIR Guiding Principles
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
To be Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards
To find the original article and more reading on the topic of research funding, follow the links below.
References and more reading
- Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data3:160018 doi: 10.1038/sdata.2016.18 (2016).