The same principles of data science that drive the discovery of new drug targets could also lead to a better understanding of exposures to chemicals in the environment, according to Stephan Schurer, Ph.D., from the University of Miami.
These principles, which state that digital resources must be findable, accessible, interoperable (see sidebar), and reusable — or FAIR — are critical in any kind of data-driven research. Schurer spoke during an Aug. 12 talk in the NIEHS Keystone Science Lecture Seminar Series.
Schurer’s experiences with making data FAIR include leading three national research consortia: the Library of Integrated Network-based Cellular Signatures (LINCS), Big Data to Knowledge (BD2K), and Illuminating the Druggable Genome (IDG).
“It is important to put the appropriate standards in place so you can use and trust the large amount of data generated by these programs,” Schurer said.
FAIRness matters
Over the last five years, the FAIR principles have gained traction. Institutions that hold research knowledge, such as academic journals and funding organizations, have mandated that data be made publicly available.
But available is not the same as useful. The concept of FAIR emerged from discussions about the need for data to be handled in a way that empowers scientists to answer previously unanswerable research questions.
Allison Harrill, Ph.D., a geneticist in the National Toxicology Program (NTP) Biomolecular Screening Branch, said she is thrilled to see that the FAIR guidelines are getting so much attention. She relies on human data from outside groups for her translational research.
“One challenge we have found is that researchers will often release parts of a dataset, like sequencing data, as required by a journal. But they will withhold other parts that are critical for interpretation, such as which files correspond to cases and controls,” Harrill said. “That makes released data a bit useless for any follow-on analyses.”
NIEHS is an early adopter of the FAIR model at the National Institutes of Health (NIH), said David Fargo, Ph.D., director of Environmental Science Cyberinfrastructure (DESC) at NIEHS. These principles are an integral part of the institute’s Informatics and Information Technology Strategic Roadmap, which addresses some of the challenges and opportunities inherent to making environmental science data FAIR.
Applying big data
NIEHS Health Scientist Administrators Michelle Heacock, Ph.D., and Chris Duncan, Ph.D., first met Schurer a couple of summers ago at a toxicology meeting. They were impressed by his efforts to integrate biomedical data across conditions, institutions, and consortia.
“We brought him here to offer a different perspective,” said Heacock, who co-hosted the lecture with Duncan. “We don’t talk a lot about drug discovery here, but that doesn’t mean that we can’t take pieces of what he has learned in his work and apply it to what we are doing here at NIEHS.”
Schurer provided the audience of scientists and staff from NIEHS, NTP, and the U.S. Environmental Protection Agency with a dizzying overview of the data standards, organization, processes, and software tools that he and his research group have developed to support FAIR and open data.
Two recent projects illustrate how FAIR data can empower research. In one project, Schurer's group drew upon genomic data from LINCS and other databases to predict effective drug combinations for glioblastoma, the deadliest brain cancer.
In another project, he is developing ways to prioritize certain understudied drug targets that may provide new avenues for cancer treatment.
“There could be a large number of drug targets out there that could be effective, but are just being ignored,” he said. “Using big data could help.”
Citation: Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. 2016. The FAIR guiding principles for scientific data management and stewardship. Sci Data 15;3:160018.
(Marla Broadfoot, Ph.D., is a contract writer for the NIEHS Office of Communications and Public Liaison.)