NIEHS contributes to big NIH investment in biomedical research data
By Joe Balintfy
As part of wide-ranging grants announced Oct. 10 by NIH, NIEHS is helping develop new strategies to analyze and make good use of the explosion in complex biomedical data sets, often referred to as Big Data. NIEHS involvement in the NIH Big Data to Knowledge (BD2K) investment spans all areas of emphasis, including training, research centers, and data discovery.
"NIEHS has had a role in developing BD2K initiatives through our participation on numerous workgroups over the past 18 months," said Allen Dearry, Ph.D., director of the NIEHS Office of Scientific Information Management and NIEHS representative to the BD2K Executive Committee.
Big data centers
"We are managing the career development portion of this nearly $32 million investment," said Gwen Collman, Ph.D., director of the NIEHS Division of Extramural Research and Training. "We are also participating in the grants establishing Centers of Excellence for Big Data Computing."
The 11 centers will develop innovative approaches, methods, software, tools, and other resources to enhance access to data and the ability to make new discoveries using it, according to an NIH press release. Development efforts will focus on specific research questions, yet the centers’ output is expected to be more generally relevant to aspects of big data science, such as data integration and use, analysis of genomic data, and managing data from electronic health records.
Data science training and workforce development
The Training and Career Development in Biomedical Big Data awards support education and training of researchers who will specialize in data science fields, as well as those whose work may require expertise in the use or generation of large amounts of data and data resources.
"Some training projects include brain imaging studies, identifying healthy behaviors that may reduce risk of heart disease and diabetes, and harnessing the largely untapped potential of social media data to capture social and cultural processes with potential health impact," said NIEHS Program Director Carol Shreffler, Ph.D.
Teams at large research universities and academic medical centers may have bioinformatics and data support, but individual scientists in the biomedical research community may not, and many have not been trained to access and analyze large data sets.
NIH also launched an effort to pilot a Data Discovery Index (DDI), which will catalyze the discoverability, accessibility, and citation standards for biomedical big data. "Currently, there is no easy query or search infrastructure that can help identify the presence and availability of relevant data sets," said Becky Boyles, a member of the NIH team overseeing this project and an NIEHS data scientist.
Boyles explained that data are found in an increasing number and variety of different repositories or web sites, when they are available at all. "The intent of establishing a DDI is to help researchers find, reuse, and cite relevant publicly available datasets related to their scientific question of interest," she said.
Big data is a big deal
Studies generating billions of data points continue to proliferate. For example, environmental studies gather data on multiple exposures and health outcomes, epidemiological studies examine thousands of participants for health and disease patterns, large disease-oriented efforts seek the genomic underpinnings of illnesses, and other projects look to identify all functional elements in the human genome.
"Mammoth data sets are emerging at an accelerated pace in today’s biomedical research," said NIH Director Francis S. Collins, M.D., Ph.D. "The potential of these data, when used effectively, is quite astounding."
(Joe Balintfy is a public affairs specialist in the NIEHS Office of Communications and Public Liaison.)