A major challenge for the biomedical community is to effectively process and integrate an increasing flood of data. For example, scientists often must grapple with large study populations, varied environmental exposure mixtures, and numerous geographic and socioeconomic factors, all of which add to the complexity.
To help ensure that researchers can use that information and spur greater discovery, the National Institutes of Health (NIH) Office of Data Science Strategy created the Data and Technology Advancement (DATA) program. Through that initiative, data scholars work at NIH institutes and centers for one to two years, tackling high-priority projects.
Lara Clark, Ph.D., recently joined NIEHS under the umbrella of a DATA project titled “Harnessing Geospatial Data for Environmental Public Health Protection,” bringing knowledge from her background in civil and environmental engineering. The term geospatial refers to data that is linked to a specific place.
Diverse populations, diverse exposures
The goal is to merge large datasets involving diverse populations and environmental exposures so that the scientific community can take a deeper dive into that information and conduct research that ultimately improves global public health.
“We have to try to bring together different types of geospatial data on a range of topics, from air pollution to green space to extreme temperature, all from different sources and on different spatial and temporal scales,” noted Clark.
Charles Schmitt, Ph.D., director of the NIEHS Office of Data Science, served on Clark’s hiring committee and will be her administrative supervisor. “We had a number of good applicants,” he said. “We were fortunate to get Lara for two years.”
Gene-environment interactions
Clark’s first order of business is to work on the NIEHS Personalized Environment and Genes Study (PEGS), a long-term project that includes detailed health, exposure, and genetic information from a diverse cohort of 20,000 North Carolinians. The study aims to boost knowledge about how gene-environment interactions influence health and to eventually provide participants with personalized risk assessments.
Out of that cohort, researchers sequenced whole genomes of nearly 5,000 individuals, which is the group that Clark will focus on. She will help to link de-identified patient location information with other available geospatial data to enhance exposure analysis for this and other studies.
“Lara is looking at how to build tools on top of what we have done to allow other studies to access the platforms and data that we have developed,” said Schmitt. “We want researchers everywhere to be able to scale up what we have built through PEGS. It is a kind of test bed for what we can do for the world in other areas.”
Building tools to strengthen research
One major task is to advance data interoperability so that scientists can answer current research questions and ones that will arise in the future, according to Clark.
“It is an ongoing challenge for researchers to make sense of what data are available and to integrate such information in ways that are useful,” she noted. “My goals are to streamline those efforts and to bring clarity to complex research questions.”
“We want to build tools and online data platforms that can be sustained throughout the years,” Schmitt added.
Global scientific discovery
Clark also will seek to make non-NIH data available and useful to NIH scientists and grantees. For example, federal agencies, universities, and other countries collect valuable information about environmental exposures based on location and time, which researchers call geospatial-temporal data.
“There is a growing need in population-based environmental health science to incorporate data collected and maintained in geospatial-temporal frameworks,” said David Fargo, Ph.D. He directs the NIEHS Office of Environmental Science Cyberinfrastructure (OESC) and served on Clark’s hiring committee. “Lara is going to try to make very heterogeneous data interpretable and usable by public health experts.”
Learn more about the use of geospatial information to inform environmental health sciences research.
“At NIEHS, there is a lot of expertise on both the data science and the environmental health sides,” said Clark. “The good news is that with a lot of people working on these problems, there is an increasing number of tools and resources at our disposal.”
(John Yewell is a contract writer for the NIEHS Office of Communications and Public Liaison.)