Specialized training will help environmental health scientists take advantage of the promise of big data, according to an interdisciplinary group of researchers who gathered Aug. 15-16 at NIEHS.
Experts in data science, epidemiology, genetics, biostatistics, and other fields exchanged insights about training students, postdoctoral researchers, and junior faculty to be the next generation of environmental health scientists. Discussion also focused on how to increase data science skills among biomedical scientists focused on environmental health studies, at all educational levels.

Organizers in the NIEHS Division of Extramural Research and Training (DERT) sought to develop strategic recommendations for data science training and NIEHS priorities, said organizer Carol Shreffler, Ph.D., the DERT program director for training and career development.
NIEHS and National Toxicology Program Director Linda Birnbaum, Ph.D., challenged participants in her welcoming remarks. “Work with us to develop an overall strategy to build data science-competent environmental health science workforce,” she said, pointing out that the 2018-2023 NIEHS Strategic Plan (see related story) names data science as a key goal in each of its focus areas.
Participants responded enthusiastically. “Everyone is grappling with the issue of how best to incorporate more data science training into biomedical programs,” reported Jenny Collins, from DERT and a member of the organizing committee.
The challenge of data
As vast amounts of data are generated by rapidly developing technology, the relatively new field of data science is meeting the challenges of sharing, accessing, analyzing, and interpreting big data.

For example, Marie Lynn Miranda, Ph.D., from Rice University, demonstrated how big data can reframe research questions. She reported that racial isolation is geographically associated with fundamental causes of racial disparities in health. “This shifts the conversation from race, which is nonmodifiable, to the experience of minorities in segregated communities, which is modifiable,” she said.
The National Institutes of Health (NIH) Big Data to Knowledge (BD2K) Initiative made early efforts to address the skills gap in biomedical data science expertise through investments in training and education.
“Methods work really well until you encounter real data,” said BD2K grantee John Quackenbush, Ph.D., from the Harvard T.H. Chan School of Public Health. He explained how epidemiology and laboratory projects help students understand the challenges they will face.
The challenge of education
Speakers discussed challenges such as course development, recruitment of students and lab members, funding strategies, and attracting and retaining data science students in environmental health sciences, or EHS.
“I’ve never seen a generation of students that cares more about making a difference in the world,” said Miranda.
“EHS is perceived as being more meaningful and interesting, but students are worried about salary levels,” said Jim Gauderman, Ph.D., from the University of Southern California. His experience suggests that early successes are important. “Published papers, software development, etc., help trainees feel both value and success,” he said.

Partnership is key
“EHS researchers draw on data from genetic sequences to geographic data to survey data,” said attendee Lisa Federer, from the National Library of Medicine (NLM). “Data science involves many different types of expertise and knowledge, and there’s not just one training model.” Her suggestion that data science training will require engaging with researchers who have not typically worked with NIH was echoed by others.
Society of Toxicology (SOT) Vice-President Ronald Hines, Ph.D., from the U.S. Environmental Protection Agency, said that SOT is reaching out to other societies for symposia collaborations. Similarly, Miranda advised NIEHS to exchange ideas with leaders of professional groups in data science, computer science, electrical engineering, and applied mathematics.

The data challenges or hackathons such groups offer could be more widely used in EHS, according to Charles Schmitt, Ph.D., a contractor in the NIEHS Office of Data Science. “In natural language processing, there are conferences that have challenges every year, aimed at advancing the state of the art,” he said. “Results from prior years make a great teaching resource.”
“Partnerships are key,” emphasized grantee Cheryl Walker, Ph.D., from the Baylor College of Medicine and moderator of the meeting’s final session.
“Some of the most powerful and innovative data science research comes out of interdisciplinary teams,” agreed Federer, adding that data science is a major focus of the new NLM strategic plan, and they hope to collaborate with colleagues across NIH.