U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Environmental Factor

Environmental Factor

Your Online Source for NIEHS News

December 2024


Data science infrastructure fuels biomedical research

Susan Gregurick, Ph.D., director of the NIH Office of Data Science Strategy, described the many resources available to NIEHS researchers.

The National Institutes of Health (NIH) has created a wealth of data management, sharing, and artificial intelligence resources for researchers at NIEHS and other institutes.

As Susan Gregurick, Ph.D., Associate Director for Data Science and Director of the NIH Office of Data Science Strategy (ODSS), explained in an Oct. 29 lecture at NIEHS, these resources are necessary for the big data era in scientific research to reach its full potential.

“Immense amounts of data are generated in our biomedical and behavioral research,” said Gregurick. “This data really fuels our scientific endeavors, everything from epigenetic to community-level population studies. But we need an infrastructure to run that on.”

Susan Gregurick, Ph.D.
Gregurick’s visit coincided with a meeting of the NIH Scientific Data Council, which she co-chairs and includes NIEHS Director Rick Woychik, Ph.D. (Image courtesy of Steve McCaw / NIEHS)

Under Gregurick’s leadership, the ODSS leads the implementation of the NIH Strategic Plan for Data Science through scientific, technical, and operational collaboration with the institutes, centers, and offices that comprise NIH. She was instrumental in the creation of the ODSS in 2018 and served as a senior advisor to the office until named to her current position.

Overall, she said ODSS has partnered with NIEHS on 16 projects with a total investment of $3.3 million.

Vast cloud infrastructure

Audience from Susan Gregurick, Ph.D., talk heard about how NIH is dedicated to harnessing the potential of data science to elevate the impact and efficiency of biomedical research
NIEHS researchers and staff heard about how NIH is dedicated to harnessing the potential of data science to elevate the impact and efficiency of biomedical research. (Image courtesy of Steve McCaw / NIEHS)

NIH shares over 363 petabytes of data across its clouds, according to Gregurick. Its Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative supports more than 2,500 research programs through partnerships with Google, Amazon Web Services, and Microsoft Azure.

“That is the largest amount of biomedical research data available to the research community,” Gregurick said. “We are part of the national AI research resource — that's a partnership led by the National Science Foundation [NSF], but all of the government participates. We are the leaders in the national AI research resource to secure clinical data. So that's our space in the AI world.”

NIH Cloud Lab is a no-cost, 90-day program that enables NIH staff, affiliated researchers, and students to try these services in a secure, NIH-approved environment. The lab is now also part of National Artificial Intelligence Research Resource Classroom. Researchers can acquire new skills, test new tools before buying them, develop innovative solutions, and explore generative AI to learn how to use this powerful new technology for research.

Partnerships to manage and share data

ODSS has also supported several activities to bolster data management and sharing across NIH and beyond. Examples include the following.

  • Annual Data Works! Prize, in partnership with Federation of American Societies for Experimental Biology (FASEB), funds innovative data reuse projects. In addition, ODSS is advancing collaborative activities with ELIXIR, an intergovernmental organization that brings together life science resources from across Europe, including databases, software tools, training materials, cloud storage, and supercomputers.
  • ODSS has partnered with the National Library of Medicine to support the Data Curation Network (DCN), a membership organization of institutional and nonprofit data repositories, to provide training to researchers and librarians to create data management and sharing plans. In addition, ODSS has established NIH as a member of the DataCite consortium to meet the critical need to create and manage digital object identifiers (DOIs) for data generated from NIH funded and conducted research.
  • ODSS coordinates the NIH-NSF Smart Health Initiative, which was created to develop transformative high-risk, high-reward advances in multidisciplinary research to address pressing questions in the biomedical and public health communities and to support interdisciplinary teams. In partnership with NIEHS, the initiative funded two projects this year.

Workforce development

“We do need to think about the folks who are going to come after us, because leaving a legacy is really leaving people in place to lead,” said Gregurick.

Gregurick highlighted several programs that ODSS created and supports, aimed at building a broad and diverse community of data scientists for work both now and in the future.

The NIH Data and Technology Advancement (DATA) National Services Scholar Program has trained more than 31 scholars at 14 institutes and offices to pursue data science projects at the interface between basic and clinical research. Gregurick recognized the first DATA Scholar at NIEHS, Lara Clark, Ph.D., who is developing tools and resources that support use of geospatial data in environmental health research.

Through the DataPath Program, NIH is fostering a pipeline of skilled professionals who can contribute to data-driven research and operations at NIH. The program recruits postbaccalaureate and post-master’s professionals who are talented in data science for two-year fellowships.

The NIH Coursera Learning Program provides free unlimited access to more than 12,000 courses, certifications, and specializations from top universities and industry leaders. Courses include those in computer science, machine learning, AI, data science and analytics, public health, and health care innovation.

“We hope that we can continue to strengthen our biomedical community to really enhance our data science and to enhance the people who are working in data science,” Gregurick said. “Because when you have multiple people coming from different backgrounds and different viewpoints working on something, the end result is so much better.”

(Elizabeth Witherspoon, Ph.D., is a contract technical writer in the NIEHS Office of Communications and Public Liaison.)


Back To Top