The National Institutes of Health (NIH) has a new Data Management and Sharing Policy that requires all NIH-funded researchers to create, implement, and report an NIH Data Management and Sharing (DMS) Plan. The new policy, which went into effect on Jan. 25, seeks to ensure that taxpayer-funded research is made publicly available. The DMS Plan features six elements that will help move government research data toward FAIR (Findable, Accessible, Interoperable, and Reusable) principles.
To aid researchers in the environmental health sciences with the new mandate, the NIEHS Environmental Health Language Collaborative (EHLC) hosted three half-day workshops titled “Sharing Your Environmental Health Sciences Data: Metadata, Standards, and Tools.”
“The purpose of the workshops was to raise awareness of and encourage use of metadata, standards, and tools that researchers can use to comply with the new 2023 policy,” said Stephanie Holmgren, program manager in the NIEHS Office of Data Science and chair of the workshop program planning committee. “The workshops were intended for anyone who will be developing or implementing the NIH data management and sharing plan or who is interested in learning ways to effectively manage, share, and reuse environmental health sciences data.”
In keeping with EHLC’s mission, the focus of the workshop series was to help researchers in the environmental health sciences comply with elements one (data descriptions) and three (standards) from the six required policy elements to include in a DMS Plan. Presentations addressed tools for metadata annotation, and they included information on how to find and use ontologies and controlled vocabularies relevant to environmental health sciences research.
Detailed data descriptions
The intent of the first element in the DMS Plan is to apply detailed descriptions and naming conventions that aid in finding data. In particular, the inclusion of metadata details are expected. Metadata are data that provide additional information intended to make scientific data interpretable and reusable.
For example, all data will be stored in repositories, and standardized descriptions will help with annotations, search capabilities, and information extraction. By having that consistent terminology, study data can be reused and interoperable.
“Metadata are extremely important because they help to communicate how the data was generated and what the data is about,” Holmgren said. “Then, not only can another human understand that data, but machines can understand that data, too. With machine power, we can do a lot more with the data in terms of inferencing across or seeing patterns in the data, thereby allowing researchers to tackle more complex questions.”
Working toward consensus
In addition to clear descriptions, a need for community development, consensus, and adoption of standards for how environmental health sciences data should be structured is also necessary.
“In order to make the data FAIR, you really need to have good metadata, and that good metadata happens when you have community endorsed metadata standards,” said Mark Musen, M.D., Ph.D., from Stanford University, during a workshop presentation.
The series highlighted foundational standards relevant to the environmental health sciences and detailed best practices for what to do if standards do not exist for specific aspects of a research study.
“We recognize every study is unique and will have its own aspects to it, but there is still some fundamental common denominator to the nature of a reproductive toxicology study or an environmental epidemiology study,” Holmgren said.
Data management improves sharing
Implementation of the DMS Plan includes data sharing.
“It is not enough anymore to say, ‘Data will be shared upon request.’ There is now the requirement to actually submit the data into a repository,” Holmgren said.
Repository sharing best practices include the following.
- Detailing where people can find the dataset.
- Ensuring that any human data have privacy protections and that access to personal identifiable information is restricted.
- Detailing the specific protocol for accessing and using the dataset.
- Links to datasets are clearly noted.
- Data and access maintenance over time.
“At the heart of the new policy, it is really about trying to convey the value of why data sharing is important,” said Holmgren. “If you are going to properly data share, you must effectively manage your data. The two are intricately linked because data management happens throughout the research lifecycle. The DMS Plan aids researchers in thinking about data management and data sharing issues up front before they conduct a study.”
Other NIEHS members of the program planning committee included Chris Duncan, Ph.D., Jennifer Fostel, Ph.D., Richard Kwok, Ph.D., Anna Maria Masci, Ph.D., Charles Schmitt, Ph.D., and Vickie Walker.
(Jennifer Harker, Ph.D., is a technical writer-editor in the NIEHS Office of Communications and Public Liaison.)