Skip to main content
Department of Environmental Health and Engineering

Cracking the Code of Everyday Chemical Exposure with AI

By using machine learning to predict how molecules behave during analysis, Johns Hopkins scientists say researchers can finally match thousands of unidentified compounds to known chemicals—advancing efforts to understand how the environment influences health over a lifetime.

Published
By
Danielle Underferth
Photography by Getty Images

Mapping the human exposome—all the chemicals people are exposed to during their lifetime—depends on identifying thousands of unknown molecules found in the human body. As scientists moved forward with a large-scale Human Exposome Project, advances in artificial intelligence are helping them predict how these molecules behave during chemical analysis, improving their ability to match unidentified compounds to known ones. By combining AI-driven techniques with mass spectrometry, researchers are uncovering hidden links between environmental exposures and human health with greater accuracy than ever before.

In a new paper, Johns Hopkins researchers reviewed current scientific literature on how AI-driven retention time prediction works, especially how AI and machine learning are being used to make these predictions more accurate. They looked at many different methods, data sources, and tools to see what works best, where the gaps are, and how this could help exposome researchers working with large datasets. The paper was published online on November 11 in the journal Frontiers in Public Health.

In this Q&A, Fenna Sillé, PhD, MS, assistant professor in Environmental Health and Engineering and lead author of the paper, shares what they discovered.

What question were you trying to answer by conducting this review?

When scientists analyze blood or urine to check what chemicals a person has been exposed to, their instruments pick up thousands of chemical “signals.” The challenge is that most of these signals come from chemicals we can detect but cannot yet identify. This makes it challenging for exposome researchers to understand how all the environmental exposures we experience over a lifetime influence our health.

Our paper explores one way to tackle this problem: using computer models and artificial intelligence to predict a value called retention time—basically, how long a molecule takes to move through a device used to separate chemical mixtures. This process, called chromatography, helps break a chemical sample into its individual components so each one can be studied. By predicting retention time, we can match signals to the different chemicals more accurately, even when we don’t have a complete chemical reference spectrum to compare against.  

What did you learn?

Our review shows that retention time isn’t just a minor side detail. It’s actually a powerful way to improve chemical identification. By combining retention time with other information, like molecular weight or fragment patterns, researchers can be much more confident about the identity of the compounds they are detecting. We also summarized how new AI tools can predict retention time across many different instruments, making studies more consistent and comparable between labs.

What was the key takeaway?

If we want to truly map all the environmental chemicals we are exposed to, we need better tools to identify them. What used to be a bottleneck of tens of thousands of “unknown unknowns” in a metabolomic profile—the measurement of metabolites, which are small molecules that are made when the body breaks down food, drugs, chemicals, or its own tissue—can be eliminated by adding retention time predictions, especially those powered by AI. With best practices in place, which we highlight in the review, retention time will no longer be an underused feature, but rather a valuable element of the metabolomics identification toolkit. It’s a practical step toward making mass spectrometry data more reliable and useful for exposome analysis.

The key takeaway is that machine learning-predicted retention time is now a viable, validated tool for metabolomics, greatly improving how we interpret complex, untargeted chemical data, and its adoption is accelerating.

Why is this important?

Without knowing what chemicals are in our bodies or environment, it’s hard to prevent diseases caused by those exposures. Realization of the Human Exposome Project depends on reliably mapping thousands of environmental and endogenous chemicals in human biospecimens. By improving how we identify unknown chemicals, researchers can better track pollution, monitor emerging contaminants, and identify internal exposure biomarkers in human biospecimens and environmental samples, which helps to eventually inform public health decisions. This kind of approach could also help build personalized exposure profiles, thereby showing how each person’s environment shapes their health over time.

Did anything surprise you about the results?

We were surprised by how much improvement you can get just by adding predicted retention time. In some cases, it performs almost as well as having full chemical spectra. Another surprise was that on one end, there are these very mature tools that are basically ready to be plugged into exposome workflows, while on the other end, everyday practice hasn’t yet caught up. It showed us that the main barrier for advancement in the field is no longer the technology itself, but getting robust, user-friendly versions of these AI tools into the hands of routine metabolomics and exposomics labs.

What happens next?

Next, we need to put these approaches into practice. That means getting more labs to adopt AI tools for predicting retention time and building them into everyday metabolomics workflows. We also need consistent retention index systems so results can be compared across studies. Finally, we need better infrastructure—like shared, easy-to-use databases that keep metabolomics data FAIR (findable, accessible, interoperable, and reusable). Together, these steps move us closer to a true Human Exposome Project, where mass spectrometry and metabolomics help drive new discoveries and turn chemical data into useful insights for improving human and environmental health.

AI redefines mass spectrometry chemicals identification: retention time prediction in metabolomics and for the Human Exposome Project” was coauthored by Fenna Sillé, Thomas Hartung, Thomas Luechtefeld, and Carsten Prasse.

This work was supported in part by a pilot project grant from the Department of Environmental Health and Engineering at the Johns Hopkins Bloomberg School of Public Health, made possible through a gift from Yu Wu and Chaomei Chen. Support was also provided through the Network for EXposomics in the United States (NEXUS), a Center for Exposome Research Coordination, which is supported by the National Institutes of Environmental Health Sciences under grant number U24ES036819.