MIT's Connor Coley develops AI models that incorporate chemical principles to accelerate drug discovery. His research bridges chemical engineering and computer science, creating computational tools like ShEPhERD and FlowER that evaluate potential drug compounds and predict chemical reactions with unprecedented accuracy.
The vast chemical space presents an overwhelming challenge for drug discovery. Scientists estimate that between 10^20 and 10^60 compounds may hold potential as small-molecule drugs. Evaluating even a fraction of these possibilities experimentally would take lifetimes. This fundamental limitation has driven researchers to develop computational approaches, with artificial intelligence emerging as a powerful tool to navigate this chemical universe.
At the forefront of this field is MIT Associate Professor Connor Coley, whose research straddles the line between chemical engineering and computer science. Coley, the Class of 1957 Career Development Associate Professor with appointments in the departments of Chemical Engineering and Electrical Engineering and Computer Science and the MIT Schwarzman College of Computing, has dedicated his career to developing computational models that can analyze vast numbers of chemical compounds, design new molecules, and predict reaction pathways.
"MIT is a very special place in terms of the resources and the fluidity across departments," Coley explains. "MIT seemed to be doing a really good job supporting the intersection of AI and science, and it was a vibrant ecosystem to stay in."

Coley's journey to this interdisciplinary field began with a family steeped in science. His father is a radiologist, his mother earned a degree in molecular biophysics and biochemistry before attending MIT Sloan, and his grandmother was a math professor. As a high school student in Dublin, Ohio, Coley participated in Science Olympiad competitions and graduated at 16. He chose chemical engineering at Caltech for its blend of science and math, while also developing computer science skills through work in a structural biology lab.
During his PhD at MIT, advised by professors Klavs Jensen and William Green, Coley began exploring how machine learning could optimize chemical reactions. His work, part of the DARPA-funded Make-It program, focused on combining machine learning with cheminformatics to plan reaction pathways for drug molecules and design hardware for automated synthesis.
"That was my real entry point into thinking about cheminformatics, thinking about machine learning, and thinking about how we can use models to understand how different chemicals can be made and what reactions are possible," Coley recalls.
After a postdoc at the Broad Institute where he gained experience in chemical biology and drug discovery, Coley established his MIT lab with a clear mission: deploy AI not only to synthesize existing therapeutic compounds but to design new molecules with desirable properties and novel synthetic routes.
One of Coley's most significant contributions is ShEPhERD, a model trained to evaluate potential drug molecules based on their three-dimensional interactions with target proteins. Traditional approaches often focus on molecular structure in isolation, but ShEPhERD considers how molecules will actually bind to proteins, a crucial factor in drug efficacy and specificity.
"We're trying to give more of a medicinal chemistry intuition to the generative model, so the model is aware of the right criteria and considerations," Coley explains. This approach has proven valuable enough that pharmaceutical companies now use ShEPhERD in their drug discovery pipelines.
More recently, Coley's lab developed FlowER, a generative AI model that predicts reaction products from different chemical inputs. What sets FlowER apart is its incorporation of fundamental physical principles, particularly the law of conservation of mass. The researchers compelled the model to consider not just the final products but also the feasibility of intermediate steps in the reaction pathway.

"Thinking about those intermediate steps, the mechanisms involved, and how the reaction evolves is something that chemists do very naturally. It's how chemistry is taught, but it's not something that models inherently think about," Coley notes. "We've spent a lot of time thinking about how to make sure that our machine-learning models are grounded in an understanding of reaction mechanisms, in the same way an expert chemist would be."
This emphasis on incorporating chemical principles into AI models represents a significant shift from purely data-driven approaches. By building in domain knowledge, Coley's team creates models that are not only accurate but also interpretable and aligned with chemical intuition. This approach addresses a persistent challenge in AI for science: ensuring that models respect the underlying principles of the domain they're operating in.
The impact of this work extends beyond drug discovery. Coley's general approach could be applied to any application of organic molecules, including materials science, agrochemicals, and specialty chemicals. The ability to predict reaction outcomes and design molecules with specific properties could revolutionize how chemists approach synthetic challenges.
Students in Coley's lab work on diverse research threads related to chemical optimization, including computer-aided structure elucidation, laboratory automation, and optimal experimental design. Through these varied projects, the lab aims to advance the frontier of AI in chemistry while maintaining practical applications.

As AI continues to transform scientific research, Coley's work exemplifies how domain knowledge and machine learning can complement each other. His models demonstrate that the most powerful AI applications in chemistry aren't those that simply replace human expertise but those that enhance it by processing vast chemical spaces while respecting fundamental principles.
The future of AI in chemistry, as envisioned by Coley, involves increasingly sophisticated models that can reason about chemical systems at multiple scales—from molecular interactions to reaction networks. This evolution will require continued collaboration between chemists and computer scientists, each bringing their own expertise to solve problems that neither could tackle alone.
For drug discovery specifically, these computational tools promise to accelerate the identification of therapeutic candidates while reducing the reliance on expensive and time-consuming experimental screening. As the models become more sophisticated, they may even suggest entirely new chemical approaches to treating diseases, expanding the possibilities beyond what traditional methods might uncover.
Coley's research represents a critical step toward realizing the potential of AI in chemistry—not as a replacement for chemical intuition, but as an amplifier of human creativity and scientific understanding.


Comments
Please log in or register to join the discussion