An international collaboration used machine-learning to create an open-access virtual catalogue of some 300,000 organic compounds, opening doors for a range of future applications from materials to drugs. The project, called Kraken, represents teamwork by Alán Aspuru-Guzik’s Matter Lab at the University of Toronto, the Sigman Research Group at the University of Utah, Technische Universität Berlin, the Karlsruhe Institute of Technology, the Vector Institute for Artificial Intelligence, the Center for Computer Assisted Synthesis at the University of Notre Dame, IBM Research and AstraZeneca.
Typically, it’s a painstakingly long journey to discover, develop and understand new catalysts and chemical reactions, explains co-lead author Gabriel dos Passos Gomes. “We had been thinking a lot about how to automate the synthesis of organic molecules in a modular manner,” says Aspuru-Guzik.
The team focused on transition-metal catalysis reactions. “These are some of the tools that allow molecular scientists to precisely develop materials and drugs, from the plastics in your smartphone to the probes that allowed for humanity to achieve the COVID-19 vaccines at an unforeseen pace,” says dos Passos Gomes, now a Banting Postdoctoral Fellow at the University of Toronto.
Transition-metal catalysis reactions require a suitable combination of metal and ligand — the ion or molecule that binds with metal to form a coordination complex. Selection of appropriate ligands is traditionally done by lab-based trial and error, a process they wanted to accelerate.
During their journey of discovery, Aspuru-Guzik’s lab realized they were not the only ones pursuing this goal. Upon learning that Tobias Gensch at Technische Universität Berlin was pursuing the same idea, “we had a couple of tense days in both labs when we were deciding what to do,” says Aspuru-Guzik. Upon reflection, the teams chose collaboration over competition, joining forces and welcoming additional collaborators too.
“We were interested in being able to look at more than just one individual reaction at a time,” says Tobias Gensch, so a marriage of computation and combinatorial chemistry made an ideal tool.
Focusing on organophosphorus ligands, some of the most prevalent ligands in homogeneous catalysis, some 330,000 compounds were available in Kraken’s catalogue at launch. Over time, the team may scale up to include millions more.
Kraken stands for Kolossal viRtual dAtabase for moleKular dEscriptors of orgaNophosphorus ligands, a name chosen for its caché as a mythical monster that grabbed ships in its octopus-like arms, analogous to how ligands grab metals. Kraken is part of a larger global initiative connecting academia, government, and industry to pursue AI-driven materials discovery via the University of Toronto’s Acceleration Consortium. Kraken can be freely accessed online with a preprint describing the tool in ChemRxiv.
McGill University organic chemist Nicolas Moitessier, though not involved in Kraken’s development, is already making use of it. What Kraken brings to the field, he emphasizes, is a huge set of ligands, with the descriptors needed to run machine learning, a term Moitessier prefers over ‘artificial intelligence.’ AI connotes a robot that can think by itself. That’s not the case here. “We’re just training mathematical functions to make predictions. It’s like predicting the weather,” he says, and predicting the outcome of matching metal and ligand, like weather forecasts, “can be right or wrong.” But, as models improve, he adds, they are more often right than wrong.
What Kraken provides, says Moitessier, is a large set of molecules that are not necessarily commercially available. “If you had to use normal computation to test or compute all of these properties for every single one of the compounds, it would have taken decades,” says Moitessier. So Kraken, he says, is “cool science.”