The Dawn of AI in Drug Discovery


Despite the buzz around artificial intelligence (AI), most industry insiders know that the use of machine learning (ML) in drug discovery is nothing new. For more than a decade, researchers have used computational techniques for many purposes, such as finding hits, modeling drug-protein interactions, and predicting reaction rates.

What is new is the hype. As AI has taken off in other industries, countless start-ups have emerged promising to transform drug discovery and design with AI-based technologies for things like virtual screening, physics-based biological activity assessment, and drug crystal-structure prediction.

Investors have made huge bets that these start-ups will succeed. Investment reached $13.8 billion in 2020. And, more than one-third of large-pharma executives report using AI technologies.

While a few “AI-native” candidates are in clinical trials, around 90% remain in discovery or preclinical development, so it will take years to see if the bets pay off.

Artificial Expectations

Along with big investments comes high expectations—drug the undruggable, drastically shorten timelines, virtually eliminate wet lab work. Insider Intelligence projects that discovery costs could be reduced by as much as 70% with AI.

Unfortunately, it’s just not that easy. The complexity of human biology precludes AI from becoming a magic bullet. On top of this, data must be plentiful and clean enough to use. Models must be reliable. Prospective compounds need to be synthesizable. And drugs have to pass real-life safety and efficacy tests.

While this harsh reality hasn’t slowed investment, it has led to fewer companies receiving funding, to devaluations, and to discontinuation of some more lofty programs, like IBM’s Watson AI for drug discovery.

This begs the question: Is AI for drug discovery more hype than hope? Absolutely not. Do we need to adjust our expectations and position for success? Absolutely, yes.

But how?

Three Keys to Implementing AI in Drug Discovery

Implementing AI in drug discovery requires: reasonable expectations, clean data, and collaboration. Let’s take a closer look.

  1. Reasonable Expectations

    AI can be a valuable part of a company’s larger drug discovery program. But, for now, it’s best thought of as one option in a box of tools. Clarifying when, why, and how AI is used is crucial, albeit challenging.

    Interestingly, investment has largely fallen to companies developing small molecules, which lend themselves to AI because they’re relatively simple compared to biologics, and also because there are decades of data upon which to build models. [1,2] There is also great variance in the ease of applying AI across discovery, with models for early screening and physical-property prediction seemingly easier to implement than those for target prediction and toxicity assessment [3,4]

    While the potential impact of AI is incredible, we should remember that good things take time. Pharmaceutical Technology recently asked its readers to project how long it might take for AI to reach its peak in drug discovery, and by far, the most common answer was “more than 9 years.”

  2. Clean Data

    “The main challenge to creating accurate and applicable AI models is that the available experimental data is heterogenous, noisy, and sparse, so appropriate data curation and data collection is of the utmost importance.”

    This quote from a 2021 Expert Opinion on Drug Discovery article speaks wonderfully to the importance of collecting clean data. While it refers to ADEMT and activity prediction models, the assertion also holds true in general.AI requires good data, and lots of it.

    But good data are hard to come by. Publicly available data can be inadequate, forcing companies to rely on their own experimental data and domain knowledge. Unfortunately, many companies struggle to capture, federate, mine, and prepare their data, perhaps due to skyrocketing data volumes, outdated software, incompatible lab systems, or disconnected research teams. Success with AI will likely elude these companies until they implement technology and workflow processes that let them:

    - Facilitate error-free data capture without relying on manual processing
    - Handle the volume and variety of data produced by different teams and partners
    - Ensure data integrity and standardize data for model readiness

  3. Collaboration

    Companies hoping to leverage AI need a full view of all their data, not just bits and pieces. This demands a research infrastructure that lets computational and experimental teams collaborate, uniting workflows and sharing data across domains and locations. Careful process and methodology standardization is also needed to ensure that results obtained with the help of AI are repeatable.

    Beyond collaboration within organizations, key industry players are also collaborating to help AI reach its full potential, making security and confidentiality key concerns. For example, many large pharmas have partnered with start-ups to help drive their AI efforts. Collaborative initiatives, such as the MELLODDY Project, have formed to help companies leverage pooled data to improve AI models. And vendors like Dotmatics are building AI models using customers’ collective experimental data.

Get AI-Ready with Dotmatics

Dotmatics Platform and its Small Molecule Drug Discovery Solution facilitate easy capture of clean data and enable the integration of AI into more extensive drug discovery workflows. To learn how we can help you get AI-ready, contact us today.


  1. Buvailo, A. Will Biologics Surpass Small Molecules In The Pharmaceutical Race? 2022.

  2. Kirkpatrick, P. Artificial intelligence makes a splash in small-molecule drug discovery.

    Biopharmadealmakers. 2022.

  3. Lowe, D. AI and Drug Discovery: Attacking the Right Problems.

    Science. 2021.

  4. David Z Huang, J. Christian Baber & Sogole Sami Bahmanyar. The challenges of generalizability in artificial intelligence for ADME/Tox endpoint and activity prediction,

    Expert Opinion on Drug Discovery. 2021, 16(9), 1045-1056.

Get the latest science news in your inbox.