AI in Drug Discovery: From Hype to Clinical Reality (2026 Deep Dive)
AI in drug discovery is no longer hype—it’s delivering real clinical progress. From AlphaFold to AI-designed drugs entering trials, this deep dive explores what’s actually working, where AI fails, and what the future of biopharma really looks like.
AI in Drug Discovery
Executive summary
AI in drug discovery has moved beyond glossy decks and conference optimism. In the past five years, the field has gained real momentum from better structure prediction, multimodal learning over omics and phenomics, faster design–make–test cycles, and the integration of AI with automated wet labs. The clearest public milestone so far is Insilico Medicine’s rentosertib for idiopathic pulmonary fibrosis, where an AI-discovered target and AI-designed small molecule progressed into a randomised phase 2a study with safety data and early signs of efficacy. At the same time, regulators have stopped treating AI as a side topic: the US FDA now reports a sharp rise in AI-related submissions, and the EMA has already published lifecycle-wide reflection principles for AI in medicines. In other words, this is no longer a fringe experiment. It is becoming part of mainstream biopharma infrastructure, albeit unevenly and a bit messily. (Zhang et al., 2025; Zitnik, 2025; FDA, 2025; EMA, 2024).
Still, the right reading is not “AI has solved drug discovery”. The better reading is narrower and more rigorous: AI is proving most useful where it can increase the odds of a good scientific decision, such as target prioritisation, structure-based modelling, ADMET prediction, mechanism-of-action inference, molecular generation, and trial enrichment. It is much less impressive when the data are noisy, the assay biology is weak, or the benchmark is too cosy to reflect real medicinal chemistry. Several recent perspectives explicitly argue that data quality, prospective benchmarking, and fit-for-purpose validation matter more than algorithm fashion. Frankly, that is the part the hype machine does not love hearing. (Durant et al., 2024; Wognum et al., 2024; Catacutan et al., 2024).
So the big analytical conclusion is this: AI is already shortening some parts of early discovery and improving some parts of biological interpretation, but it has not abolished the hard physics, toxicology, manufacturing, and clinical uncertainty that dominate late-stage attrition. The winners over the next few years are likely to be companies and research groups that combine multimodal data, strong experimental loops, regulatory credibility, and a very non-magical tolerance for failure. (FDA, 2023; Zhang et al., 2025; Recursion, 2026).
Why this matters now
The strongest “why now?” driver is structural biology. AlphaFold changed protein structure prediction at scale, and AlphaFold-related infrastructure now makes more than 200 million predicted protein structures openly available. In 2024, AlphaFold 3 extended this idea beyond proteins to complexes involving DNA, RNA, ligands, ions and modified residues, with Isomorphic Labs and Google DeepMind reporting at least a 50% improvement over existing methods for protein interactions with other molecule types. That matters because drug discovery is, in large part, a problem of molecular recognition. If you can model the shape and interaction context faster, you can ask better medicinal-chemistry questions earlier. (Google DeepMind, 2026; AlphaFold DB, 2026; Isomorphic Labs, 2024; Abramson et al., 2024).
A second driver is multimodal data. Modern AI workflows increasingly combine genomics, transcriptomics, proteomics, phenomics, imaging, literature mining and patient records rather than relying on one neat assay. Recursion says its platform aggregates more than 50 petabytes across phenomics, transcriptomics, proteomics, ADME and de-identified patient data, while BenevolentAI emphasises a disease knowledge graph built from multiple orthogonal modalities. These architectures are not just bigger databases; they are attempts to make biological context more computable. That shift is important because many past small-molecule programmes failed not from bad chemistry alone, but from weak target rationale and poor translatability. (Recursion, 2026; BenevolentAI, 2022).
A third driver is closed-loop experimentation. The current frontier is not “AI proposes molecule, humans celebrate”, but rather AI proposes, robots make, assays test, models update, and the loop repeats. Absci describes six-week AI–wet-lab cycles for biologics optimisation; Generate:Biomedicines frames its platform as a generate–build–measure–learn loop; Recursion does something analogous at large scale with automated phenomics. This is slighly less cinematic than the popular narrative, but more useful. AI gets materially better when it is fed back with high-quality experimental truth. (Absci, 2026; Generate:Biomedicines, 2026; Recursion, 2026).
The published literature is also getting more concrete. A 2024 Nature Computational Science paper by van Tilborg and Grisoni showed that active deep learning can achieve up to a sixfold improvement in hit discovery in low-data settings relative to traditional screening methods, at least under the conditions they tested. Another 2024 paper by Yu and colleagues built a deep-learning pipeline around temporal mitochondrial phenotypes, using 570,096 single-cell images across 1,068 FDA-approved drugs to infer mechanisms of action, and then experimentally validated cyclooxygenase-2 inhibition for epicatechin. These are not “general AGI cures disease” stories. They are narrower, but they are the kind of narrow progress that actually accumulates. (van Tilborg & Grisoni, 2024; Yu et al., 2024).
What AI actually does in drug discovery
Classical machine learning is still the quiet workhorse. It underpins QSAR-style property prediction, ADMET modelling, drug–target interaction scoring, repurposing screens, and synthesis or route-planning support. The FDA’s own discussion paper notes that early discovery uses already include target identification, selection and prioritisation, compound screening, prediction of chemical properties and bioactivity, and the anticipation of efficacy and adverse events from specificity and affinity patterns. So yes, foundation models get the headlines, but a lot of practical value is still coming from more ordinary supervised models trained on massive assay collections. (FDA, 2023; Zhang et al., 2025).
Deep learning broadened this by learning richer molecular and biological representations. Graph neural networks are now central for property prediction, drug–target interaction modelling and structure-aware scoring; the 2025 Nature Medicine review by Zhang and colleagues explicitly points to GNN-based methods and multimodal fusion as major pieces of the modern stack. Open tools mirror that shift: Chemprop is a message-passing neural-network framework for molecular property prediction, while DeepChem positions itself as an open-source library with a particular focus on molecular ML and drug discovery. In practice, these frameworks help teams move from hand-crafted descriptors towards learned chemical representations that can generalise better—though not always as well as the benchmark curve first suggests. (Zhang et al., 2025; Chemprop, 2026; DeepChem, 2022).
Generative models are the flashier end of the story. For small molecules, the leading families include language-model approaches over SMILES, graph generators, autoregressive transformers, diffusion-like systems, and constrained generators that optimise for multiple properties at once. A 2024 review in Current Opinion in Biotechnology summarised how generative AI is reshaping small-molecule design, and the same 2025 Nature Medicine review situates generative AI and LLMs as central to the new wave of molecular design. On the protein side, biologics companies are doing something conceptually similar: Generate:Biomedicines says its platform learns generalisable biological rules from protein sequence and structure data to generate novel therapeutic proteins, and Absci is using generative AI plus a synthetic-biology data engine to design antibodies. (Kanakala et al., 2024; Zhang et al., 2025; Generate:Biomedicines, 2026; Absci, 2026).
Reinforcement learning is not the whole field, but it fills an important niche where the task is sequential and multi-objective: optimise potency without wrecking solubility, improve selectivity without making synthesis impossible, or search pathways in retrosynthesis. The 2024 review on deep reinforcement learning in chemistry highlights molecule generation, geometry optimisation and retrosynthetic pathway search as major application domains. In the discovery workflow, RL is at its best when paired with realistic constraints and active learning; otherwise, it can “win” a toy objective and produce molecules no chemist would seriously pursue. (Kulshrestha et al., 2024 review; van Tilborg & Grisoni, 2024; Zhang et al., 2025).
An underrated category is AI for mechanism-of-action and phenotypic interpretation. This sits between biology and chemistry and often matters more than molecule generation itself. Yu et al. (2024) showed how time-resolved mitochondrial imaging can infer MOA at scale. Recursion’s phenomics-rich approach is built on the same broad idea: let cellular morphology and perturbation data reveal hidden biological relationships that humans would not spot reliably by eye. If that sounds less glamorous than a molecule-generating chatbot, well, that’s because it is. But it may be closer to where the durable value is. (Yu et al., 2024; Recursion, 2026).
Open scientific infrastructure remains crucial too. AlphaFold DB provides large-scale predicted structures; ChEMBL curates bioactive molecules and target-linked bioactivity data; the Open Targets Platform supports systematic target identification and prioritisation; DeepChem and Chemprop lower the barrier to prototyping and benchmarking molecular models. These tools do not replace proprietary data moats, but they do make AI drug discovery less of a closed club than it used to be. For researchers in smaller labs or emerging markets, that matters quite a lot. (AlphaFold DB, 2026; ChEMBL, 2026; Open Targets, 2026; DeepChem, 2022; Chemprop, 2026).
Fictional vignette. A small translational lab in Dhaka builds a target short-list using Open Targets and public transcriptomics, trains a Chemprop model for permeability and hERG risk, then realises on week three that the cell assay has drifted so badly that the “top” compounds were being rewarded for plate position effects. Nothing catastrophic happened; the model was simply learning the wrong thing. It is a very believable AI-drug-discovery story because teh hard part, again and again, is not only modelling. It is experimental truth.
Platform landscape
I selected the platforms below using three criteria: public primary documentation, evidence of actual pipeline use or partnerships, and diversity of technical approach across small molecules, biologics, phenomics and structure-first design.
| Company | Core tech | Stage of use | Notable achievements | Limitations | Primary sources |
|---|---|---|---|---|---|
| Insilico Medicine | PandaOmics target discovery, Chemistry42 generative chemistry, robotics / Pharma.AI | Target ID → design → preclinical → clinical | Rentosertib in IPF reached randomised phase 2a; company states this is the first proof-of-concept clinical validation for an AI-discovered target and AI-designed drug in this setting | Public clinical validation is still concentrated in a flagship programme; larger confirmatory trials are needed | Insilico rentosertib release; ClinicalTrials.gov; Nature Medicine commentary. |
| Recursion | Phenomics + transcriptomics + ML + automated labs + high-performance compute | Discovery → preclinical → clinical | Multiple clinical-stage assets; REC-994 phase 2 met its primary safety endpoint and showed promising efficacy trends; company reports first clinical validation of its full-stack OS in FAP | Several readouts are still signal-finding and not powered for decisive statistical proof | Recursion platform page; REC-994 phase 2 release; 2026 business update. |
| Isomorphic Labs | Structure-first, multimodal unified drug-design engine building beyond AlphaFold 3 | Target / design → partnered preclinical; internal pipeline not yet publicly clinical | Co-developed AlphaFold 3; pharma collaborations with Lilly, Novartis and Johnson & Johnson; internal oncology and immunology pipeline | Public evidence is strongest for platform capability and partnerships, not yet for disclosed clinical candidates | AlphaFold 3 article; tech page; partnership pages. |
| BenevolentAI | Biomedical knowledge graph + multimodal target discovery + molecular design | Target ID → design → clinical / repurposing | Baricitinib repurposing for COVID-19 is the best-known public case; BEN-2293 phase 2a in atopic dermatitis met safety endpoint; UC target-to-candidate story advanced rapidly | Mixed efficacy outcomes in in-house assets; company claims are stronger than peer-reviewed public detail for some programmes | Baricitinib publication page; BEN-2293 phase 2a release; ulcerative colitis release. |
| Absci | Generative AI for biologics + synthetic biology data engine + iterative wet-lab loops | Biologic design → preclinical → early clinical | ABS-201 entered phase 1/2a in December 2025; company says concept-to-clinical pipeline in about 24 months for this lead | Very early human-stage evidence; platform claims remain largely company-generated | Absci home and ABS-201 case-study pages. |
| Owkin | Multimodal patient-data-first target discovery, subgrouping and clinical optimisation | Target ID → biomarkering → clinical optimisation → early pipeline | TargetMATCH links targets to patient subgroups; company also runs a growing oncology pipeline and AI-driven trial optimisation products | More visible today in target discovery and development optimisation than in public de novo molecule outcomes | TargetMATCH page; pipeline page; development approach page. |
| XtalPi | Physics-informed AI + quantum/force-field methods + robotics | Design / optimisation → partner programmes → clinic via collaborators | With Signet, designed a preclinical candidate in just over six months and advanced target-to-IND in just over three years; broader collaborations include Pfizer XFEP deployment | Clinical proof is mostly through partners, so attribution to platform versus partner biology is harder to separate | XtalPi–Signet page; XtalPi–Pfizer page. |
| Generate:Biomedicines | Generative biology for de novo proteins, continuous generate–build–measure–learn loop | Protein design → preclinical / collaborations | Strong technical positioning in protein generation and validation loops across modalities | Publicly disclosed clinical-stage evidence remains thinner than for some small-molecule peers | Generate platform and pipeline pages. |
A practical reading of this table is that “AI drug discovery” is no longer one market. It has split into at least four strategic camps: structure-first engines, phenomics-first engines, knowledge-graph/patient-data engines, and generative biologics engines. That split matters because it changes what success should look like. A structure company should not be judged exactly like a phenomics company, and a target/subgroup company should not be judged exactly like a de novo biologics shop. Sometimes investors blur these categories; scientists probably should not. (Isomorphic Labs, 2024; Recursion, 2026; Owkin, 2026; Absci, 2026).
Case studies and a realistic timeline
The headline case is Insilico’s rentosertib in idiopathic pulmonary fibrosis. Public company materials and Nature commentary describe a workflow where multi-omics target discovery identified TNIK, generative chemistry produced the inhibitor, and the programme moved from target discovery to phase 1 in under 30 months. In the phase 2a study publicised in 2025, the highest-dose group showed a mean forced vital capacity gain of +98.4 mL versus a mean decline of −20.3 mL in placebo, with the company and the accompanying Nature Medicine commentary both emphasising safety and signs of efficacy rather than definitive proof. This is exactly the sort of case that shifts a field’s tone: not “AI won”, but “AI can clearly get at least some programmes this far”. (Insilico Medicine, 2025; Zitnik, 2025; Kingwell, 2024).
Public case vignette. Imagine being one of the IPF clinicians on that 2025 study: not looking for a grand theory of AI, just scanning lung-function trends and tolerability tables to decide whether the signal deserves a bigger trial. That is how technology reputations really change in medicine. Not with a slogan. With a clinician staring at data and saying, “alright, this might be worth the next study.” The rentosertib story feels important precisely because it crossed that threshold. (Insilico Medicine, 2025; Xu et al., 2025 commentary discussed by Zitnik, 2025).
The second public case is BenevolentAI’s baricitinib repurposing programme for COVID-19. Here, AI did not invent a brand-new molecule; it helped find a new indication quickly by traversing a biomedical knowledge graph with ML-augmented literature extraction and human-guided querying. BenevolentAI reports that this workflow nominated baricitinib as both an antiviral and anti-inflammatory candidate. The broader clinical validation then came through trials such as ACTT-2 and regulatory events that culminated in FDA approval for hospitalised adults with COVID-19 in 2022, with WHO also recommending baricitinib in severe or critical COVID-19. This case is analytically important because it shows where AI can move very fast: not in de novo toxicology, but in repurposing when safety is already partly characterised. (Stebbing et al./BenevolentAI publication page, 2021; FDA, 2022; WHO, 2022).
The third case is Recursion’s clinical pipeline. REC-994 in cerebral cavernous malformations met its primary endpoint of safety and tolerability in phase 2 and produced promising MRI and functional trends, but Recursion also clearly states the study was a signal-finding trial not powered for statistical significance. That honesty matters. Separately, the company’s 2025 TUPELO readout for REC-4881 in familial adenomatous polyposis reported rapid and durable reductions in polyp burden in a majority of evaluable patients, which Recursion later described as the first clinical validation of its full-stack AI operating system in FAP. These data are still early, but they are a useful corrective to caricatures: some AI-native companies are not just discovery engines for licensing deals; they are becoming drug developers with the same evidentiary burdens as everyone else. (Recursion, 2025; Recursion, 2026).
For biologics, the evidence base is earlier but worth watching. Absci says its lead antibody candidate ABS-201 entered phase 1/2a in December 2025 after a roughly 24-month platform-driven development window from concept to clinical-trial pipeline. Generate:Biomedicines remains more preclinical in public disclosures, but its generate–build–measure–learn framing is representative of where AI biologics platforms are heading: from sequence generation to iterative experimental validation rather than sequence generation alone. (Absci, 2026; Generate:Biomedicines, 2026).
The timing question deserves realism. Best-publicised AI-assisted projects suggest that target prioritisation and candidate nomination can sometimes be compressed materially—months instead of years in favourable cases—but IND-enabling studies, manufacturing, safety packages, and clinical trials still dominate the calendar. FDA itself describes drug development as an iterative continuum rather than a neat linear train. So the right timeline is not “AI makes it instant”. It is “AI may compress early learning loops, while the later phases still move at the speed of biology, regulation and patient safety”. (FDA, 2023; Savage, 2021 feature; Insilico Medicine, 2025).
That chart is illustrative rather than universal. It synthesises public AI-to-clinic case histories and regulator descriptions of the workflow, and it deliberately assumes that clinical development remains the long pole. In some celebrated AI-native programmes, the discovery segment has been faster than this. The rest of the path, though, still takes patience. (FDA, 2023; Insilico Medicine, 2025; Zitnik, 2025).
Regulation, ethics, and failure modes
Regulation is catching up, but not all at once. The US FDA says CDER has seen a significant increase in submissions with AI components and notes experience with more than 500 submissions containing AI components between 2016 and 2023. Its 2025 draft guidance sets out a risk-based credibility assessment framework for AI used to generate information supporting regulatory decision-making on safety, effectiveness or quality. But the FDA also makes an important boundary clear: pure discovery-stage use is largely outside the scope of that specific guidance. That means early discovery can still move fast, yet the evidence that eventually reaches regulators must be credible, documented and fit for context of use. (FDA, 2025; FDA, 2023).
Europe is taking a lifecycle view. The EMA reflection paper explicitly states that its principles are relevant from drug discovery to the post-authorisation setting. This is subtle but important. Instead of regulating “AI” as one monolith, the agency is treating AI as something that can enter different stages of a medicine’s lifecycle with different risk profiles, data issues and validation needs. That seems sensible, because the same model class can be low-stakes in target prioritisation and high-stakes in dose or efficacy claims. (EMA, 2024).
Ethically, the WHO position remains a useful anchor. Its guidance on AI for health highlights human rights, transparency, accountability, bias, safety and cybersecurity, inclusiveness, and sustainability. In drug discovery, these issues show up less as bedside autonomy questions and more as dataset representativeness, hidden population bias, provenance of training data, labour concentration inside proprietary platforms, and the risk that models optimised on Western-heavy biomedical data produce weak or unfair generalisation elsewhere. This is not an abstract concern for low- and middle-income settings; it is a practical one. (WHO, 2021; WHO, 2021 news release).
The most common failure mode is poor data rather than poor code. Durant and colleagues argue that ML progress in small-molecule discovery will be driven more by better data and benchmarking than by ever-fancier model architectures. Wognum and co-authors go further, calling for blind, prospective benchmarks as the gold standard for unbiased evaluation. If a model is trained and tested on overly related assays, chemotypes or publication artefacts, it may look brilliant and then collapse in live projects. This is one reason why public claims from platform companies need to be read with some caution, even when I find them interesting. (Durant et al., 2024; Wognum et al., 2024).
A related failure mode is biological non-translation. A model can predict binding, or even potency, while the target itself proves non-causal, redundant, unsafe, or therapeutically irrelevant in humans. Catacutan and colleagues stress that ML is spreading across hit discovery, mechanism elucidation and chemical optimisation, but the tacit assumption that better computational ranking automatically means better clinical success is still too strong. That assumption is not harmless; it can redirect scarce wet-lab effort into elegant dead ends. (Catacutan et al., 2024; Zhang et al., 2025).
Structure prediction has its own limits. AlphaFold 3 is a major advance, but independent assessments have reported that while overall GPCR backbone predictions improve, meaningful discrepancies can persist in ligand-binding poses, especially for some ions, peptides and proteins. So structure AI is extremely useful, but not infallible enough to replace medicinal chemistry judgment, docking controls, biophysics, or crystallography/cryo-EM where decisions are costly. This is the sort of nuance that gets lost in popular summaries. (Xu & He, 2024/2025 assessments; Isomorphic Labs, 2024).
Future outlook
The next few years will probably belong to multi-agent, multimodal, experimentally grounded systems rather than to any single model family. The 2025 Nature Medicine review already frames the field as moving across the entire workflow from target identification to clinical trial design. Open infrastructure is expanding too: AlphaFold DB continues to grow, and Open Targets, ChEMBL, DeepChem and Chemprop lower the cost of entry for serious modelling. That combination should make AI drug discovery more distributed, more competitive, and, hopefully, more reproducible. (Zhang et al., 2025; AlphaFold DB, 2026; ChEMBL, 2026; DeepChem, 2022; Chemprop, 2026).
I also expect the centre of gravity to broaden beyond classic small molecules. Biologics, multispecific antibodies, molecular glues, targeted degraders and designed protein–protein interactions are all natural fits for generative or structure-aware systems, provided the experimental loops are strong enough. Companies like Generate:Biomedicines, Absci and Isomorphic Labs are already pushing in that direction, while XtalPi shows how physics-informed AI may remain relevant where binding energetics and selectivity really matter. The field will look less like “one chatbot, one drug” and more like a stack of specialised models, data engines and lab automation. (Generate:Biomedicines, 2026; Absci, 2026; Isomorphic Labs, 2026; XtalPi, 2025).
Yet the best near-term forecast is still moderate rather than breathless. Publicly documented AI-native programmes are now reaching phase 1 and phase 2 proof-of-concept, but the number of late-stage, unambiguously AI-originated approvals remains limited in public evidence. The likely future is not that medicinal chemists, pharmacologists and clinicians disappear. It is that the best of them spend less time on low-value search and more time on designing decisive experiments. That would already be a huge win, and maybe a more honest one. (Zitnik, 2025; FDA, 2025; Recursion, 2026).
Open questions and limitations
Some of the most recent company-specific claims, especially around timelines and “industry first” language, come from official company releases rather than fully peer-reviewed papers. I used them because the user asked for tools/platforms and notable companies, but they should be read as high-signal primary claims, not as neutral truth by default.
For a few recent peer-reviewed papers behind paywalls, I relied on publisher previews, abstracts, or associated official release material rather than full-text methodological deep dives. That is good enough for a rigorous blog-style analysis, but not enough for a systematic review or investment memo.
Conversation
Comments
Reply, like, report abuse, and keep the discussion constructive.
No comments yet. Be the first to start the conversation.
You need an account to write comments, replies, and likes in this thread.