2025 ARCHIVES

Cambridge Healthtech Institute's Inaugural

Machine Learning in Early Discovery

Exploiting the Power of ML in Early-Stage Biotherapeutic R&D

January 14 - 15, 2025 ALL TIMES PST

The early stages of drug discovery are being transformed by the application of machine learning (ML) to identify novel targets, predict drug-target interactions, and guide experimental design. CHI's Machine Learning in Early Discovery conference will explore cutting-edge approaches for leveraging ML in target identification, including analysis of immune receptors, integration of multiomics data, drug repurposing, gene expression analysis, and network analysis. The meeting will also delve into the latest advances in high-throughput screening and automation, with a focus on generating high-quality training data for ML models. Attendees will learn about strategies for augmenting experimental data with synthetic data, coupling sequence information with NGS, and overcoming limitations in throughput and data generation.

Tuesday, January 14

7:30 amRegistration and Morning Coffee

8:20 am

Organizer's Remarks

Christina Lingham, Executive Director, Conferences and Fellow, Cambridge Healthtech Institute

Kent Simmons, Senior Conference Director, Cambridge Healthtech Institute

8:25 am

Plenary Keynote Introduction

Victor Greiff, PhD, Associate Professor, University of Oslo and Director of Computational Immunology, IMPRINT

8:30 am

The State of the Art for Antibody Structure Prediction

Victor Greiff, PhD, Associate Professor, University of Oslo and Director of Computational Immunology, IMPRINT

Antibody structure prediction is pivotal for understanding antibody function and for enabling in silico antibody design. This lecture will outline current key advances as well as unresolved challenges in antibody structure prediction.

9:00 am

Design of New Protein Functions Using Deep Learning

David A. Baker, PhD, Henrietta & Aubrey David Endowed Professor, Biochemistry, University of Washington

Proteins are biology's workhorses. Our goal is to create new proteins that address current-day problems not faced during evolution. Rather than modify naturally occurring proteins, we design new ones from scratch to optimally solve the problem at hand. Increasingly, we develop and use deep learning methods to generate protein sequence, structure, and function. We then characterize these designed molecules experimentally. In this talk, I will describe several recent projects.

10:00 am

Chairperson's Remarks

Brian Pierce, PhD, Associate Professor, Cell Biology & Molecular Genetics, Institute for Bioscience and Biotechnology Research, University of Maryland

10:05 am

KEYNOTE PRESENTATION: De novo Antibody Design with RFantibody

Nathaniel Bennett, PhD, Co-Founder, Xaira Therapeutics

Despite the central role that antibodies play in modern medicine, there is currently no way to rationally design novel antibodies to bind a specific epitope on a target. I will discuss the development and experimental validation of RFantibody, a deep-learning pipeline capable of designing de novo antibodies that bind to user-specified epitopes.

10:35 am

Drug Target Prediction through Deep Learning Functional Representation of Gene Signatures

Hao Chen, PhD, Assistant Professor, University of Illinois Chicago

The L1000 program systematically generated 1.3 million gene expression signatures in human cell lines with diverse genomic and pharmacological perturbations. Similar L1000 gene signatures offer an unbiased data-driven mechanism to identify compound-target pairs. Current methods rely on matching gene identities when comparing gene signatures and fail to utilize preexisting gene function knowledge. We developed FRoGS, an approach that represents gene signatures projected onto their biological functions, instead of their identities, which results in more effective compound-target predictions. FRoGS can help uncover new relationships among gene signatures acquired by large-scale OMICs studies on compounds, cell types, disease models, and patient cohorts.

11:05 amGrand Opening Coffee Break in the Exhibit Hall with Poster Viewing

11:20 am

Multi-Dimensional Functional Interrogation and Network Modeling of Tissue Tregs Reveals Novel Therapeutics for Autoimmunity and Inflammation

Ian Taylor, Director, Computational Biology, TRexBio

Our discovery biology platform provides a deep understanding of the regulatory circuits underlying human tissue Treg behavior. A suite of disease-relevant phenotypic assays facilitates multi-dimensional functional interrogation of key genes in human Tregs. These complex in silico tools combined with translational in vitro assays identify novel regulatory nodes and form the foundation of our growing pipeline of a new class of tissue Treg-focused therapeutics for immune-mediated diseases.

11:50 am

Spatial Proteomics and Virtual Cell Models

Emma Lundberg, PhD, Associate Professor, Bioengineering and Pathology, Stanford University

This research integrates bioimaging, proteomics, and AI to study human cell biology to explore protein distribution in time and space, investigating how localization variations affect cell functions and disease. Our goal is to create a spatiotemporal proteome model of a human cell. Using ML, we interpret spatial data from image collections and combine it with other datatypes to build whole-proteome multi-scale cell models, with potential to enhance drug discovery processes.

12:20 pm

Redefining Therapeutic Possibilities: Aureka’s Integrated AI and High-Throughput Platform for Antibody Innovation

Alon Wellner, VP Biology, Antibody Engineering, Aureka Biotechnologies

Aureka Biotechnologies is advancing antibody development through an integrated platform that combines high-throughput biology with AI-driven multi-objective optimization. This unique approach enables the design of therapeutics with complex, challenging-to-engineer characteristics, addressing diverse therapeutic needs with precision. This presentation will explore how Aureka’s platform is redefining possibilities in antibody engineering and driving the next generation of innovative therapeutics.

12:35 pmEnjoy Lunch on Your Own

1:30 pmRefreshment Break in the Exhibit Hall with Poster Viewing

2:00 pm

Chairperson's Remarks

Ian Taylor, Director, Computational Biology, TRexBio

2:05 pm

AI Empowered Antibody Discovery

Matthew Massett, PhD, Senior ML Scientist, Sanofi

Monoclonal antibodies are important biologics, but developing them is expensive and difficult since they must be specific and able to be produced at commercial-scale. We present two transformer-based antibody language models, trained on large amounts of antibody data. These models can be used to inform and de-risk early antibody discovery.

2:35 pm

Method Development and Application of Machine Learning to Engineer and Rapidly Reduce the Immunogenicity of Bacterial Proteases that Degrade Pathogenic Immunoglobulins

Jung-Eun (June) Shin, PhD, Machine Learning Scientist, Seismic Therapeutic

We develop and apply machine learning models to optimize in parallel multiple drug-like properties of the bacterial enzyme IdeS, to design a therapeutic for chronic autoantibody-mediated diseases, while minimizing its immunogenicity and other liabilities. The success of this approach is demonstrated via in vivo and in vitro assays, and we illustrate its generalizability by engineering non-immunogenic bacterial cysteine proteases with a variety of immunoglobulin isotype specificities.

3:05 pm

Use Case of CMC Development Digital Continuum

Dana I. Filoti, PhD, Associate Director of Scientific Architecture, Development Sciences Data and Digital Strategy, Abbvie

Analytical and formulation data is foundational to CMCDevelopment. In this presentation we discuss AbbVie’s CMC digital journey towardsconnecting analytical and formulation data to their respective study and sampleID, to build a data-driven CMC organization where structured data brakes silos, empowers storytelling visualizations that enable comparison acrossprojects, reduce manual transcription to other systems and colleagues, andallow for trending and other business metrics.

3:35 pmRefreshment Break in the Exhibit Hall with Poster Viewing

4:15 pmInteractive Breakout Discussions

TABLE 1: The Transition of Experimentalists into a Computational Paradigm in Pharmaceutical R&D

Qing Chai, PhD, Research Advisor, Biotechnology Discovery Research, Eli Lilly and Company

Addressing skills gaps
Benchmarking progress compared with traditional structures
Best practices for collaboration between experimentalists and data scientists
Examples of successful transitions
Implementing models/tools and workflows based on AI/ML approaches

5:30 pm

RESP AI Model Accelerates Identification of Tight-Binding Antibodies

Wei Wang, PhD, Professor, Chemistry and Biochemistry, University of California San Diego

We present RESP2, an enhanced version of our RESP pipeline, designed for the discovery of antibodies against diverse antigens with simultaneously optimized developability properties. We used the RBD of the COVID-19 spike protein as a case study, and discover a highly human antibody with broad binding to different variants, which demonstrated the power of this pipeline for antibody discovery against a challenging target.

6:00 pm

Assessing AlphaFold for Modeling Antibody and T Cell Receptor Recognition: Insights and Optimization Strategies

Brian Pierce, PhD, Associate Professor, Cell Biology & Molecular Genetics, Institute for Bioscience and Biotechnology Research, University of Maryland

Accurate modeling of immune recognition remains a major challenge in computational biology. To provide insights into the performance of deep learning for modeling immune recognition, we performed detailed benchmarking of multiple AlphaFold2 and AlphaFold3 approaches for modeling antibody and T cell receptor complexes. This revealed approaches with higher performance, overall limitations in success, as well the utility of confidence scores in the selection of accurate models.

6:30 pmNetworking Reception in the Exhibit Hall with Poster Viewing

7:30 pmClose of Day

Wednesday, January 15

7:45 amRegistration and Morning Coffee

8:30 am

Chairperson's Remarks

Rebecca Croasdale-Wood, PhD, Senior Director, Augmented Biologics Discovery & Design, Biologics Engineering, Oncology, AstraZeneca

8:40 am

ML-Powered "Lab-in-the-Loop" Approach for Therapeutic Antibody Discovery and Optimization

Vladimir Gligorijević, PhD, Senior Director, AI/ML Prescient Design, Genentech

In this talk, I will review our latest machine-learning approaches for antibody design and multi-property optimization that we use in our "Lab-in-the-Loop" (LitL) system. I will demonstrate how we use our LitL system to overcome some of the critical antibody design challenges and accelerate drug discovery programs.

9:10 amSession Break

9:15 am

Chairperson’s Remarks

Wing Ki Wong, PhD, Senior Scientist, Pharmaceutical Research and Development, Large Molecule Research, Roche Diagnostics GmbH

9:20 am

Data- and Model-Guided Antibody and TCR Optimization

Arvind Sivasubramanian, PhD, Director, Computational Biology & Platform Technologies, Adimab LLC

We have developed an NGS-guided workflow for multi-dimensional optimization of biologics such as antibodies and TCRs. The workflow involves the rational design of targeted CDR combinatorial diversity guided by Deep Mutational Scanning, followed by yeast library selections. We will present antibody and TCR optimization case studies demonstrating substantial gain in affinity that is competitive with standard approaches, good developability, and minimal variant mutational load.

9:50 am

Repertoire Expansion for Antibody Discovery and Optimization Using Machine Learning Approaches

Wing Ki Wong, PhD, Senior Scientist, Pharmaceutical Research and Development, Large Molecule Research, Roche Diagnostics GmbH

In antibody discovery, repertoire sequencing unfolds a broad antigen-specific sequence space and informs of the potential mutational space. Integrating this information and advances in experimental techniques with different machine learning approaches, we are now able to efficiently explore and extend the sequence space to identify alternative binders and improve existing binders.

10:20 am

Using Machine Learning and Molecular Mimicry of Complex Biologies to Design New GLP-1 Agonists

Marcin Paduch, PhD, Vice President, Head of Platform Biology, Metaphore Biotechnologies

Targeting GPCRs with agonists is a significant challenge in biologics design. Our approach employs molecular mimicry to optimize both developability and functional outcomes, including biased agonism. Leveraging machine learning algorithms backed by extensive empirical data acquisition and protein engineering, we thoroughly explore the pharmacophore space within live cells. This enables us to create designer molecules that address complex biological processes, ultimately leading to the development of novel GLP1R modulators.

10:50 amBagel Booth Crawl with Coffee in the Exhibit Hall with Poster Viewing (Sponsorship Opportunity Available)

11:15 am

Chairperson's Remarks

Alissa Hummer, DPhil, Postdoctoral Fellow, Stanford University

11:20 am

Benchmarking and Integrating ML/AI Advancements in Biologics Discovery and Optimisation for Pharma

Rebecca Croasdale-Wood, PhD, Senior Director, Augmented Biologics Discovery & Design, Biologics Engineering, Oncology, AstraZeneca