Released May 27, 2026 — Alex Rives, BioHub

ESMFold2: The Bitter Lesson is Coming for Proteins

An open scientific engine to power prediction, design, and discovery across protein biology. State of the art performance on protein interactions, especially antibodies — a critical modality for therapeutics.

🔬 Alex Rives 🏗️ BioHub 🧬 Protein Structure Prediction

Explore ESMFold2 → View on GitHub

Overview

Beyond Inductive Bias: The Scale Hypothesis for Proteins

Datasets vs. inductive bias, world models, and programmable biology. ESMFold2 demonstrates that vanilla BERT-like transformers trained on diverse data can beat specialized models on the hardest protein problems.

2.8B
Sequences in ESMC world model
6.8B
Proteins in predicted atlas
1.1B
Predicted structures released

🧬

World Model for Proteins

ESMC learns abstract patterns from 2.8 billion sequences via unsupervised training. The abstraction is semantic, compositional, and supports generalization — predicting real-world biology it was never trained on.

🎯

State of the Art on Antibodies

Without MSAs, AlphaFold struggles on rapidly-mutating antibodies. ESMFold2 achieves state of the art performance on antibody interactions, a critical modality for therapeutic development.

⚡

Inference Time Scaling

Evidence that inference time compute scaling works across five targets in cancer and immunology. More compute at inference yields better predictions, opening a new axis for improvement.

🔓

Open Under MIT License

Both ESMC (the world model) and ESMFold2 (the structure prediction head) are released under MIT license, enabling broad adoption and community-driven innovation in protein biology.

"ESM takes a different approach: learn the relationship between different proteins by unsupervised training on as much diversity as you can find, and then correlate that back to structures known from the Protein Data Bank and other sources. In other words, a World Model."

— Alex Rives, Head of Science at BioHub

How It Works

Scale-Pilled Before It Was Cool

The ESM team doubled down on the scale hypothesis after AlphaFold2, betting that diverse data and compute would outperform specialized inductive biases. Here's how the architecture breaks down.

Unsupervised Pretraining on 2.8B Sequences

ESMC is a BERT-like transformer trained with a masked language modeling objective across 2.8 billion protein sequences drawn from across the tree of life. The model learns rich representations of protein structure and function without any structural supervision.

Structure Prediction Head — ESMFold2

ESMFold2 attaches a structure prediction head to the frozen ESMC representations. It maps the learned sequence embeddings directly to 3D coordinates, using a diffusion-based decoder that iteratively refines the structure.

Cryo-EM Data Integration

Building on Cryo-EM data, ESMFold2 achieves superior performance on protein-protein interactions. The model leverages experimentally determined density maps to improve complex structure prediction, particularly for challenging therapeutic targets.

Mechanistic Interpretability via SAEs

Sparse Autoencoders extract hierarchical semantic features from the world model — from local amino acid biochemistry to global fold identifiers and conceptual motifs like DNA-binding, membrane integration, and disordered regions.

Deep Dive

A Cell is a Computer

Genes are programs for building proteins. The cell nucleus is a storage controller, the ribosome is a JIT-compiler and runtime, and proteins are processes that interact in signalling pathways to produce phenotypes.

Hierarchical Features of Protein Structure

SAE features capture protein structure at every level: very local (1–3 residues), short-range (~5–10 residues), medium-range (~10–30 residues), and long-range (whole-protein domains). This mirrors the natural hierarchy of protein folding.

Structure SAE

Conceptual Motifs Discovered by SAEs

The model learns not just structural features but conceptual ones: DNA-binding across diverse folds, membrane integration regardless of protein class, ~686 features for disordered regions, and disulfide bond identification.

Interpretability Biology

📊

ESMC: The World Model

A model trained on 2.8 billion sequences. Once you have a world model, you can attach heads for downstream tasks: predict properties, decompose functional features, or search representations for proteins that meet design criteria.

🧪

Wet-Lab Validation

Harder molecules predicted by the model were validated in the wet-lab. The world model generates protein sequences and measures predicted properties like binding affinity, bridging computation with experimental biology.

🗺️

Atlas of 6.8 Billion Proteins

ESMFold2 releases an atlas of 6.8 billion proteins with 1.1 billion predicted structures. This massive resource enables researchers to explore protein space at an unprecedented scale.

🔄

Programmable Biology

As we learn to compose SAE features into novel protein designs, we move further towards programmable biology. The cell as a computer, proteins as software, and design as programming.

FAQ

The fastest answers to the questions people ask first

Start here if you want the creator, the architecture, the training data, or the hardware requirements without reading the whole paper first.

Who created ESMFold2?

ESMFold2 comes from Alex Rives and the ESM team at BioHub. The model builds on years of work scaling protein language models, from ESM-1 through ESM2 and ESM3, culminating in the open release of ESMC and ESMFold2.

How is ESMFold2 different from AlphaFold?

AlphaFold2 uses multiple sequence alignments (MSAs) as a key inductive bias — clever but limiting for domains like antibodies that lack MSAs. ESMFold2 skips MSAs entirely, using a BERT-like transformer trained on vast protein diversity to build a world model that generalizes better.

What is the Bitter Lesson in this context?

The Bitter Lesson, coined by Rich Sutton, states that general methods that leverage computation scale better than human-engineered inductive biases. ESMFold2 embodies this by replacing MSA-based reasoning with large-scale unsupervised pretraining and inference-time computation.

What is the training data?

ESMC was trained on 2.8 billion protein sequences drawn from across life. The structure prediction head was trained on the Protein Data Bank (PDB) and augmented with Cryo-EM data and AlphaFold2-predicted structures for distillation.

Is ESMFold2 open source?

Yes. Both ESMC (the world model) and ESMFold2 (the structure prediction head) are released under the MIT license. Model weights, inference code, and evaluation benchmarks are publicly available.

What hardware is needed to run it?

ESMFold2 can run on a single GPU. The models are designed to be accessible to academic and industrial researchers alike, with inference feasible on desktop-class hardware.

What are SAEs and why do they matter?

Sparse Autoencoders (SAEs) are mechanistic interpretability tools that extract interpretable features from neural network representations. In ESMFold2, SAEs reveal hierarchical features from local biochemistry to global fold concepts, enabling novel protein design and discovery.