An open scientific engine to power prediction, design, and discovery across protein biology. State of the art performance on protein interactions, especially antibodies — a critical modality for therapeutics.
Datasets vs. inductive bias, world models, and programmable biology. ESMFold2 demonstrates that vanilla BERT-like transformers trained on diverse data can beat specialized models on the hardest protein problems.
ESMC learns abstract patterns from 2.8 billion sequences via unsupervised training. The abstraction is semantic, compositional, and supports generalization — predicting real-world biology it was never trained on.
Without MSAs, AlphaFold struggles on rapidly-mutating antibodies. ESMFold2 achieves state of the art performance on antibody interactions, a critical modality for therapeutic development.
Evidence that inference time compute scaling works across five targets in cancer and immunology. More compute at inference yields better predictions, opening a new axis for improvement.
Both ESMC (the world model) and ESMFold2 (the structure prediction head) are released under MIT license, enabling broad adoption and community-driven innovation in protein biology.
"ESM takes a different approach: learn the relationship between different proteins by unsupervised training on as much diversity as you can find, and then correlate that back to structures known from the Protein Data Bank and other sources. In other words, a World Model."
The ESM team doubled down on the scale hypothesis after AlphaFold2, betting that diverse data and compute would outperform specialized inductive biases. Here's how the architecture breaks down.
ESMC is a BERT-like transformer trained with a masked language modeling objective across 2.8 billion protein sequences drawn from across the tree of life. The model learns rich representations of protein structure and function without any structural supervision.
ESMFold2 attaches a structure prediction head to the frozen ESMC representations. It maps the learned sequence embeddings directly to 3D coordinates, using a diffusion-based decoder that iteratively refines the structure.
Building on Cryo-EM data, ESMFold2 achieves superior performance on protein-protein interactions. The model leverages experimentally determined density maps to improve complex structure prediction, particularly for challenging therapeutic targets.
Sparse Autoencoders extract hierarchical semantic features from the world model — from local amino acid biochemistry to global fold identifiers and conceptual motifs like DNA-binding, membrane integration, and disordered regions.
Genes are programs for building proteins. The cell nucleus is a storage controller, the ribosome is a JIT-compiler and runtime, and proteins are processes that interact in signalling pathways to produce phenotypes.
SAE features capture protein structure at every level: very local (1–3 residues), short-range (~5–10 residues), medium-range (~10–30 residues), and long-range (whole-protein domains). This mirrors the natural hierarchy of protein folding.
Structure SAEThe model learns not just structural features but conceptual ones: DNA-binding across diverse folds, membrane integration regardless of protein class, ~686 features for disordered regions, and disulfide bond identification.
Interpretability BiologyA model trained on 2.8 billion sequences. Once you have a world model, you can attach heads for downstream tasks: predict properties, decompose functional features, or search representations for proteins that meet design criteria.
Harder molecules predicted by the model were validated in the wet-lab. The world model generates protein sequences and measures predicted properties like binding affinity, bridging computation with experimental biology.
ESMFold2 releases an atlas of 6.8 billion proteins with 1.1 billion predicted structures. This massive resource enables researchers to explore protein space at an unprecedented scale.
As we learn to compose SAE features into novel protein designs, we move further towards programmable biology. The cell as a computer, proteins as software, and design as programming.
Start here if you want the creator, the architecture, the training data, or the hardware requirements without reading the whole paper first.