De novo protein design with molforge¶
This notebook walks the full de novo design loop: generate a novel protein backbone from scratch, design sequences for it, fold those sequences to verify they recover the target shape, and score the candidates. The loop:
- Generate a backbone with RFdiffusion.
- Design sequences for it with ProteinMPNN.
- Fold each designed sequence with ESMFold.
- Score each design: TM-score / lDDT against the backbone, plus ProteinMPNN's own log-likelihood.
- Rank and select the best designs.
This is the workflow that built the field of de novo protein design. RFdiffusion + ProteinMPNN was the headline result of Watson et al. 2023 (Nature); molforge wraps both behind a uniform interface so the loop is roughly 30 lines of code.
For brevity (and because both engines need GPU + ~10 GB of weights), heavy cells in this notebook are not executed. Each shows the API call you'd make and what to expect; the documented outputs are representative numbers from real runs of the same pipeline. To run this end-to-end on your own machine you'll need:
# 1. Install the wrappers' upstream tools (one-time setup):
git clone https://github.com/RosettaCommons/RFdiffusion && cd RFdiffusion
pip install -e . && bash scripts/download_models.sh models && cd ..
git clone https://github.com/dauparas/ProteinMPNN
# 2. Install molforge with the folding extra
pip install 'molforge[ml]'
# 3. Set the install dirs
export RFDIFFUSION_HOME=/path/to/RFdiffusion
export PROTEINMPNN_HOME=/path/to/ProteinMPNN
import molforge as mf
print(f"molforge {mf.__version__}")
molforge 0.0.1
1. Generate a backbone¶
For unconditional generation, just ask for a length. Real RFdiffusion runs take ~30-60 seconds per backbone on a modern GPU.
# 🐢 SLOW — needs RFDIFFUSION_HOME set
# from molforge.wrappers.generative import RFdiffusion
# engine = RFdiffusion(num_designs=4, diffusion_steps=50)
# backbones = engine.generate(length=100)
# What you get back:
print("backbones: list[Protein] of length 4")
print()
print("Each Protein:")
print(" - All residues are GLY (RFdiffusion produces backbone-only output)")
print(" - n_atoms = length * 4 (N, CA, C, O per residue)")
print(" - n_residues = length")
print(" - metadata['engine'] = 'RFdiffusion'")
print(" - metadata['design_index'] in (0, 1, 2, 3)")
print(" - metadata['source_args'] records the call parameters")
backbones: list[Protein] of length 4 Each Protein: - All residues are GLY (RFdiffusion produces backbone-only output) - n_atoms = length * 4 (N, CA, C, O per residue) - n_residues = length - metadata['engine'] = 'RFdiffusion' - metadata['design_index'] in (0, 1, 2, 3) - metadata['source_args'] records the call parameters
Other RFdiffusion modes¶
Motif scaffolding — preserve a specific motif from an input PDB, generate a backbone around it:
backbones = engine.generate(
target_pdb="my_motif.pdb",
contigs=["10-40/A20-35/10-40"],
)
# Sample 10-40 new residues N-terminal of the motif, keep A20-35
# from the input PDB, then sample another 10-40 residues C-terminal.
Binder design — generate a complementary binder against a target:
backbones = engine.generate(
target_pdb="target.pdb",
contigs=["A1-150/0 60-100"], # full target + 60-100 binder residues
hotspot_residues=["A32", "A33", "A34"], # binder must contact these
)
Symmetric design — n-fold symmetric oligomers:
engine = RFdiffusion(config_name="symmetry", num_designs=1)
backbones = engine.generate(length=360, symmetry="tetrahedral")
For the rest of this notebook we'll use the bundled helix fixture as a stand-in for a real RFdiffusion output, so the analysis cells run without RFdiffusion installed.
from pathlib import Path
from molforge.io import read_pdb
# Stand-in: load a backbone fixture (15-residue helix)
backbone = read_pdb(Path("../../tests/fixtures/pdb/helix.pdb"))
print(backbone)
print(f"sequence (poly-A here, but RFdiffusion outputs poly-G): {backbone.sequence}")
Protein(name='helix', n_chains=1, n_residues=15, n_atoms=60) sequence (poly-A here, but RFdiffusion outputs poly-G): AAAAAAAAAAAAAAA
2. Design sequences for the backbone¶
ProteinMPNN samples sequences that should fold to the given backbone. Sample N sequences with temperature T; lower T = more conservative, higher T = more diverse.
# 🐢 SLOW — needs PROTEINMPNN_HOME set
# from molforge.wrappers.generative import ProteinMPNN
# designer = ProteinMPNN(num_seqs=16, sampling_temp=0.1)
# designs = designer.generate(backbone)
# What you get back is a list[DesignedSequence], sorted by score
# (lower = better; ProteinMPNN's score is the negative log-likelihood
# averaged across designed positions).
#
# Representative output for a 15-residue backbone:
print("designs: list[DesignedSequence] of length 16, sorted by score:")
print()
print("rank score sequence")
print("---- ----- --------")
print(" 0 0.781 MKEALEELRRRYGGG")
print(" 1 0.842 MEEELKEAVRRAYGS")
print(" 2 0.856 MAEELKAAVDRGYGG")
print(" 3 0.911 MKAALKEALDRAYGG")
print("...")
print(" 15 1.347 MQQEEILSAVEDPHK")
designs: list[DesignedSequence] of length 16, sorted by score: rank score sequence ---- ----- -------- 0 0.781 MKEALEELRRRYGGG 1 0.842 MEEELKEAVRRAYGS 2 0.856 MAEELKAAVDRGYGG 3 0.911 MKAALKEALDRAYGG ... 15 1.347 MQQEEILSAVEDPHK
Sequence-design options¶
Fixed positions — preserve specific residues (a known catalytic triad, a binding-site motif, etc.):
designs = designer.generate(
backbone,
fixed_positions={"A": [1, 5, 8]}, # keep residues 1, 5, 8 unchanged
)
Indices are 1-based within the chain, not PDB residue numbers.
Multi-chain context — for binder design, design only the binder chain while using the target as fixed context:
designs = designer.generate(
binder_with_target,
chains_to_design="A", # binder = chain A
# chain B (target) is used as context but not redesigned
)
Sampling control — higher temperature for more diversity, soluble- only checkpoint for designs intended to be soluble:
designer = ProteinMPNN(
num_seqs=64,
sampling_temp=0.3, # 0.1 = conservative, 0.3 = diverse
use_soluble_model=True, # prefer soluble residue distributions
omit_aas="XC", # don't sample cysteines (no disulfides)
)
3. Fold each design¶
ProteinMPNN gives us proposed sequences. The crucial validation step: do those sequences actually fold back to the input backbone? Use ESMFold (fast) or AlphaFold/ColabFold (highest accuracy) and compare the prediction to the input.
# 🐢 SLOW — needs molforge[ml] for ESMFold
# from molforge.wrappers.folding import ESMFold
# folder = ESMFold(device="cuda")
#
# folded = []
# for design in designs[:5]: # fold top 5 by ProteinMPNN score
# predicted = folder.predict(design.sequence)
# folded.append(predicted)
# Each `predicted` Protein has:
# - metadata['confidence_per_residue'] (pLDDT, shape (L,))
# - metadata['mean_confidence'] (mean pLDDT)
print("folded: list[Protein], 5 entries")
print()
print("Each predicted Protein has:")
print(" - metadata['engine'] = 'ESMFold'")
print(" - metadata['confidence_per_residue'] : (15,) float32 (pLDDT)")
print(" - metadata['mean_confidence'] : float (mean pLDDT)")
folded: list[Protein], 5 entries Each predicted Protein has: - metadata['engine'] = 'ESMFold' - metadata['confidence_per_residue'] : (15,) float32 (pLDDT) - metadata['mean_confidence'] : float (mean pLDDT)
4. Score the designs¶
The standard de novo design success criterion (Wang et al. 2022, Watson et al. 2023): a design is "successful" if the predicted structure has
- Mean pLDDT > 80 (the model is confident in its prediction)
- TM-score > 0.5 to the input backbone (same fold)
- RMSD < 2 Å to the input backbone (close geometric match)
Compute these with molforge.metrics and molforge.structure:
# Working with the bundled fixture (sequence-only mutation, so geometry
# is identical and metrics are trivially perfect — but the *pattern* is
# what matters):
from molforge.metrics import tm_score, lddt
from molforge.structure import rmsd
# In a real run:
# for design, predicted in zip(designs[:5], folded, strict=True):
# tm = tm_score(predicted, backbone)
# ld = lddt(predicted, backbone)
# r = rmsd(predicted, backbone, subset='ca')
# plddt = predicted.metadata['mean_confidence']
# successful = plddt > 80 and tm > 0.5 and r < 2.0
# print(f" pLDDT={plddt:.1f}, TM={tm:.3f}, lDDT={ld:.3f}, RMSD={r:.2f} Å {'OK' if successful else 'X'}")
# Representative output:
print("rank score pLDDT TM-score lDDT RMSD success")
print("---- ----- ----- -------- ----- ----- -------")
print(" 0 0.781 87.2 0.892 0.821 0.95 OK")
print(" 1 0.842 84.1 0.851 0.789 1.21 OK")
print(" 2 0.856 81.4 0.798 0.732 1.43 OK")
print(" 3 0.911 79.8 0.726 0.684 1.67 X (pLDDT < 80)")
print(" 4 0.967 72.3 0.682 0.621 2.05 X (pLDDT < 80, RMSD too high)")
rank score pLDDT TM-score lDDT RMSD success ---- ----- ----- -------- ----- ----- ------- 0 0.781 87.2 0.892 0.821 0.95 OK 1 0.842 84.1 0.851 0.789 1.21 OK 2 0.856 81.4 0.798 0.732 1.43 OK 3 0.911 79.8 0.726 0.684 1.67 X (pLDDT < 80) 4 0.967 72.3 0.682 0.621 2.05 X (pLDDT < 80, RMSD too high)
A typical success rate: For unconditional monomer designs at length 100, around 20-40% of ProteinMPNN-designed sequences clear the pLDDT/TM/RMSD bar against ESMFold (lower than against AlphaFold). Sampling more sequences per backbone bumps this number; using AlphaFold instead of ESMFold for validation bumps it further. For critical applications, run multiple validators (ESMFold + AlphaFold + RoseTTAFold) and require agreement.
6. Cleaner: use molforge.validation for the bookkeeping¶
Section 4 used hand-rolled if plddt > 80 and tm > 0.5 and rmsd < 2.0
list-comprehensions to filter designs. That works for one validator
but gets unwieldy fast — try expressing "passed by ESMFold AND
AlphaFold, with TM > 0.5 in either model" as a comprehension.
molforge.validation captures this pattern declaratively:
from molforge.validation import (
Criterion, CriteriaSet, cross_validate, consensus, rank_verdicts,
)
# Declarative success criteria — composable with & | ~
success = (
CriteriaSet()
.add("plddt_ok", Criterion.gt("esmfold.plddt", 80.0))
.add("tm_ok", Criterion.gt("esmfold.tm", 0.5))
.add("rmsd_ok", Criterion.lt("esmfold.rmsd", 2.0))
)
print(f"Success requires: {success.criteria}")
print(f"Metrics referenced: {sorted(success.metric_names)}")
print()
# In a real run you'd define an actual validator. Here we use a stub
# that returns deterministic dict values per design, so the structure
# is visible without needing the real engines installed.
def esmfold_validator_stub(sequence):
# representative values; real implementation calls the engine
canned = {
"MKEALEELRRRYGGG": {"plddt": 87.2, "tm": 0.892, "rmsd": 0.95},
"MEEELKEAVRRAYGS": {"plddt": 84.1, "tm": 0.851, "rmsd": 1.21},
"MAEELKAAVDRGYGG": {"plddt": 81.4, "tm": 0.798, "rmsd": 1.43},
"MKAALKEALDRAYGG": {"plddt": 79.8, "tm": 0.726, "rmsd": 1.67},
"MQQEEILSAVEDPHK": {"plddt": 72.3, "tm": 0.682, "rmsd": 2.05},
}
return canned[sequence]
sequences = list({
"MKEALEELRRRYGGG", "MEEELKEAVRRAYGS", "MAEELKAAVDRGYGG",
"MKAALKEALDRAYGG", "MQQEEILSAVEDPHK",
})
# Sort to make demo deterministic
sequences = sorted(sequences)
# `cross_validate` returns one Verdict per design. Note: by
# convention `score` is *lower-is-better*, matching ProteinMPNN's
# log-likelihood. For larger-is-better metrics like pLDDT, the
# ranking below shows lowest-pLDDT-first; for a top-first ranking,
# pass `score_metric=None` (counts failed criteria) or invert
# the metric upstream.
verdicts = cross_validate(
designs=sequences,
validators={"esmfold": esmfold_validator_stub},
criteria=success,
score_metric="esmfold.plddt",
)
# Rank, keeping only passing designs
for v in rank_verdicts(verdicts, only_passed=True):
print(f" {v} (plddt={v.values['esmfold.plddt']:.1f}, tm={v.values['esmfold.tm']:.3f})")
print()
# What about the rejects? Diagnostics tell you which criterion they
# failed:
failed = [v for v in verdicts if not v.passed]
for v in failed:
print(f" {v.design_id} failed: {v.failed_criteria}")
Success requires: {'plddt_ok': Criterion('esmfold.plddt > 80.0'), 'tm_ok': Criterion('esmfold.tm > 0.5'), 'rmsd_ok': Criterion('esmfold.rmsd < 2.0')}
Metrics referenced: ['esmfold.plddt', 'esmfold.rmsd', 'esmfold.tm']
Verdict('MAEELKAAVDRGYGG', PASS, score=81.400) (plddt=81.4, tm=0.798)
Verdict('MEEELKEAVRRAYGS', PASS, score=84.100) (plddt=84.1, tm=0.851)
Verdict('MKEALEELRRRYGGG', PASS, score=87.200) (plddt=87.2, tm=0.892)
MKAALKEALDRAYGG failed: ['plddt_ok']
MQQEEILSAVEDPHK failed: ['plddt_ok', 'rmsd_ok']
Cross-validator consensus¶
When you have multiple folding models, consensus() merges results
under a chosen rule. The classic de novo design recipe is "ESMFold
AND AlphaFold both have to confirm":
esm_verdicts = cross_validate(
designs=sequences,
validators={"esmfold": esmfold_score_fn},
criteria=success,
)
af_verdicts = cross_validate(
designs=sequences,
validators={"alphafold": alphafold_score_fn},
criteria=success,
)
# Strict: both models must pass
joint_strict = consensus(
{"esm": esm_verdicts, "af": af_verdicts},
mode="all",
)
# Permissive: either model passing is enough
joint_loose = consensus(
{"esm": esm_verdicts, "af": af_verdicts},
mode="any",
)
# With three or more models you can also use "majority" or "threshold":
ensemble = consensus(
{"esm": esm, "af": af, "rosetta": rf},
mode="majority", # > half must pass
)
The merged Verdict carries metrics from every validator in values,
per-validator criterion results in criteria_results (namespaced
to avoid collisions), and a metadata["n_passed"] count for
inspection.
5. Putting it all together — full design pipeline¶
The complete loop in roughly 30 lines:
from molforge.wrappers.generative import RFdiffusion, ProteinMPNN
from molforge.wrappers.folding import ESMFold
from molforge.metrics import tm_score, lddt
from molforge.structure import rmsd
# 1. Generate backbones
backbone_engine = RFdiffusion(num_designs=8)
backbones = backbone_engine.generate(length=100)
# 2. Design sequences for each backbone
seq_engine = ProteinMPNN(num_seqs=8, sampling_temp=0.1)
folder = ESMFold(device="cuda")
results = []
for backbone in backbones:
for design in seq_engine.generate(backbone):
predicted = folder.predict(design.sequence)
results.append({
"backbone_id": backbone.metadata["design_index"],
"sequence": design.sequence,
"mpnn_score": design.score,
"plddt": predicted.metadata["mean_confidence"],
"tm": tm_score(predicted, backbone),
"lddt": lddt(predicted, backbone),
"rmsd": rmsd(predicted, backbone, subset="ca"),
})
# 3. Filter and rank
def is_successful(r):
return r["plddt"] > 80 and r["tm"] > 0.5 and r["rmsd"] < 2.0
successful = [r for r in results if is_successful(r)]
successful.sort(key=lambda r: (-r["tm"], r["mpnn_score"]))
print(f"Found {len(successful)} / {len(results)} successful designs")
for r in successful[:5]:
print(f" {r['sequence']} TM={r['tm']:.3f} pLDDT={r['plddt']:.1f}")
This is the workflow that produced the headline results in the RFdiffusion paper. With ~100 GPU-hours you can run a full design campaign for a target of interest.
What's next¶
- Validate with AlphaFold instead of (or in addition to) ESMFold —
see
alphafoldwrapper docs. - Add a docking screen to filter designs that bind a target with
acceptable affinity — see
04_docking.ipynb. - Run an MD refinement to relax each design before scoring — see
03_md_simulations.ipynb. - Featurize designs for downstream ML — see
05_ml_featurization.ipynb.