molforge.wrappers.generative¶
generative ¶
Generative-model wrappers: backbone generation and sequence design.
Concrete engines
- :class:
RFdiffusion— RoseTTAFold diffusion for de novo protein backbone generation (Watson et al. 2023). Unconditional monomer generation, motif scaffolding, binder design. - :class:
ProteinMPNN— message-passing neural network for protein sequence design (Dauparas et al. 2022). Given a backbone, propose sequences that should fold to it.
These engines complete the de novo design loop in molforge: combined with the folding wrappers (ESMFold, AlphaFold) and the analysis layer (structure, metrics), you can go from "I want a new protein for X" to "here are 50 candidate sequences ranked by predicted quality."
GenerativeEngine ¶
Bases: ABC
Abstract base for generative-design engines.
Subclasses live under :mod:molforge.wrappers.generative and must
implement :meth:generate. The contract is intentionally loose —
different engine categories (backbone generators vs. sequence
designers) return different types — but every concrete engine
follows the same lazy-import / clean-error / uniform-metadata
pattern as the other molforge wrappers.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Human-readable engine name (set by subclasses). |
generate
abstractmethod
¶
Run the engine and return a list of designs.
The return type depends on the engine
- Backbone generators (RFdiffusion) return
list[Protein]. - Sequence designers (ProteinMPNN) return
list[DesignedSequence].
Concrete engines document their exact return type.
GenerativeEngineNotInstalledError ¶
Bases: ImportError
Raised when a generative engine's heavy dependencies aren't installed.
ProteinMPNN ¶
ProteinMPNN(
*,
proteinmpnn_dir: str | PathLike[str] | None = None,
python_executable: str | None = None,
model_name: str = "v_48_020",
use_soluble_model: bool = False,
ca_only: bool = False,
num_seqs: int = 8,
sampling_temp: float = 0.1,
omit_aas: str = "X",
seed: int = 0,
)
Bases: GenerativeEngine
ProteinMPNN sequence-design engine.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
proteinmpnn_dir
|
str | PathLike[str] | None
|
Path to the cloned dauparas/ProteinMPNN repo.
If |
None
|
python_executable
|
str | None
|
Python interpreter. Default |
None
|
model_name
|
str
|
Which checkpoint to use. |
'v_48_020'
|
use_soluble_model
|
bool
|
Use the soluble-protein-only checkpoint. Better for designs intended to be soluble. |
False
|
ca_only
|
bool
|
Use the CA-only checkpoint. Required if your input structure is CA-only. |
False
|
num_seqs
|
int
|
How many sequences to sample per call. ProteinMPNN samples are independent, so more samples = better sequence-recovery odds. |
8
|
sampling_temp
|
float
|
Sampling temperature. The paper recommends 0.1 for high-fidelity recovery, 0.2-0.3 for diversity. |
0.1
|
omit_aas
|
str
|
String of one-letter codes to omit from the
generated sequences. Default |
'X'
|
seed
|
int
|
Random seed. |
0
|
Example
from molforge.wrappers.generative import ProteinMPNN from molforge.io import read_pdb
backbone = read_pdb("backbone.pdb") engine = ProteinMPNN(num_seqs=8, sampling_temp=0.1) designs = engine.generate(backbone)
Each design is a DesignedSequence¶
best = min(designs, key=lambda d: d.score) print(f"{best.sequence} (score {best.score:.3f})")
generate ¶
generate(
backbone: Protein | str | PathLike[str],
*,
chains_to_design: str | None = None,
fixed_positions: dict[str, list[int]] | None = None,
timeout: float | None = None,
) -> list[DesignedSequence]
Design sequences for backbone.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backbone
|
Protein | str | PathLike[str]
|
A :class: |
required |
chains_to_design
|
str | None
|
Space-separated chain IDs to design
(e.g. |
None
|
fixed_positions
|
dict[str, list[int]] | None
|
Dict mapping chain ID to a list of
1-indexed residue positions to keep at their
wild-type identity. Example: |
None
|
timeout
|
float | None
|
Optional subprocess timeout in seconds. |
None
|
Returns:
| Type | Description |
|---|---|
list[DesignedSequence]
|
A list of :class: |
list[DesignedSequence]
|
score (lowest = best per ProteinMPNN's convention). |
Raises:
| Type | Description |
|---|---|
GenerativeEngineNotInstalledError
|
If ProteinMPNN isn't found. |
RuntimeError
|
If the subprocess fails. |
RFdiffusion ¶
RFdiffusion(
*,
rfdiffusion_dir: str | PathLike[str] | None = None,
python_executable: str | None = None,
num_designs: int = 1,
diffusion_steps: int = 50,
device: str | None = None,
config_name: str = "base",
)
Bases: GenerativeEngine
RFdiffusion backbone generator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rfdiffusion_dir
|
str | PathLike[str] | None
|
Path to the cloned RosettaCommons/RFdiffusion
repository. If |
None
|
python_executable
|
str | None
|
Python interpreter to invoke. Default
|
None
|
num_designs
|
int
|
How many backbones to generate per call. |
1
|
diffusion_steps
|
int
|
Reverse-diffusion steps (default 50, RFdiffusion's standard). |
50
|
device
|
str | None
|
|
None
|
config_name
|
str
|
Hydra config file name. Defaults to |
'base'
|
Example
from molforge.wrappers.generative import RFdiffusion engine = RFdiffusion(num_designs=4)
Unconditional generation: 4 backbones of length 100¶
backbones = engine.generate(length=100) len(backbones) 4
Motif scaffolding from a target PDB¶
backbones = engine.generate( ... target_pdb="my_motif.pdb", ... contigs=["10-40/A20-35/10-40"], ... )
Note
RFdiffusion outputs are backbone-only (N/CA/C/O, no side chains,
all residues labeled GLY). Run :class:ProteinMPNN on each
backbone to get a designable sequence.
generate ¶
generate(
*,
length: int | None = None,
target_pdb: str | PathLike[str] | None = None,
contigs: Sequence[str] | None = None,
hotspot_residues: Sequence[str] | None = None,
symmetry: str | None = None,
extra_hydra_args: dict[str, str] | None = None,
timeout: float | None = None,
) -> list[Protein]
Generate num_designs backbones.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
length
|
int | None
|
Number of residues for unconditional generation.
Mutually exclusive with |
None
|
target_pdb
|
str | PathLike[str] | None
|
Path to an input PDB for motif scaffolding or binder design. |
None
|
contigs
|
Sequence[str] | None
|
Hydra-style contig list, e.g.
|
None
|
hotspot_residues
|
Sequence[str] | None
|
For binder design, residues on the
target that the binder should contact (e.g.
|
None
|
symmetry
|
str | None
|
For symmetric design — |
None
|
extra_hydra_args
|
dict[str, str] | None
|
Additional |
None
|
timeout
|
float | None
|
Optional subprocess timeout in seconds. |
None
|
Returns:
| Type | Description |
|---|---|
list[Protein]
|
List of :class: |
list[Protein]
|
Each |
list[Protein]
|
and |
Raises:
| Type | Description |
|---|---|
GenerativeEngineNotInstalledError
|
If RFdiffusion can't be found. |
ValueError
|
For incompatible argument combinations. |
CalledProcessError
|
If RFdiffusion errors out. |