Skip to content

molforge.wrappers.generative

generative

Generative-model wrappers: backbone generation and sequence design.

Concrete engines
  • :class:RFdiffusion — RoseTTAFold diffusion for de novo protein backbone generation (Watson et al. 2023). Unconditional monomer generation, motif scaffolding, binder design.
  • :class:ProteinMPNN — message-passing neural network for protein sequence design (Dauparas et al. 2022). Given a backbone, propose sequences that should fold to it.
Shared
  • :class:GenerativeEngine — abstract base for the engine contract
  • :class:GenerativeEngineNotInstalledError — raised when the heavy dependencies aren't installed.

These engines complete the de novo design loop in molforge: combined with the folding wrappers (ESMFold, AlphaFold) and the analysis layer (structure, metrics), you can go from "I want a new protein for X" to "here are 50 candidate sequences ranked by predicted quality."

GenerativeEngine

Bases: ABC

Abstract base for generative-design engines.

Subclasses live under :mod:molforge.wrappers.generative and must implement :meth:generate. The contract is intentionally loose — different engine categories (backbone generators vs. sequence designers) return different types — but every concrete engine follows the same lazy-import / clean-error / uniform-metadata pattern as the other molforge wrappers.

Attributes:

Name Type Description
name str

Human-readable engine name (set by subclasses).

generate abstractmethod

generate(*args: object, **kwargs: object) -> list[object]

Run the engine and return a list of designs.

The return type depends on the engine
  • Backbone generators (RFdiffusion) return list[Protein].
  • Sequence designers (ProteinMPNN) return list[DesignedSequence].

Concrete engines document their exact return type.

GenerativeEngineNotInstalledError

Bases: ImportError

Raised when a generative engine's heavy dependencies aren't installed.

ProteinMPNN

ProteinMPNN(
    *,
    proteinmpnn_dir: str | PathLike[str] | None = None,
    python_executable: str | None = None,
    model_name: str = "v_48_020",
    use_soluble_model: bool = False,
    ca_only: bool = False,
    num_seqs: int = 8,
    sampling_temp: float = 0.1,
    omit_aas: str = "X",
    seed: int = 0,
)

Bases: GenerativeEngine

ProteinMPNN sequence-design engine.

Parameters:

Name Type Description Default
proteinmpnn_dir str | PathLike[str] | None

Path to the cloned dauparas/ProteinMPNN repo. If None, reads PROTEINMPNN_HOME from the environment.

None
python_executable str | None

Python interpreter. Default sys.executable.

None
model_name str

Which checkpoint to use. v_48_020 is the standard default (48 edges, 0.20 Å backbone noise during training). Lower noise (v_48_002) for very high- quality backbones; higher noise (v_48_030) for noisier inputs.

'v_48_020'
use_soluble_model bool

Use the soluble-protein-only checkpoint. Better for designs intended to be soluble.

False
ca_only bool

Use the CA-only checkpoint. Required if your input structure is CA-only.

False
num_seqs int

How many sequences to sample per call. ProteinMPNN samples are independent, so more samples = better sequence-recovery odds.

8
sampling_temp float

Sampling temperature. The paper recommends 0.1 for high-fidelity recovery, 0.2-0.3 for diversity.

0.1
omit_aas str

String of one-letter codes to omit from the generated sequences. Default "X" (don't sample the unknown token); add "C" to avoid cysteines, etc.

'X'
seed int

Random seed. 0 uses a fresh random seed.

0
Example

from molforge.wrappers.generative import ProteinMPNN from molforge.io import read_pdb

backbone = read_pdb("backbone.pdb") engine = ProteinMPNN(num_seqs=8, sampling_temp=0.1) designs = engine.generate(backbone)

Each design is a DesignedSequence

best = min(designs, key=lambda d: d.score) print(f"{best.sequence} (score {best.score:.3f})")

generate

generate(
    backbone: Protein | str | PathLike[str],
    *,
    chains_to_design: str | None = None,
    fixed_positions: dict[str, list[int]] | None = None,
    timeout: float | None = None,
) -> list[DesignedSequence]

Design sequences for backbone.

Parameters:

Name Type Description Default
backbone Protein | str | PathLike[str]

A :class:molforge.core.Protein or a path to a PDB file. The structure must have backbone atoms (N/CA/C/O); side chains are ignored.

required
chains_to_design str | None

Space-separated chain IDs to design (e.g. "A" or "A B"). None = all chains. Other chains, if present, serve as fixed context.

None
fixed_positions dict[str, list[int]] | None

Dict mapping chain ID to a list of 1-indexed residue positions to keep at their wild-type identity. Example: {"A": [10, 11, 12]}. Indices count from the first residue of the chain, not the PDB residue number.

None
timeout float | None

Optional subprocess timeout in seconds.

None

Returns:

Type Description
list[DesignedSequence]

A list of :class:DesignedSequence instances, sorted by

list[DesignedSequence]

score (lowest = best per ProteinMPNN's convention).

Raises:

Type Description
GenerativeEngineNotInstalledError

If ProteinMPNN isn't found.

RuntimeError

If the subprocess fails.

RFdiffusion

RFdiffusion(
    *,
    rfdiffusion_dir: str | PathLike[str] | None = None,
    python_executable: str | None = None,
    num_designs: int = 1,
    diffusion_steps: int = 50,
    device: str | None = None,
    config_name: str = "base",
)

Bases: GenerativeEngine

RFdiffusion backbone generator.

Parameters:

Name Type Description Default
rfdiffusion_dir str | PathLike[str] | None

Path to the cloned RosettaCommons/RFdiffusion repository. If None, reads RFDIFFUSION_HOME from the environment.

None
python_executable str | None

Python interpreter to invoke. Default sys.executable. Override if RFdiffusion's dependencies live in a different env.

None
num_designs int

How many backbones to generate per call.

1
diffusion_steps int

Reverse-diffusion steps (default 50, RFdiffusion's standard).

50
device str | None

"cuda", "cpu", or None to use the default.

None
config_name str

Hydra config file name. Defaults to base; use symmetry for symmetric designs.

'base'
Example

from molforge.wrappers.generative import RFdiffusion engine = RFdiffusion(num_designs=4)

Unconditional generation: 4 backbones of length 100

backbones = engine.generate(length=100) len(backbones) 4

Motif scaffolding from a target PDB

backbones = engine.generate( ... target_pdb="my_motif.pdb", ... contigs=["10-40/A20-35/10-40"], ... )

Note

RFdiffusion outputs are backbone-only (N/CA/C/O, no side chains, all residues labeled GLY). Run :class:ProteinMPNN on each backbone to get a designable sequence.

generate

generate(
    *,
    length: int | None = None,
    target_pdb: str | PathLike[str] | None = None,
    contigs: Sequence[str] | None = None,
    hotspot_residues: Sequence[str] | None = None,
    symmetry: str | None = None,
    extra_hydra_args: dict[str, str] | None = None,
    timeout: float | None = None,
) -> list[Protein]

Generate num_designs backbones.

Parameters:

Name Type Description Default
length int | None

Number of residues for unconditional generation. Mutually exclusive with contigs.

None
target_pdb str | PathLike[str] | None

Path to an input PDB for motif scaffolding or binder design.

None
contigs Sequence[str] | None

Hydra-style contig list, e.g. ["10-40/A20-35/10-40"] for motif scaffolding. See the RFdiffusion README for the full grammar.

None
hotspot_residues Sequence[str] | None

For binder design, residues on the target that the binder should contact (e.g. ["A32", "A33", "A34"]).

None
symmetry str | None

For symmetric design — "cyclic", "dihedral", "tetrahedral", etc. Requires config_name="symmetry" on the engine.

None
extra_hydra_args dict[str, str] | None

Additional key=value overrides passed directly to Hydra. Use this for any setting not exposed above.

None
timeout float | None

Optional subprocess timeout in seconds.

None

Returns:

Type Description
list[Protein]

List of :class:Protein instances, one per design.

list[Protein]

Each Protein has metadata["engine"] = "RFdiffusion"

list[Protein]

and metadata["source_args"] recording the call.

Raises:

Type Description
GenerativeEngineNotInstalledError

If RFdiffusion can't be found.

ValueError

For incompatible argument combinations.

CalledProcessError

If RFdiffusion errors out.