Skip to content

molforge.wrappers.folding

folding

Folding-engine wrappers.

Concrete engines
  • :class:ESMFold — implemented (single-sequence transformer; fast)
  • :class:AlphaFold — implemented (MSA-based via ColabFold)
  • :class:Boltz — implemented (Boltz-1 / Boltz-2 via subprocess)
  • :class:RoseTTAFold — implemented (RoseTTAFold All-Atom; subprocess)
Shared
  • :class:FoldingEngine — abstract base for the engine contract
  • :class:FoldingEngineNotInstalledError — raised when heavy dependencies (torch, transformers, colabfold, the boltz CLI, the RFAA repo + databases, ...) aren't installed.

All engines write per-residue confidence to protein.metadata["confidence_per_residue"] so downstream code can read confidence uniformly regardless of which engine produced the structure.

FoldingEngine

Bases: ABC

Abstract base for sequence-to-structure prediction engines.

Subclasses must implement :meth:predict. The default implementation of :meth:predict_many is a simple loop; engines that support batching (most do) should override it for efficiency.

Attributes:

Name Type Description
name str

Human-readable engine name (set by subclasses).

predict abstractmethod

predict(sequence: str, **kwargs: object) -> Protein

Predict a single structure from a sequence.

Parameters:

Name Type Description Default
sequence str

One-letter amino-acid sequence. Whitespace is stripped; non-letter characters raise :class:ValueError.

required
**kwargs object

Engine-specific options.

{}

Returns:

Name Type Description
A Protein

class:molforge.core.Protein whose metadata includes

Protein

at minimum engine (the engine name) and, where the engine

Protein

produces one, confidence_per_residue and mean_confidence.

predict_many

predict_many(
    sequences: Sequence[str], **kwargs: object
) -> list[Protein]

Predict structures for a batch of sequences.

The default implementation is a serial loop. Engines with batch APIs (almost all of them) should override this.

FoldingEngineNotInstalledError

Bases: ImportError

Raised when a folding engine's heavy dependencies aren't installed.

The message points at the relevant pip install extras so users can fix it without grepping the docs.

AlphaFold

AlphaFold(
    *,
    mode: Literal["local", "server"] = "local",
    num_models: int = 5,
    num_recycles: int = 3,
    msa_mode: str = "mmseqs2_uniref_env",
    device: str | None = None,
    model_type: str = "AlphaFold2-ptm",
)

Bases: FoldingEngine

Wrapper around AlphaFold via ColabFold's Python API.

Parameters:

Name Type Description Default
mode Literal['local', 'server']

"local" to call ColabFold's local Python API (requires the package and weights), or "server" for remote prediction (not yet implemented).

'local'
num_models int

How many of the 5 AlphaFold models to run. Default 5 (full ensemble). Set to 1 for faster preview predictions; the AlphaFold paper showed that the top-1-of-5 best model captures most of the accuracy.

5
num_recycles int

AlphaFold recycling iterations. Default 3 matches the original paper. More = slower but slightly better; useful for low-confidence regions.

3
msa_mode str

ColabFold MSA pipeline. "mmseqs2_uniref_env" (default) is the full-quality search. "single_sequence" skips MSA entirely (very fast but lower accuracy — about on par with ESMFold).

'mmseqs2_uniref_env'
device str | None

"cuda", "cpu", or None to auto-detect.

None
model_type str

"AlphaFold2-ptm" (default, with pTM head) or "AlphaFold2" (original).

'AlphaFold2-ptm'
Example

from molforge.wrappers.folding import AlphaFold engine = AlphaFold(num_models=1, num_recycles=3) # fastest preview protein = engine.predict("MKTVRQERLKSIVRILERSK") protein.metadata["mean_confidence"] 87.2

predict

predict(sequence: str, **kwargs: object) -> Protein

Fold a single sequence into a :class:Protein.

Parameters:

Name Type Description Default
sequence str

One-letter amino-acid sequence.

required
**kwargs object

Reserved for future per-call options.

{}

Returns:

Name Type Description
A Protein

class:Protein with:

Protein
  • metadata["engine"] = "AlphaFold"
Protein
  • metadata["model_type"]: which AlphaFold model was used
Protein
  • metadata["confidence_per_residue"]: (L,) float32 pLDDT
Protein
  • metadata["mean_confidence"]: float mean pLDDT
Protein
  • metadata["confidence_per_atom"]: (N_atoms,) float32 pLDDT (copy of B-factor column)
Protein
  • metadata["ptm"] (if model_type="AlphaFold2-ptm"): predicted TM score

Boltz

Boltz(
    *,
    model_version: Literal["boltz1", "boltz2"] = "boltz2",
    use_msa_server: bool = True,
    recycling_steps: int | None = None,
    diffusion_samples: int | None = None,
    sampling_steps: int | None = None,
    device: str | None = None,
    executable: str | None = None,
    cache_dir: str | None = None,
)

Bases: FoldingEngine

Wrapper around the Boltz biomolecular prediction model.

Parameters:

Name Type Description Default
model_version Literal['boltz1', 'boltz2']

"boltz1" or "boltz2". The CLI defaults to boltz2 when available. Set explicitly for reproducibility.

'boltz2'
use_msa_server bool

If True (default), Boltz hits the MMseqs2 MSA server for protein chains. Set False for fast single-sequence inference (lower accuracy, no internet required after weight download).

True
recycling_steps int | None

How many trunk-recycling rounds Boltz runs. Default None lets Boltz choose its own (3 for boltz1, 10 for boltz2 currently).

None
diffusion_samples int | None

Number of diffusion samples drawn per prediction. Default None uses Boltz's default (1). Higher = more thorough sampling, slower.

None
sampling_steps int | None

Number of diffusion sampling steps. Default None uses Boltz's own (200 for boltz1, 30 for boltz2).

None
device str | None

Which device to use. Default None lets Boltz auto-detect (CUDA → CPU fallback). Pass "cpu" to force CPU even when a GPU is present.

None
executable str | None

Path to the boltz CLI binary. None (default) means look it up on $PATH. Override only for testing or non-standard installs.

None
cache_dir str | None

Where Boltz looks for / downloads its weights. None (default) uses Boltz's own default (~/.boltz).

None
Example

from molforge.wrappers.folding import Boltz engine = Boltz(model_version="boltz2", use_msa_server=True) protein = engine.predict("MKTVRQERLKSIVRILERSK") protein.metadata["mean_confidence"] 87.3 protein.metadata["ptm"] 0.84

predict

predict(sequence: str, **kwargs: object) -> Protein

Fold a single sequence into a :class:Protein via the boltz CLI.

Parameters:

Name Type Description Default
sequence str

One-letter amino-acid sequence.

required
**kwargs object

Reserved for future per-call options.

{}

Returns:

Name Type Description
A Protein

class:Protein with:

Protein
  • metadata["engine"] = "Boltz"
Protein
  • metadata["model_version"]: "boltz1" or "boltz2"
Protein
  • metadata["source_sequence"]: the input sequence
Protein
  • metadata["confidence_per_residue"]: (L,) float32 pLDDT
Protein
  • metadata["mean_confidence"]: scalar float pLDDT
Protein
  • metadata["ptm"]: predicted TM-score
Protein
  • metadata["iptm"]: interface pTM (only meaningful for complexes; usually 0 for single-chain inputs)
Protein
  • metadata["confidence_score"]: Boltz's composite confidence (0.8 * pLDDT + 0.2 * iPTM for boltz2)

Raises:

Type Description
FoldingEngineNotInstalledError

If the boltz CLI isn't on $PATH (or at the configured executable).

RuntimeError

If the CLI runs but produces no output, or its output can't be parsed.

ESMFold

ESMFold(
    *,
    model_name: str = "facebook/esmfold_v1",
    device: str | None = None,
    chunk_size: int | None = None,
    dtype: str = "float32",
)

Bases: FoldingEngine

Wrapper around Meta AI's ESMFold (single-sequence transformer folder).

Parameters:

Name Type Description Default
model_name str

HuggingFace model identifier. Defaults to "facebook/esmfold_v1", the public ESMFold v1 checkpoint.

'facebook/esmfold_v1'
device str | None

Where to run inference. "cuda", "cpu", "mps", or None to auto-detect (CUDA if available, else CPU).

None
chunk_size int | None

Axial-attention chunk size (lower = less memory but slower). None for no chunking. 64 is a reasonable default for sequences > 700 aa on a 24 GB GPU.

None
dtype str

"float32" (default) or "float16" for faster GPU inference at the cost of marginal accuracy.

'float32'
Example

from molforge.wrappers.folding import ESMFold engine = ESMFold(device="cuda") protein = engine.predict("MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVS") protein.metadata["mean_confidence"] 82.4

predict

predict(sequence: str, **kwargs: object) -> Protein

Fold a single sequence into a :class:Protein.

Parameters:

Name Type Description Default
sequence str

One-letter amino-acid sequence.

required
**kwargs object

Reserved for future per-call options; currently unused.

{}

Returns:

Name Type Description
A Protein

class:Protein with one chain ("A"), the predicted

Protein

structure, and:

Protein
  • metadata["engine"] = "ESMFold"
Protein
  • metadata["confidence_per_residue"]: (L,) float32 pLDDT
Protein
  • metadata["mean_confidence"]: float mean pLDDT
Protein
  • metadata["confidence_per_atom"]: (N_atoms,) float32 pLDDT (copy of B-factor column for convenience)

RoseTTAFold

RoseTTAFold(
    *,
    repo_dir: str | None = None,
    python_executable: str | None = None,
    max_cycle: int | None = None,
    job_name: str = "molforge_prediction",
    extra_overrides: list[str] | None = None,
)

Bases: FoldingEngine

Wrapper around RoseTTAFold All-Atom (RFAA) for single-chain protein folding.

Parameters:

Name Type Description Default
repo_dir str | None

Path to the cloned RoseTTAFold-All-Atom repo. If None (default), the wrapper looks at the RFAA_HOME environment variable. If neither is set, :meth:predict raises :class:FoldingEngineNotInstalledError.

None
python_executable str | None

Path to the Python interpreter that has the RFAA environment activated. Default None uses sys.executable (the same Python molforge is running in). Override when RFAA lives in a different conda env.

None
max_cycle int | None

Hydra override for loader_params.MAXCYCLE. The RFAA README recommends 10 for hard cases (default is 4). None keeps the model default.

None
job_name str

Name used for output files. Defaults to "molforge_prediction".

'molforge_prediction'
extra_overrides list[str] | None

Additional Hydra-style overrides (e.g. ["recycling_steps=8"]) passed verbatim to the CLI. Use this for any RFAA config knob the wrapper doesn't expose explicitly.

None
Example

from molforge.wrappers.folding import RoseTTAFold engine = RoseTTAFold(repo_dir="/opt/RoseTTAFold-All-Atom", ... max_cycle=10) protein = engine.predict("MKTVRQERLKSIVRILERSK") protein.metadata["mean_confidence"] 82.4 protein.metadata["pae_inter"] # RFAA's headline confidence 4.8

predict

predict(sequence: str, **kwargs: object) -> Protein

Fold a single sequence into a :class:Protein via RFAA.

Parameters:

Name Type Description Default
sequence str

One-letter amino-acid sequence.

required
**kwargs object

Reserved for future per-call options.

{}

Returns:

Name Type Description
A Protein

class:Protein with:

Protein
  • metadata["engine"] = "RoseTTAFold"
Protein
  • metadata["source_sequence"]: the input sequence
Protein
  • metadata["confidence_per_residue"]: (L,) float32 pLDDT
Protein
  • metadata["confidence_per_atom"]: (N_atoms,) float32 pLDDT
Protein
  • metadata["mean_confidence"]: scalar float pLDDT (0–100)
Protein
  • metadata["pae"]: (L, L) predicted aligned error (only populated when the aux file is readable)
Protein
  • metadata["pae_inter"]: scalar mean inter-frame PAE (RFAA's headline interface-quality metric; < 10 typically indicates high quality)
Protein
  • metadata["mean_pae"]: scalar mean PAE over the matrix
Protein
  • metadata["pae_prot"]: scalar mean PAE over protein-only residues

Raises:

Type Description
FoldingEngineNotInstalledError

If repo_dir isn't set (via constructor or RFAA_HOME) or doesn't exist.

RuntimeError

If the CLI fails or produces no output.