molforge.wrappers.folding¶

folding ¶

Folding-engine wrappers.

Concrete engines

:class:ESMFold — implemented (single-sequence transformer; fast)
:class:AlphaFold — implemented (MSA-based via ColabFold)
:class:Boltz — implemented (Boltz-1 / Boltz-2 via subprocess)
:class:RoseTTAFold — implemented (RoseTTAFold All-Atom; subprocess)

Shared

:class:FoldingEngine — abstract base for the engine contract
:class:FoldingEngineNotInstalledError — raised when heavy dependencies (torch, transformers, colabfold, the boltz CLI, the RFAA repo + databases, ...) aren't installed.

All engines write per-residue confidence to protein.metadata["confidence_per_residue"] so downstream code can read confidence uniformly regardless of which engine produced the structure.

FoldingEngine ¶

Bases: ABC

Abstract base for sequence-to-structure prediction engines.

Subclasses must implement :meth:predict. The default implementation of :meth:predict_many is a simple loop; engines that support batching (most do) should override it for efficiency.

Attributes:

Name	Type	Description
`name`	`str`	Human-readable engine name (set by subclasses).

predict `abstractmethod` ¶

predict(sequence: str, **kwargs: object) -> Protein

Predict a single structure from a sequence.

Parameters:

Name	Type	Description	Default
`sequence`	`str`	One-letter amino-acid sequence. Whitespace is stripped; non-letter characters raise :class:`ValueError`.	required
`**kwargs`	`object`	Engine-specific options.	`{}`

Returns:

Name	Type	Description
`A`	`Protein`	class:`molforge.core.Protein` whose `metadata` includes
	`Protein`	at minimum `engine` (the engine name) and, where the engine
	`Protein`	produces one, `confidence_per_residue` and `mean_confidence`.

predict_many ¶

predict_many(
    sequences: Sequence[str], **kwargs: object
) -> list[Protein]

Predict structures for a batch of sequences.

The default implementation is a serial loop. Engines with batch APIs (almost all of them) should override this.

FoldingEngineNotInstalledError ¶

Bases: ImportError

Raised when a folding engine's heavy dependencies aren't installed.

The message points at the relevant pip install extras so users can fix it without grepping the docs.

AlphaFold ¶

AlphaFold(
    *,
    mode: Literal["local", "server"] = "local",
    num_models: int = 5,
    num_recycles: int = 3,
    msa_mode: str = "mmseqs2_uniref_env",
    device: str | None = None,
    model_type: str = "AlphaFold2-ptm",
)

Bases: FoldingEngine

Wrapper around AlphaFold via ColabFold's Python API.

Parameters:

Name	Type	Description	Default
`mode`	`Literal['local', 'server']`	`"local"` to call ColabFold's local Python API (requires the package and weights), or `"server"` for remote prediction (not yet implemented).	`'local'`
`num_models`	`int`	How many of the 5 AlphaFold models to run. Default 5 (full ensemble). Set to 1 for faster preview predictions; the AlphaFold paper showed that the top-1-of-5 best model captures most of the accuracy.	`5`
`num_recycles`	`int`	AlphaFold recycling iterations. Default 3 matches the original paper. More = slower but slightly better; useful for low-confidence regions.	`3`
`msa_mode`	`str`	ColabFold MSA pipeline. `"mmseqs2_uniref_env"` (default) is the full-quality search. `"single_sequence"` skips MSA entirely (very fast but lower accuracy — about on par with ESMFold).	`'mmseqs2_uniref_env'`
`device`	`str \| None`	`"cuda"`, `"cpu"`, or `None` to auto-detect.	`None`
`model_type`	`str`	`"AlphaFold2-ptm"` (default, with pTM head) or `"AlphaFold2"` (original).	`'AlphaFold2-ptm'`

Example

from molforge.wrappers.folding import AlphaFold engine = AlphaFold(num_models=1, num_recycles=3) # fastest preview protein = engine.predict("MKTVRQERLKSIVRILERSK") protein.metadata["mean_confidence"] 87.2

predict ¶

predict(sequence: str, **kwargs: object) -> Protein

Fold a single sequence into a :class:Protein.

Parameters:

Name	Type	Description	Default
`sequence`	`str`	One-letter amino-acid sequence.	required
`**kwargs`	`object`	Reserved for future per-call options.	`{}`

Returns:

Name	Type	Description
`A`	`Protein`	class:`Protein` with:
	`Protein`	`metadata["engine"] = "AlphaFold"`
	`Protein`	`metadata["model_type"]`: which AlphaFold model was used
	`Protein`	`metadata["confidence_per_residue"]`: `(L,)` float32 pLDDT
	`Protein`	`metadata["mean_confidence"]`: float mean pLDDT
	`Protein`	`metadata["confidence_per_atom"]`: `(N_atoms,)` float32 pLDDT (copy of B-factor column)
	`Protein`	`metadata["ptm"]` (if `model_type="AlphaFold2-ptm"`): predicted TM score

Boltz ¶

Boltz(
    *,
    model_version: Literal["boltz1", "boltz2"] = "boltz2",
    use_msa_server: bool = True,
    recycling_steps: int | None = None,
    diffusion_samples: int | None = None,
    sampling_steps: int | None = None,
    device: str | None = None,
    executable: str | None = None,
    cache_dir: str | None = None,
)

Bases: FoldingEngine

Wrapper around the Boltz biomolecular prediction model.

Parameters:

Name	Type	Description	Default
`model_version`	`Literal['boltz1', 'boltz2']`	`"boltz1"` or `"boltz2"`. The CLI defaults to `boltz2` when available. Set explicitly for reproducibility.	`'boltz2'`
`use_msa_server`	`bool`	If `True` (default), Boltz hits the MMseqs2 MSA server for protein chains. Set `False` for fast single-sequence inference (lower accuracy, no internet required after weight download).	`True`
`recycling_steps`	`int \| None`	How many trunk-recycling rounds Boltz runs. Default `None` lets Boltz choose its own (3 for boltz1, 10 for boltz2 currently).	`None`
`diffusion_samples`	`int \| None`	Number of diffusion samples drawn per prediction. Default `None` uses Boltz's default (1). Higher = more thorough sampling, slower.	`None`
`sampling_steps`	`int \| None`	Number of diffusion sampling steps. Default `None` uses Boltz's own (200 for boltz1, 30 for boltz2).	`None`
`device`	`str \| None`	Which device to use. Default `None` lets Boltz auto-detect (CUDA → CPU fallback). Pass `"cpu"` to force CPU even when a GPU is present.	`None`
`executable`	`str \| None`	Path to the `boltz` CLI binary. `None` (default) means look it up on `$PATH`. Override only for testing or non-standard installs.	`None`
`cache_dir`	`str \| None`	Where Boltz looks for / downloads its weights. `None` (default) uses Boltz's own default (`~/.boltz`).	`None`

Example

from molforge.wrappers.folding import Boltz engine = Boltz(model_version="boltz2", use_msa_server=True) protein = engine.predict("MKTVRQERLKSIVRILERSK") protein.metadata["mean_confidence"] 87.3 protein.metadata["ptm"] 0.84

predict ¶

predict(sequence: str, **kwargs: object) -> Protein

Fold a single sequence into a :class:Protein via the boltz CLI.

Parameters:

Name	Type	Description	Default
`sequence`	`str`	One-letter amino-acid sequence.	required
`**kwargs`	`object`	Reserved for future per-call options.	`{}`

Returns:

Name	Type	Description
`A`	`Protein`	class:`Protein` with:
	`Protein`	`metadata["engine"] = "Boltz"`
	`Protein`	`metadata["model_version"]`: `"boltz1"` or `"boltz2"`
	`Protein`	`metadata["source_sequence"]`: the input sequence
	`Protein`	`metadata["confidence_per_residue"]`: `(L,)` float32 pLDDT
	`Protein`	`metadata["mean_confidence"]`: scalar float pLDDT
	`Protein`	`metadata["ptm"]`: predicted TM-score
	`Protein`	`metadata["iptm"]`: interface pTM (only meaningful for complexes; usually 0 for single-chain inputs)
	`Protein`	`metadata["confidence_score"]`: Boltz's composite confidence (`0.8 * pLDDT + 0.2 * iPTM` for boltz2)

Raises:

Type	Description
`FoldingEngineNotInstalledError`	If the `boltz` CLI isn't on `$PATH` (or at the configured `executable`).
`RuntimeError`	If the CLI runs but produces no output, or its output can't be parsed.

ESMFold ¶

ESMFold(
    *,
    model_name: str = "facebook/esmfold_v1",
    device: str | None = None,
    chunk_size: int | None = None,
    dtype: str = "float32",
)

Bases: FoldingEngine

Wrapper around Meta AI's ESMFold (single-sequence transformer folder).

Parameters:

Name	Type	Description	Default
`model_name`	`str`	HuggingFace model identifier. Defaults to `"facebook/esmfold_v1"`, the public ESMFold v1 checkpoint.	`'facebook/esmfold_v1'`
`device`	`str \| None`	Where to run inference. `"cuda"`, `"cpu"`, `"mps"`, or `None` to auto-detect (CUDA if available, else CPU).	`None`
`chunk_size`	`int \| None`	Axial-attention chunk size (lower = less memory but slower). `None` for no chunking. `64` is a reasonable default for sequences > 700 aa on a 24 GB GPU.	`None`
`dtype`	`str`	`"float32"` (default) or `"float16"` for faster GPU inference at the cost of marginal accuracy.	`'float32'`

Example

from molforge.wrappers.folding import ESMFold engine = ESMFold(device="cuda") protein = engine.predict("MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVS") protein.metadata["mean_confidence"] 82.4

predict ¶

predict(sequence: str, **kwargs: object) -> Protein

Fold a single sequence into a :class:Protein.

Parameters:

Name	Type	Description	Default
`sequence`	`str`	One-letter amino-acid sequence.	required
`**kwargs`	`object`	Reserved for future per-call options; currently unused.	`{}`

Returns:

Name	Type	Description
`A`	`Protein`	class:`Protein` with one chain (`"A"`), the predicted
	`Protein`	structure, and:
	`Protein`	`metadata["engine"] = "ESMFold"`
	`Protein`	`metadata["confidence_per_residue"]`: `(L,)` float32 pLDDT
	`Protein`	`metadata["mean_confidence"]`: float mean pLDDT
	`Protein`	`metadata["confidence_per_atom"]`: `(N_atoms,)` float32 pLDDT (copy of B-factor column for convenience)

RoseTTAFold ¶

RoseTTAFold(
    *,
    repo_dir: str | None = None,
    python_executable: str | None = None,
    max_cycle: int | None = None,
    job_name: str = "molforge_prediction",
    extra_overrides: list[str] | None = None,
)

Bases: FoldingEngine

Wrapper around RoseTTAFold All-Atom (RFAA) for single-chain protein folding.

Parameters:

Name	Type	Description	Default
`repo_dir`	`str \| None`	Path to the cloned `RoseTTAFold-All-Atom` repo. If `None` (default), the wrapper looks at the `RFAA_HOME` environment variable. If neither is set, :meth:`predict` raises :class:`FoldingEngineNotInstalledError`.	`None`
`python_executable`	`str \| None`	Path to the Python interpreter that has the RFAA environment activated. Default `None` uses `sys.executable` (the same Python `molforge` is running in). Override when RFAA lives in a different conda env.	`None`
`max_cycle`	`int \| None`	Hydra override for `loader_params.MAXCYCLE`. The RFAA README recommends `10` for hard cases (default is 4). `None` keeps the model default.	`None`
`job_name`	`str`	Name used for output files. Defaults to `"molforge_prediction"`.	`'molforge_prediction'`
`extra_overrides`	`list[str] \| None`	Additional Hydra-style overrides (e.g. `["recycling_steps=8"]`) passed verbatim to the CLI. Use this for any RFAA config knob the wrapper doesn't expose explicitly.	`None`

Example

from molforge.wrappers.folding import RoseTTAFold engine = RoseTTAFold(repo_dir="/opt/RoseTTAFold-All-Atom", ... max_cycle=10) protein = engine.predict("MKTVRQERLKSIVRILERSK") protein.metadata["mean_confidence"] 82.4 protein.metadata["pae_inter"] # RFAA's headline confidence 4.8

predict ¶

predict(sequence: str, **kwargs: object) -> Protein

Fold a single sequence into a :class:Protein via RFAA.

Parameters:

Name	Type	Description	Default
`sequence`	`str`	One-letter amino-acid sequence.	required
`**kwargs`	`object`	Reserved for future per-call options.	`{}`

Returns:

Name	Type	Description
`A`	`Protein`	class:`Protein` with:
	`Protein`	`metadata["engine"] = "RoseTTAFold"`
	`Protein`	`metadata["source_sequence"]`: the input sequence
	`Protein`	`metadata["confidence_per_residue"]`: `(L,)` float32 pLDDT
	`Protein`	`metadata["confidence_per_atom"]`: `(N_atoms,)` float32 pLDDT
	`Protein`	`metadata["mean_confidence"]`: scalar float pLDDT (0–100)
	`Protein`	`metadata["pae"]`: `(L, L)` predicted aligned error (only populated when the aux file is readable)
	`Protein`	`metadata["pae_inter"]`: scalar mean inter-frame PAE (RFAA's headline interface-quality metric; < 10 typically indicates high quality)
	`Protein`	`metadata["mean_pae"]`: scalar mean PAE over the matrix
	`Protein`	`metadata["pae_prot"]`: scalar mean PAE over protein-only residues

Raises:

Type	Description
`FoldingEngineNotInstalledError`	If `repo_dir` isn't set (via constructor or `RFAA_HOME`) or doesn't exist.
`RuntimeError`	If the CLI fails or produces no output.

molforge.wrappers.folding¶

folding ¶

FoldingEngine ¶

predict abstractmethod ¶

predict_many ¶

FoldingEngineNotInstalledError ¶

AlphaFold ¶

predict ¶

Boltz ¶

predict ¶

ESMFold ¶

predict ¶

RoseTTAFold ¶

predict ¶

predict `abstractmethod` ¶