Skip to content

File I/O

molforge.io provides one-line load/save for the formats structural biology actually uses, with extension-based dispatch so you don't have to remember per-format function names.

The three entry points

from molforge.io import load, save, fetch
  • load(path) — read a structure or sequence from disk. Dispatches on extension.
  • save(obj, path) — write a structure or sequence. Dispatches on extension.
  • fetch(pdb_id) — download a structure from the RCSB PDB into a Protein.

Supported formats

Extension Load Save Notes
.pdb Full PDB parser including altlocs, multi-model.
.cif mmCIF; biotite backend with [io] extra.
.mmcif Alias for .cif.
.fasta One record per chain.
.fa Alias for .fasta.
.mol2 Small molecules, ligands.
.sdf Small molecules, ligands.

AlphaFold-aware loading

AlphaFold PDB output stores per-residue pLDDT in the B-factor column. load_alphafold recognizes this and stores the values in protein.metadata["confidence_per_residue"] instead of silently mixing them with structural B-factors:

from molforge.io import load_alphafold, is_alphafold_pdb

if is_alphafold_pdb("AF-Q9Y6K9-F1-model_v4.pdb"):
    protein = load_alphafold("AF-Q9Y6K9-F1-model_v4.pdb")
    confidence = protein.metadata["confidence_per_residue"]

load itself does not auto-detect AlphaFold output — it would require sniffing file contents on every call. Use the explicit loader when you know you're working with AlphaFold predictions.

Altloc handling

PDB altloc records (alternate side-chain conformations) are preserved by default. Pass altloc= to choose how to resolve them at load time:

from molforge.io import load_pdb

p = load_pdb("with_altlocs.pdb", altloc="first")      # default
p = load_pdb("with_altlocs.pdb", altloc="highest")    # by occupancy
p = load_pdb("with_altlocs.pdb", altloc="all")        # keep all rows
p = load_pdb("with_altlocs.pdb", altloc="A")          # by label

See molforge.io for the full set of options and per-format hooks.

Reference