File I/O¶
molforge.io provides one-line load/save for the formats structural
biology actually uses, with extension-based dispatch so you don't
have to remember per-format function names.
The three entry points¶
load(path)— read a structure or sequence from disk. Dispatches on extension.save(obj, path)— write a structure or sequence. Dispatches on extension.fetch(pdb_id)— download a structure from the RCSB PDB into aProtein.
Supported formats¶
| Extension | Load | Save | Notes |
|---|---|---|---|
.pdb |
✓ | ✓ | Full PDB parser including altlocs, multi-model. |
.cif |
✓ | ✓ | mmCIF; biotite backend with [io] extra. |
.mmcif |
✓ | ✓ | Alias for .cif. |
.fasta |
✓ | ✓ | One record per chain. |
.fa |
✓ | ✓ | Alias for .fasta. |
.mol2 |
✓ | — | Small molecules, ligands. |
.sdf |
✓ | — | Small molecules, ligands. |
AlphaFold-aware loading¶
AlphaFold PDB output stores per-residue pLDDT in the B-factor column.
load_alphafold recognizes this and stores
the values in protein.metadata["confidence_per_residue"] instead of
silently mixing them with structural B-factors:
from molforge.io import load_alphafold, is_alphafold_pdb
if is_alphafold_pdb("AF-Q9Y6K9-F1-model_v4.pdb"):
protein = load_alphafold("AF-Q9Y6K9-F1-model_v4.pdb")
confidence = protein.metadata["confidence_per_residue"]
load itself does not auto-detect AlphaFold output — it would
require sniffing file contents on every call. Use the explicit
loader when you know you're working with AlphaFold predictions.
Altloc handling¶
PDB altloc records (alternate side-chain conformations) are
preserved by default. Pass altloc= to choose how to resolve them
at load time:
from molforge.io import load_pdb
p = load_pdb("with_altlocs.pdb", altloc="first") # default
p = load_pdb("with_altlocs.pdb", altloc="highest") # by occupancy
p = load_pdb("with_altlocs.pdb", altloc="all") # keep all rows
p = load_pdb("with_altlocs.pdb", altloc="A") # by label
See molforge.io for the full set of
options and per-format hooks.
Reference¶
molforge.io— full API.