Walkthrough 3 — MD simulations¶
This walkthrough covers molforge.wrappers.md.OpenMM: the MD-engine
wrapper that exposes a uniform prepare → minimize → run flow over
OpenMM.
The pipeline:
- Load a protein structure.
- Prepare an OpenMM simulation with a chosen force field.
- Minimize the energy.
- Run production dynamics, recording snapshots.
- Analyze the resulting
Trajectorywith the rest of molforge.
The OpenMM calls themselves are slow (CPU minutes, GPU seconds) and
need the full OpenMM install. Cells that actually invoke OpenMM are
marked # 🐢 SLOW and shown without execution; the data structures
they produce are explained inline so this notebook is readable on
GitHub without OpenMM installed.
To run end-to-end:
pip install 'molforge[md]'
jupyter lab notebooks/walkthroughs/03_md_simulations.ipynb
On Windows, use conda instead:
conda install -c conda-forge openmm
import molforge as mf
import numpy as np
print(f"molforge {mf.__version__}")
molforge 0.0.1
1. Load the structure¶
For demonstration we use the bundled helix.pdb fixture. In practice
you'd feed in your real structure — a crystal structure, an ESMFold
prediction, or any Protein instance.
Important caveat: the helix fixture is just backbone (N/CA/C/O,
no side chains, no hydrogens), which means real OpenMM force fields
can't parameterize it. For a genuine MD run, your input structure
needs full atom sets. Use OpenMM's Modeller.addHydrogens() for
prep, or feed in a complete crystal structure to start.
from pathlib import Path
from molforge.io import read_pdb
protein = read_pdb(Path("../../tests/fixtures/pdb/helix.pdb"))
print(protein)
print(f"chain IDs: {[c.chain_id for c in protein.chains]}")
print(f"sequence: {protein.sequence}")
Protein(name='helix', n_chains=1, n_residues=15, n_atoms=60) chain IDs: ['A'] sequence: AAAAAAAAAAAAAAA
2. Build the OpenMM engine¶
Engine construction is lazy — it doesn't load OpenMM until you
actually call prepare(). That means you can build and configure
the engine in scripts that don't end up running anything, and you
only pay the OpenMM import cost when you actually need it.
from molforge.wrappers.md import OpenMM
engine = OpenMM(
platform="CUDA", # or "CPU"; None = auto-detect fastest
precision="mixed", # mixed = ~2x faster than double on GPU
nonbonded_method="NoCutoff", # NoCutoff = vacuum / implicit solvent
constraints="HBonds", # constrains H bonds, enables 2 fs timestep
)
print(engine)
OpenMM()
3. Prepare the simulation¶
engine.prepare(protein, force_field=...) builds an OpenMM
Simulation object — system, topology, integrator, initial positions
and velocities — and returns a molforge Simulation dataclass that
wraps it.
molforge ships a curated registry of force-field names with their matching water-model XMLs. The names are:
| Name | Force field | Water model |
|---|---|---|
amber14-all |
AMBER ff14SB | TIP3P |
amber99sb |
AMBER ff99SB | TIP3P |
amber99sb-ildn |
AMBER ff99SB-ILDN | TIP3P |
charmm36 |
CHARMM36 | CHARMM-flavor water |
Any other XML name OpenMM recognizes is passed through.
# 🐢 SLOW — needs OpenMM installed and a force-field-compatible structure
# sim = engine.prepare(
# protein,
# force_field="amber14-all",
# temperature=300.0, # K — thermostat target
# timestep=0.002, # ps = 2 fs (the standard with HBonds constraints)
# )
# For illustration: what the returned `Simulation` object looks like.
print("Simulation(")
print(" topology=Protein(name='helix', n_chains=1, n_residues=15, n_atoms=60),")
print(" coordinates=ndarray (60, 3) float32,")
print(" velocities=ndarray (60, 3) float32, # Maxwell-Boltzmann at 300 K")
print(" time=0.0,")
print(" force_field='amber14-all',")
print(" temperature=300.0,")
print(" timestep=0.002,")
print(" engine_handle=<openmm.app.Simulation>,")
print(" metadata={'platform': 'CUDA', 'nonbonded_method': 'NoCutoff',")
print(" 'constraints': 'HBonds'},")
print(")")
Simulation(
topology=Protein(name='helix', n_chains=1, n_residues=15, n_atoms=60),
coordinates=ndarray (60, 3) float32,
velocities=ndarray (60, 3) float32, # Maxwell-Boltzmann at 300 K
time=0.0,
force_field='amber14-all',
temperature=300.0,
timestep=0.002,
engine_handle=<openmm.app.Simulation>,
metadata={'platform': 'CUDA', 'nonbonded_method': 'NoCutoff',
'constraints': 'HBonds'},
)
4. Energy-minimize¶
Minimization removes any clashes and steep gradients that would otherwise cause the first few MD steps to blow up. The defaults (1000 iterations, tolerance 10 kJ/mol/nm) match OpenMM's standard recommendations.
# 🐢 SLOW
# sim = engine.minimize(sim, max_iterations=1000, tolerance=10.0)
# sim.coordinates is now the minimized configuration
print("After minimize:")
print(" - sim.coordinates is updated in place to the minimized geometry")
print(" - sim.velocities is reset (zeroed); the next call re-thermalizes")
print(" - the engine_handle is the same OpenMM Simulation object")
After minimize: - sim.coordinates is updated in place to the minimized geometry - sim.velocities is reset (zeroed); the next call re-thermalizes - the engine_handle is the same OpenMM Simulation object
5. Run production dynamics¶
engine.run(sim, n_steps, save_every) integrates the dynamics and
records a frame every save_every steps. The result is a
Trajectory whose coordinates is a (n_frames, n_atoms, 3) array.
A useful rule of thumb: with the default 2 fs timestep,
- 10 000 steps = 20 ps (a quick smoke test)
- 500 000 steps = 1 ns (typical short equilibration)
- 50 000 000 steps = 100 ns (a real production run)
save_every controls how often you record. Recording every step
is wasteful — the molecule barely moves in 2 fs — so the typical
range is 100-1000 steps per frame.
# 🐢 SLOW
# traj = engine.run(
# sim,
# n_steps=50_000, # 100 ps with 2 fs timestep
# save_every=500, # record every 1 ps -> 101 frames
# )
print("Returns: Trajectory(")
print(" topology=Protein(...),")
print(" coordinates=ndarray (101, 60, 3) float32, # one frame per ps")
print(" times=ndarray (101,) float64, # in ps")
print(" energies=ndarray (101,) float64, # in kJ/mol")
print(" temperatures=None, # not recorded by default")
print(" metadata={'engine': 'OpenMM', 'force_field': 'amber14-all',")
print(" 'timestep_ps': 0.002, 'save_every': 500,")
print(" 'n_steps': 50000, 'temperature_K': 300.0},")
print(")")
Returns: Trajectory(
topology=Protein(...),
coordinates=ndarray (101, 60, 3) float32, # one frame per ps
times=ndarray (101,) float64, # in ps
energies=ndarray (101,) float64, # in kJ/mol
temperatures=None, # not recorded by default
metadata={'engine': 'OpenMM', 'force_field': 'amber14-all',
'timestep_ps': 0.002, 'save_every': 500,
'n_steps': 50000, 'temperature_K': 300.0},
)
6. Analyze the trajectory¶
The Trajectory dataclass is the bridge between MD output and the
rest of molforge. Three core operations:
- Iterate frames — yields
Proteinsnapshots (one per recorded frame). - Index a single frame —
traj.frame(i)returns aProteindeep- copy with the i-th frame's coordinates. - Inspect properties —
traj.coordinates,traj.times,traj.energiesare plain NumPy arrays.
This is the seam where MD output meets the rest of the library: any
analysis you'd do on a single Protein — RMSD, DSSP, contacts,
SASA, dihedrals — works on a trajectory frame.
Note: The trajectory below is synthetic — we generate frames by adding small Gaussian noise to the original coordinates. Real OpenMM trajectories would show smoother dynamics with the force field maintaining the helix's H-bond geometry. The point of this cell is to demonstrate the
TrajectoryAPI, not to show realistic dynamics.
# Constructing a synthetic Trajectory for demonstration without OpenMM.
# In real use, `traj` is the output of `engine.run(...)`.
from molforge.md import Trajectory
from copy import deepcopy
rng = np.random.default_rng(0)
n_frames, n_atoms = 5, protein.n_atoms
coords = np.tile(
protein.atom_array.coords[None, :, :],
(n_frames, 1, 1),
).astype(np.float32)
coords += rng.normal(scale=0.3, size=coords.shape).astype(np.float32)
traj = Trajectory(
topology=protein,
coordinates=coords,
times=np.linspace(0, 4.0, n_frames), # ps
energies=np.linspace(-2000, -1950, n_frames), # kJ/mol
metadata={"engine": "OpenMM (synthetic)", "force_field": "amber14-all"},
)
print(f"n_frames: {traj.n_frames}")
print(f"n_atoms: {traj.n_atoms}")
print(f"times: {traj.times}")
print(f"energies: {traj.energies}")
n_frames: 5 n_atoms: 60 times: [0. 1. 2. 3. 4.] energies: [-2000. -1987.5 -1975. -1962.5 -1950. ]
Iterate frames as Protein snapshots¶
# Each frame is a full Protein deep-copy. Mutating it doesn't affect the trajectory.
for i, snapshot in enumerate(traj):
print(f"frame {i}: {type(snapshot).__name__} with {snapshot.n_atoms} atoms")
if i >= 2:
print("...")
break
# Direct frame access (deep copy returned, original trajectory unchanged)
mid = traj.frame(2)
print(f"\nMiddle frame as Protein: {mid}")
frame 0: Protein with 60 atoms frame 1: Protein with 60 atoms frame 2: Protein with 60 atoms ... Middle frame as Protein: Protein(name='helix', n_chains=1, n_residues=15, n_atoms=60)
Combine with other molforge subpackages¶
Because frames are Protein instances, every structural analysis you
already know works on MD output. Two common patterns:
Track RMSD against the initial frame — a basic stability check.
from molforge.structure import rmsd
start = traj.frame(0)
rmsd_trace = [rmsd(traj.frame(i), start, subset="ca") for i in range(traj.n_frames)]
print(f"CA-RMSD vs. frame 0 (Å):")
for i, r in enumerate(rmsd_trace):
print(f" frame {i} (t={traj.times[i]:.1f} ps): {r:.3f}")
CA-RMSD vs. frame 0 (Å): frame 0 (t=0.0 ps): 0.000 frame 1 (t=1.0 ps): 0.665 frame 2 (t=2.0 ps): 0.685 frame 3 (t=3.0 ps): 0.603 frame 4 (t=4.0 ps): 0.504
Track secondary-structure stability — does the helix stay folded?
from molforge.structure import dssp_3state
ss_trace = [dssp_3state(traj.frame(i)) for i in range(traj.n_frames)]
print(f"DSSP per frame:")
for i, ss in enumerate(ss_trace):
print(f" frame {i}: {ss}")
# A common quality check: fraction of helix-classified residues over time
print()
print(f"helix fraction per frame:")
for i, ss in enumerate(ss_trace):
f = ss.count('H') / len(ss)
print(f" frame {i}: {f:.0%}")
DSSP per frame: frame 0: CCHHHHHHHCCCCCC frame 1: CCCCHHHHCCCCCCC frame 2: CCCHHHHHHHCCCCC frame 3: CHHHHHHCCCCCCCC frame 4: CHHHHHHHHHHHCCC helix fraction per frame: frame 0: 47% frame 1: 27% frame 2: 47% frame 3: 40% frame 4: 73%
7. The full design pattern¶
Putting it together, an MD-based refinement after folding looks like:
from molforge.wrappers.folding import ESMFold
from molforge.wrappers.md import OpenMM
from molforge.structure import rmsd, dssp_3state
# Fold
predicted = ESMFold().predict("MKTVRQERLKSIVRILERSK")
# Set up MD
engine = OpenMM(platform="CUDA")
sim = engine.prepare(predicted, force_field="amber14-all")
sim = engine.minimize(sim)
traj = engine.run(sim, n_steps=500_000, save_every=1000) # 1 ns trajectory
# Analyze
final_struct = traj.frame(-1) # last snapshot
print(f"Drift from prediction: {rmsd(final_struct, predicted, subset='ca'):.2f} Å")
print(f"Predicted secondary str: {dssp_3state(predicted)}")
print(f"Equilibrated secondary: {dssp_3state(final_struct)}")
The folding wrapper, MD wrapper, and analysis utilities all speak the
same Protein data model. No conversion code, no file I/O between
stages, no glue.
What's next¶
- Walkthrough 4 — docking with AutoDock Vina.
- Walkthrough 5 — ML featurization (turn trajectory frames into GNN inputs).
- End-to-end example (
notebooks/examples/end_to_end_design.ipynb) — full design loop combining folding, mutation, and re-folding.