Walkthrough 3 — MD simulations¶

This walkthrough covers molforge.wrappers.md.OpenMM: the MD-engine wrapper that exposes a uniform prepare → minimize → run flow over OpenMM.

The pipeline:

Load a protein structure.
Prepare an OpenMM simulation with a chosen force field.
Minimize the energy.
Run production dynamics, recording snapshots.
Analyze the resulting Trajectory with the rest of molforge.

The OpenMM calls themselves are slow (CPU minutes, GPU seconds) and need the full OpenMM install. Cells that actually invoke OpenMM are marked # 🐢 SLOW and shown without execution; the data structures they produce are explained inline so this notebook is readable on GitHub without OpenMM installed.

To run end-to-end:

pip install 'molforge[md]'
jupyter lab notebooks/walkthroughs/03_md_simulations.ipynb

On Windows, use conda instead:

conda install -c conda-forge openmm

In [1]:

Copied!

import molforge as mf
import numpy as np
print(f"molforge {mf.__version__}")
import molforge as mf
import numpy as np
print(f"molforge {mf.__version__}")

molforge 0.0.1

1. Load the structure¶

For demonstration we use the bundled helix.pdb fixture. In practice you'd feed in your real structure — a crystal structure, an ESMFold prediction, or any Protein instance.

Important caveat: the helix fixture is just backbone (N/CA/C/O, no side chains, no hydrogens), which means real OpenMM force fields can't parameterize it. For a genuine MD run, your input structure needs full atom sets. Use OpenMM's Modeller.addHydrogens() for prep, or feed in a complete crystal structure to start.

In [1]:

Copied!





from pathlib import Path
from molforge.io import read_pdb

protein = read_pdb(Path("../../tests/fixtures/pdb/helix.pdb"))
print(protein)
print(f"chain IDs: {[c.chain_id for c in protein.chains]}")
print(f"sequence:  {protein.sequence}")
from pathlib import Path
from molforge.io import read_pdb

protein = read_pdb(Path("../../tests/fixtures/pdb/helix.pdb"))
print(protein)
print(f"chain IDs: {[c.chain_id for c in protein.chains]}")
print(f"sequence:  {protein.sequence}")

Protein(name='helix', n_chains=1, n_residues=15, n_atoms=60)
chain IDs: ['A']
sequence:  AAAAAAAAAAAAAAA

2. Build the OpenMM engine¶

Engine construction is lazy — it doesn't load OpenMM until you actually call prepare(). That means you can build and configure the engine in scripts that don't end up running anything, and you only pay the OpenMM import cost when you actually need it.

In [1]:

Copied!





from molforge.wrappers.md import OpenMM

engine = OpenMM(
    platform="CUDA",       # or "CPU"; None = auto-detect fastest
    precision="mixed",     # mixed = ~2x faster than double on GPU
    nonbonded_method="NoCutoff",   # NoCutoff = vacuum / implicit solvent
    constraints="HBonds",  # constrains H bonds, enables 2 fs timestep
)
print(engine)
from molforge.wrappers.md import OpenMM

engine = OpenMM(
    platform="CUDA",       # or "CPU"; None = auto-detect fastest
    precision="mixed",     # mixed = ~2x faster than double on GPU
    nonbonded_method="NoCutoff",   # NoCutoff = vacuum / implicit solvent
    constraints="HBonds",  # constrains H bonds, enables 2 fs timestep
)
print(engine)

OpenMM()

3. Prepare the simulation¶

engine.prepare(protein, force_field=...) builds an OpenMM Simulation object — system, topology, integrator, initial positions and velocities — and returns a molforge Simulation dataclass that wraps it.

molforge ships a curated registry of force-field names with their matching water-model XMLs. The names are:

Name	Force field	Water model
`amber14-all`	AMBER ff14SB	TIP3P
`amber99sb`	AMBER ff99SB	TIP3P
`amber99sb-ildn`	AMBER ff99SB-ILDN	TIP3P
`charmm36`	CHARMM36	CHARMM-flavor water

Any other XML name OpenMM recognizes is passed through.

In [1]:

Copied!





# 🐢 SLOW — needs OpenMM installed and a force-field-compatible structure
# sim = engine.prepare(
#     protein,
#     force_field="amber14-all",
#     temperature=300.0,   # K — thermostat target
#     timestep=0.002,      # ps = 2 fs (the standard with HBonds constraints)
# )

# For illustration: what the returned `Simulation` object looks like.
print("Simulation(")
print("    topology=Protein(name='helix', n_chains=1, n_residues=15, n_atoms=60),")
print("    coordinates=ndarray (60, 3) float32,")
print("    velocities=ndarray (60, 3) float32,    # Maxwell-Boltzmann at 300 K")
print("    time=0.0,")
print("    force_field='amber14-all',")
print("    temperature=300.0,")
print("    timestep=0.002,")
print("    engine_handle=<openmm.app.Simulation>,")
print("    metadata={'platform': 'CUDA', 'nonbonded_method': 'NoCutoff',")
print("              'constraints': 'HBonds'},")
print(")")
# 🐢 SLOW — needs OpenMM installed and a force-field-compatible structure
# sim = engine.prepare(
#     protein,
#     force_field="amber14-all",
#     temperature=300.0,   # K — thermostat target
#     timestep=0.002,      # ps = 2 fs (the standard with HBonds constraints)
# )

# For illustration: what the returned `Simulation` object looks like.
print("Simulation(")
print("    topology=Protein(name='helix', n_chains=1, n_residues=15, n_atoms=60),")
print("    coordinates=ndarray (60, 3) float32,")
print("    velocities=ndarray (60, 3) float32,    # Maxwell-Boltzmann at 300 K")
print("    time=0.0,")
print("    force_field='amber14-all',")
print("    temperature=300.0,")
print("    timestep=0.002,")
print("    engine_handle=<openmm.app.Simulation>,")
print("    metadata={'platform': 'CUDA', 'nonbonded_method': 'NoCutoff',")
print("              'constraints': 'HBonds'},")
print(")")

Simulation(
    topology=Protein(name='helix', n_chains=1, n_residues=15, n_atoms=60),
    coordinates=ndarray (60, 3) float32,
    velocities=ndarray (60, 3) float32,    # Maxwell-Boltzmann at 300 K
    time=0.0,
    force_field='amber14-all',
    temperature=300.0,
    timestep=0.002,
    engine_handle=<openmm.app.Simulation>,
    metadata={'platform': 'CUDA', 'nonbonded_method': 'NoCutoff',
              'constraints': 'HBonds'},
)

4. Energy-minimize¶

Minimization removes any clashes and steep gradients that would otherwise cause the first few MD steps to blow up. The defaults (1000 iterations, tolerance 10 kJ/mol/nm) match OpenMM's standard recommendations.

In [1]:

Copied!





# 🐢 SLOW
# sim = engine.minimize(sim, max_iterations=1000, tolerance=10.0)
# sim.coordinates is now the minimized configuration

print("After minimize:")
print("  - sim.coordinates is updated in place to the minimized geometry")
print("  - sim.velocities is reset (zeroed); the next call re-thermalizes")
print("  - the engine_handle is the same OpenMM Simulation object")
# 🐢 SLOW
# sim = engine.minimize(sim, max_iterations=1000, tolerance=10.0)
# sim.coordinates is now the minimized configuration

print("After minimize:")
print("  - sim.coordinates is updated in place to the minimized geometry")
print("  - sim.velocities is reset (zeroed); the next call re-thermalizes")
print("  - the engine_handle is the same OpenMM Simulation object")

After minimize:
  - sim.coordinates is updated in place to the minimized geometry
  - sim.velocities is reset (zeroed); the next call re-thermalizes
  - the engine_handle is the same OpenMM Simulation object

5. Run production dynamics¶

engine.run(sim, n_steps, save_every) integrates the dynamics and records a frame every save_every steps. The result is a Trajectory whose coordinates is a (n_frames, n_atoms, 3) array.

A useful rule of thumb: with the default 2 fs timestep,

10 000 steps = 20 ps (a quick smoke test)
500 000 steps = 1 ns (typical short equilibration)
50 000 000 steps = 100 ns (a real production run)

save_every controls how often you record. Recording every step is wasteful — the molecule barely moves in 2 fs — so the typical range is 100-1000 steps per frame.

In [1]:

Copied!





# 🐢 SLOW
# traj = engine.run(
#     sim,
#     n_steps=50_000,      # 100 ps with 2 fs timestep
#     save_every=500,      # record every 1 ps -> 101 frames
# )

print("Returns: Trajectory(")
print("    topology=Protein(...),")
print("    coordinates=ndarray (101, 60, 3) float32,   # one frame per ps")
print("    times=ndarray (101,) float64,                # in ps")
print("    energies=ndarray (101,) float64,             # in kJ/mol")
print("    temperatures=None,                            # not recorded by default")
print("    metadata={'engine': 'OpenMM', 'force_field': 'amber14-all',")
print("              'timestep_ps': 0.002, 'save_every': 500,")
print("              'n_steps': 50000, 'temperature_K': 300.0},")
print(")")
# 🐢 SLOW
# traj = engine.run(
#     sim,
#     n_steps=50_000,      # 100 ps with 2 fs timestep
#     save_every=500,      # record every 1 ps -> 101 frames
# )

print("Returns: Trajectory(")
print("    topology=Protein(...),")
print("    coordinates=ndarray (101, 60, 3) float32,   # one frame per ps")
print("    times=ndarray (101,) float64,                # in ps")
print("    energies=ndarray (101,) float64,             # in kJ/mol")
print("    temperatures=None,                            # not recorded by default")
print("    metadata={'engine': 'OpenMM', 'force_field': 'amber14-all',")
print("              'timestep_ps': 0.002, 'save_every': 500,")
print("              'n_steps': 50000, 'temperature_K': 300.0},")
print(")")

Returns: Trajectory(
    topology=Protein(...),
    coordinates=ndarray (101, 60, 3) float32,   # one frame per ps
    times=ndarray (101,) float64,                # in ps
    energies=ndarray (101,) float64,             # in kJ/mol
    temperatures=None,                            # not recorded by default
    metadata={'engine': 'OpenMM', 'force_field': 'amber14-all',
              'timestep_ps': 0.002, 'save_every': 500,
              'n_steps': 50000, 'temperature_K': 300.0},
)

6. Analyze the trajectory¶

The Trajectory dataclass is the bridge between MD output and the rest of molforge. Three core operations:

Iterate frames — yields Protein snapshots (one per recorded frame).
Index a single frame — traj.frame(i) returns a Protein deep- copy with the i-th frame's coordinates.
Inspect properties — traj.coordinates, traj.times, traj.energies are plain NumPy arrays.

This is the seam where MD output meets the rest of the library: any analysis you'd do on a single Protein — RMSD, DSSP, contacts, SASA, dihedrals — works on a trajectory frame.

Note: The trajectory below is synthetic — we generate frames by adding small Gaussian noise to the original coordinates. Real OpenMM trajectories would show smoother dynamics with the force field maintaining the helix's H-bond geometry. The point of this cell is to demonstrate the Trajectory API, not to show realistic dynamics.

In [1]:

Copied!





# Constructing a synthetic Trajectory for demonstration without OpenMM.
# In real use, `traj` is the output of `engine.run(...)`.
from molforge.md import Trajectory
from copy import deepcopy

rng = np.random.default_rng(0)
n_frames, n_atoms = 5, protein.n_atoms
coords = np.tile(
    protein.atom_array.coords[None, :, :],
    (n_frames, 1, 1),
).astype(np.float32)
coords += rng.normal(scale=0.3, size=coords.shape).astype(np.float32)

traj = Trajectory(
    topology=protein,
    coordinates=coords,
    times=np.linspace(0, 4.0, n_frames),     # ps
    energies=np.linspace(-2000, -1950, n_frames),  # kJ/mol
    metadata={"engine": "OpenMM (synthetic)", "force_field": "amber14-all"},
)
print(f"n_frames: {traj.n_frames}")
print(f"n_atoms:  {traj.n_atoms}")
print(f"times:    {traj.times}")
print(f"energies: {traj.energies}")
# Constructing a synthetic Trajectory for demonstration without OpenMM.
# In real use, `traj` is the output of `engine.run(...)`.
from molforge.md import Trajectory
from copy import deepcopy

rng = np.random.default_rng(0)
n_frames, n_atoms = 5, protein.n_atoms
coords = np.tile(
    protein.atom_array.coords[None, :, :],
    (n_frames, 1, 1),
).astype(np.float32)
coords += rng.normal(scale=0.3, size=coords.shape).astype(np.float32)

traj = Trajectory(
    topology=protein,
    coordinates=coords,
    times=np.linspace(0, 4.0, n_frames),     # ps
    energies=np.linspace(-2000, -1950, n_frames),  # kJ/mol
    metadata={"engine": "OpenMM (synthetic)", "force_field": "amber14-all"},
)
print(f"n_frames: {traj.n_frames}")
print(f"n_atoms:  {traj.n_atoms}")
print(f"times:    {traj.times}")
print(f"energies: {traj.energies}")

n_frames: 5
n_atoms:  60
times:    [0. 1. 2. 3. 4.]
energies: [-2000.   -1987.5  -1975.   -1962.5  -1950. ]

Iterate frames as Protein snapshots¶

In [1]:

Copied!





# Each frame is a full Protein deep-copy. Mutating it doesn't affect the trajectory.
for i, snapshot in enumerate(traj):
    print(f"frame {i}: {type(snapshot).__name__} with {snapshot.n_atoms} atoms")
    if i >= 2:
        print("...")
        break

# Direct frame access (deep copy returned, original trajectory unchanged)
mid = traj.frame(2)
print(f"\nMiddle frame as Protein: {mid}")
# Each frame is a full Protein deep-copy. Mutating it doesn't affect the trajectory.
for i, snapshot in enumerate(traj):
    print(f"frame {i}: {type(snapshot).__name__} with {snapshot.n_atoms} atoms")
    if i >= 2:
        print("...")
        break

# Direct frame access (deep copy returned, original trajectory unchanged)
mid = traj.frame(2)
print(f"\nMiddle frame as Protein: {mid}")

frame 0: Protein with 60 atoms
frame 1: Protein with 60 atoms
frame 2: Protein with 60 atoms
...

Middle frame as Protein: Protein(name='helix', n_chains=1, n_residues=15, n_atoms=60)

Combine with other molforge subpackages¶

Because frames are Protein instances, every structural analysis you already know works on MD output. Two common patterns:

Track RMSD against the initial frame — a basic stability check.

In [1]:

Copied!





from molforge.structure import rmsd

start = traj.frame(0)
rmsd_trace = [rmsd(traj.frame(i), start, subset="ca") for i in range(traj.n_frames)]
print(f"CA-RMSD vs. frame 0 (Å):")
for i, r in enumerate(rmsd_trace):
    print(f"  frame {i} (t={traj.times[i]:.1f} ps): {r:.3f}")
from molforge.structure import rmsd

start = traj.frame(0)
rmsd_trace = [rmsd(traj.frame(i), start, subset="ca") for i in range(traj.n_frames)]
print(f"CA-RMSD vs. frame 0 (Å):")
for i, r in enumerate(rmsd_trace):
    print(f"  frame {i} (t={traj.times[i]:.1f} ps): {r:.3f}")

CA-RMSD vs. frame 0 (Å):
  frame 0 (t=0.0 ps): 0.000
  frame 1 (t=1.0 ps): 0.665
  frame 2 (t=2.0 ps): 0.685
  frame 3 (t=3.0 ps): 0.603
  frame 4 (t=4.0 ps): 0.504

Track secondary-structure stability — does the helix stay folded?

In [1]:

Copied!





from molforge.structure import dssp_3state

ss_trace = [dssp_3state(traj.frame(i)) for i in range(traj.n_frames)]
print(f"DSSP per frame:")
for i, ss in enumerate(ss_trace):
    print(f"  frame {i}: {ss}")
# A common quality check: fraction of helix-classified residues over time
print()
print(f"helix fraction per frame:")
for i, ss in enumerate(ss_trace):
    f = ss.count('H') / len(ss)
    print(f"  frame {i}: {f:.0%}")
from molforge.structure import dssp_3state

ss_trace = [dssp_3state(traj.frame(i)) for i in range(traj.n_frames)]
print(f"DSSP per frame:")
for i, ss in enumerate(ss_trace):
    print(f"  frame {i}: {ss}")
# A common quality check: fraction of helix-classified residues over time
print()
print(f"helix fraction per frame:")
for i, ss in enumerate(ss_trace):
    f = ss.count('H') / len(ss)
    print(f"  frame {i}: {f:.0%}")

DSSP per frame:
  frame 0: CCHHHHHHHCCCCCC
  frame 1: CCCCHHHHCCCCCCC
  frame 2: CCCHHHHHHHCCCCC
  frame 3: CHHHHHHCCCCCCCC
  frame 4: CHHHHHHHHHHHCCC

helix fraction per frame:
  frame 0: 47%
  frame 1: 27%
  frame 2: 47%
  frame 3: 40%
  frame 4: 73%

7. The full design pattern¶

Putting it together, an MD-based refinement after folding looks like:

from molforge.wrappers.folding import ESMFold
from molforge.wrappers.md import OpenMM
from molforge.structure import rmsd, dssp_3state

# Fold
predicted = ESMFold().predict("MKTVRQERLKSIVRILERSK")

# Set up MD
engine = OpenMM(platform="CUDA")
sim    = engine.prepare(predicted, force_field="amber14-all")
sim    = engine.minimize(sim)
traj   = engine.run(sim, n_steps=500_000, save_every=1000)   # 1 ns trajectory

# Analyze
final_struct = traj.frame(-1)                                # last snapshot
print(f"Drift from prediction:    {rmsd(final_struct, predicted, subset='ca'):.2f} Å")
print(f"Predicted secondary str:  {dssp_3state(predicted)}")
print(f"Equilibrated secondary:   {dssp_3state(final_struct)}")

The folding wrapper, MD wrapper, and analysis utilities all speak the same Protein data model. No conversion code, no file I/O between stages, no glue.

What's next¶

Walkthrough 4 — docking with AutoDock Vina.
Walkthrough 5 — ML featurization (turn trajectory frames into GNN inputs).
End-to-end example (notebooks/examples/end_to_end_design.ipynb) — full design loop combining folding, mutation, and re-folding.