Walkthrough 4 — Docking with AutoDock Vina¶
This notebook shows how to dock a small molecule against a protein
target using molforge.wrappers.docking.Vina. The wrapper handles
the parts of Vina that are most tedious in raw form: invocation,
PDBQT output parsing, and conversion of results back into a
molforge-native DockingResult.
Receptor and ligand preparation is handled automatically as of molforge v0.0.4 — meeko and RDKit are invoked transparently when you pass a PDB, SDF, MOL2, or SMILES input. The example below uses an SDF ligand and a PDB receptor; molforge does the conversion to PDBQT under the hood. Install the extra:
pip install 'molforge[docking]' meeko
If you already have prepared .pdbqt files (from a previous meeko
run or from AutoDockTools), molforge uses them as-is — no
reprocessing.
import molforge as mf
from molforge.wrappers.docking import Vina
print(f"molforge {mf.__version__}")
molforge 0.0.1
1. Construct the engine¶
Constructing a Vina() instance does not load anything heavy.
The wrapper lazily imports the vina package the first time you
call .dock(). That means you can build the engine up-front, share
it across docking runs against different receptors, and never pay
the cost if you don't actually dock anything.
engine = Vina(
scoring="vina", # or "vinardo"
seed=42, # reproducibility
cpu=0, # 0 = use all available threads
verbosity=0,
)
print(engine)
Vina()
2. Specify the search box¶
Vina searches within a 3-D box. The most important parameter is
center — pick the wrong one and you'll dock to nowhere. For
a known binding site, the center of the apo-pocket residues is a
good default. For a blind docking, center on the protein centroid
with a much larger box.
# Example: dock against the binding pocket of PDB 1IEP
# (Imatinib-bound Abl kinase). The known pocket center is roughly
# (16, 53, 15). A 20 Å cube comfortably covers a typical drug-sized
# pocket.
center = (16.0, 53.0, 15.0)
box_size = (20.0, 20.0, 20.0)
print(f"Search box: center {center}, size {box_size} Å")
Search box: center (16.0, 53.0, 15.0), size (20.0, 20.0, 20.0) Å
3. Dock the ligand¶
result = engine.dock(
receptor="data/1iep_receptor.pdbqt",
ligand="data/imatinib.pdbqt",
center=center,
box_size=box_size,
exhaustiveness=8, # Vina default — bump to 16/32 for hard cases
n_poses=9, # how many distinct poses to return
)
A typical Vina run for a 20 Å pocket with exhaustiveness=8 takes
10–30 seconds on a modern multi-core CPU.
4. Inspect the results¶
DockingResult is iterable and sorted best-first (lowest score wins
— Vina returns negative kcal/mol, so −9.0 is better than −7.5).
# Demo: parse a synthetic Vina output to show the result API.
# In real use, `result` comes from engine.dock(...) above.
synthetic_vina_output = '''\
MODEL 1
REMARK VINA RESULT: -9.200 0.000 0.000
ATOM 1 C LIG A 1 1.234 2.345 3.456 1.00 0.00 C
ATOM 2 N LIG A 1 2.234 3.345 4.456 1.00 0.00 N
ENDMDL
MODEL 2
REMARK VINA RESULT: -8.700 0.812 1.456
ATOM 1 C LIG A 1 1.500 2.500 3.600 1.00 0.00 C
ATOM 2 N LIG A 1 2.500 3.500 4.600 1.00 0.00 N
ENDMDL
MODEL 3
REMARK VINA RESULT: -8.100 1.345 2.234
ATOM 1 C LIG A 1 1.700 2.700 3.800 1.00 0.00 C
ATOM 2 N LIG A 1 2.700 3.700 4.800 1.00 0.00 N
ENDMDL
'''
result = engine._parse_poses_pdbqt(synthetic_vina_output)
print(f"Number of poses: {len(result)}")
print(f"Engine: {result.engine}")
print()
print(f"{'rank':>4} {'score (kcal/mol)':>16} {'RMSD vs best':>13}")
for pose in result:
print(f"{pose.rank:>4} {pose.score:>16.2f} {pose.rmsd_lb or 0.0:>13.3f}")
Number of poses: 3 Engine: Vina rank score (kcal/mol) RMSD vs best 0 -9.20 0.000 1 -8.70 0.812 2 -8.10 1.345
5. Work with the top pose¶
Each pose's ligand attribute is a Protein — the same data model
you've been using everywhere else. That means you can run any of
molforge's structural analyses on docked poses directly: RMSD between
top-2 poses, centroid distance, contacts to the receptor, etc.
best = result.best
print(f"Top pose score: {best.score:.2f} kcal/mol")
print(f"Ligand atoms: {best.ligand.n_atoms}")
print(f"Ligand coords:")
print(best.ligand.atom_array.coords)
Top pose score: -9.20 kcal/mol Ligand atoms: 2 Ligand coords: [[1.234 2.345 3.456] [2.234 3.345 4.456]]
# Compare pose 1 to the best pose
import numpy as np
from molforge.structure import rmsd_raw
best_coords = result.poses[0].ligand.atom_array.coords
pose2_coords = result.poses[1].ligand.atom_array.coords
# Both poses have the same atom order/count — that's a Vina guarantee
pose_rmsd = rmsd_raw(best_coords, pose2_coords)
print(f"RMSD between top and 2nd pose: {pose_rmsd:.3f} Å")
print(f"(This matches Vina's reported rmsd_lb={result.poses[1].rmsd_lb:.3f} for sanity)")
RMSD between top and 2nd pose: 0.300 Å (This matches Vina's reported rmsd_lb=0.812 for sanity)
6. Run metadata¶
DockingResult.metadata carries the search parameters used for the
run — useful for reproducibility and for distinguishing replicate
runs in a screening pipeline.
# In a real run, this dict is populated by engine.dock(...).
# We can attach it manually for the demo:
result.metadata = {
"center": center,
"box_size": box_size,
"exhaustiveness": 8,
"scoring": "vina",
"seed": 42,
}
for k, v in result.metadata.items():
print(f" {k}: {v}")
center: (16.0, 53.0, 15.0) box_size: (20.0, 20.0, 20.0) exhaustiveness: 8 scoring: vina seed: 42
What's next¶
- Batch docking across many ligands: a simple
for ligand in ligands: engine.dock(...)loop. The engine reuses its setup across calls. - Cross-receptor docking: build a list of
Vina()engines (one per receptor) and dock each ligand against all of them. - Integrate with the design loop: see
notebooks/examples/end_to_end_design.ipynbfor an example of combining ESMFold, DSSP, mutations, and docking in a single workflow.
For the receptor and ligand prep that Vina requires up-front, watch
the changelog: meeko integration is on the next-release list.