Walkthrough 4 — Docking with AutoDock Vina¶

This notebook shows how to dock a small molecule against a protein target using molforge.wrappers.docking.Vina. The wrapper handles the parts of Vina that are most tedious in raw form: invocation, PDBQT output parsing, and conversion of results back into a molforge-native DockingResult.

Receptor and ligand preparation is handled automatically as of molforge v0.0.4 — meeko and RDKit are invoked transparently when you pass a PDB, SDF, MOL2, or SMILES input. The example below uses an SDF ligand and a PDB receptor; molforge does the conversion to PDBQT under the hood. Install the extra:

pip install 'molforge[docking]' meeko

If you already have prepared .pdbqt files (from a previous meeko run or from AutoDockTools), molforge uses them as-is — no reprocessing.

In [1]:

Copied!

import molforge as mf
from molforge.wrappers.docking import Vina
print(f"molforge {mf.__version__}")
import molforge as mf
from molforge.wrappers.docking import Vina
print(f"molforge {mf.__version__}")

molforge 0.0.1

1. Construct the engine¶

Constructing a Vina() instance does not load anything heavy. The wrapper lazily imports the vina package the first time you call .dock(). That means you can build the engine up-front, share it across docking runs against different receptors, and never pay the cost if you don't actually dock anything.

In [1]:

Copied!





engine = Vina(
    scoring="vina",     # or "vinardo"
    seed=42,            # reproducibility
    cpu=0,              # 0 = use all available threads
    verbosity=0,
)
print(engine)
engine = Vina(
    scoring="vina",     # or "vinardo"
    seed=42,            # reproducibility
    cpu=0,              # 0 = use all available threads
    verbosity=0,
)
print(engine)

Vina()

Vina searches within a 3-D box. The most important parameter is center — pick the wrong one and you'll dock to nowhere. For a known binding site, the center of the apo-pocket residues is a good default. For a blind docking, center on the protein centroid with a much larger box.

In [1]:

Copied!





# Example: dock against the binding pocket of PDB 1IEP
# (Imatinib-bound Abl kinase). The known pocket center is roughly
# (16, 53, 15). A 20 Å cube comfortably covers a typical drug-sized
# pocket.
center = (16.0, 53.0, 15.0)
box_size = (20.0, 20.0, 20.0)
print(f"Search box: center {center}, size {box_size} Å")
# Example: dock against the binding pocket of PDB 1IEP
# (Imatinib-bound Abl kinase). The known pocket center is roughly
# (16, 53, 15). A 20 Å cube comfortably covers a typical drug-sized
# pocket.
center = (16.0, 53.0, 15.0)
box_size = (20.0, 20.0, 20.0)
print(f"Search box: center {center}, size {box_size} Å")

Search box: center (16.0, 53.0, 15.0), size (20.0, 20.0, 20.0) Å

3. Dock the ligand¶

result = engine.dock(
    receptor="data/1iep_receptor.pdbqt",
    ligand="data/imatinib.pdbqt",
    center=center,
    box_size=box_size,
    exhaustiveness=8,    # Vina default — bump to 16/32 for hard cases
    n_poses=9,           # how many distinct poses to return
)

A typical Vina run for a 20 Å pocket with exhaustiveness=8 takes 10–30 seconds on a modern multi-core CPU.

4. Inspect the results¶

DockingResult is iterable and sorted best-first (lowest score wins — Vina returns negative kcal/mol, so −9.0 is better than −7.5).

In [1]:

Copied!





# Demo: parse a synthetic Vina output to show the result API.
# In real use, `result` comes from engine.dock(...) above.
synthetic_vina_output = '''\
MODEL 1
REMARK VINA RESULT:    -9.200    0.000    0.000
ATOM      1  C   LIG A   1       1.234   2.345   3.456  1.00  0.00           C
ATOM      2  N   LIG A   1       2.234   3.345   4.456  1.00  0.00           N
ENDMDL
MODEL 2
REMARK VINA RESULT:    -8.700    0.812    1.456
ATOM      1  C   LIG A   1       1.500   2.500   3.600  1.00  0.00           C
ATOM      2  N   LIG A   1       2.500   3.500   4.600  1.00  0.00           N
ENDMDL
MODEL 3
REMARK VINA RESULT:    -8.100    1.345    2.234
ATOM      1  C   LIG A   1       1.700   2.700   3.800  1.00  0.00           C
ATOM      2  N   LIG A   1       2.700   3.700   4.800  1.00  0.00           N
ENDMDL
'''
result = engine._parse_poses_pdbqt(synthetic_vina_output)

print(f"Number of poses: {len(result)}")
print(f"Engine:          {result.engine}")
print()
print(f"{'rank':>4}  {'score (kcal/mol)':>16}  {'RMSD vs best':>13}")
for pose in result:
    print(f"{pose.rank:>4}  {pose.score:>16.2f}  {pose.rmsd_lb or 0.0:>13.3f}")
# Demo: parse a synthetic Vina output to show the result API.
# In real use, `result` comes from engine.dock(...) above.
synthetic_vina_output = '''\
MODEL 1
REMARK VINA RESULT:    -9.200    0.000    0.000
ATOM      1  C   LIG A   1       1.234   2.345   3.456  1.00  0.00           C
ATOM      2  N   LIG A   1       2.234   3.345   4.456  1.00  0.00           N
ENDMDL
MODEL 2
REMARK VINA RESULT:    -8.700    0.812    1.456
ATOM      1  C   LIG A   1       1.500   2.500   3.600  1.00  0.00           C
ATOM      2  N   LIG A   1       2.500   3.500   4.600  1.00  0.00           N
ENDMDL
MODEL 3
REMARK VINA RESULT:    -8.100    1.345    2.234
ATOM      1  C   LIG A   1       1.700   2.700   3.800  1.00  0.00           C
ATOM      2  N   LIG A   1       2.700   3.700   4.800  1.00  0.00           N
ENDMDL
'''
result = engine._parse_poses_pdbqt(synthetic_vina_output)

print(f"Number of poses: {len(result)}")
print(f"Engine:          {result.engine}")
print()
print(f"{'rank':>4}  {'score (kcal/mol)':>16}  {'RMSD vs best':>13}")
for pose in result:
    print(f"{pose.rank:>4}  {pose.score:>16.2f}  {pose.rmsd_lb or 0.0:>13.3f}")

Number of poses: 3
Engine:          Vina

rank   score (kcal/mol)   RMSD vs best
   0             -9.20          0.000
   1             -8.70          0.812
   2             -8.10          1.345

5. Work with the top pose¶

Each pose's ligand attribute is a Protein — the same data model you've been using everywhere else. That means you can run any of molforge's structural analyses on docked poses directly: RMSD between top-2 poses, centroid distance, contacts to the receptor, etc.

In [1]:

Copied!





best = result.best
print(f"Top pose score: {best.score:.2f} kcal/mol")
print(f"Ligand atoms:   {best.ligand.n_atoms}")
print(f"Ligand coords:")
print(best.ligand.atom_array.coords)
best = result.best
print(f"Top pose score: {best.score:.2f} kcal/mol")
print(f"Ligand atoms:   {best.ligand.n_atoms}")
print(f"Ligand coords:")
print(best.ligand.atom_array.coords)

Top pose score: -9.20 kcal/mol
Ligand atoms:   2
Ligand coords:
[[1.234 2.345 3.456]
 [2.234 3.345 4.456]]

In [1]:

Copied!





# Compare pose 1 to the best pose
import numpy as np
from molforge.structure import rmsd_raw

best_coords = result.poses[0].ligand.atom_array.coords
pose2_coords = result.poses[1].ligand.atom_array.coords

# Both poses have the same atom order/count — that's a Vina guarantee
pose_rmsd = rmsd_raw(best_coords, pose2_coords)
print(f"RMSD between top and 2nd pose: {pose_rmsd:.3f} Å")
print(f"(This matches Vina's reported rmsd_lb={result.poses[1].rmsd_lb:.3f} for sanity)")
# Compare pose 1 to the best pose
import numpy as np
from molforge.structure import rmsd_raw

best_coords = result.poses[0].ligand.atom_array.coords
pose2_coords = result.poses[1].ligand.atom_array.coords

# Both poses have the same atom order/count — that's a Vina guarantee
pose_rmsd = rmsd_raw(best_coords, pose2_coords)
print(f"RMSD between top and 2nd pose: {pose_rmsd:.3f} Å")
print(f"(This matches Vina's reported rmsd_lb={result.poses[1].rmsd_lb:.3f} for sanity)")

RMSD between top and 2nd pose: 0.300 Å
(This matches Vina's reported rmsd_lb=0.812 for sanity)

6. Run metadata¶

DockingResult.metadata carries the search parameters used for the run — useful for reproducibility and for distinguishing replicate runs in a screening pipeline.

In [1]:

Copied!





# In a real run, this dict is populated by engine.dock(...).
# We can attach it manually for the demo:
result.metadata = {
    "center": center,
    "box_size": box_size,
    "exhaustiveness": 8,
    "scoring": "vina",
    "seed": 42,
}
for k, v in result.metadata.items():
    print(f"  {k}: {v}")
# In a real run, this dict is populated by engine.dock(...).
# We can attach it manually for the demo:
result.metadata = {
    "center": center,
    "box_size": box_size,
    "exhaustiveness": 8,
    "scoring": "vina",
    "seed": 42,
}
for k, v in result.metadata.items():
    print(f"  {k}: {v}")

  center: (16.0, 53.0, 15.0)
  box_size: (20.0, 20.0, 20.0)
  exhaustiveness: 8
  scoring: vina
  seed: 42

What's next¶

Batch docking across many ligands: a simple for ligand in ligands: engine.dock(...) loop. The engine reuses its setup across calls.
Cross-receptor docking: build a list of Vina() engines (one per receptor) and dock each ligand against all of them.
Integrate with the design loop: see notebooks/examples/end_to_end_design.ipynb for an example of combining ESMFold, DSSP, mutations, and docking in a single workflow.

For the receptor and ligand prep that Vina requires up-front, watch the changelog: meeko integration is on the next-release list.

Walkthrough 4 — Docking with AutoDock Vina¶

1. Construct the engine¶

2. Specify the search box¶

3. Dock the ligand¶

4. Inspect the results¶

5. Work with the top pose¶

6. Run metadata¶

What's next¶