Walkthrough 6 — Authoring a plugin¶
molforge ships with several engine wrappers built in (ESMFold,
AlphaFold, Vina, OpenMM, RFdiffusion, ProteinMPNN). But the space of
useful tools is much larger than what one library can wrap. This
walkthrough covers how to make your own tool available through
molforge's interfaces — so a downstream user can write
get("engine", "my_thing")() and have it Just Work.
Three kinds of things can be plugged in:
- Engines — anything with a
predict/generate/simulatemethod. Folding models, docking tools, design tools, custom MD engines. - Parsers — readers for additional file formats not covered by
molforge.io. - Scorers — scoring functions that take a
Protein(orTrajectory, or pair of structures) and return a number.
The infrastructure for all three lives in molforge.plugins.
How plugins are loaded¶
Discovery uses Python entry points (the same mechanism pytest
uses to find plugins). A third-party package declares an entry point
in its pyproject.toml:
[project.entry-points."molforge.plugins"]
my_tool = "my_pkg.molforge_integration:register"
The callable on the right-hand side (my_pkg.molforge_integration:register)
is a function that calls one or more of register_engine /
register_parser / register_score r. molforge calls it once during
discover().
import molforge as mf
print(f"molforge {mf.__version__}")
from molforge.plugins import (
available,
clear,
discover,
get,
register_engine,
register_parser,
register_scorer,
)
molforge 0.0.1
1. Register a custom engine¶
You don't strictly need to ship a separate package to use the registry — you can register from inline Python. This is useful for notebook-bound experiments. Here we'll register a toy "RandomFolder" engine that just predicts a random extended chain for any sequence.
import numpy as np
from molforge.core import AtomArray, Protein
class RandomFolder:
"""Toy folding engine — extended chain along x."""
name = "RandomFolder"
def __init__(self, seed: int = 0) -> None:
self.rng = np.random.default_rng(seed)
def predict(self, sequence: str) -> Protein:
n = len(sequence)
arr = AtomArray(n)
arr.element[:] = "C"
arr.atom_name[:] = "CA"
# Three-letter residue names (placeholder mapping)
arr.residue_name[:] = ["ALA"] * n
arr.residue_id[:] = np.arange(1, n + 1)
arr.chain_id[:] = "A"
arr.coords[:, 0] = np.arange(n) * 3.8 # CA-CA spacing
arr.coords[:, 1:] = self.rng.normal(scale=0.5, size=(n, 2))
return Protein(arr)
# Register a factory — a zero-arg callable that returns a fresh engine
register_engine("random_folder", lambda: RandomFolder(seed=42))
print(f"Registered engines: {available('engine')}")
Registered engines: ['random_folder']
Now any code that knows the engine name can grab the factory and
use the engine — without needing to import RandomFolder directly.
This is what makes the registry useful: the consumer of an engine
doesn't need to know where it came from.
# Grab the factory by name
folder_factory = get("engine", "random_folder")
folder = folder_factory()
predicted = folder.predict("MKTV")
print(f"Predicted Protein: {predicted}")
print(f"Sequence: {predicted.sequence}")
print(f"CA coords[:3]:\n{predicted.atom_array.coords[:3]}")
Predicted Protein: Protein(name='', n_chains=1, n_residues=4, n_atoms=4) Sequence: AAAA CA coords[:3]: [[ 0. 0.15235855 -0.51999205] [ 3.8 0.3752256 0.47028235] [ 7.6 -0.9755176 -0.6510897 ]]
2. Register a custom parser¶
Same pattern. The convention for parsers is to use the file extension as the registration key:
def parse_xyz(path: str) -> Protein:
"""Minimal parser for the XYZ format (a list of element + x/y/z lines)."""
lines = [ln.strip() for ln in open(path) if ln.strip()]
n = int(lines[0])
# Skip the comment line (line 1)
atoms = lines[2 : 2 + n]
arr = AtomArray(n)
for i, atom_line in enumerate(atoms):
element, x, y, z = atom_line.split()
arr.element[i] = element
arr.atom_name[i] = element
arr.residue_name[i] = "UNK"
arr.residue_id[i] = 1
arr.chain_id[i] = "A"
arr.coords[i] = [float(x), float(y), float(z)]
return Protein(arr)
register_parser("xyz", parse_xyz)
print(f"Registered parsers: {available('parser')}")
Registered parsers: ['xyz']
# Use it
import tempfile, os
xyz = '''\
3
demo
C 0.0 0.0 0.0
N 1.5 0.0 0.0
O 3.0 0.0 0.0
'''
with tempfile.NamedTemporaryFile(suffix=".xyz", mode="w", delete=False) as fh:
fh.write(xyz)
fh_path = fh.name
try:
parsed = get("parser", "xyz")(fh_path)
print(f"Parsed: {parsed}")
print(f"Elements: {parsed.atom_array.element.tolist()}")
finally:
os.unlink(fh_path)
Parsed: Protein(name='', n_chains=1, n_residues=1, n_atoms=3) Elements: ['C', 'N', 'O']
3. Register a custom scorer¶
A scorer is any callable that returns a float (or a dict, or whatever your downstream consumers expect). The contract is deliberately loose — molforge doesn't impose a particular signature.
def hydrophobic_fraction(protein: Protein) -> float:
"""Fraction of hydrophobic residues (very crude proxy)."""
hydrophobic = set("AVILMFWY")
seq = protein.sequence.replace("/", "")
if not seq:
return 0.0
return sum(1 for c in seq if c in hydrophobic) / len(seq)
register_scorer("hydrophobic_fraction", hydrophobic_fraction)
print(f"Registered scorers: {available('scorer')}")
# Score the predicted structure from above
score_fn = get("scorer", "hydrophobic_fraction")
print(f"Score: {score_fn(predicted):.3f}")
Registered scorers: ['hydrophobic_fraction'] Score: 1.000
4. Auto-discovery from entry points¶
The inline registration above is great for experiments, but for
distribution you'll want auto-discovery. Your third-party package's
pyproject.toml should declare its entry points:
[project]
name = "my_protein_tool"
[project.entry-points."molforge.plugins"]
my_tool = "my_protein_tool.plugin:register"
my_protein_tool/plugin.py:
from molforge.plugins import register_engine, register_scorer
from my_protein_tool.engines import MyEngine
def register() -> None:
register_engine("my_engine", lambda: MyEngine())
register_scorer("my_score", MyEngine.score)
Then in user code:
from molforge.plugins import discover, get
loaded = discover()
print(loaded) # ["my_tool"]
engine = get("engine", "my_engine")()
discover() is idempotent and tolerant: it walks all installed
entry points under the molforge.plugins group, calling each
registration function. If a plugin fails to load (missing dependency,
buggy import, registration function raises), it's logged-then-skipped
rather than aborting discovery — so one broken plugin can't take down
everything that depends on molforge.
# discover() returns the list of entry points that loaded successfully.
# With no third-party plugins installed in this environment, it just
# returns []. But you can call it safely:
loaded = discover()
print(f"discover() returned: {loaded}")
discover() returned: []
5. When to use plugins (and when not to)¶
The registry is the right answer when:
- You're shipping a tool that wraps another tool, and you want it to be picked up automatically by anyone who has both installed. (Example: a docking tool that integrates with molforge but doesn't want to live inside the molforge repo.)
- You want to keep the dependency graph clean. Plugins let heavy or niche dependencies stay opt-in; only users who install the plugin pay the cost.
Use direct imports (no registry) when:
- The tool is for your own use only.
- You always know the engine name at write-time.
engine = MyEngine()is simpler and more typecheckable thanget("engine", "my_engine")().
A reasonable rule of thumb: registries are for late binding by name. If you're not binding by name, you don't need them.
What's next¶
- See
examples/de_novo_design.ipynbfor how the built-in engines compose into a full design pipeline. - See the registry's source:
src/molforge/plugins/registry.pyis ~60 lines. - For inspiration on what plugins might exist, look at the wrapper
pattern in
src/molforge/wrappers/. A custom engine would follow the same pattern: lazy imports, missing-dep error path, uniform output metadata.