Walkthrough 6 — Authoring a plugin¶

molforge ships with several engine wrappers built in (ESMFold, AlphaFold, Vina, OpenMM, RFdiffusion, ProteinMPNN). But the space of useful tools is much larger than what one library can wrap. This walkthrough covers how to make your own tool available through molforge's interfaces — so a downstream user can write get("engine", "my_thing")() and have it Just Work.

Three kinds of things can be plugged in:

Engines — anything with a predict / generate / simulate method. Folding models, docking tools, design tools, custom MD engines.
Parsers — readers for additional file formats not covered by molforge.io.
Scorers — scoring functions that take a Protein (or Trajectory, or pair of structures) and return a number.

The infrastructure for all three lives in molforge.plugins.

How plugins are loaded¶

Discovery uses Python entry points (the same mechanism pytest uses to find plugins). A third-party package declares an entry point in its pyproject.toml:

[project.entry-points."molforge.plugins"]
my_tool = "my_pkg.molforge_integration:register"

The callable on the right-hand side (my_pkg.molforge_integration:register) is a function that calls one or more of register_engine / register_parser / register_score r. molforge calls it once during discover().

In [1]:

Copied!





import molforge as mf
print(f"molforge {mf.__version__}")
from molforge.plugins import (
    available,
    clear,
    discover,
    get,
    register_engine,
    register_parser,
    register_scorer,
)
import molforge as mf
print(f"molforge {mf.__version__}")
from molforge.plugins import (
    available,
    clear,
    discover,
    get,
    register_engine,
    register_parser,
    register_scorer,
)

molforge 0.0.1

1. Register a custom engine¶

You don't strictly need to ship a separate package to use the registry — you can register from inline Python. This is useful for notebook-bound experiments. Here we'll register a toy "RandomFolder" engine that just predicts a random extended chain for any sequence.

In [1]:

Copied!





import numpy as np
from molforge.core import AtomArray, Protein

class RandomFolder:
    """Toy folding engine — extended chain along x."""
    name = "RandomFolder"

    def __init__(self, seed: int = 0) -> None:
        self.rng = np.random.default_rng(seed)

    def predict(self, sequence: str) -> Protein:
        n = len(sequence)
        arr = AtomArray(n)
        arr.element[:] = "C"
        arr.atom_name[:] = "CA"
        # Three-letter residue names (placeholder mapping)
        arr.residue_name[:] = ["ALA"] * n
        arr.residue_id[:] = np.arange(1, n + 1)
        arr.chain_id[:] = "A"
        arr.coords[:, 0] = np.arange(n) * 3.8     # CA-CA spacing
        arr.coords[:, 1:] = self.rng.normal(scale=0.5, size=(n, 2))
        return Protein(arr)

# Register a factory — a zero-arg callable that returns a fresh engine
register_engine("random_folder", lambda: RandomFolder(seed=42))

print(f"Registered engines: {available('engine')}")
import numpy as np
from molforge.core import AtomArray, Protein

class RandomFolder:
    """Toy folding engine — extended chain along x."""
    name = "RandomFolder"

    def __init__(self, seed: int = 0) -> None:
        self.rng = np.random.default_rng(seed)

    def predict(self, sequence: str) -> Protein:
        n = len(sequence)
        arr = AtomArray(n)
        arr.element[:] = "C"
        arr.atom_name[:] = "CA"
        # Three-letter residue names (placeholder mapping)
        arr.residue_name[:] = ["ALA"] * n
        arr.residue_id[:] = np.arange(1, n + 1)
        arr.chain_id[:] = "A"
        arr.coords[:, 0] = np.arange(n) * 3.8     # CA-CA spacing
        arr.coords[:, 1:] = self.rng.normal(scale=0.5, size=(n, 2))
        return Protein(arr)

# Register a factory — a zero-arg callable that returns a fresh engine
register_engine("random_folder", lambda: RandomFolder(seed=42))

print(f"Registered engines: {available('engine')}")

Registered engines: ['random_folder']

Now any code that knows the engine name can grab the factory and use the engine — without needing to import RandomFolder directly. This is what makes the registry useful: the consumer of an engine doesn't need to know where it came from.

In [1]:

Copied!





# Grab the factory by name
folder_factory = get("engine", "random_folder")
folder = folder_factory()
predicted = folder.predict("MKTV")
print(f"Predicted Protein: {predicted}")
print(f"Sequence:          {predicted.sequence}")
print(f"CA coords[:3]:\n{predicted.atom_array.coords[:3]}")
# Grab the factory by name
folder_factory = get("engine", "random_folder")
folder = folder_factory()
predicted = folder.predict("MKTV")
print(f"Predicted Protein: {predicted}")
print(f"Sequence:          {predicted.sequence}")
print(f"CA coords[:3]:\n{predicted.atom_array.coords[:3]}")

Predicted Protein: Protein(name='', n_chains=1, n_residues=4, n_atoms=4)
Sequence:          AAAA
CA coords[:3]:
[[ 0.          0.15235855 -0.51999205]
 [ 3.8         0.3752256   0.47028235]
 [ 7.6        -0.9755176  -0.6510897 ]]

2. Register a custom parser¶

Same pattern. The convention for parsers is to use the file extension as the registration key:

In [1]:

Copied!





def parse_xyz(path: str) -> Protein:
    """Minimal parser for the XYZ format (a list of element + x/y/z lines)."""
    lines = [ln.strip() for ln in open(path) if ln.strip()]
    n = int(lines[0])
    # Skip the comment line (line 1)
    atoms = lines[2 : 2 + n]
    arr = AtomArray(n)
    for i, atom_line in enumerate(atoms):
        element, x, y, z = atom_line.split()
        arr.element[i] = element
        arr.atom_name[i] = element
        arr.residue_name[i] = "UNK"
        arr.residue_id[i] = 1
        arr.chain_id[i] = "A"
        arr.coords[i] = [float(x), float(y), float(z)]
    return Protein(arr)

register_parser("xyz", parse_xyz)
print(f"Registered parsers: {available('parser')}")
def parse_xyz(path: str) -> Protein:
    """Minimal parser for the XYZ format (a list of element + x/y/z lines)."""
    lines = [ln.strip() for ln in open(path) if ln.strip()]
    n = int(lines[0])
    # Skip the comment line (line 1)
    atoms = lines[2 : 2 + n]
    arr = AtomArray(n)
    for i, atom_line in enumerate(atoms):
        element, x, y, z = atom_line.split()
        arr.element[i] = element
        arr.atom_name[i] = element
        arr.residue_name[i] = "UNK"
        arr.residue_id[i] = 1
        arr.chain_id[i] = "A"
        arr.coords[i] = [float(x), float(y), float(z)]
    return Protein(arr)

register_parser("xyz", parse_xyz)
print(f"Registered parsers: {available('parser')}")

Registered parsers: ['xyz']

In [1]:

Copied!





# Use it
import tempfile, os
xyz = '''\
3
demo
C  0.0  0.0  0.0
N  1.5  0.0  0.0
O  3.0  0.0  0.0
'''
with tempfile.NamedTemporaryFile(suffix=".xyz", mode="w", delete=False) as fh:
    fh.write(xyz)
    fh_path = fh.name

try:
    parsed = get("parser", "xyz")(fh_path)
    print(f"Parsed: {parsed}")
    print(f"Elements: {parsed.atom_array.element.tolist()}")
finally:
    os.unlink(fh_path)
# Use it
import tempfile, os
xyz = '''\
3
demo
C  0.0  0.0  0.0
N  1.5  0.0  0.0
O  3.0  0.0  0.0
'''
with tempfile.NamedTemporaryFile(suffix=".xyz", mode="w", delete=False) as fh:
    fh.write(xyz)
    fh_path = fh.name

try:
    parsed = get("parser", "xyz")(fh_path)
    print(f"Parsed: {parsed}")
    print(f"Elements: {parsed.atom_array.element.tolist()}")
finally:
    os.unlink(fh_path)

Parsed: Protein(name='', n_chains=1, n_residues=1, n_atoms=3)
Elements: ['C', 'N', 'O']

3. Register a custom scorer¶

A scorer is any callable that returns a float (or a dict, or whatever your downstream consumers expect). The contract is deliberately loose — molforge doesn't impose a particular signature.

In [1]:

Copied!





def hydrophobic_fraction(protein: Protein) -> float:
    """Fraction of hydrophobic residues (very crude proxy)."""
    hydrophobic = set("AVILMFWY")
    seq = protein.sequence.replace("/", "")
    if not seq:
        return 0.0
    return sum(1 for c in seq if c in hydrophobic) / len(seq)

register_scorer("hydrophobic_fraction", hydrophobic_fraction)
print(f"Registered scorers: {available('scorer')}")

# Score the predicted structure from above
score_fn = get("scorer", "hydrophobic_fraction")
print(f"Score: {score_fn(predicted):.3f}")
def hydrophobic_fraction(protein: Protein) -> float:
    """Fraction of hydrophobic residues (very crude proxy)."""
    hydrophobic = set("AVILMFWY")
    seq = protein.sequence.replace("/", "")
    if not seq:
        return 0.0
    return sum(1 for c in seq if c in hydrophobic) / len(seq)

register_scorer("hydrophobic_fraction", hydrophobic_fraction)
print(f"Registered scorers: {available('scorer')}")

# Score the predicted structure from above
score_fn = get("scorer", "hydrophobic_fraction")
print(f"Score: {score_fn(predicted):.3f}")

Registered scorers: ['hydrophobic_fraction']
Score: 1.000

4. Auto-discovery from entry points¶

The inline registration above is great for experiments, but for distribution you'll want auto-discovery. Your third-party package's pyproject.toml should declare its entry points:

[project]
name = "my_protein_tool"

[project.entry-points."molforge.plugins"]
my_tool = "my_protein_tool.plugin:register"

my_protein_tool/plugin.py:

from molforge.plugins import register_engine, register_scorer
from my_protein_tool.engines import MyEngine

def register() -> None:
    register_engine("my_engine", lambda: MyEngine())
    register_scorer("my_score", MyEngine.score)

Then in user code:

from molforge.plugins import discover, get

loaded = discover()
print(loaded)                     # ["my_tool"]
engine = get("engine", "my_engine")()

discover() is idempotent and tolerant: it walks all installed entry points under the molforge.plugins group, calling each registration function. If a plugin fails to load (missing dependency, buggy import, registration function raises), it's logged-then-skipped rather than aborting discovery — so one broken plugin can't take down everything that depends on molforge.

In [1]:

Copied!





# discover() returns the list of entry points that loaded successfully.
# With no third-party plugins installed in this environment, it just
# returns []. But you can call it safely:
loaded = discover()
print(f"discover() returned: {loaded}")
# discover() returns the list of entry points that loaded successfully.
# With no third-party plugins installed in this environment, it just
# returns []. But you can call it safely:
loaded = discover()
print(f"discover() returned: {loaded}")

discover() returned: []

5. When to use plugins (and when not to)¶

The registry is the right answer when:

You're shipping a tool that wraps another tool, and you want it to be picked up automatically by anyone who has both installed. (Example: a docking tool that integrates with molforge but doesn't want to live inside the molforge repo.)
You want to keep the dependency graph clean. Plugins let heavy or niche dependencies stay opt-in; only users who install the plugin pay the cost.

Use direct imports (no registry) when:

The tool is for your own use only.
You always know the engine name at write-time. engine = MyEngine() is simpler and more typecheckable than get("engine", "my_engine")().

A reasonable rule of thumb: registries are for late binding by name. If you're not binding by name, you don't need them.

What's next¶

See examples/de_novo_design.ipynb for how the built-in engines compose into a full design pipeline.
See the registry's source: src/molforge/plugins/registry.py is ~60 lines.
For inspiration on what plugins might exist, look at the wrapper pattern in src/molforge/wrappers/. A custom engine would follow the same pattern: lazy imports, missing-dep error path, uniform output metadata.