molforge.ensembles¶
ensembles ¶
Ensemble analysis: weighted statistics over collections of structures.
An ensemble is what you have when one input gives you many outputs:
- A docking run returns many plausible poses.
- An MD trajectory returns many time-snapshots of one system.
- A generative model (RFdiffusion, ProteinMPNN sampling) returns many candidates.
The questions you typically ask of an ensemble are the same regardless of which kind it is:
- How should I weight the members? Boltzmann-weight by score, uniform-weight, or use external weights.
- How diverse is the ensemble? Pairwise-distance statistics tell you whether the sampling actually explored conformational space.
- Which members cluster together? Reduces N noisy samples to k representative groups.
- What does the population look like in space? Density maps, contact frequencies.
- Can I draw a single representative member? Medoid pick or weighted average.
This module focuses on pose ensembles in v1 (i.e. docking output —
the same ligand in N different binding-mode candidates against one
receptor) because that's the immediately useful case. The design
deliberately admits MD trajectories and structural ensembles as
future extensions: boltzmann_weights takes raw scores so it works
for any kind of score, and the clustering / density functions take
generic coordinate arrays alongside the convenience overloads that
take Pose objects.
What's here:
Weighting:
- :func:boltzmann_weights — softmax weights from scores.
- :func:resample — weighted bootstrap.
Geometry:
- :func:pairwise_rmsd — N×N RMSD matrix for ligand heavy atoms.
- :func:pose_diversity — summary statistics over pairwise RMSDs.
Clustering:
- :func:pose_clusters — hierarchical RMSD clustering with medoids.
Spatial:
- :func:binding_site_density — 3D histogram of ligand atom
positions, optionally Boltzmann-weighted.
Consensus:
- :func:consensus_pose — pick a representative (medoid or
weighted-mean) pose.
Limitations of v1:
- Pose RMSD is order-sensitive. Atom orderings are assumed to be consistent across poses (which is true for poses returned from a single docking run by Vina, DiffDock, etc.). For symmetric ligands the resulting RMSD is an upper bound on the true symmetry-aware RMSD; for the common case of asymmetric drug-like ligands this is fine. A symmetry-aware RMSD using RDKit graph matching would be a natural future addition.
- Receptor is fixed. Ensemble functions treat the receptor as a single conformation; poses differ only in ligand placement. Mixed-receptor ensembles (e.g. one ligand across an MD ensemble of receptor frames) need a separate, slightly different surface.
pose_clusters ¶
pose_clusters(
poses: Sequence[Pose],
*,
cutoff: float = 2.0,
heavy_atoms_only: bool = True,
) -> PoseClusteringResult
Hierarchical (average-linkage) clustering of poses by RMSD.
Two poses end up in the same cluster if their average linkage to
each other (= mean pairwise RMSD across the current cluster
membership during agglomeration) stays below cutoff.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
poses
|
Sequence[Pose]
|
Sequence of poses to cluster. Must share an atom
ordering — see :func: |
required |
cutoff
|
float
|
RMSD threshold in Å (default 2.0, a common docking- community value for "same binding mode"). Larger ⇒ fewer, looser clusters; smaller ⇒ more, tighter clusters. |
2.0
|
heavy_atoms_only
|
bool
|
see :func: |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
PoseClusteringResult
|
class: |
PoseClusteringResult
|
class: |
|
PoseClusteringResult
|
RMSD matrix used internally. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
from molforge.ensembles import pose_clusters result = pose_clusters(docking.poses, cutoff=2.0) print(f"{result.n_clusters} distinct binding modes")
Get the medoid of the biggest cluster:¶
biggest = result.clusters[0] representative = docking.poses[biggest.medoid]
consensus_pose ¶
consensus_pose(
poses: Sequence[Pose],
*,
weights: NDArray[floating] | None = None,
method: str = "medoid",
heavy_atoms_only: bool = True,
) -> Pose
Pick or build a single pose representing the ensemble.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
poses
|
Sequence[Pose]
|
Sequence of poses, all with the same atom ordering. |
required |
weights
|
NDArray[floating] | None
|
|
None
|
method
|
str
|
One of:
|
'medoid'
|
heavy_atoms_only
|
bool
|
see :func: |
True
|
Returns:
| Type | Description |
|---|---|
Pose
|
A single :class: |
Pose
|
input poses, returned by reference. For |
Pose
|
new deep-copied :class: |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
from molforge.ensembles import boltzmann_weights, consensus_pose w = boltzmann_weights(docking.poses)
Pick the most "central" high-affinity pose.¶
representative = consensus_pose(docking.poses, weights=w)
binding_site_density ¶
binding_site_density(
poses: Sequence[Pose],
*,
spacing: float = 1.0,
padding: float = 4.0,
weights: NDArray[floating] | None = None,
heavy_atoms_only: bool = True,
origin: NDArray[floating] | None = None,
shape: tuple[int, int, int] | None = None,
) -> DensityGrid
Compute a 3D spatial density of ligand atom positions across the ensemble.
Each ligand atom in each pose contributes weights[i] (default
uniform = 1/n_poses) to the grid cell containing it. Atoms
landing outside the grid are silently discarded; the returned
total_weight tells you how much weight made it in.
By default the grid is auto-sized: it's the axis-aligned bounding
box of all heavy ligand atoms across all poses, padded by
padding Å on each side, with the requested spacing. For
differential / comparative analysis (where you want two ensembles
on the same grid), pass explicit origin and shape.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
poses
|
Sequence[Pose]
|
Sequence of poses. Ligand atoms across all poses determine the bounding box by default. |
required |
spacing
|
float
|
Cubic grid spacing in Å. Default |
1.0
|
padding
|
float
|
Bounding-box padding in Å on each side, applied only
when |
4.0
|
weights
|
NDArray[floating] | None
|
|
None
|
heavy_atoms_only
|
bool
|
see :func: |
True
|
origin
|
NDArray[floating] | None
|
|
None
|
shape
|
tuple[int, int, int] | None
|
|
None
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
DensityGrid
|
class: |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
from molforge.ensembles import boltzmann_weights, binding_site_density weights = boltzmann_weights(docking.poses) grid = binding_site_density(docking.poses, spacing=0.5, weights=weights) hot_spot = np.unravel_index(grid.density.argmax(), grid.shape) grid.coordinate_of(hot_spot) # the most-occupied point in Å array([12.3, 45.6, 78.9], dtype=float32)
pairwise_rmsd ¶
Compute the N×N pairwise heavy-atom RMSD matrix over a pose ensemble.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
poses
|
Sequence[Pose]
|
Sequence of :class: |
required |
heavy_atoms_only
|
bool
|
If |
True
|
Returns:
| Type | Description |
|---|---|
NDArray[float32]
|
A symmetric |
NDArray[float32]
|
|
NDArray[float32]
|
the input units (Å for biology). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If poses don't share an atom ordering (different heavy-atom counts) or if any pose has no atoms. |
Example
from molforge.ensembles import pairwise_rmsd rmsd_matrix = pairwise_rmsd(result.poses) rmsd_matrix[0, 1] # RMSD between poses 0 and 1 2.34
pose_diversity ¶
Summary statistics over the pairwise RMSD distribution of an ensemble.
Use as a quick diagnostic for "did the docking actually explore?" A run where every pose is within 0.5 Å of every other pose probably converged on one binding mode; a run with mean pairwise RMSD > 3 Å found multiple modes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
poses
|
Sequence[Pose]
|
Sequence of poses, as for :func: |
required |
heavy_atoms_only
|
bool
|
see :func: |
True
|
Returns:
| Type | Description |
|---|---|
dict[str, float]
|
A dict with keys: |
dict[str, float]
|
|
dict[str, float]
|
|
dict[str, float]
|
|
dict[str, float]
|
|
dict[str, float]
|
|
dict[str, float]
|
|
dict[str, float]
|
For an ensemble of size 1, all distance statistics are |
Raises:
| Type | Description |
|---|---|
ValueError
|
If poses don't share an atom ordering or any pose has no atoms. |
boltzmann_weights ¶
boltzmann_weights(
scores: Sequence[float]
| NDArray[floating]
| Sequence[Pose],
*,
temperature: float = KT_298K_KCAL_PER_MOL,
lower_is_better: bool = True,
) -> NDArray[np.float64]
Compute Boltzmann weights from a vector of scores.
The returned weights satisfy sum(w) == 1.0 and are computed in
a numerically stable way (subtracting the min/max score before
exponentiating).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scores
|
Sequence[float] | NDArray[floating] | Sequence[Pose]
|
Either a sequence of scalar scores, or a sequence of
:class: |
required |
temperature
|
float
|
The thermal energy |
KT_298K_KCAL_PER_MOL
|
lower_is_better
|
bool
|
If |
True
|
Returns:
| Type | Description |
|---|---|
NDArray[float64]
|
A |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
from molforge.ensembles import boltzmann_weights
Three docking scores in kcal/mol (Vina convention).¶
weights = boltzmann_weights([-9.5, -8.2, -7.1]) weights.sum() 1.0 weights[0] > weights[2] # best score gets highest weight True
resample ¶
resample(
poses: Sequence[Pose],
n_samples: int,
*,
weights: NDArray[floating] | None = None,
rng: Generator | None = None,
replace: bool = True,
) -> list[Pose]
Draw n_samples poses with replacement, optionally weighted.
Useful for downstream uncertainty estimation (bootstrap any pose- derived metric over the resampled population) or for converting a Boltzmann-weighted ensemble into an unweighted population suitable for tools that don't accept weights.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
poses
|
Sequence[Pose]
|
The source ensemble. Order is preserved across calls
with the same |
required |
n_samples
|
int
|
How many draws to return. Must be ≥ 1. |
required |
weights
|
NDArray[floating] | None
|
A |
None
|
rng
|
Generator | None
|
A NumPy |
None
|
replace
|
bool
|
Whether to sample with replacement. Default |
True
|
Returns:
| Type | Description |
|---|---|
list[Pose]
|
A list of |
list[Pose]
|
|
list[Pose]
|
sampling with replacement; objects are not copied (each |
list[Pose]
|
returned reference points at the original pose). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |