molforge.validation¶
validation ¶
Cross-validation utilities for protein design.
This subpackage captures the pattern of scoring designs across one
or more validators and combining the results — the natural follow-on
to :mod:molforge.metrics for de novo design workflows.
Three core building blocks:
Criteria — declarative success conditions:
- :class:Criterion — atomic comparison (e.g. pLDDT > 80).
Compose with &, |, ~.
- :class:NamedCriterion — a criterion with a human-readable label.
- :class:CriteriaSet — a named collection of criteria evaluated
together (implicit AND across criteria; pass/fail captured per
criterion for diagnostics).
Verdicts — per-design results:
- :class:Verdict — the metric values, criterion results, and
overall pass/fail for one design.
- :func:rank_verdicts — sort verdicts by score, optionally
filtering to only-passed.
Orchestration:
- :func:cross_validate — run a list of designs through one or
more validators (any callable that returns a metric dict),
apply criteria, return verdicts.
- :func:consensus — merge verdict lists across validators
("ESMFold AND AlphaFold both confirm").
Example::
from molforge.validation import (
cross_validate, consensus, rank_verdicts,
Criterion, CriteriaSet,
)
success = (
CriteriaSet()
.add("plddt_ok", Criterion.gt("esmfold.plddt", 80))
.add("tm_ok", Criterion.gt("esmfold.tm", 0.5))
.add("rmsd_ok", Criterion.lt("esmfold.rmsd", 2.0))
)
verdicts = cross_validate(
designs=sequences,
validators={"esmfold": esmfold_score_fn},
criteria=success,
score_metric="esmfold.plddt",
)
for v in rank_verdicts(verdicts, only_passed=True):
print(v.design_id, v.values["esmfold.plddt"])
CriteriaSet
dataclass
¶
A named collection of criteria evaluated together.
Each criterion is evaluated separately so per-criterion pass/fail
is available in the resulting :class:Verdict. The overall verdict
passes only if all named criteria pass (an implicit AND).
Criterion ¶
Criterion(
evaluate: Callable[[Mapping[str, Any]], bool],
describe: Callable[[], str],
metric_names: frozenset[str],
)
Declarative success criterion: a metric name + comparison + threshold.
Construct via the factory classmethods :meth:gt, :meth:ge,
:meth:lt, :meth:le, :meth:eq, :meth:ne rather than
instantiating directly — they make the intent obvious in the
calling code.
Compose with the standard logical operators::
a & b # both must pass
a | b # either must pass
~a # invert (passes when ``a`` would fail)
metric_names
property
¶
Names of all metrics this criterion references.
evaluate ¶
Return True iff values satisfies this criterion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
values
|
Mapping[str, Any]
|
Dict mapping metric names to their measured values.
Must contain every metric this criterion references
(see :attr: |
required |
Raises:
| Type | Description |
|---|---|
KeyError
|
If a referenced metric is missing from |
NamedCriterion
dataclass
¶
A criterion paired with a human-readable name.
Useful when you want diagnostics that say "design passed
fold_quality but failed solubility" rather than dumping the
full criterion expression.
Verdict
dataclass
¶
Verdict(
design_id: str,
values: dict[str, Any] = dict(),
criteria_results: dict[str, bool] = dict(),
passed: bool = False,
score: float = float("inf"),
metadata: dict[str, Any] = dict(),
)
The result of validating a single design.
Attributes:
| Name | Type | Description |
|---|---|---|
design_id |
str
|
An identifier for the design (sequence string,
file path, index, etc.). Used as the join key in
:func: |
values |
dict[str, Any]
|
Dict of all metric values measured for this design. Keys are metric names; values are typically floats but can be anything criteria evaluate against. |
criteria_results |
dict[str, bool]
|
Dict mapping criterion name to its pass/fail result. |
passed |
bool
|
True iff every criterion passed (the overall verdict). |
score |
float
|
A single scalar used for ranking. Smaller-is-better by
convention (matches ProteinMPNN and most folding-confidence
scores when negated); if you have a larger-is-better metric,
store |
metadata |
dict[str, Any]
|
Engine-specific extras (validator name, runtime, etc.). |
consensus ¶
consensus(
verdict_lists: Mapping[str, Iterable[Verdict]],
*,
mode: str = "all",
threshold: int | None = None,
) -> list[Verdict]
Combine verdicts from multiple validators into a single consensus list.
Each input list is the result of running the same set of designs
through one validator. Designs are joined by design_id, so
every validator must have produced a verdict for every design
(use :func:cross_validate with multiple validators, or run each
separately with consistent IDs).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
verdict_lists
|
Mapping[str, Iterable[Verdict]]
|
Dict mapping validator name to its list of
Verdicts. The dict keys are folded into the merged
|
required |
mode
|
str
|
Consensus rule:
- |
'all'
|
threshold
|
int | None
|
Required when |
None
|
Returns:
| Type | Description |
|---|---|
list[Verdict]
|
A list of consensus :class: |
list[Verdict]
|
with metrics from every validator merged into |
list[Verdict]
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If validator verdict lists disagree on design IDs, or for invalid mode/threshold combos. |
Example::
esm_verdicts = cross_validate(seqs, validators={"esmfold": fn1}, criteria=c)
af_verdicts = cross_validate(seqs, validators={"alphafold": fn2}, criteria=c)
# Only accept designs both folding models agree are good
joint = consensus(
{"esm": esm_verdicts, "af": af_verdicts},
mode="all",
)
cross_validate ¶
cross_validate(
designs: Iterable[Any],
*,
validators: Mapping[str, Validator],
criteria: CriteriaSet,
design_id: Callable[[Any], str] | None = None,
score_metric: str | None = None,
on_error: str = "raise",
) -> list[Verdict]
Run every design through every validator; collect verdicts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
designs
|
Iterable[Any]
|
Iterable of designs (typically a list of Protein / DesignedSequence / sequence strings). |
required |
validators
|
Mapping[str, Validator]
|
Dict mapping validator name to a callable that
consumes a design and returns a dict of metric values.
Validator names are prepended to each metric key so values
from different validators don't collide in the verdict
dict — e.g. |
required |
criteria
|
CriteriaSet
|
The :class: |
required |
design_id
|
Callable[[Any], str] | None
|
Optional function to extract a string ID from each
design. Defaults to |
None
|
score_metric
|
str | None
|
Name of the metric (after qualification) to use
as the sortable :attr: |
None
|
on_error
|
str
|
How to handle exceptions raised by a validator:
.. versionchanged:: 0.2
The default flipped from |
'raise'
|
Returns:
| Name | Type | Description |
|---|---|---|
One |
list[Verdict]
|
class: |
Example::
criteria = (
CriteriaSet()
.add("fold_quality", Criterion.gt("esmfold.plddt", 80))
.add("backbone_match", Criterion.gt("esmfold.tm", 0.5))
.add("rmsd_ok", Criterion.lt("esmfold.rmsd", 2.0))
)
def esmfold_validator(seq):
predicted = esm_engine.predict(seq)
return {
"plddt": predicted.metadata["mean_confidence"],
"tm": tm_score(predicted, target_backbone),
"rmsd": rmsd(predicted, target_backbone, subset="ca"),
}
verdicts = cross_validate(
designs=sequences,
validators={"esmfold": esmfold_validator},
criteria=criteria,
score_metric="esmfold.plddt",
)
for v in rank_verdicts(verdicts, only_passed=True):
print(v.design_id, v.values["esmfold.plddt"])
rank_verdicts ¶
rank_verdicts(
verdicts: Iterable[Verdict],
*,
only_passed: bool = False,
by: str | None = None,
) -> list[Verdict]
Sort verdicts for ranking.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
verdicts
|
Iterable[Verdict]
|
Verdicts to sort. |
required |
only_passed
|
bool
|
If True, drop verdicts that didn't pass before sorting. Useful for "show me the successful designs". |
False
|
by
|
str | None
|
Sort key. |
None
|
Returns:
| Type | Description |
|---|---|
list[Verdict]
|
A new list, sorted. Original order is preserved within |
list[Verdict]
|
equal-score groups (stable sort). |