molforge.validation¶

validation ¶

Cross-validation utilities for protein design.

This subpackage captures the pattern of scoring designs across one or more validators and combining the results — the natural follow-on to :mod:molforge.metrics for de novo design workflows.

Three core building blocks:

Criteria — declarative success conditions: - :class:Criterion — atomic comparison (e.g. pLDDT > 80). Compose with &, |, ~. - :class:NamedCriterion — a criterion with a human-readable label. - :class:CriteriaSet — a named collection of criteria evaluated together (implicit AND across criteria; pass/fail captured per criterion for diagnostics).

Verdicts — per-design results: - :class:Verdict — the metric values, criterion results, and overall pass/fail for one design. - :func:rank_verdicts — sort verdicts by score, optionally filtering to only-passed.

Orchestration: - :func:cross_validate — run a list of designs through one or more validators (any callable that returns a metric dict), apply criteria, return verdicts. - :func:consensus — merge verdict lists across validators ("ESMFold AND AlphaFold both confirm").

Example::

from molforge.validation import (
    cross_validate, consensus, rank_verdicts,
    Criterion, CriteriaSet,
)

success = (
    CriteriaSet()
    .add("plddt_ok", Criterion.gt("esmfold.plddt", 80))
    .add("tm_ok",    Criterion.gt("esmfold.tm", 0.5))
    .add("rmsd_ok",  Criterion.lt("esmfold.rmsd", 2.0))
)

verdicts = cross_validate(
    designs=sequences,
    validators={"esmfold": esmfold_score_fn},
    criteria=success,
    score_metric="esmfold.plddt",
)

for v in rank_verdicts(verdicts, only_passed=True):
    print(v.design_id, v.values["esmfold.plddt"])

CriteriaSet `dataclass` ¶

CriteriaSet(criteria: dict[str, Criterion] = dict())

A named collection of criteria evaluated together.

Each criterion is evaluated separately so per-criterion pass/fail is available in the resulting :class:Verdict. The overall verdict passes only if all named criteria pass (an implicit AND).

metric_names `property` ¶

metric_names: frozenset[str]

Union of metric names across all criteria.

add ¶

add(name: str, criterion: Criterion) -> CriteriaSet

Add a named criterion. Returns self for chaining.

evaluate ¶

evaluate(values: Mapping[str, Any]) -> dict[str, bool]

Evaluate every criterion against values.

Returns a dict mapping criterion name to its pass/fail result.

passes ¶

passes(values: Mapping[str, Any]) -> bool

True iff every criterion passes.

Criterion ¶

Criterion(
    evaluate: Callable[[Mapping[str, Any]], bool],
    describe: Callable[[], str],
    metric_names: frozenset[str],
)

Declarative success criterion: a metric name + comparison + threshold.

Construct via the factory classmethods :meth:gt, :meth:ge, :meth:lt, :meth:le, :meth:eq, :meth:ne rather than instantiating directly — they make the intent obvious in the calling code.

Compose with the standard logical operators::

a & b   # both must pass
a | b   # either must pass
~a      # invert (passes when ``a`` would fail)

metric_names `property` ¶

metric_names: frozenset[str]

Names of all metrics this criterion references.

gt `classmethod` ¶

gt(metric: str, threshold: float) -> Criterion

metric > threshold.

ge `classmethod` ¶

ge(metric: str, threshold: float) -> Criterion

metric >= threshold.

lt `classmethod` ¶

lt(metric: str, threshold: float) -> Criterion

metric < threshold.

le `classmethod` ¶

le(metric: str, threshold: float) -> Criterion

metric <= threshold.

eq `classmethod` ¶

eq(metric: str, value: Any) -> Criterion

metric == value.

ne `classmethod` ¶

ne(metric: str, value: Any) -> Criterion

metric != value.

evaluate ¶

evaluate(values: Mapping[str, Any]) -> bool

Return True iff values satisfies this criterion.

Parameters:

Name	Type	Description	Default
`values`	`Mapping[str, Any]`	Dict mapping metric names to their measured values. Must contain every metric this criterion references (see :attr:`metric_names`).	required

Raises:

Type	Description
`KeyError`	If a referenced metric is missing from `values`. Missing metrics are treated as a programming error, not a failed criterion; if you want "missing = fail", filter upstream or use `None` explicitly (`None` always fails an atomic comparison).

NamedCriterion `dataclass` ¶

NamedCriterion(
    name: str, criterion: Criterion, description: str = ""
)

A criterion paired with a human-readable name.

Useful when you want diagnostics that say "design passed fold_quality but failed solubility" rather than dumping the full criterion expression.

Verdict `dataclass` ¶

Verdict(
    design_id: str,
    values: dict[str, Any] = dict(),
    criteria_results: dict[str, bool] = dict(),
    passed: bool = False,
    score: float = float("inf"),
    metadata: dict[str, Any] = dict(),
)

The result of validating a single design.

Attributes:

Name	Type	Description
`design_id`	`str`	An identifier for the design (sequence string, file path, index, etc.). Used as the join key in :func:`consensus` across validators.
`values`	`dict[str, Any]`	Dict of all metric values measured for this design. Keys are metric names; values are typically floats but can be anything criteria evaluate against.
`criteria_results`	`dict[str, bool]`	Dict mapping criterion name to its pass/fail result.
`passed`	`bool`	True iff every criterion passed (the overall verdict).
`score`	`float`	A single scalar used for ranking. Smaller-is-better by convention (matches ProteinMPNN and most folding-confidence scores when negated); if you have a larger-is-better metric, store `-value` here.
`metadata`	`dict[str, Any]`	Engine-specific extras (validator name, runtime, etc.).

failed_criteria `property` ¶

failed_criteria: list[str]

Names of criteria that failed (empty if everything passed).

passed_criteria `property` ¶

passed_criteria: list[str]

Names of criteria that passed.

consensus ¶

consensus(
    verdict_lists: Mapping[str, Iterable[Verdict]],
    *,
    mode: str = "all",
    threshold: int | None = None,
) -> list[Verdict]

Combine verdicts from multiple validators into a single consensus list.

Each input list is the result of running the same set of designs through one validator. Designs are joined by design_id, so every validator must have produced a verdict for every design (use :func:cross_validate with multiple validators, or run each separately with consistent IDs).

Parameters:

Name	Type	Description	Default
`verdict_lists`	`Mapping[str, Iterable[Verdict]]`	Dict mapping validator name to its list of Verdicts. The dict keys are folded into the merged `values` so per-validator metrics stay distinguishable.	required
`mode`	`str`	Consensus rule: - `"all"` (default): every validator must mark the design as passed. - `"any"`: any validator suffices. - `"majority"`: more than half of validators must pass. - `"threshold"`: at least `threshold` validators must pass (provide `threshold`).	`'all'`
`threshold`	`int \| None`	Required when `mode="threshold"`; the minimum number of validators that must mark the design as passed.	`None`

Returns:

Type	Description
`list[Verdict]`	A list of consensus :class:`Verdict` instances, one per design,
`list[Verdict]`	with metrics from every validator merged into `values` and
`list[Verdict]`	`passed` set per the chosen rule.

Raises:

Type	Description
`ValueError`	If validator verdict lists disagree on design IDs, or for invalid mode/threshold combos.

Example::

esm_verdicts = cross_validate(seqs, validators={"esmfold": fn1}, criteria=c)
af_verdicts  = cross_validate(seqs, validators={"alphafold": fn2}, criteria=c)

# Only accept designs both folding models agree are good
joint = consensus(
    {"esm": esm_verdicts, "af": af_verdicts},
    mode="all",
)

cross_validate ¶

cross_validate(
    designs: Iterable[Any],
    *,
    validators: Mapping[str, Validator],
    criteria: CriteriaSet,
    design_id: Callable[[Any], str] | None = None,
    score_metric: str | None = None,
    on_error: str = "raise",
) -> list[Verdict]

Run every design through every validator; collect verdicts.

Parameters:

Name	Type	Description	Default
`designs`	`Iterable[Any]`	Iterable of designs (typically a list of Protein / DesignedSequence / sequence strings).	required
`validators`	`Mapping[str, Validator]`	Dict mapping validator name to a callable that consumes a design and returns a dict of metric values. Validator names are prepended to each metric key so values from different validators don't collide in the verdict dict — e.g. `validators={"esmfold": fn}` produces keys like `"esmfold.plddt"`.	required
`criteria`	`CriteriaSet`	The :class:`CriteriaSet` to apply. Should reference metric keys in their fully-qualified form (`"esmfold.plddt"` not just `"plddt"`).	required
`design_id`	`Callable[[Any], str] \| None`	Optional function to extract a string ID from each design. Defaults to `str(design)` truncated to 60 chars.	`None`
`score_metric`	`str \| None`	Name of the metric (after qualification) to use as the sortable :attr:`Verdict.score`. If `None`, the score is the count of failed criteria (so passed designs sort to the front).	`None`
`on_error`	`str`	How to handle exceptions raised by a validator: `"raise"` (default): propagate the exception immediately. This is the default because a validator that throws is almost always a bug — a misconfigured engine, a missing dependency, a bad input — and silently swallowing it produces a full list of `passed=False` verdicts that looks like a real result. Failing loud surfaces the problem at once. `"record"`: catch the exception, record it under `verdict.metadata["validator_errors"]`, mark that verdict `passed=False`, and carry on to the next design. Opt into this when you genuinely want a large batch to survive individual validator failures (e.g. an overnight screen where one bad design shouldn't abort the run). .. versionchanged:: 0.2 The default flipped from `"record"` to `"raise"`. Code that relied on the old fault-tolerant default must now pass `on_error="record"` explicitly.	`'raise'`

Returns:

Name	Type	Description
`One`	`list[Verdict]`	class:`Verdict` per design, in input order.

Example::

criteria = (
    CriteriaSet()
    .add("fold_quality", Criterion.gt("esmfold.plddt", 80))
    .add("backbone_match", Criterion.gt("esmfold.tm", 0.5))
    .add("rmsd_ok", Criterion.lt("esmfold.rmsd", 2.0))
)

def esmfold_validator(seq):
    predicted = esm_engine.predict(seq)
    return {
        "plddt": predicted.metadata["mean_confidence"],
        "tm":    tm_score(predicted, target_backbone),
        "rmsd":  rmsd(predicted, target_backbone, subset="ca"),
    }

verdicts = cross_validate(
    designs=sequences,
    validators={"esmfold": esmfold_validator},
    criteria=criteria,
    score_metric="esmfold.plddt",
)

for v in rank_verdicts(verdicts, only_passed=True):
    print(v.design_id, v.values["esmfold.plddt"])

rank_verdicts ¶

rank_verdicts(
    verdicts: Iterable[Verdict],
    *,
    only_passed: bool = False,
    by: str | None = None,
) -> list[Verdict]

Sort verdicts for ranking.

Parameters:

Name	Type	Description	Default
`verdicts`	`Iterable[Verdict]`	Verdicts to sort.	required
`only_passed`	`bool`	If True, drop verdicts that didn't pass before sorting. Useful for "show me the successful designs".	`False`
`by`	`str \| None`	Sort key. `None` (default) sorts by `verdict.score` ascending (lower = better). Otherwise sort by the named metric value in `verdict.values`, ascending.	`None`

Returns:

Type	Description
`list[Verdict]`	A new list, sorted. Original order is preserved within
`list[Verdict]`	equal-score groups (stable sort).

molforge.validation¶

validation ¶

CriteriaSet dataclass ¶

metric_names property ¶

add ¶

evaluate ¶

passes ¶

Criterion ¶

metric_names property ¶

gt classmethod ¶

ge classmethod ¶

lt classmethod ¶

le classmethod ¶

eq classmethod ¶

ne classmethod ¶

evaluate ¶

NamedCriterion dataclass ¶

Verdict dataclass ¶

failed_criteria property ¶

passed_criteria property ¶

consensus ¶

cross_validate ¶

rank_verdicts ¶

CriteriaSet `dataclass` ¶

metric_names `property` ¶

metric_names `property` ¶

gt `classmethod` ¶

ge `classmethod` ¶

lt `classmethod` ¶

le `classmethod` ¶

eq `classmethod` ¶

ne `classmethod` ¶

NamedCriterion `dataclass` ¶

Verdict `dataclass` ¶

failed_criteria `property` ¶

passed_criteria `property` ¶