Skip to content

molforge.structure

structure

Structural analysis: superposition, RMSD, contacts, geometry, DSSP, SASA, dihedrals.

Workhorses for analyzing the geometric properties of protein structures and comparing them.

Common entry points
  • :func:rmsd — RMSD between two structures (with optional superposition).
  • :func:superpose — Kabsch / Umeyama optimal rigid-body alignment.
  • :func:contact_map / :func:distance_map — residue-residue contact and distance matrices.
  • :func:residue_contacts — all-atom contacts as a sorted list.
  • :func:radius_of_gyration, :func:centroid, :func:center_of_mass — bulk geometric properties.
  • :func:translate, :func:rotate, :func:center_at_origin — in-place coordinate transforms.
  • :func:dssp / :func:dssp_3state — Kabsch-Sander secondary- structure assignment (8-state and 3-state).
  • :func:sasa / :func:sasa_per_residue / :func:total_sasa — solvent-accessible surface area (Shrake-Rupley).
  • :func:phi / :func:psi / :func:omega / :func:phi_psi_omega / :func:ramachandran / :func:dihedral — backbone dihedral angles.

SuperpositionResult dataclass

SuperpositionResult(
    rotation: NDArray[float64],
    translation: NDArray[float64],
    rmsd: float,
    n_atoms: int,
    mobile_aligned: NDArray[float32],
)

Result of a structural superposition.

Attributes:

Name Type Description
rotation NDArray[float64]

(3, 3) proper rotation matrix.

translation NDArray[float64]

(3,) translation vector.

rmsd float

Root-mean-square deviation of the superposed structures.

n_atoms int

Number of atoms used in the superposition.

mobile_aligned NDArray[float32]

(n_atoms, 3) mobile coords after superposition onto the reference.

contact_map

contact_map(
    protein: Protein,
    *,
    cutoff: float = 8.0,
    atom_choice: AtomChoice = "cb",
    exclude_neighbors: int = 0,
) -> NDArray[np.bool_]

Binary contact map at cutoff Å.

Parameters:

Name Type Description Default
protein Protein

structure to analyze.

required
cutoff float

distance below which residues are in contact (default 8.0 Å, the CASP standard for CB-CB).

8.0
atom_choice AtomChoice

which atom defines the residue position — defaults to "cb" (the field standard); use "ca" for Gly-heavy structures.

'cb'
exclude_neighbors int

Set the diagonal band of width 2*exclude_neighbors + 1 to False. Useful for ignoring trivial sequential contacts; pass 4 to remove the ±4 band (helix neighbors) for instance.

0

Returns:

Type Description
NDArray[bool_]

(n_res, n_res) boolean array; entry [i, j] is True if

NDArray[bool_]

residue i and j are in contact.

distance_map

distance_map(
    protein: Protein, *, atom_choice: AtomChoice = "ca"
) -> NDArray[np.float32]

Compute a residue-by-residue distance map.

Parameters:

Name Type Description Default
protein Protein

structure to analyze.

required
atom_choice AtomChoice

per-residue representative point — "ca", "cb", or the residue centroid ("heavy" / "all").

'ca'

Returns:

Type Description
NDArray[float32]

(n_res, n_res) float32 array of pairwise Euclidean distances

NDArray[float32]

between the representative points.

residue_contacts

residue_contacts(
    protein: Protein,
    *,
    cutoff: float = 5.0,
    chain_a: str | None = None,
    chain_b: str | None = None,
) -> list[tuple[tuple[str, int], tuple[str, int], float]]

List inter-residue contacts at the all-atom level.

Unlike :func:contact_map, this enumerates contacts as triples of ((chain_a, resid_a), (chain_b, resid_b), distance) and uses the "any atom within cutoff" definition.

Parameters:

Name Type Description Default
protein Protein

structure to analyze.

required
cutoff float

distance threshold in Å (default 5.0).

5.0
chain_a str | None

If both chain_a and chain_b are given, only return contacts between those two chains (useful for interface analysis).

None
chain_b str | None

see chain_a.

None

Returns:

Type Description
list[tuple[tuple[str, int], tuple[str, int], float]]

Sorted list of contact tuples.

dihedral

dihedral(
    p1: NDArray[floating],
    p2: NDArray[floating],
    p3: NDArray[floating],
    p4: NDArray[floating],
) -> float

Compute the dihedral angle (in degrees) between four 3D points.

Parameters:

Name Type Description Default
p1 NDArray[floating]

(3,) Cartesian coordinates of the first point.

required
p2 NDArray[floating]

(3,) Cartesian coordinates of the second point.

required
p3 NDArray[floating]

(3,) Cartesian coordinates of the third point.

required
p4 NDArray[floating]

(3,) Cartesian coordinates of the fourth point.

required

Returns:

Type Description
float

Angle in degrees in [-180, 180]. Uses the standard atan2

float

formula which avoids the numerical issues of acos near

float

±1 and naturally captures the sign.

dihedrals_batch

dihedrals_batch(
    quartets: NDArray[floating],
) -> NDArray[np.float64]

Vectorized dihedral over an array of atom quartets.

Parameters:

Name Type Description Default
quartets NDArray[floating]

(N, 4, 3) array of four-atom quartets.

required

Returns:

Type Description
NDArray[float64]

(N,) float64 array of dihedral angles in degrees.

omega

omega(protein: Protein) -> NDArray[np.float64]

ω (omega) angles per residue, degrees, NaN where undefined.

phi

phi(protein: Protein) -> NDArray[np.float64]

φ (phi) angles per residue, degrees, NaN where undefined.

phi_psi_omega

phi_psi_omega(
    protein: Protein,
) -> tuple[
    NDArray[np.float64],
    NDArray[np.float64],
    NDArray[np.float64],
]

Per-residue backbone dihedrals (φ, ψ, ω) in degrees.

Parameters:

Name Type Description Default
protein Protein

structure to analyze.

required

Returns:

Type Description
NDArray[float64]

Three (n_residues,) float64 arrays of φ, ψ, ω values in

NDArray[float64]

degrees. Entries where the angle is undefined (chain termini,

NDArray[float64]

missing backbone atoms) are NaN.

psi

psi(protein: Protein) -> NDArray[np.float64]

ψ (psi) angles per residue, degrees, NaN where undefined.

ramachandran

ramachandran(protein: Protein) -> NDArray[np.float64]

Per-residue (φ, ψ) pairs for Ramachandran-plot construction.

Parameters:

Name Type Description Default
protein Protein

structure to analyze.

required

Returns:

Type Description
NDArray[float64]

(n_residues, 2) float64 array. Rows where either angle is

NDArray[float64]

undefined contain NaNs.

dssp_3state

dssp_3state(protein: Protein) -> str

Return the per-residue 3-state secondary-structure string.

Parameters:

Name Type Description Default
protein Protein

structure to analyze.

required

Returns:

Type Description
str

A string of "H" / "E" / "C" characters, one per

str

residue.

bounding_box

bounding_box(
    protein: Protein,
) -> tuple[NDArray[np.float64], NDArray[np.float64]]

Axis-aligned bounding box of a structure.

Returns:

Type Description
tuple[NDArray[float64], NDArray[float64]]

(min_xyz, max_xyz), each a (3,) float64 array.

center_at_origin

center_at_origin(
    protein: Protein, *, mass_weighted: bool = False
) -> None

Translate the structure so its centroid is at the origin (in place).

center_of_mass

center_of_mass(protein: Protein) -> NDArray[np.float64]

Mass-weighted center of mass. Alias for centroid(mass_weighted=True).

centroid

centroid(
    protein: Protein, *, mass_weighted: bool = False
) -> NDArray[np.float64]

Geometric (or mass-weighted) centroid of a structure.

Parameters:

Name Type Description Default
protein Protein

input structure.

required
mass_weighted bool

If True, weight by atomic mass (i.e. compute the center of mass instead).

False

Returns:

Type Description
NDArray[float64]

(3,) float64 centroid coordinate.

radius_of_gyration

radius_of_gyration(
    protein: Protein, *, mass_weighted: bool = True
) -> float

Radius of gyration — RMS distance from atoms to the center of mass.

A standard compactness metric: smaller Rg means more globular.

Parameters:

Name Type Description Default
protein Protein

input structure.

required
mass_weighted bool

If True (default), use mass-weighted Rg.

True

Returns:

Type Description
float

Radius of gyration in angstroms.

rotate

rotate(
    protein: Protein, rotation: NDArray[floating]
) -> None

Apply a 3x3 rotation in place around the origin.

For a rotation around the centroid, translate to origin first, rotate, then translate back. Use :func:center_at_origin as a helper.

translate

translate(
    protein: Protein, vector: NDArray[floating]
) -> None

Translate protein in place by vector.

Mutates the underlying AtomArray.coords directly — both hierarchical and linear views reflect the change immediately.

rmsd_per_residue

rmsd_per_residue(
    mobile: Protein,
    reference: Protein,
    *,
    subset: AtomSubset = "ca",
    align: bool = True,
) -> NDArray[np.float32]

Per-residue RMSD after (optionally) aligning the structures globally.

Useful for spotting which loops moved between two conformations or where a folding model disagrees with experiment.

Parameters:

Name Type Description Default
mobile Protein

First structure (the one that is moved to align with reference).

required
reference Protein

Second structure; must have the same residue count as mobile.

required
subset AtomSubset

Atom selector for both the global alignment and the per-residue comparison.

'ca'
align bool

Whether to superpose first.

True

Returns:

Type Description
NDArray[float32]

(n_residues,) float32 array of per-residue RMSDs.

rmsd_raw

rmsd_raw(
    a: NDArray[floating], b: NDArray[floating]
) -> float

RMSD between two equal-length coordinate sets, no alignment.

Parameters:

Name Type Description Default
a NDArray[floating]

First (n, 3) coordinate array.

required
b NDArray[floating]

Second (n, 3) coordinate array, same shape as a.

required

Returns:

Type Description
float

Root-mean-square deviation in the input units (Å for biology).

sasa_per_residue

sasa_per_residue(
    protein: Protein,
    *,
    probe_radius: float = 1.4,
    n_sphere_points: int = 100,
) -> NDArray[np.float64]

Per-residue SASA, summed across atoms in each residue.

Parameters:

Name Type Description Default
protein Protein

input structure.

required
probe_radius float

see :func:sasa.

1.4
n_sphere_points int

see :func:sasa.

100

Returns:

Type Description
NDArray[float64]

(n_residues,) float64 array, one SASA value per residue

NDArray[float64]

in array order.

total_sasa

total_sasa(
    protein: Protein,
    *,
    probe_radius: float = 1.4,
    n_sphere_points: int = 100,
) -> float

Total solvent-accessible surface area (Ų).

kabsch_rmsd

kabsch_rmsd(
    mobile: NDArray[floating],
    reference: NDArray[floating],
    *,
    weights: NDArray[floating] | None = None,
) -> float

Return the minimum-RMSD over all rigid-body alignments.

Convenience wrapper around :func:superpose for when you only want the RMSD value.

superpose

superpose(
    mobile: NDArray[floating],
    reference: NDArray[floating],
    *,
    weights: NDArray[floating] | None = None,
) -> SuperpositionResult

Superpose mobile onto reference by optimal rigid-body fit.

Implements the Kabsch / Umeyama algorithm via SVD of the weighted covariance matrix. The returned rotation is guaranteed to be a proper rotation (det = +1), not a reflection.

Parameters:

Name Type Description Default
mobile NDArray[floating]

(n, 3) coordinates to align.

required
reference NDArray[floating]

(n, 3) reference coordinates.

required
weights NDArray[floating] | None

Optional (n,) per-point weights (e.g. inverse-B-factor for X-ray structures, or 1/0 for masking atoms).

None

Returns:

Name Type Description
A SuperpositionResult

class:SuperpositionResult with the rotation, translation,

SuperpositionResult

post-superposition RMSD, and aligned mobile coords.

Raises:

Type Description
ValueError

If shapes mismatch or fewer than 3 atoms are given (degenerate; rotation under-determined).