Exposure Dynamics Analysis

Understand how your polymer interacts with transiently exposed protein residues using dynamic SASA-based enrichment and chaperone event analysis. This module answers two complementary questions that static contact analysis cannot:

“Does my polymer preferentially bind certain amino acid classes at the moment they are surface-exposed, controlling for instantaneous surface availability?”
“Does the polymer act as a molecular chaperone — providing contact during transient exposure events that would otherwise be unassisted?”

Note

Exposure dynamics analysis builds on contacts analysis and uses SASA computed per-frame from MD trajectories, not from a static PDB. Run contacts analysis first, then enable exposure dynamics in your comparison configuration.

Warning

Exposure dynamics is currently labeled experimental in the presentation release. The command remains available and plots are still produced, but CLI output and figures now carry explicit experimental labeling because the metric definitions and interpretation may still change.

Scientific Motivation

Why static analysis is insufficient

Standard binding preference analysis (see Binding Preference Analysis) uses SASA from a single static PDB to define which residues are “surface accessible.” This works well for residues that are consistently exposed, but misses a crucial class of events: transient exposure.

A residue that is buried 90% of the time but exposes briefly may be the most biologically relevant interaction site — or it may represent a mechanically vulnerable region that the polymer can protect or destabilize. A static SASA filter either includes it (at a permissive threshold) or excludes it entirely (at a stringent threshold), conflating residues with fundamentally different dynamics.

Dynamic enrichment solves this by computing the enrichment frame by frame, conditioning on instantaneous surface availability. A residue that exposes in only 10% of frames contributes to the enrichment only in those 10% of frames.

Why chaperone fraction captures something enrichment cannot

Dynamic enrichment tells you whether the polymer preferentially contacts aromatic residues when they are exposed — but it does not tell you whether the polymer is present during exposure events or contacts residues after they are already exposed and re-expose repeatedly.

Chaperone fraction asks a different question: of all the exposure episodes a residue undergoes (contiguous stretches of frames where it is exposed), what fraction have polymer contact during the episode? A high chaperone fraction for a transient residue suggests the polymer is co-localizing during the exposure window, not merely making incidental contact at stable surface sites.

The Two Metrics

Dynamic enrichment

For each polymer type \(P\) and amino-acid group \(G\) (e.g., “aromatic”), the enrichment is computed frame by frame over the full trajectory:

\[ \begin{align}\begin{aligned}\text{For each frame } t\text{:}\\\quad n^G_{\text{exp}}(t) = \left|\{i \in G \mid \text{rel\_SASA}(t,i) > \theta\}\right|\\\quad n_{\text{exp}}(t) = \left|\{i \mid \text{rel\_SASA}(t,i) > \theta\}\right|\\\quad n^G_{\text{con}}(t) = \left|\{i \in G \mid \text{rel\_SASA}(t,i) > \theta \text{ AND } \text{contact}_P(t,i)\}\right|\\\quad \text{observed}(t) = \frac{n^G_{\text{con}}(t)}{n^G_{\text{exp}}(t)} \quad (\text{skip frame if } n^G_{\text{exp}}(t) = 0)\\\quad \text{expected}(t) = \frac{n^G_{\text{exp}}(t)}{n_{\text{exp}}(t)}\end{aligned}\end{align} \]

\[\text{Enrichment} = \frac{\overline{\text{observed}}}{\overline{\text{expected}}} - 1\]

where \(\theta\) is the exposure threshold (default 0.20), overbars denote the mean over frames where the group has at least one exposed residue, and \(\text{contact}_P(t,i)\) is True if any atom of polymer type \(P\) is within the contact cutoff of residue \(i\) at frame \(t\).

Plain-English interpretation: An enrichment of \(+6.94\) for aromatic residues means that, on average across all frames, polymer type \(P\) contacted aromatic residues 7.94× more often than their instantaneous share of the exposed surface would predict if contacts were distributed proportionally.

Enrichment	Meaning
> 0	Preferential contact — polymer contacts this group more than surface share predicts
= 0	Neutral — contact rate matches instantaneous surface availability
< 0	Avoidance — polymer contacts this group less than surface share predicts
= -1	Complete avoidance — no contacts with this group observed

Important

Enrichment is not a binding free energy. It is a conditional contact frequency ratio. A large positive value means preferential co-localization, not strong thermodynamic affinity. Do not use phrases like “binding energy” or “affinity” when reporting enrichment values.

Dynamic enrichment vs. static binding preference

Both metrics answer different questions. Choose the right one for your scientific question:

Aspect	Dynamic enrichment	Static binding preference
SASA baseline	Per-frame instantaneous exposure	Single static PDB
Residue eligibility	Changes every frame	Fixed at analysis setup
Normalization	Instantaneous exposed fraction	Time-averaged surface fraction
Best for	Residues with variable exposure; detecting transient-site preference	Stable surface residues; overall polymer chemistry characterization
Typical magnitude	Can be large (>5) for transiently exposed groups	Usually −1 to +2
Equilibration window	Included in frame count (see Limitations)	Not applicable

Both analyses are independently valid. Do not substitute one for the other when interpreting results.

Chaperone fraction

Residue stability classification

Before computing chaperone fractions, each residue is classified by its exposure fraction — the proportion of trajectory frames where its relative SASA exceeds the threshold \(\theta\):

Classification	Condition	Meaning
`stably_exposed`	exposure_fraction ≥ `transient_upper` (default 0.80)	Consistently on the surface
`transient`	`transient_lower` < exposure_fraction < `transient_upper`	Fluctuates between buried and exposed
`stably_buried`	exposure_fraction ≤ `transient_lower` (default 0.20)	Consistently buried

Only transient residues are used for condition-level chaperone fraction reporting, because chaperone events are only meaningful for residues that genuinely fluctuate.

Event detection

For each residue, the trajectory is segmented into exposure windows: maximal contiguous runs of frames where the residue is exposed (relative SASA > \(\theta\)). Windows shorter than min_event_length frames are discarded to remove single-frame SASA noise.

Each exposure window is then classified:

Chaperone event: at least one polymer contact occurs during the window
Unassisted event: no polymer contact occurs during the window

Per-residue chaperone fraction

\[\text{chaperone\_fraction} = \frac{n_{\text{chaperone events}}}{n_{\text{chaperone events}} + n_{\text{unassisted events}}}\]

A value of 1.0 means every exposure window for this residue had polymer contact; 0.0 means none did.

Condition-level chaperone fraction

The reported condition-level value is the mean chaperone fraction over all transient residues in a replicate:

\[\text{condition chaperone fraction} = \frac{1}{|R_{\text{transient}}|} \sum_{i \in R_{\text{transient}}} \text{chaperone\_fraction}_i\]

This value is computed per replicate; the mean ± SEM across replicates is reported in comparisons.

Note

What “transient” means in the JSON output: In exposure_dynamics.json, each residue has a stability field. Residues with "stability": "transient" are those with exposure fraction between transient_lower and transient_upper. The condition-level chaperone fraction is numpy.mean() over exactly these residues.

Known Assumptions and Limitations

Important

Read this section before reporting results. These are not caveats to hide — they are properties of the analysis that affect interpretation. Understanding them will help you explain your results to reviewers.

Exposure threshold is 0.20 by default

The exposure threshold \(\theta = 0.20\) means a residue is classified as exposed when its relative SASA (computed with MDTraj shrake_rupley, protein-only, chain A) exceeds 20% of the Tien et al. 2013 maximum ASA for that residue type.

At this threshold, a typical enzyme will have ~80–100 exposed residues per frame. Lowering the threshold to 0.10 includes more partially buried residues (larger denominator, smaller enrichment magnitudes); raising to 0.30 restricts to only highly exposed residues (smaller denominator, more variable enrichments).

The threshold is stored in sasa_metadata.json alongside each SASA cache. If you change the threshold, you must recompute:

polyzymd compare exposure -f comparison.yaml --recompute-sasa

Comparing enrichment values computed with different thresholds is not valid.

Contact frame indices are absolute (no offset)

Contact event start_frame values in contacts_rep*.json are absolute trajectory frame indices, numbered from 0 at the start of the full trajectory. They are not relative to the beginning of the contact analysis window or the end of equilibration.

When cross-referencing SASA data manually:

# CORRECT: use contact frame index directly as SASA row
sasa_at_contact = sasa_matrix[contact_event.start_frame]

# WRONG: do NOT add any offset
sasa_at_contact = sasa_matrix[contact_event.start_frame + equilibration_offset]  # Wrong!

This means frame 0 of sasa_trajectory.npz corresponds to absolute frame 0 of the MD trajectory (the first frame of equilibration, if equilibration frames were saved to the trajectory file).

Equilibration frames are included in SASA but not in contacts

SASA is computed over all saved trajectory frames, including any equilibration period at the start. Contact analysis may begin at a later frame (stored in contacts_rep*.json as start_frame). During the equilibration window, contact arrays are all-False — no contact events are recorded.

Effect on enrichment: Equilibration frames contribute to \(n_{\text{exp}}(t)\) and \(n^G_{\text{exp}}(t)\) in the denominator of the expected fraction, but never to \(n^G_{\text{con}}(t)\). This slightly deflates mean_observed relative to a post-equilibration-only analysis. In the reference dataset (38 equilibration frames out of 322 total, ~12%), the effect is:

Systematic: affects all conditions identically
Consistent: relative comparisons between conditions are unaffected
Disclosed: sasa_metadata.json records n_frames (total) and the contacts file records start_frame (equilibration end)

If you want post-equilibration-only enrichment, you can slice the SASA array manually — see Independent Verification below.

Contact cutoff is stored in the contacts cache

The enrichment and chaperone fraction are both computed from the same cached contact events. The cutoff used is recorded in each contacts_rep*.json under criteria_cutoff. If you re-run contact analysis with a different cutoff, you must also recompute exposure dynamics to maintain consistency.

Note

The default contact cutoff in analysis_settings.contacts.cutoff is 4.5 Å, but cached results from earlier runs may use 4.0 Å. Always check criteria_cutoff in the contacts JSON if you suspect a mismatch. See Troubleshooting for details.

No causation or free-energy claims

Dynamic enrichment is a conditional contact frequency ratio, not a binding free energy or interaction enthalpy.
Chaperone fraction is an event co-occurrence rate, not a mechanistic claim that the polymer caused or prevented folding.
Use language like “SBM co-localizes with aromatic residues during exposure events” rather than “SBM stabilizes aromatic residues.”

Quick Start

Enable exposure dynamics in comparison.yaml

YAML (Recommended)

# comparison.yaml
name: "Exposure Dynamics Study"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../no_polymer/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../sbma_100/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../egma_100/config.yaml"
    replicates: [1, 2, 3]

analysis_settings:
  contacts:
    name: "polymer_contacts"
    polymer_selection: "chainID C"
    protein_selection: "protein"
    cutoff: 4.5

  exposure:
    exposure_threshold: 0.20        # Relative SASA threshold for exposed/buried
    transient_lower: 0.20           # Below this → stably buried
    transient_upper: 0.80           # Above this → stably exposed
    min_event_length: 1             # Minimum frames for an exposure window
    protein_chain: "A"              # Chain letter for protein
    # polymer_resnames: ["SBM", "EGM"]  # Optional: subset of polymer types

Then run:

polyzymd compare exposure -f comparison.yaml

Python

from pathlib import Path
from polyzymd.analysis.sasa.trajectory import SASATrajectoryResult
from polyzymd.analysis.contacts.results import ContactResult
from polyzymd.analysis.exposure.enrichment import compute_chaperone_enrichment
from polyzymd.analysis.exposure.dynamics import analyze_exposure_dynamics
from polyzymd.analysis.exposure.config import ExposureConfig

# Load per-frame SASA (computed previously)
sasa_result = SASATrajectoryResult.load("analysis/rep1/sasa/")

# Load contact analysis result
contact_result = ContactResult.load("analysis/contacts/contacts_rep1.json")

# Configure exposure analysis
config = ExposureConfig(
    transient_lower=0.20,
    transient_upper=0.80,
    min_event_length=1,
    polymer_resnames=["SBM", "EGM"],  # or leave empty for all types
)

# Compute exposure dynamics (stability + chaperone events)
dynamics = analyze_exposure_dynamics(
    sasa_result=sasa_result,
    contact_result=contact_result,
    config=config,
    analysis_dir=Path("analysis/rep1/"),
)

print(f"Transient residues: {dynamics.n_transient()}")
print(f"Chaperone events:   {dynamics.total_chaperone_events()}")
print(f"Unassisted events:  {dynamics.total_unassisted_events()}")

# Condition-level chaperone fraction (mean over transient residues)
transient = dynamics.transient_residues()
mean_chap_frac = sum(r.chaperone_fraction for r in transient) / len(transient)
print(f"Mean chaperone fraction: {mean_chap_frac:.3f}")

# Compute dynamic enrichment
enrichment = compute_chaperone_enrichment(
    sasa_result=sasa_result,
    contact_result=contact_result,
    polymer_resnames=["SBM", "EGM"],
)

# Print enrichment table
print("\nDynamic Enrichment (residue-based):")
for entry in enrichment.entries:
    print(f"  {entry.polymer_type} → {entry.aa_group}: {entry.enrichment_residue:+.3f}")

Example output

Exposure Dynamics - Dynamic Enrichment by Amino Acid Class
------------------------------------------------------------------
Metric: mean_t(observed[t]) / mean_t(expected[t]) - 1
Exposure threshold: 0.20 (relative SASA, Tien et al. 2013)
Frames: 322 total (38 equilibration + 284 production)

  Polymer: SBM  (100% SBMA / 0% EGMA)
  AA Group             Enrichment    Observed   Expected    N frames exposed
  -----------------------------------------------------------------------
  aromatic             +6.942        0.651      0.083       322
  charged_negative     +3.053        0.321      0.079       322
  charged_positive     +1.738        0.206      0.076       322
  nonpolar             -0.376        0.049      0.078       322
  polar                +0.051        0.082      0.078       322

Chaperone Fraction - Transient Residues
------------------------------------------------------------------
Condition                     N transient    Chaperone fraction
100% SBMA / 0% EGMA           41             0.517 ± 0.023
75% SBMA / 25% EGMA           39             0.762 ± 0.018
50% SBMA / 50% EGMA           38             0.743 ± 0.021
25% SBMA / 75% EGMA           35             0.635 ± 0.019
0% SBMA / 100% EGMA           33             0.441 ± 0.025

Interpreting these results

Dynamic enrichment (SBM → aromatic = +6.94):

SBM contacts aromatic residues 7.94× more often than their instantaneous share of the exposed surface would predict
This is not an artifact of there being few aromatic residues: with \(\theta = 0.20\), aromatic residues are typically exposed in most frames (the Tien et al. max ASA for TRP/TYR/PHE is generous), and the result holds consistently across all 322 frames

Why dynamic enrichment can be much larger than static enrichment:

Both metrics compute the same ratio — observed contact share divided by expected contact share — but they define “expected” differently, and this difference matters when exposure is dynamic.

Static binding preference defines expected from a fixed snapshot: the fraction of all surface-exposed residues (from a single PDB) that belong to group \(G\). This denominator is a constant across all frames.

Dynamic enrichment conditions on the instantaneous exposed set at each frame, computes observed(t) and expected(t) independently per frame, and then takes the enrichment of the means:

Static:  enrichment = (contact_frames_G / contact_frames_all)
                      / (n_exposed_G_static / n_exposed_all_static)  - 1

Dynamic: enrichment = mean_t( observed(t) )
                      / mean_t( expected(t) )  - 1

  where  observed(t) = n_contacted_G(t) / n_exposed_G(t)
         expected(t) = n_exposed_G(t)   / n_exposed_all(t)

Concrete example (two representative frames):

The protein has ~20 aromatic residues. At threshold 0.20, typically 12 of them are exposed per frame, but the total exposed surface fluctuates with protein conformation. Suppose the polymer contacts 8 of the exposed aromatics in both frames:

	Frame A	Frame B	Mean across all frames
Exposed aromatics	12	12	—
Total exposed residues	70	110	—
`observed(t)` = 8/12	0.667	0.667	0.667
`expected(t)` = 12/70, 12/110	0.171	0.109	0.140

Dynamic enrichment = mean(observed) / mean(expected) − 1 = 0.667 / 0.140 − 1 = +3.76

Notice that mean(expected) = 0.140 is not the same as computing expected from the mean exposed counts: 12 / mean(70, 110) = 12/90 = 0.133. This is because mean_t(ratio) ≠ ratio_of_means when both numerator and denominator vary across frames. The dynamic expected is lower when the protein tends to expose aromatics preferentially in frames where the overall exposed set is also small (correlated fluctuations).

Now compare to static enrichment with aromatics comprising 14% of the fixed surface: if the observed contact share is ~0.30, static enrichment = 0.30 / 0.14 − 1 = +1.14. The dynamic value is larger because the per-frame contact rate (8 out of 12 exposed aromatics) is much higher than what the aggregated contact-frame count captures across the whole trajectory. Both metrics are valid — they answer different questions.

Why we report mean(observed)/mean(expected) rather than mean(observed/expected)

An alternative formula would compute the per-frame enrichment ratio and average those ratios:

\[\text{enrichment}_{\text{alt}} = \overline{\left(\frac{\text{observed}(t)}{\text{expected}(t)} - 1\right)}\]

In the example above this gives (0.667/0.171 − 1 + 0.667/0.109 − 1) / 2 = +4.01, slightly larger than the +3.76 from the ratio-of-means formula.

We chose not to use this formulation for one key reason: small-denominator instability. In frames where only 1–2 aromatic residues happen to be exposed out of 90 total, expected(t) is on the order of 0.01–0.02. A single contact with that one exposed residue sends observed(t) / expected(t) to 50–100, dominating the mean and producing enrichment values driven by rare conformational states rather than the polymer’s typical behavior.

The ratio-of-means formula (mean(observed) / mean(expected)) avoids this: each frame contributes equally to the numerator and denominator averages independently, so no single frame can distort the result. This is consistent with the LiveCoMS best practices for mean-based MD metrics (Grossfield et al., 2018): compute the average of the quantity of interest, then form the ratio — rather than averaging the ratio itself.

Chaperone fraction trend: The 75% SBMA / 25% EGMA condition shows a higher chaperone fraction (0.762) than pure SBMA (0.517). This does not imply 75% SBMA is “better” — it means exposure events in that condition are more frequently accompanied by polymer contact. Whether this reflects beneficial protection or increased polymer interference depends on the biological context and downstream functional assays.

Configuration Reference

ExposureAnalysisSettings

Field	Type	Default	Description
`exposure_threshold`	float	`0.20`	Relative SASA threshold for exposed/buried classification (0–1). Stored in `sasa_metadata.json`.
`transient_lower`	float	`0.20`	Exposure fraction below which a residue is classified `stably_buried`.
`transient_upper`	float	`0.80`	Exposure fraction above which a residue is classified `stably_exposed`.
`min_event_length`	int	`1`	Minimum consecutive exposed frames to count as an exposure window.
`protein_chain`	str	`"A"`	Chain letter for the protein (chain A convention).
`protein_selection`	str	`"protein"`	MDAnalysis selection for protein atoms.
`polymer_selection`	str	`"chainID C"`	MDAnalysis selection for polymer atoms.
`polymer_resnames`	list[str]	`null`	Subset of polymer monomer resnames to analyze. If null, all detected types are included.
`probe_radius_nm`	float	`0.14`	Probe radius for MDTraj `shrake_rupley`, in nm (1.4 Å water probe).
`n_sphere_points`	int	`960`	Number of sphere points for `shrake_rupley`. Higher = more accurate, slower.

ExposureComparisonSettings

No comparison-specific parameters beyond analysis settings defaults. Specify only analysis_settings.exposure in comparison.yaml.

Amino Acid Classification

Residues are grouped by chemical class for enrichment reporting. The same groups used in binding preference analysis apply here:

Group	Amino Acids	Chemical Property
`aromatic`	TRP, PHE, TYR	π-stacking, hydrophobic
`charged_positive`	LYS, ARG	Cationic at pH 7
`charged_negative`	ASP, GLU	Anionic at pH 7
`nonpolar`	ALA, VAL, LEU, ILE, MET, PRO, GLY	Hydrophobic/aliphatic
`polar`	SER, THR, ASN, GLN, HIS, CYS	H-bond donors/acceptors

Note

Histidine (HIS) is classified as polar because its protonation state is pH-dependent. If your system uses HID/HIE/HIP protonation state naming, these are recognized and classified correctly.

Output Files

Exposure dynamics results are written under each condition’s analysis/ directory:

project/
└── condition_name/
    └── analysis/
        ├── contacts/
        │   ├── contacts_rep1.json          ← Contact events (input to exposure analysis)
        │   ├── contacts_rep2.json
        │   └── contacts_rep3.json
        └── rep{N}/
            └── sasa/
            │   ├── sasa_trajectory.npz     ← Per-frame SASA (n_frames × n_residues)
            │   └── sasa_metadata.json      ← Threshold, resnames, aa_classes, n_frames
            └── exposure/
                └── exposure_dynamics.json  ← Per-residue stability + chaperone events

sasa_metadata.json fields

Field	Description
`exposure_threshold`	The \(\theta\) value used for this cache
`n_frames`	Total frames in SASA array (including equilibration)
`n_residues`	Number of protein residues
`resnames`	List of 3-letter residue names in chain order
`aa_classes`	List of AA class labels (same order as resnames)
`trajectory_path`	Source trajectory file path (provenance)

exposure_dynamics.json fields (per residue)

Field	Type	Description
`resid`	int	1-indexed residue ID
`resname`	str	3-letter amino acid code
`aa_class`	str	Amino acid class label
`exposure_fraction`	float	Fraction of frames where residue is exposed
`stability`	str	`"stably_exposed"`, `"transient"`, or `"stably_buried"`
`n_exposed_windows`	int	Number of contiguous exposed windows detected
`n_chaperone_events`	int	Windows with at least one polymer contact
`n_unassisted_events`	int	Windows without any polymer contact
`chaperone_fraction`	float	`n_chaperone / (n_chaperone + n_unassisted)`
`polymer_type_counts`	dict	`{resname: n_events}` breakdown by polymer type
`mean_chaperone_event_duration`	float	Mean length (frames) of chaperone events
`mean_unassisted_event_duration`	float	Mean length (frames) of unassisted events

Independent Verification

You can verify both metrics independently using only NumPy and the raw cached files, without running any polyzymd code. This allows you to audit the formula implementation directly.

Tip

Run the following script on your own data to reproduce the enrichment and chaperone fraction for a single condition and replicate. All intermediate values are printed so you can trace any discrepancy.

"""
Independent audit of dynamic enrichment and chaperone fraction.

Requires only: numpy, json, pathlib — no polyzymd imports.

Usage:
    python audit_exposure.py

Edit COND, REP, and BASE to match your data.
"""
import json
import numpy as np
from pathlib import Path

# ── Edit these paths ─────────────────────────────────────────────────────────
BASE = Path("/path/to/your/project")
COND = "condition_name"
REP  = 1
POLYMER_TYPE = "SBM"   # resname to audit
AA_GROUP     = "aromatic"  # group to audit
# ─────────────────────────────────────────────────────────────────────────────

analysis_dir = BASE / COND / "analysis"

# ── Load raw SASA ─────────────────────────────────────────────────────────────
sasa_dir = analysis_dir / f"rep{REP}" / "sasa"
npz       = np.load(sasa_dir / "sasa_trajectory.npz")
rel_sasa  = npz["relative_sasa_per_frame"]          # (n_frames, n_residues)
resids    = npz["resids"]                            # (n_residues,)
meta      = json.loads((sasa_dir / "sasa_metadata.json").read_text())
threshold = meta["exposure_threshold"]
aa_classes = meta["aa_classes"]
resnames   = meta["resnames"]
n_frames, n_residues = rel_sasa.shape

print(f"SASA: {n_frames} frames × {n_residues} residues, threshold={threshold}")

# ── Load contact events ───────────────────────────────────────────────────────
contacts_json = json.loads(
    (analysis_dir / "contacts" / f"contacts_rep{REP}.json").read_text()
)
# Frame indices are ABSOLUTE (0-indexed from trajectory frame 0)
# Do NOT add any equilibration offset when indexing into rel_sasa.
contact_matrix = np.zeros((n_frames, n_residues), dtype=bool)
resid_to_idx = {int(r): i for i, r in enumerate(resids)}

for rc in contacts_json["residue_contacts"]:
    res_idx = resid_to_idx.get(rc["protein_resid"])
    if res_idx is None:
        continue
    for sc in rc["segment_contacts"]:
        if sc["polymer_resname"] != POLYMER_TYPE:
            continue
        for ev in sc["events"]:
            start = ev["start_frame"]
            dur   = ev["duration_frames"]
            contact_matrix[start : start + dur, res_idx] = True

print(f"Contact matrix built: {contact_matrix.sum()} True entries for {POLYMER_TYPE}")

# ── Compute enrichment ────────────────────────────────────────────────────────
exposed_mask = rel_sasa > threshold                          # (n_frames, n_residues)
group_mask   = np.array([c == AA_GROUP for c in aa_classes]) # (n_residues,)

group_exposed  = exposed_mask[:, group_mask]                 # (n_frames, n_group)
group_contact  = contact_matrix[:, group_mask]               # (n_frames, n_group)
total_exposed  = exposed_mask.sum(axis=1).astype(float)      # (n_frames,)

n_group_exp = group_exposed.sum(axis=1).astype(float)        # (n_frames,)
n_group_con = (group_exposed & group_contact).sum(axis=1).astype(float)

valid = n_group_exp > 0

observed = np.where(valid, n_group_con / np.maximum(n_group_exp, 1e-12), 0.0)
expected = np.where(total_exposed > 0,
                    n_group_exp / np.maximum(total_exposed, 1e-12), 0.0)

mean_obs = observed[valid].mean() if valid.any() else 0.0
mean_exp = expected[valid].mean() if valid.any() else 0.0
enrichment = mean_obs / max(mean_exp, 1e-12) - 1.0

print(f"\nEnrichment ({POLYMER_TYPE} → {AA_GROUP}):")
print(f"  mean_observed = {mean_obs:.6f}")
print(f"  mean_expected = {mean_exp:.6f}")
print(f"  enrichment    = {enrichment:.4f}")
print(f"  frames with group exposed: {valid.sum()} / {n_frames}")

# ── Compute chaperone fraction for one residue ────────────────────────────────
# Pick first aromatic residue as example
aromatic_idxs = np.where(group_mask)[0]
if len(aromatic_idxs) == 0:
    print("No aromatic residues found.")
else:
    idx = aromatic_idxs[0]
    resid = int(resids[idx])
    exp_vec = exposed_mask[:, idx]
    con_vec = contact_matrix[:, idx]

    # Find contiguous exposed windows
    in_window = False
    windows = []
    for f in range(n_frames):
        if exp_vec[f] and not in_window:
            win_start = f
            in_window = True
        elif not exp_vec[f] and in_window:
            windows.append((win_start, f - 1))
            in_window = False
    if in_window:
        windows.append((win_start, n_frames - 1))

    chap = sum(1 for s, e in windows if con_vec[s:e+1].any())
    total_win = len(windows)
    frac = chap / total_win if total_win > 0 else 0.0

    print(f"\nChaperone fraction for residue {resid} ({resnames[idx]}, {aa_classes[idx]}):")
    print(f"  exposure windows: {total_win}")
    print(f"  chaperone events: {chap}")
    print(f"  chaperone_fraction = {frac:.3f}")

    # Compare with cached value
    dyn_path = analysis_dir / f"rep{REP}" / "exposure" / "exposure_dynamics.json"
    if dyn_path.exists():
        dyn = json.loads(dyn_path.read_text())
        cached = next((r for r in dyn["residues"] if r["resid"] == resid), None)
        if cached:
            print(f"  cached value       = {cached['chaperone_fraction']:.3f}")
            match = abs(frac - cached["chaperone_fraction"]) < 1e-3
            print(f"  match: {'YES ✓' if match else 'NO — investigate'}")

Troubleshooting

“My manually computed enrichment doesn’t match”

The two most common causes, in order of likelihood:

Wrong threshold. Check sasa_metadata.json → exposure_threshold. If your manual computation uses a different threshold, the exposed sets will differ. Use meta["exposure_threshold"] directly rather than hardcoding a value.
Frame offset applied incorrectly. Contact start_frame values are absolute trajectory frame indices. Do not add any equilibration offset when indexing the SASA array. See Contact frame indices are absolute.
Different polymer type. If polymer_resnames is set in your config, only the listed types are included. Verify the resname matches exactly (case-sensitive: "SBM" not "sbm").

“Chaperone fraction seems too high / too low”

Check n_transient_residues in the output. If very few residues are transient (< 10), the mean is noisy and condition comparisons may not be meaningful.
Verify transient_lower and transient_upper are appropriate for your system. A protein with slow dynamics may need a narrower transient window.
Check whether min_event_length is filtering too aggressively. At min_event_length=5, short exposure flickers are discarded; at =1, all windows are counted.

“Enrichment is very large (> 5)”

This is not necessarily an artifact. Large enrichment values occur when a group has a small instantaneous expected fraction but a high observed contact rate. Verify by checking:

n_frames_with_exposed in the result: if this is close to n_frames, the group is consistently exposed and the large enrichment is robust.
observed_contact_fraction: if this is, e.g., 0.65, then the polymer contacts 65% of exposed aromatic residues per frame — a real signal.
expected_contact_fraction_residue: if this is ~0.08, the group comprises only 8% of the exposed surface on average, making the ratio large.

Use the audit script to trace the per-frame values if needed.

“Results change when I rerun”

Exposure dynamics results are cached in exposure_dynamics.json and SASA in sasa_trajectory.npz. If input trajectories or contact results change, the cache is stale. Force recomputation with:

polyzymd compare exposure -f comparison.yaml --recompute-sasa --recompute-exposure

Or delete the cache directories manually:

rm -rf condition_name/analysis/rep*/sasa/
rm -rf condition_name/analysis/rep*/exposure/

Exposure Dynamics Analysis

Scientific Motivation

Why static analysis is insufficient

Why chaperone fraction captures something enrichment cannot

The Two Metrics

Dynamic enrichment

Dynamic enrichment vs. static binding preference

Chaperone fraction

Residue stability classification

Event detection

Per-residue chaperone fraction

Condition-level chaperone fraction

Known Assumptions and Limitations

Exposure threshold is 0.20 by default

Contact frame indices are absolute (no offset)

Equilibration frames are included in SASA but not in contacts

Contact cutoff is stored in the contacts cache

No causation or free-energy claims

Quick Start

Enable exposure dynamics in comparison.yaml

Example output

Interpreting these results

Configuration Reference

ExposureAnalysisSettings

ExposureComparisonSettings

Amino Acid Classification

Output Files

sasa_metadata.json fields

exposure_dynamics.json fields (per residue)

Independent Verification

Troubleshooting

“My manually computed enrichment doesn’t match”

“Chaperone fraction seems too high / too low”

“Enrichment is very large (> 5)”

“Results change when I rerun”

See Also