Exposure Dynamics Analysis
Understand how your polymer interacts with transiently exposed protein residues using dynamic SASA-based enrichment and chaperone event analysis. This module answers two complementary questions that static contact analysis cannot:
“Does my polymer preferentially bind certain amino acid classes at the moment they are surface-exposed, controlling for instantaneous surface availability?”
“Does the polymer act as a molecular chaperone — providing contact during transient exposure events that would otherwise be unassisted?”
Note
Exposure dynamics analysis builds on contacts analysis and uses SASA computed per-frame from MD trajectories, not from a static PDB. Run contacts analysis first, then enable exposure dynamics in your comparison configuration.
Warning
Exposure dynamics is currently labeled experimental in the presentation release. The command remains available and plots are still produced, but CLI output and figures now carry explicit experimental labeling because the metric definitions and interpretation may still change.
Scientific Motivation
Why static analysis is insufficient
Standard binding preference analysis (see Binding Preference Analysis) uses SASA from a single static PDB to define which residues are “surface accessible.” This works well for residues that are consistently exposed, but misses a crucial class of events: transient exposure.
A residue that is buried 90% of the time but exposes briefly may be the most biologically relevant interaction site — or it may represent a mechanically vulnerable region that the polymer can protect or destabilize. A static SASA filter either includes it (at a permissive threshold) or excludes it entirely (at a stringent threshold), conflating residues with fundamentally different dynamics.
Dynamic enrichment solves this by computing the enrichment frame by frame, conditioning on instantaneous surface availability. A residue that exposes in only 10% of frames contributes to the enrichment only in those 10% of frames.
Why chaperone fraction captures something enrichment cannot
Dynamic enrichment tells you whether the polymer preferentially contacts aromatic residues when they are exposed — but it does not tell you whether the polymer is present during exposure events or contacts residues after they are already exposed and re-expose repeatedly.
Chaperone fraction asks a different question: of all the exposure episodes a residue undergoes (contiguous stretches of frames where it is exposed), what fraction have polymer contact during the episode? A high chaperone fraction for a transient residue suggests the polymer is co-localizing during the exposure window, not merely making incidental contact at stable surface sites.
The Two Metrics
Dynamic enrichment
For each polymer type \(P\) and amino-acid group \(G\) (e.g., “aromatic”), the enrichment is computed frame by frame over the full trajectory:
where \(\theta\) is the exposure threshold (default 0.20), overbars denote the mean over frames where the group has at least one exposed residue, and \(\text{contact}_P(t,i)\) is True if any atom of polymer type \(P\) is within the contact cutoff of residue \(i\) at frame \(t\).
Plain-English interpretation: An enrichment of \(+6.94\) for aromatic residues means that, on average across all frames, polymer type \(P\) contacted aromatic residues 7.94× more often than their instantaneous share of the exposed surface would predict if contacts were distributed proportionally.
Enrichment |
Meaning |
|---|---|
> 0 |
Preferential contact — polymer contacts this group more than surface share predicts |
= 0 |
Neutral — contact rate matches instantaneous surface availability |
< 0 |
Avoidance — polymer contacts this group less than surface share predicts |
= -1 |
Complete avoidance — no contacts with this group observed |
Important
Enrichment is not a binding free energy. It is a conditional contact frequency ratio. A large positive value means preferential co-localization, not strong thermodynamic affinity. Do not use phrases like “binding energy” or “affinity” when reporting enrichment values.
Dynamic enrichment vs. static binding preference
Both metrics answer different questions. Choose the right one for your scientific question:
Aspect |
Dynamic enrichment |
Static binding preference |
|---|---|---|
SASA baseline |
Per-frame instantaneous exposure |
Single static PDB |
Residue eligibility |
Changes every frame |
Fixed at analysis setup |
Normalization |
Instantaneous exposed fraction |
Time-averaged surface fraction |
Best for |
Residues with variable exposure; detecting transient-site preference |
Stable surface residues; overall polymer chemistry characterization |
Typical magnitude |
Can be large (>5) for transiently exposed groups |
Usually −1 to +2 |
Equilibration window |
Included in frame count (see Limitations) |
Not applicable |
Both analyses are independently valid. Do not substitute one for the other when interpreting results.
Chaperone fraction
Residue stability classification
Before computing chaperone fractions, each residue is classified by its exposure fraction — the proportion of trajectory frames where its relative SASA exceeds the threshold \(\theta\):
Classification |
Condition |
Meaning |
|---|---|---|
|
exposure_fraction ≥ |
Consistently on the surface |
|
|
Fluctuates between buried and exposed |
|
exposure_fraction ≤ |
Consistently buried |
Only transient residues are used for condition-level chaperone fraction reporting, because chaperone events are only meaningful for residues that genuinely fluctuate.
Event detection
For each residue, the trajectory is segmented into exposure windows: maximal
contiguous runs of frames where the residue is exposed (relative SASA > \(\theta\)).
Windows shorter than min_event_length frames are discarded to remove
single-frame SASA noise.
Each exposure window is then classified:
Chaperone event: at least one polymer contact occurs during the window
Unassisted event: no polymer contact occurs during the window
Per-residue chaperone fraction
A value of 1.0 means every exposure window for this residue had polymer contact; 0.0 means none did.
Condition-level chaperone fraction
The reported condition-level value is the mean chaperone fraction over all transient residues in a replicate:
This value is computed per replicate; the mean ± SEM across replicates is reported in comparisons.
Note
What “transient” means in the JSON output: In exposure_dynamics.json,
each residue has a stability field. Residues with "stability": "transient"
are those with exposure fraction between transient_lower and
transient_upper. The condition-level chaperone fraction is numpy.mean()
over exactly these residues.
Known Assumptions and Limitations
Important
Read this section before reporting results. These are not caveats to hide — they are properties of the analysis that affect interpretation. Understanding them will help you explain your results to reviewers.
Exposure threshold is 0.20 by default
The exposure threshold \(\theta = 0.20\) means a residue is classified as
exposed when its relative SASA (computed with MDTraj shrake_rupley,
protein-only, chain A) exceeds 20% of the Tien et al. 2013 maximum ASA for
that residue type.
At this threshold, a typical enzyme will have ~80–100 exposed residues per frame. Lowering the threshold to 0.10 includes more partially buried residues (larger denominator, smaller enrichment magnitudes); raising to 0.30 restricts to only highly exposed residues (smaller denominator, more variable enrichments).
The threshold is stored in sasa_metadata.json alongside each SASA cache.
If you change the threshold, you must recompute:
polyzymd compare exposure -f comparison.yaml --recompute-sasa
Comparing enrichment values computed with different thresholds is not valid.
Contact frame indices are absolute (no offset)
Contact event start_frame values in contacts_rep*.json are absolute
trajectory frame indices, numbered from 0 at the start of the full
trajectory. They are not relative to the beginning of the contact analysis
window or the end of equilibration.
When cross-referencing SASA data manually:
# CORRECT: use contact frame index directly as SASA row
sasa_at_contact = sasa_matrix[contact_event.start_frame]
# WRONG: do NOT add any offset
sasa_at_contact = sasa_matrix[contact_event.start_frame + equilibration_offset] # Wrong!
This means frame 0 of sasa_trajectory.npz corresponds to absolute frame 0
of the MD trajectory (the first frame of equilibration, if equilibration
frames were saved to the trajectory file).
Equilibration frames are included in SASA but not in contacts
SASA is computed over all saved trajectory frames, including any
equilibration period at the start. Contact analysis may begin at a later frame
(stored in contacts_rep*.json as start_frame). During the equilibration
window, contact arrays are all-False — no contact events are recorded.
Effect on enrichment: Equilibration frames contribute to
\(n_{\text{exp}}(t)\) and \(n^G_{\text{exp}}(t)\) in the denominator of the
expected fraction, but never to \(n^G_{\text{con}}(t)\). This slightly deflates
mean_observed relative to a post-equilibration-only analysis. In the
reference dataset (38 equilibration frames out of 322 total, ~12%), the
effect is:
Systematic: affects all conditions identically
Consistent: relative comparisons between conditions are unaffected
Disclosed:
sasa_metadata.jsonrecordsn_frames(total) and the contacts file recordsstart_frame(equilibration end)
If you want post-equilibration-only enrichment, you can slice the SASA array manually — see Independent Verification below.
Contact cutoff is stored in the contacts cache
The enrichment and chaperone fraction are both computed from the same cached
contact events. The cutoff used is recorded in each contacts_rep*.json under
criteria_cutoff. If you re-run contact analysis with a different cutoff, you
must also recompute exposure dynamics to maintain consistency.
Note
The default contact cutoff in analysis_settings.contacts.cutoff is 4.5 Å,
but cached results from earlier runs may use 4.0 Å. Always check
criteria_cutoff in the contacts JSON if you suspect a mismatch. See Troubleshooting for details.
No causation or free-energy claims
Dynamic enrichment is a conditional contact frequency ratio, not a binding free energy or interaction enthalpy.
Chaperone fraction is an event co-occurrence rate, not a mechanistic claim that the polymer caused or prevented folding.
Use language like “SBM co-localizes with aromatic residues during exposure events” rather than “SBM stabilizes aromatic residues.”
Quick Start
Enable exposure dynamics in comparison.yaml
# comparison.yaml
name: "Exposure Dynamics Study"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../no_polymer/config.yaml"
replicates: [1, 2, 3]
- label: "100% SBMA"
config: "../sbma_100/config.yaml"
replicates: [1, 2, 3]
- label: "100% EGMA"
config: "../egma_100/config.yaml"
replicates: [1, 2, 3]
analysis_settings:
contacts:
name: "polymer_contacts"
polymer_selection: "chainID C"
protein_selection: "protein"
cutoff: 4.5
exposure:
exposure_threshold: 0.20 # Relative SASA threshold for exposed/buried
transient_lower: 0.20 # Below this → stably buried
transient_upper: 0.80 # Above this → stably exposed
min_event_length: 1 # Minimum frames for an exposure window
protein_chain: "A" # Chain letter for protein
# polymer_resnames: ["SBM", "EGM"] # Optional: subset of polymer types
Then run:
polyzymd compare exposure -f comparison.yaml
from pathlib import Path
from polyzymd.analysis.sasa.trajectory import SASATrajectoryResult
from polyzymd.analysis.contacts.results import ContactResult
from polyzymd.analysis.exposure.enrichment import compute_chaperone_enrichment
from polyzymd.analysis.exposure.dynamics import analyze_exposure_dynamics
from polyzymd.analysis.exposure.config import ExposureConfig
# Load per-frame SASA (computed previously)
sasa_result = SASATrajectoryResult.load("analysis/rep1/sasa/")
# Load contact analysis result
contact_result = ContactResult.load("analysis/contacts/contacts_rep1.json")
# Configure exposure analysis
config = ExposureConfig(
transient_lower=0.20,
transient_upper=0.80,
min_event_length=1,
polymer_resnames=["SBM", "EGM"], # or leave empty for all types
)
# Compute exposure dynamics (stability + chaperone events)
dynamics = analyze_exposure_dynamics(
sasa_result=sasa_result,
contact_result=contact_result,
config=config,
analysis_dir=Path("analysis/rep1/"),
)
print(f"Transient residues: {dynamics.n_transient()}")
print(f"Chaperone events: {dynamics.total_chaperone_events()}")
print(f"Unassisted events: {dynamics.total_unassisted_events()}")
# Condition-level chaperone fraction (mean over transient residues)
transient = dynamics.transient_residues()
mean_chap_frac = sum(r.chaperone_fraction for r in transient) / len(transient)
print(f"Mean chaperone fraction: {mean_chap_frac:.3f}")
# Compute dynamic enrichment
enrichment = compute_chaperone_enrichment(
sasa_result=sasa_result,
contact_result=contact_result,
polymer_resnames=["SBM", "EGM"],
)
# Print enrichment table
print("\nDynamic Enrichment (residue-based):")
for entry in enrichment.entries:
print(f" {entry.polymer_type} → {entry.aa_group}: {entry.enrichment_residue:+.3f}")
Example output
Exposure Dynamics - Dynamic Enrichment by Amino Acid Class
------------------------------------------------------------------
Metric: mean_t(observed[t]) / mean_t(expected[t]) - 1
Exposure threshold: 0.20 (relative SASA, Tien et al. 2013)
Frames: 322 total (38 equilibration + 284 production)
Polymer: SBM (100% SBMA / 0% EGMA)
AA Group Enrichment Observed Expected N frames exposed
-----------------------------------------------------------------------
aromatic +6.942 0.651 0.083 322
charged_negative +3.053 0.321 0.079 322
charged_positive +1.738 0.206 0.076 322
nonpolar -0.376 0.049 0.078 322
polar +0.051 0.082 0.078 322
Chaperone Fraction - Transient Residues
------------------------------------------------------------------
Condition N transient Chaperone fraction
100% SBMA / 0% EGMA 41 0.517 ± 0.023
75% SBMA / 25% EGMA 39 0.762 ± 0.018
50% SBMA / 50% EGMA 38 0.743 ± 0.021
25% SBMA / 75% EGMA 35 0.635 ± 0.019
0% SBMA / 100% EGMA 33 0.441 ± 0.025
Interpreting these results
Dynamic enrichment (SBM → aromatic = +6.94):
SBM contacts aromatic residues 7.94× more often than their instantaneous share of the exposed surface would predict
This is not an artifact of there being few aromatic residues: with \(\theta = 0.20\), aromatic residues are typically exposed in most frames (the Tien et al. max ASA for TRP/TYR/PHE is generous), and the result holds consistently across all 322 frames
Why dynamic enrichment can be much larger than static enrichment:
Both metrics compute the same ratio — observed contact share divided by expected contact share — but they define “expected” differently, and this difference matters when exposure is dynamic.
Static binding preference defines expected from a fixed snapshot: the fraction of all surface-exposed residues (from a single PDB) that belong to group \(G\). This denominator is a constant across all frames.
Dynamic enrichment conditions on the instantaneous exposed set at each
frame, computes observed(t) and expected(t) independently per frame,
and then takes the enrichment of the means:
Static: enrichment = (contact_frames_G / contact_frames_all)
/ (n_exposed_G_static / n_exposed_all_static) - 1
Dynamic: enrichment = mean_t( observed(t) )
/ mean_t( expected(t) ) - 1
where observed(t) = n_contacted_G(t) / n_exposed_G(t)
expected(t) = n_exposed_G(t) / n_exposed_all(t)
Concrete example (two representative frames):
The protein has ~20 aromatic residues. At threshold 0.20, typically 12 of them are exposed per frame, but the total exposed surface fluctuates with protein conformation. Suppose the polymer contacts 8 of the exposed aromatics in both frames:
Frame A |
Frame B |
Mean across all frames |
|
|---|---|---|---|
Exposed aromatics |
12 |
12 |
— |
Total exposed residues |
70 |
110 |
— |
|
0.667 |
0.667 |
0.667 |
|
0.171 |
0.109 |
0.140 |
Dynamic enrichment = mean(observed) / mean(expected) − 1 = 0.667 / 0.140 − 1 = +3.76
Notice that mean(expected) = 0.140 is not the same as computing expected
from the mean exposed counts: 12 / mean(70, 110) = 12/90 = 0.133. This is
because mean_t(ratio) ≠ ratio_of_means when both numerator and denominator
vary across frames. The dynamic expected is lower when the protein tends to
expose aromatics preferentially in frames where the overall exposed set is
also small (correlated fluctuations).
Now compare to static enrichment with aromatics comprising 14% of the fixed surface: if the observed contact share is ~0.30, static enrichment = 0.30 / 0.14 − 1 = +1.14. The dynamic value is larger because the per-frame contact rate (8 out of 12 exposed aromatics) is much higher than what the aggregated contact-frame count captures across the whole trajectory. Both metrics are valid — they answer different questions.
Why we report mean(observed)/mean(expected) rather than mean(observed/expected)
An alternative formula would compute the per-frame enrichment ratio and average those ratios:
In the example above this gives (0.667/0.171 − 1 + 0.667/0.109 − 1) / 2 = +4.01, slightly larger than the +3.76 from the ratio-of-means formula.
We chose not to use this formulation for one key reason: small-denominator
instability. In frames where only 1–2 aromatic residues happen to be exposed
out of 90 total, expected(t) is on the order of 0.01–0.02. A single contact
with that one exposed residue sends observed(t) / expected(t) to 50–100,
dominating the mean and producing enrichment values driven by rare
conformational states rather than the polymer’s typical behavior.
The ratio-of-means formula (mean(observed) / mean(expected)) avoids this:
each frame contributes equally to the numerator and denominator averages
independently, so no single frame can distort the result. This is consistent
with the LiveCoMS best practices for mean-based MD metrics
(Grossfield et al., 2018): compute the average of the quantity of interest,
then form the ratio — rather than averaging the ratio itself.
Chaperone fraction trend: The 75% SBMA / 25% EGMA condition shows a higher chaperone fraction (0.762) than pure SBMA (0.517). This does not imply 75% SBMA is “better” — it means exposure events in that condition are more frequently accompanied by polymer contact. Whether this reflects beneficial protection or increased polymer interference depends on the biological context and downstream functional assays.
Configuration Reference
ExposureAnalysisSettings
Field |
Type |
Default |
Description |
|---|---|---|---|
|
float |
|
Relative SASA threshold for exposed/buried classification (0–1). Stored in |
|
float |
|
Exposure fraction below which a residue is classified |
|
float |
|
Exposure fraction above which a residue is classified |
|
int |
|
Minimum consecutive exposed frames to count as an exposure window. |
|
str |
|
Chain letter for the protein (chain A convention). |
|
str |
|
MDAnalysis selection for protein atoms. |
|
str |
|
MDAnalysis selection for polymer atoms. |
|
list[str] |
|
Subset of polymer monomer resnames to analyze. If null, all detected types are included. |
|
float |
|
Probe radius for MDTraj |
|
int |
|
Number of sphere points for |
ExposureComparisonSettings
No comparison-specific parameters beyond analysis settings defaults. Specify
only analysis_settings.exposure in comparison.yaml.
Amino Acid Classification
Residues are grouped by chemical class for enrichment reporting. The same groups used in binding preference analysis apply here:
Group |
Amino Acids |
Chemical Property |
|---|---|---|
|
TRP, PHE, TYR |
π-stacking, hydrophobic |
|
LYS, ARG |
Cationic at pH 7 |
|
ASP, GLU |
Anionic at pH 7 |
|
ALA, VAL, LEU, ILE, MET, PRO, GLY |
Hydrophobic/aliphatic |
|
SER, THR, ASN, GLN, HIS, CYS |
H-bond donors/acceptors |
Note
Histidine (HIS) is classified as polar because its protonation state is pH-dependent. If your system uses HID/HIE/HIP protonation state naming, these are recognized and classified correctly.
Output Files
Exposure dynamics results are written under each condition’s analysis/
directory:
project/
└── condition_name/
└── analysis/
├── contacts/
│ ├── contacts_rep1.json ← Contact events (input to exposure analysis)
│ ├── contacts_rep2.json
│ └── contacts_rep3.json
└── rep{N}/
└── sasa/
│ ├── sasa_trajectory.npz ← Per-frame SASA (n_frames × n_residues)
│ └── sasa_metadata.json ← Threshold, resnames, aa_classes, n_frames
└── exposure/
└── exposure_dynamics.json ← Per-residue stability + chaperone events
sasa_metadata.json fields
Field |
Description |
|---|---|
|
The \(\theta\) value used for this cache |
|
Total frames in SASA array (including equilibration) |
|
Number of protein residues |
|
List of 3-letter residue names in chain order |
|
List of AA class labels (same order as resnames) |
|
Source trajectory file path (provenance) |
exposure_dynamics.json fields (per residue)
Field |
Type |
Description |
|---|---|---|
|
int |
1-indexed residue ID |
|
str |
3-letter amino acid code |
|
str |
Amino acid class label |
|
float |
Fraction of frames where residue is exposed |
|
str |
|
|
int |
Number of contiguous exposed windows detected |
|
int |
Windows with at least one polymer contact |
|
int |
Windows without any polymer contact |
|
float |
|
|
dict |
|
|
float |
Mean length (frames) of chaperone events |
|
float |
Mean length (frames) of unassisted events |
Independent Verification
You can verify both metrics independently using only NumPy and the raw cached files, without running any polyzymd code. This allows you to audit the formula implementation directly.
Tip
Run the following script on your own data to reproduce the enrichment and chaperone fraction for a single condition and replicate. All intermediate values are printed so you can trace any discrepancy.
"""
Independent audit of dynamic enrichment and chaperone fraction.
Requires only: numpy, json, pathlib — no polyzymd imports.
Usage:
python audit_exposure.py
Edit COND, REP, and BASE to match your data.
"""
import json
import numpy as np
from pathlib import Path
# ── Edit these paths ─────────────────────────────────────────────────────────
BASE = Path("/path/to/your/project")
COND = "condition_name"
REP = 1
POLYMER_TYPE = "SBM" # resname to audit
AA_GROUP = "aromatic" # group to audit
# ─────────────────────────────────────────────────────────────────────────────
analysis_dir = BASE / COND / "analysis"
# ── Load raw SASA ─────────────────────────────────────────────────────────────
sasa_dir = analysis_dir / f"rep{REP}" / "sasa"
npz = np.load(sasa_dir / "sasa_trajectory.npz")
rel_sasa = npz["relative_sasa_per_frame"] # (n_frames, n_residues)
resids = npz["resids"] # (n_residues,)
meta = json.loads((sasa_dir / "sasa_metadata.json").read_text())
threshold = meta["exposure_threshold"]
aa_classes = meta["aa_classes"]
resnames = meta["resnames"]
n_frames, n_residues = rel_sasa.shape
print(f"SASA: {n_frames} frames × {n_residues} residues, threshold={threshold}")
# ── Load contact events ───────────────────────────────────────────────────────
contacts_json = json.loads(
(analysis_dir / "contacts" / f"contacts_rep{REP}.json").read_text()
)
# Frame indices are ABSOLUTE (0-indexed from trajectory frame 0)
# Do NOT add any equilibration offset when indexing into rel_sasa.
contact_matrix = np.zeros((n_frames, n_residues), dtype=bool)
resid_to_idx = {int(r): i for i, r in enumerate(resids)}
for rc in contacts_json["residue_contacts"]:
res_idx = resid_to_idx.get(rc["protein_resid"])
if res_idx is None:
continue
for sc in rc["segment_contacts"]:
if sc["polymer_resname"] != POLYMER_TYPE:
continue
for ev in sc["events"]:
start = ev["start_frame"]
dur = ev["duration_frames"]
contact_matrix[start : start + dur, res_idx] = True
print(f"Contact matrix built: {contact_matrix.sum()} True entries for {POLYMER_TYPE}")
# ── Compute enrichment ────────────────────────────────────────────────────────
exposed_mask = rel_sasa > threshold # (n_frames, n_residues)
group_mask = np.array([c == AA_GROUP for c in aa_classes]) # (n_residues,)
group_exposed = exposed_mask[:, group_mask] # (n_frames, n_group)
group_contact = contact_matrix[:, group_mask] # (n_frames, n_group)
total_exposed = exposed_mask.sum(axis=1).astype(float) # (n_frames,)
n_group_exp = group_exposed.sum(axis=1).astype(float) # (n_frames,)
n_group_con = (group_exposed & group_contact).sum(axis=1).astype(float)
valid = n_group_exp > 0
observed = np.where(valid, n_group_con / np.maximum(n_group_exp, 1e-12), 0.0)
expected = np.where(total_exposed > 0,
n_group_exp / np.maximum(total_exposed, 1e-12), 0.0)
mean_obs = observed[valid].mean() if valid.any() else 0.0
mean_exp = expected[valid].mean() if valid.any() else 0.0
enrichment = mean_obs / max(mean_exp, 1e-12) - 1.0
print(f"\nEnrichment ({POLYMER_TYPE} → {AA_GROUP}):")
print(f" mean_observed = {mean_obs:.6f}")
print(f" mean_expected = {mean_exp:.6f}")
print(f" enrichment = {enrichment:.4f}")
print(f" frames with group exposed: {valid.sum()} / {n_frames}")
# ── Compute chaperone fraction for one residue ────────────────────────────────
# Pick first aromatic residue as example
aromatic_idxs = np.where(group_mask)[0]
if len(aromatic_idxs) == 0:
print("No aromatic residues found.")
else:
idx = aromatic_idxs[0]
resid = int(resids[idx])
exp_vec = exposed_mask[:, idx]
con_vec = contact_matrix[:, idx]
# Find contiguous exposed windows
in_window = False
windows = []
for f in range(n_frames):
if exp_vec[f] and not in_window:
win_start = f
in_window = True
elif not exp_vec[f] and in_window:
windows.append((win_start, f - 1))
in_window = False
if in_window:
windows.append((win_start, n_frames - 1))
chap = sum(1 for s, e in windows if con_vec[s:e+1].any())
total_win = len(windows)
frac = chap / total_win if total_win > 0 else 0.0
print(f"\nChaperone fraction for residue {resid} ({resnames[idx]}, {aa_classes[idx]}):")
print(f" exposure windows: {total_win}")
print(f" chaperone events: {chap}")
print(f" chaperone_fraction = {frac:.3f}")
# Compare with cached value
dyn_path = analysis_dir / f"rep{REP}" / "exposure" / "exposure_dynamics.json"
if dyn_path.exists():
dyn = json.loads(dyn_path.read_text())
cached = next((r for r in dyn["residues"] if r["resid"] == resid), None)
if cached:
print(f" cached value = {cached['chaperone_fraction']:.3f}")
match = abs(frac - cached["chaperone_fraction"]) < 1e-3
print(f" match: {'YES ✓' if match else 'NO — investigate'}")
Troubleshooting
“My manually computed enrichment doesn’t match”
The two most common causes, in order of likelihood:
Wrong threshold. Check
sasa_metadata.json→exposure_threshold. If your manual computation uses a different threshold, the exposed sets will differ. Usemeta["exposure_threshold"]directly rather than hardcoding a value.Frame offset applied incorrectly. Contact
start_framevalues are absolute trajectory frame indices. Do not add any equilibration offset when indexing the SASA array. See Contact frame indices are absolute.Different polymer type. If
polymer_resnamesis set in your config, only the listed types are included. Verify the resname matches exactly (case-sensitive:"SBM"not"sbm").
“Chaperone fraction seems too high / too low”
Check
n_transient_residuesin the output. If very few residues are transient (< 10), the mean is noisy and condition comparisons may not be meaningful.Verify
transient_lowerandtransient_upperare appropriate for your system. A protein with slow dynamics may need a narrower transient window.Check whether
min_event_lengthis filtering too aggressively. Atmin_event_length=5, short exposure flickers are discarded; at=1, all windows are counted.
“Enrichment is very large (> 5)”
This is not necessarily an artifact. Large enrichment values occur when a group has a small instantaneous expected fraction but a high observed contact rate. Verify by checking:
n_frames_with_exposedin the result: if this is close ton_frames, the group is consistently exposed and the large enrichment is robust.observed_contact_fraction: if this is, e.g., 0.65, then the polymer contacts 65% of exposed aromatic residues per frame — a real signal.expected_contact_fraction_residue: if this is ~0.08, the group comprises only 8% of the exposed surface on average, making the ratio large.
Use the audit script to trace the per-frame values if needed.
“Results change when I rerun”
Exposure dynamics results are cached in exposure_dynamics.json and SASA
in sasa_trajectory.npz. If input trajectories or contact results change,
the cache is stale. Force recomputation with:
polyzymd compare exposure -f comparison.yaml --recompute-sasa --recompute-exposure
Or delete the cache directories manually:
rm -rf condition_name/analysis/rep*/sasa/
rm -rf condition_name/analysis/rep*/exposure/
See Also
Contacts Analysis Quick Start — contacts analysis required before exposure analysis
Binding Preference Analysis — static SASA-based enrichment
Statistics Best Practices — replicate aggregation, uncertainty
Comparing Conditions — multi-condition workflows
Equilibration — how equilibration frames affect trajectory-level analyses