Polymer Bridging Analysis (Experimental)
Warning
Polymer bridging is experimental. This plugin was contributed as a proof-of-concept extensibility exercise. Metric definitions, chemistry-aware profiling outputs, and interpretation guidance are all subject to change. CLI output and generated figures carry explicit experimental labels.
Quantify per-fragment, per-frame multisite attachment of individual polymer chains (oligomers) to the enzyme surface directly from trajectories.
This analysis answers: “When a single oligomer chain contacts the protein, how often does it contact more than one distinct protein residue — and which residue classes and monomer types are involved?”
Environment Setup
All commands below assume you have activated the PolyzyMD pixi environment:
pixi shell -e build
Alternatively, prefix each command with pixi run -e build.
Core Concepts
What Is an Observation?
The fundamental unit of data in this analysis is one observation: a single polymer fragment in a single trajectory frame that makes at least one contact with the protein (within the distance cutoff). Every observation records:
Which protein residues are contacted
The frame-wise C-alpha distances between those residues
The amino acid class of each contacted residue
The monomer identity of each contacting polymer residue
The ordered residue-name sequence (signature) of the polymer fragment
One replicate may produce thousands of observations. All statistics are computed over this observation population.
What Is “Multisite”?
An observation is classified as multisite when the polymer fragment contacts
protein residues whose effective eligible valency exceeds 1. The meaning of
“eligible” depends on the min_ca_distance_angstrom setting:
min_ca_distance_angstrom = 0(default): Any observation contacting 2+ distinct residues counts as multisite, regardless of their spatial separation.min_ca_distance_angstrom > 0(e.g., 8.0 or 10.0): An observation is multisite only if at least two contacted residues have a frame-wise C-alpha separation >= the threshold. This filters out contacts with sequentially adjacent residues that happen to be geometrically close, focusing on geometrically significant bridging across the protein surface.Eligible valency is the number of residues that participate in at least one such qualifying pair. It is computed per frame from actual atomic coordinates, so the same pair of residues may qualify in one frame and not another if the protein is flexible.
What Is “High Valency”?
An observation is high valency when its eligible valency is 3 or more — meaning the polymer simultaneously contacts at least three spatially separated protein residues in that frame.
Primary Metrics
Metric |
Field name |
Description |
|---|---|---|
Mean Contacts / Oligomer |
|
Average number of distinct protein residues contacted per observation. |
Multisite Fraction |
|
Fraction of observations with eligible valency > 1. |
High-Valency Fraction |
|
Fraction of observations with eligible valency >= 3. |
Valency Distribution |
|
Probability of 1-site, 2-site, and 3+-site attachment across all observations. |
These four metrics are used in the default cross-condition comparison pipeline (t-tests, ANOVA, effect sizes, rankings).
Chemistry-Aware Outputs (Experimental)
Important
All chemistry-aware outputs described below are labeled
polymer_bridging_chemistry in the experimental feature system. They are
descriptive probabilities — observed frequencies over the observation
population — not normalized enrichments and not evidence of mechanism.
Probabilities reflect what was observed in the simulation. They do not control for surface accessibility, polymer composition, or reference expectations. Interpret them as a starting point for hypothesis generation, not as proof of preferential interaction.
Chemistry-aware outputs are computed only from multivalent observations (eligible valency > 1). They are reported per-replicate and aggregated (mean +/- SEM) across replicates.
Protein Residue Classification
All protein residue outputs use the ProteinAAClassification scheme from
polyzymd.analyses.shared.groupings:
Class |
Amino Acids |
|---|---|
aromatic |
PHE, TRP, TYR, HIS |
charged_positive |
ARG, LYS |
charged_negative |
ASP, GLU |
polar |
ASN, CYS, GLN, SER, THR |
nonpolar |
ALA, GLY, ILE, LEU, MET, PRO, VAL |
unknown |
Non-standard residues |
Common protonation-state variants (HIE, HID, HIP, ASH, GLH, etc.) are automatically mapped to their parent residue.
Anchor and Peripheral Residues
In each multivalent observation, the plugin identifies an anchor — the protein residue with the closest atom-level distance to the polymer. All other eligible contacted residues are peripheral.
Anchor protein class probabilities: Frequency distribution of the amino acid class of the anchor residue across all multivalent observations.
Peripheral protein class probabilities: Frequency distribution of the amino acid classes of non-anchor eligible residues.
Multivalent protein class probabilities: Frequency distribution of the amino acid classes of all eligible residues in multivalent observations (anchor + peripheral combined).
Polymer Monomer Probabilities
Polymer contact type probabilities: Frequency of each polymer monomer type (by residue name, e.g. SBM, EGM) among all polymer residues that make protein contacts in multivalent observations.
Polymer anchor type probabilities: Frequency of the polymer monomer type of the anchor (the polymer residue closest to the anchor protein residue) across multivalent observations.
Cross-Classification Matrices
Anchor-to-peripheral class matrix (
anchor_to_peripheral_group_matrix): A row-normalized matrix where rows are the anchor protein class and columns are peripheral protein classes. Each row sums to 1.0. Answers: “Given that the anchor is aromatic, what protein classes are the peripheral contacts?”Polymer-anchor to protein-anchor matrix (
polymer_anchor_to_protein_anchor_matrix): A row-normalized matrix where rows are polymer monomer types and columns are protein anchor classes. Each row sums to 1.0. Answers: “Given that SBM is the polymer anchor monomer, which protein residue classes does it anchor to?”
Fragment Signature Probabilities
Each polymer fragment has an ordered 5-mer signature — the sequence of
residue names along the fragment (e.g., EGM-EGM-SBM-EGM-EGM). The top-10
most frequent signatures across multivalent observations are reported as
probabilities. These may help identify whether specific polymer subsequences
are over-represented in bridging events.
Note
Fragment signatures depend on the topology’s residue ordering. In practice the fragment length equals the number of monomers in the polymer chain. The “5-mer” label is for illustration — the actual signature length is the full fragment.
Quick Start
Step 1: Add bridging settings to your comparison YAML
# comparison.yaml
plugins:
polymer_bridging:
cutoff: 4.5
min_ca_distance_angstrom: 8.0 # Require contacted residues to be >= 8 A apart
protein_selection: "protein"
polymer_selection: "chainID C" # Must match chain convention
Setting min_ca_distance_angstrom: 0 disables the geometric filter entirely,
counting any 2+-residue observation as multisite.
Step 2: Run the analysis
polyzymd compare run polymer_bridging -f comparison.yaml
The plugin automatically filters out conditions that have no polymer atoms (e.g., a protein-only control).
Step 3: Inspect results
The CLI prints a comparison table with the three primary metrics. Example:
WARNING: Experimental analysis
Definitions and interpretation may change after the presentation release.
Affected: Polymer bridging chemistry profiling
Polymer Bridging Comparison
================================================================================
Multisite Fraction
100% SBMA : 0.312 +/- 0.018 (n=3)
SBMA-EGPMA : 0.487 +/- 0.025 (n=3)
Pairwise: 100% SBMA -> SBMA-EGPMA p=0.004 ** d=1.82 +56.1% (more multisite)
Average Oligomer Valency
100% SBMA : 1.41 +/- 0.03 (n=3)
SBMA-EGPMA : 1.72 +/- 0.05 (n=3)
Pairwise: 100% SBMA -> SBMA-EGPMA p=0.008 ** d=1.54 +22.0% (more bridging)
High-Valency Oligomers
100% SBMA : 0.051 +/- 0.009 (n=3)
SBMA-EGPMA : 0.128 +/- 0.014 (n=3)
Pairwise: 100% SBMA -> SBMA-EGPMA p=0.012 * d=1.23 +150.8% (more high-valency)
Tip
Use --format json to export the full ComparisonResult for downstream
analysis, or --format markdown for integration into reports.
Step 4: Generate plots
polyzymd compare plot-all -f comparison.yaml
Configuration Reference
plugins.polymer_bridging
Field |
Type |
Default |
Description |
|---|---|---|---|
|
float |
|
Contact distance cutoff in Angstroms. Atom pairs within this distance are considered in contact. |
|
float |
|
Minimum frame-wise C-alpha distance between contacted protein residues for an observation to count as multisite. Set > 0 to filter for geometrically significant bridging. Must be >= 0. |
|
str |
|
MDAnalysis atom selection string for the protein. |
|
str |
|
MDAnalysis atom selection string for the polymer. Must match the chain convention (C = polymer). |
Plot Settings
All plots are enabled by default and can be individually toggled in
plot_settings.polymer_bridging:
Setting |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Bar chart of multisite fraction per condition. |
|
bool |
|
Bar chart of mean contacted residues per oligomer. |
|
bool |
|
Stacked bars showing 1 / 2 / 3+ valency distribution. |
|
bool |
|
Grouped bars of anchor protein residue class. |
|
bool |
|
Stacked bars of protein classes in multivalent events. |
|
bool |
|
Heatmap of anchor vs. peripheral protein class co-occurrence. |
|
bool |
|
Heatmap of polymer anchor monomer vs. protein anchor class. |
|
bool |
|
Top-10 fragment signatures by frequency. |
|
(float, float) |
|
Figure size for bar charts. |
|
(float, float) |
|
Figure size for stacked charts. |
|
(float, float) |
|
Figure size for heatmaps. |
Output Files
Comparison cache
Per-replicate and aggregated results use fingerprinted filenames that encode the analysis settings hash, ensuring that results computed with different settings do not collide in the cache.
comparison_workspace/
├── analysis/
│ ├── <condition_A>/
│ │ └── polymer_bridging/
│ │ ├── run_1/
│ │ │ └── polymer_bridging_<settings_tag>.json
│ │ ├── run_2/
│ │ │ └── polymer_bridging_<settings_tag>.json
│ │ └── aggregated/
│ │ └── polymer_bridging_<rep_range>_<settings_tag>.json
│ └── <condition_B>/
│ └── polymer_bridging/
│ └── ...
└── comparison/
└── polymer_bridging/
└── result.json # ComparisonResult
<settings_tag> is an 8-character hex fingerprint derived from the plugin
settings (cutoff, selections, etc.). <rep_range> encodes the replicate
range (e.g., r1-3 for replicates 1–3).
Generated Figures
All figures are stamped with an EXPERIMENTAL tag and saved to the configured
output directory (default: figures/).
Filename |
Content |
|---|---|
|
Bar chart of multisite probability. |
|
Bar chart of average valency. |
|
Stacked bars: 1-site, 2-site, 3+ site. |
|
Grouped bars of anchor protein residue class. |
|
Stacked bars of all protein classes in multivalent events. |
|
Anchor class (row) vs. peripheral class (column). |
|
Polymer monomer type (row) vs. protein anchor class (column). |
|
Top-10 fragment signature probabilities. |
CLI Reference
polyzymd compare run polymer_bridging [OPTIONS]
Option |
Description |
|---|---|
|
Path to |
|
Force recompute even if cached results exist |
|
Output format (default: |
|
Save output to file |
|
Suppress INFO messages |
|
Enable DEBUG logging |
|
Override equilibration time |
Alias: polyzymd compare run bridging (resolves to polymer_bridging).
Loading Results Programmatically
Result files use fingerprinted names (see Output Files).
Use glob to find the correct file, or import the result classes directly:
from pathlib import Path
from polyzymd.analyses.polymer_bridging import (
PolymerBridgingAggregatedResult,
PolymerBridgingReplicateResult,
)
# Load per-replicate result (fingerprinted filename)
run_dir = Path("analysis/<condition>/polymer_bridging/run_1")
rep_file = next(run_dir.glob("polymer_bridging_*.json"))
rep = PolymerBridgingReplicateResult.load(rep_file)
print(f"Multisite fraction: {rep.multisite_fraction:.3f}")
print(f"Anchor protein groups: {rep.anchor_protein_group_probabilities}")
# Load aggregated result (fingerprinted filename)
agg_dir = Path("analysis/<condition>/polymer_bridging/aggregated")
agg_file = next(agg_dir.glob("polymer_bridging_*.json"))
agg = PolymerBridgingAggregatedResult.load(agg_file)
print(f"Mean valency: {agg.mean_contacts_per_contacting_oligomer:.2f} "
f"+/- {agg.mean_contacts_sem:.2f}")
# Inspect cross-classification matrices
for anchor_class, peripherals in agg.anchor_to_peripheral_group_matrix_mean.items():
for peripheral_class, prob in peripherals.items():
if prob > 0.05:
print(f" {anchor_class} -> {peripheral_class}: {prob:.2f}")
Interpretation Caveats
Warning
These caveats are essential for responsible use of this analysis. Read them before presenting or publishing polymer bridging results.
Descriptive, not mechanistic. All outputs are observed frequencies. A high anchor probability for aromatic residues does not prove that aromatic anchoring drives conjugate stability — it may reflect surface composition, polymer placement, or simulation artifacts.
Not normalized enrichment. Unlike binding preference analysis, polymer bridging probabilities are raw frequencies over the observation population, not enrichments normalized by surface availability. A protein with 30% aromatic surface and 30% aromatic anchors is not showing enrichment — it is showing baseline proportionality. Compare with surface composition before drawing conclusions.
Frame-wise, not event-wise. Each trajectory frame generates independent observations. A polymer that bridges two residues for 500 consecutive frames counts as 500 observations, not one sustained bridging event. Residence-time analysis is not yet implemented.
C-alpha distance is dynamic. When
min_ca_distance_angstrom > 0, the threshold is evaluated per frame against actual C-alpha coordinates. Protein breathing motions mean that the same pair of residues may qualify in some frames and not others. This is physically correct but can make results sensitive to protein flexibility.Anchor selection is heuristic. The anchor is the polymer-protein pair with the minimum atom-level distance. In cases with multiple equidistant contacts, the choice is arbitrary. This affects anchor-specific outputs but not the primary multisite/valency metrics.
Fragment signatures assume topology ordering. The ordered monomer sequence comes directly from the topology. If the topology’s residue ordering does not reflect the true polymer sequence, signatures will be misleading.
Conditions without polymer are filtered. The plugin automatically excludes conditions that lack polymer atoms (e.g., protein-only controls). This is correct behavior but means the control condition for pairwise statistics must itself contain polymer.
Proof-of-concept status. This plugin was contributed as an extensibility exercise. The analysis methodology has not been independently validated. Use it for internal hypothesis generation, not for publication-ready claims, until the methodology matures.
Relation to Other Analyses
Analysis |
What It Measures |
Relation to Polymer Bridging |
|---|---|---|
Contacts |
Total contact counts and frequencies |
Polymer bridging decomposes contacts per-chain, adding valency information. |
Binding Preference |
Enrichment by residue class |
Provides surface-normalized context that bridging lacks. |
Polymer Affinity |
Total interaction strength (N x deltaG) |
Complementary: affinity measures total adhesion; bridging measures spatial distribution of adhesion per chain. |
RMSF |
Structural flexibility |
Complementary: does multisite bridging correlate with reduced flexibility? |
Catalytic Triad |
Active site geometry |
Complementary: do bridging events coincide with triad perturbation? |
Troubleshooting
“No conditions passed polymer filtering”
All conditions were filtered out because the plugin could not detect polymer
atoms. Check that your polymer_selection matches your topology (default is
"chainID C", following the PolyzyMD chain convention).
Very low multisite fraction
If multisite fraction is near zero:
Check that
min_ca_distance_angstromis not too stringent. A value of 20+ A may filter out nearly all events for small proteins.Verify that the polymer fragments contain more than one monomer.
Check that the contact cutoff is appropriate for your force field (4.5 A is standard for heavy-atom contacts).
All probabilities are empty ({})
Chemistry-aware outputs require multivalent observations. If no observation has
eligible valency > 1, all chemistry dictionaries will be empty. Lower
min_ca_distance_angstrom or verify that the polymer makes multi-residue
contacts.
Heatmaps show only one condition
The anchor-peripheral and polymer-anchor heatmaps currently display data for the first condition only (by label order). This is a known limitation of the current plotting code. To compare matrices across conditions, load the aggregated results programmatically (see above).
See Also
Contacts Analysis Quick Start — prerequisite contact computation
Binding Preference Analysis — surface-normalized enrichment (complementary)
Polymer Affinity Analysis — total interaction strength scoring
Statistics Best Practices — replicate planning
Comparing Conditions — multi-condition workflows
Extending the Analysis Framework — contribute a new plugin