Polymer Bridging Analysis (Experimental)

Warning

Polymer bridging is experimental. This plugin was contributed as a proof-of-concept extensibility exercise. Metric definitions, chemistry-aware profiling outputs, and interpretation guidance are all subject to change. CLI output and generated figures carry explicit experimental labels.

Quantify per-fragment, per-frame multisite attachment of individual polymer chains (oligomers) to the enzyme surface directly from trajectories.

This analysis answers: “When a single oligomer chain contacts the protein, how often does it contact more than one distinct protein residue — and which residue classes and monomer types are involved?”

Environment Setup

All commands below assume you have activated the PolyzyMD pixi environment:

pixi shell -e build

Alternatively, prefix each command with pixi run -e build.

Core Concepts

What Is an Observation?

The fundamental unit of data in this analysis is one observation: a single polymer fragment in a single trajectory frame that makes at least one contact with the protein (within the distance cutoff). Every observation records:

Which protein residues are contacted
The frame-wise C-alpha distances between those residues
The amino acid class of each contacted residue
The monomer identity of each contacting polymer residue
The ordered residue-name sequence (signature) of the polymer fragment

One replicate may produce thousands of observations. All statistics are computed over this observation population.

What Is “Multisite”?

An observation is classified as multisite when the polymer fragment contacts protein residues whose effective eligible valency exceeds 1. The meaning of “eligible” depends on the min_ca_distance_angstrom setting:

min_ca_distance_angstrom = 0 (default): Any observation contacting 2+ distinct residues counts as multisite, regardless of their spatial separation.
min_ca_distance_angstrom > 0 (e.g., 8.0 or 10.0): An observation is multisite only if at least two contacted residues have a frame-wise C-alpha separation >= the threshold. This filters out contacts with sequentially adjacent residues that happen to be geometrically close, focusing on geometrically significant bridging across the protein surface.

Eligible valency is the number of residues that participate in at least one such qualifying pair. It is computed per frame from actual atomic coordinates, so the same pair of residues may qualify in one frame and not another if the protein is flexible.

What Is “High Valency”?

An observation is high valency when its eligible valency is 3 or more — meaning the polymer simultaneously contacts at least three spatially separated protein residues in that frame.

Primary Metrics

Metric	Field name	Description
Mean Contacts / Oligomer	`mean_contacts_per_contacting_oligomer`	Average number of distinct protein residues contacted per observation.
Multisite Fraction	`multisite_fraction`	Fraction of observations with eligible valency > 1.
High-Valency Fraction	`high_valency_fraction`	Fraction of observations with eligible valency >= 3.
Valency Distribution	`valency_probabilities`	Probability of 1-site, 2-site, and 3+-site attachment across all observations.

These four metrics are used in the default cross-condition comparison pipeline (t-tests, ANOVA, effect sizes, rankings).

Chemistry-Aware Outputs (Experimental)

Important

All chemistry-aware outputs described below are labeled polymer_bridging_chemistry in the experimental feature system. They are descriptive probabilities — observed frequencies over the observation population — not normalized enrichments and not evidence of mechanism.

Probabilities reflect what was observed in the simulation. They do not control for surface accessibility, polymer composition, or reference expectations. Interpret them as a starting point for hypothesis generation, not as proof of preferential interaction.

Chemistry-aware outputs are computed only from multivalent observations (eligible valency > 1). They are reported per-replicate and aggregated (mean +/- SEM) across replicates.

Protein Residue Classification

All protein residue outputs use the ProteinAAClassification scheme from polyzymd.analyses.shared.groupings:

Class	Amino Acids
aromatic	PHE, TRP, TYR, HIS
charged_positive	ARG, LYS
charged_negative	ASP, GLU
polar	ASN, CYS, GLN, SER, THR
nonpolar	ALA, GLY, ILE, LEU, MET, PRO, VAL
unknown	Non-standard residues

Common protonation-state variants (HIE, HID, HIP, ASH, GLH, etc.) are automatically mapped to their parent residue.

Anchor and Peripheral Residues

In each multivalent observation, the plugin identifies an anchor — the protein residue with the closest atom-level distance to the polymer. All other eligible contacted residues are peripheral.

Anchor protein class probabilities: Frequency distribution of the amino acid class of the anchor residue across all multivalent observations.
Peripheral protein class probabilities: Frequency distribution of the amino acid classes of non-anchor eligible residues.
Multivalent protein class probabilities: Frequency distribution of the amino acid classes of all eligible residues in multivalent observations (anchor + peripheral combined).

Polymer Monomer Probabilities

Polymer contact type probabilities: Frequency of each polymer monomer type (by residue name, e.g. SBM, EGM) among all polymer residues that make protein contacts in multivalent observations.
Polymer anchor type probabilities: Frequency of the polymer monomer type of the anchor (the polymer residue closest to the anchor protein residue) across multivalent observations.

Cross-Classification Matrices

Anchor-to-peripheral class matrix (anchor_to_peripheral_group_matrix): A row-normalized matrix where rows are the anchor protein class and columns are peripheral protein classes. Each row sums to 1.0. Answers: “Given that the anchor is aromatic, what protein classes are the peripheral contacts?”
Polymer-anchor to protein-anchor matrix (polymer_anchor_to_protein_anchor_matrix): A row-normalized matrix where rows are polymer monomer types and columns are protein anchor classes. Each row sums to 1.0. Answers: “Given that SBM is the polymer anchor monomer, which protein residue classes does it anchor to?”

Fragment Signature Probabilities

Each polymer fragment has an ordered 5-mer signature — the sequence of residue names along the fragment (e.g., EGM-EGM-SBM-EGM-EGM). The top-10 most frequent signatures across multivalent observations are reported as probabilities. These may help identify whether specific polymer subsequences are over-represented in bridging events.

Note

Fragment signatures depend on the topology’s residue ordering. In practice the fragment length equals the number of monomers in the polymer chain. The “5-mer” label is for illustration — the actual signature length is the full fragment.

Quick Start

Step 1: Add bridging settings to your comparison YAML

# comparison.yaml
plugins:
  polymer_bridging:
    cutoff: 4.5
    min_ca_distance_angstrom: 8.0   # Require contacted residues to be >= 8 A apart
    protein_selection: "protein"
    polymer_selection: "chainID C"  # Must match chain convention

Setting min_ca_distance_angstrom: 0 disables the geometric filter entirely, counting any 2+-residue observation as multisite.

Step 2: Run the analysis

polyzymd compare run polymer_bridging -f comparison.yaml

The plugin automatically filters out conditions that have no polymer atoms (e.g., a protein-only control).

Step 3: Inspect results

The CLI prints a comparison table with the three primary metrics. Example:

WARNING: Experimental analysis
Definitions and interpretation may change after the presentation release.
Affected: Polymer bridging chemistry profiling

Polymer Bridging Comparison
================================================================================
Multisite Fraction
  100% SBMA    : 0.312 +/- 0.018  (n=3)
  SBMA-EGPMA   : 0.487 +/- 0.025  (n=3)
  Pairwise: 100% SBMA -> SBMA-EGPMA  p=0.004 ** d=1.82  +56.1%  (more multisite)

Average Oligomer Valency
  100% SBMA    : 1.41 +/- 0.03  (n=3)
  SBMA-EGPMA   : 1.72 +/- 0.05  (n=3)
  Pairwise: 100% SBMA -> SBMA-EGPMA  p=0.008 ** d=1.54  +22.0%  (more bridging)

High-Valency Oligomers
  100% SBMA    : 0.051 +/- 0.009  (n=3)
  SBMA-EGPMA   : 0.128 +/- 0.014  (n=3)
  Pairwise: 100% SBMA -> SBMA-EGPMA  p=0.012 *  d=1.23  +150.8%  (more high-valency)

Tip

Use --format json to export the full ComparisonResult for downstream analysis, or --format markdown for integration into reports.

Step 4: Generate plots

polyzymd compare plot-all -f comparison.yaml

Configuration Reference

`plugins.polymer_bridging`

Field	Type	Default	Description
`cutoff`	float	`4.5`	Contact distance cutoff in Angstroms. Atom pairs within this distance are considered in contact.
`min_ca_distance_angstrom`	float	`0.0`	Minimum frame-wise C-alpha distance between contacted protein residues for an observation to count as multisite. Set > 0 to filter for geometrically significant bridging. Must be >= 0.
`protein_selection`	str	`"protein"`	MDAnalysis atom selection string for the protein.
`polymer_selection`	str	`"chainID C"`	MDAnalysis atom selection string for the polymer. Must match the chain convention (C = polymer).

Plot Settings

All plots are enabled by default and can be individually toggled in plot_settings.polymer_bridging:

Setting	Type	Default	Description
`generate_multisite_bars`	bool	`true`	Bar chart of multisite fraction per condition.
`generate_mean_contacts_bars`	bool	`true`	Bar chart of mean contacted residues per oligomer.
`generate_valency_stack`	bool	`true`	Stacked bars showing 1 / 2 / 3+ valency distribution.
`generate_anchor_group_bars`	bool	`true`	Grouped bars of anchor protein residue class.
`generate_protein_group_stack`	bool	`true`	Stacked bars of protein classes in multivalent events.
`generate_anchor_peripheral_heatmap`	bool	`true`	Heatmap of anchor vs. peripheral protein class co-occurrence.
`generate_polymer_anchor_heatmap`	bool	`true`	Heatmap of polymer anchor monomer vs. protein anchor class.
`generate_fragment_signature_bars`	bool	`true`	Top-10 fragment signatures by frequency.
`figsize_bars`	(float, float)	`(9, 6)`	Figure size for bar charts.
`figsize_stack`	(float, float)	`(11, 6)`	Figure size for stacked charts.
`figsize_heatmap`	(float, float)	`(9, 7)`	Figure size for heatmaps.

Output Files

Comparison cache

Per-replicate and aggregated results use fingerprinted filenames that encode the analysis settings hash, ensuring that results computed with different settings do not collide in the cache.

comparison_workspace/
├── analysis/
│   ├── <condition_A>/
│   │   └── polymer_bridging/
│   │       ├── run_1/
│   │       │   └── polymer_bridging_<settings_tag>.json
│   │       ├── run_2/
│   │       │   └── polymer_bridging_<settings_tag>.json
│   │       └── aggregated/
│   │           └── polymer_bridging_<rep_range>_<settings_tag>.json
│   └── <condition_B>/
│       └── polymer_bridging/
│           └── ...
└── comparison/
    └── polymer_bridging/
        └── result.json                  # ComparisonResult

<settings_tag> is an 8-character hex fingerprint derived from the plugin settings (cutoff, selections, etc.). <rep_range> encodes the replicate range (e.g., r1-3 for replicates 1–3).

Generated Figures

All figures are stamped with an EXPERIMENTAL tag and saved to the configured output directory (default: figures/).

Filename	Content
`polymer_bridging_multisite_fraction.*`	Bar chart of multisite probability.
`polymer_bridging_mean_contacts.*`	Bar chart of average valency.
`polymer_bridging_valency_distribution.*`	Stacked bars: 1-site, 2-site, 3+ site.
`polymer_bridging_anchor_groups.*`	Grouped bars of anchor protein residue class.
`polymer_bridging_protein_group_distribution.*`	Stacked bars of all protein classes in multivalent events.
`polymer_bridging_anchor_peripheral_heatmap.*`	Anchor class (row) vs. peripheral class (column).
`polymer_bridging_polymer_anchor_heatmap.*`	Polymer monomer type (row) vs. protein anchor class (column).
`polymer_bridging_fragment_signatures.*`	Top-10 fragment signature probabilities.

CLI Reference

polyzymd compare run polymer_bridging [OPTIONS]

Option	Description
`-f, --file PATH`	Path to `comparison.yaml` (default: `comparison.yaml`)
`--recompute`	Force recompute even if cached results exist
`--format [table\|markdown\|json]`	Output format (default: `table`)
`-o, --output PATH`	Save output to file
`-q, --quiet`	Suppress INFO messages
`--debug`	Enable DEBUG logging
`--eq-time TEXT`	Override equilibration time

Alias: polyzymd compare run bridging (resolves to polymer_bridging).

Loading Results Programmatically

Result files use fingerprinted names (see Output Files). Use glob to find the correct file, or import the result classes directly:

from pathlib import Path
from polyzymd.analyses.polymer_bridging import (
    PolymerBridgingAggregatedResult,
    PolymerBridgingReplicateResult,
)

# Load per-replicate result (fingerprinted filename)
run_dir = Path("analysis/<condition>/polymer_bridging/run_1")
rep_file = next(run_dir.glob("polymer_bridging_*.json"))
rep = PolymerBridgingReplicateResult.load(rep_file)
print(f"Multisite fraction: {rep.multisite_fraction:.3f}")
print(f"Anchor protein groups: {rep.anchor_protein_group_probabilities}")

# Load aggregated result (fingerprinted filename)
agg_dir = Path("analysis/<condition>/polymer_bridging/aggregated")
agg_file = next(agg_dir.glob("polymer_bridging_*.json"))
agg = PolymerBridgingAggregatedResult.load(agg_file)
print(f"Mean valency: {agg.mean_contacts_per_contacting_oligomer:.2f} "
      f"+/- {agg.mean_contacts_sem:.2f}")

# Inspect cross-classification matrices
for anchor_class, peripherals in agg.anchor_to_peripheral_group_matrix_mean.items():
    for peripheral_class, prob in peripherals.items():
        if prob > 0.05:
            print(f"  {anchor_class} -> {peripheral_class}: {prob:.2f}")

Interpretation Caveats

Warning

These caveats are essential for responsible use of this analysis. Read them before presenting or publishing polymer bridging results.

Descriptive, not mechanistic. All outputs are observed frequencies. A high anchor probability for aromatic residues does not prove that aromatic anchoring drives conjugate stability — it may reflect surface composition, polymer placement, or simulation artifacts.
Not normalized enrichment. Unlike binding preference analysis, polymer bridging probabilities are raw frequencies over the observation population, not enrichments normalized by surface availability. A protein with 30% aromatic surface and 30% aromatic anchors is not showing enrichment — it is showing baseline proportionality. Compare with surface composition before drawing conclusions.
Frame-wise, not event-wise. Each trajectory frame generates independent observations. A polymer that bridges two residues for 500 consecutive frames counts as 500 observations, not one sustained bridging event. Residence-time analysis is not yet implemented.
C-alpha distance is dynamic. When min_ca_distance_angstrom > 0, the threshold is evaluated per frame against actual C-alpha coordinates. Protein breathing motions mean that the same pair of residues may qualify in some frames and not others. This is physically correct but can make results sensitive to protein flexibility.
Anchor selection is heuristic. The anchor is the polymer-protein pair with the minimum atom-level distance. In cases with multiple equidistant contacts, the choice is arbitrary. This affects anchor-specific outputs but not the primary multisite/valency metrics.
Fragment signatures assume topology ordering. The ordered monomer sequence comes directly from the topology. If the topology’s residue ordering does not reflect the true polymer sequence, signatures will be misleading.
Conditions without polymer are filtered. The plugin automatically excludes conditions that lack polymer atoms (e.g., protein-only controls). This is correct behavior but means the control condition for pairwise statistics must itself contain polymer.
Proof-of-concept status. This plugin was contributed as an extensibility exercise. The analysis methodology has not been independently validated. Use it for internal hypothesis generation, not for publication-ready claims, until the methodology matures.

Relation to Other Analyses

Analysis	What It Measures	Relation to Polymer Bridging
Contacts	Total contact counts and frequencies	Polymer bridging decomposes contacts per-chain, adding valency information.
Binding Preference	Enrichment by residue class	Provides surface-normalized context that bridging lacks.
Polymer Affinity	Total interaction strength (N x deltaG)	Complementary: affinity measures total adhesion; bridging measures spatial distribution of adhesion per chain.
RMSF	Structural flexibility	Complementary: does multisite bridging correlate with reduced flexibility?
Catalytic Triad	Active site geometry	Complementary: do bridging events coincide with triad perturbation?

Troubleshooting

“No conditions passed polymer filtering”

All conditions were filtered out because the plugin could not detect polymer atoms. Check that your polymer_selection matches your topology (default is "chainID C", following the PolyzyMD chain convention).

Very low multisite fraction

If multisite fraction is near zero:

Check that min_ca_distance_angstrom is not too stringent. A value of 20+ A may filter out nearly all events for small proteins.
Verify that the polymer fragments contain more than one monomer.
Check that the contact cutoff is appropriate for your force field (4.5 A is standard for heavy-atom contacts).

All probabilities are empty (`{}`)

Chemistry-aware outputs require multivalent observations. If no observation has eligible valency > 1, all chemistry dictionaries will be empty. Lower min_ca_distance_angstrom or verify that the polymer makes multi-residue contacts.

Heatmaps show only one condition

The anchor-peripheral and polymer-anchor heatmaps currently display data for the first condition only (by label order). This is a known limitation of the current plotting code. To compare matrices across conditions, load the aggregated results programmatically (see above).