# Interpreting Polymer Bridging Results

```{warning}
Polymer bridging is **experimental**. This page documents how to read and
reason about the outputs. Definitions and interpretation guidance may change
as the methodology matures.
```

This page explains the concepts behind polymer bridging analysis — what the
numbers mean, what they do not mean, and how to think about them in the context
of enzyme-polymer conjugate design.

## What Problem Does This Solve?

Standard contact analysis tells you *how many* polymer-protein contacts exist,
but not *how those contacts are distributed among polymer chains*. Consider two
scenarios that produce identical total contact counts:

- **Scenario A:** 10 polymer chains each contact 1 protein residue → 10
  contacts, all single-site.
- **Scenario B:** 1 polymer chain contacts 10 protein residues → 10 contacts,
  one chain bridging across the surface.

These produce the same total contact count but represent fundamentally
different binding modes. Polymer bridging analysis distinguishes them by
analyzing contacts at the **per-fragment, per-frame** level.

## The Observation Model

The analysis produces a flat table of observations, conceptually:

| Frame | Fragment | Protein residues contacted | Eligible valency | Anchor | Peripheral |
|-------|----------|---------------------------|------------------|--------|------------|
| 100   | frag_0   | {Lys42, Phe88}           | 2                | Phe88  | Lys42      |
| 100   | frag_1   | {Glu15}                  | 1                | —      | —          |
| 101   | frag_0   | {Lys42, Phe88, Asp92}    | 3                | Phe88  | Lys42, Asp92 |
| 101   | frag_1   | {}                       | 0 (no contact)   | —      | —          |

Frame 101, frag_1 produces no observation (no contact). Frame 101, frag_0
produces one observation with eligible valency 3 (high-valency). All
statistics are computed over the set of all observations.

### Why Per-Frame, Not Per-Event?

A bridging "event" in physical terms is a sustained period where a polymer
fragment bridges two or more residues. The current implementation does not
track event persistence — each frame is independent. This means a 500-frame
bridging event counts 500 times, biasing statistics toward long-lived bridges.

This is a known limitation. It means:

- **Multisite fraction** is biased toward long-lived bridging (a feature if you
  care about cumulative adhesion, a bug if you care about event frequency).
- **Anchor/peripheral probabilities** over-represent stable configurations.
- Transient, single-frame bridging events are diluted rather than suppressed.

For most conjugate design questions — *"how much of the interaction is
multisite?"* — the frame-weighted view is informative. For event-counting
questions — *"how often does bridging initiate?"* — you would need
residence-time analysis (not yet implemented).

## Understanding the C-Alpha Distance Filter

### Why Filter at All?

Without filtering, a polymer contacting residues Lys42 and Arg43 (which are
sequential neighbors, C-alpha distance ~3.8 A) is classified as multisite.
This is technically correct but not physically interesting — the polymer is
merely spanning a single local patch.

Setting `min_ca_distance_angstrom` to, e.g., 8.0 A requires that at least one
pair of contacted residues be separated by >= 8 A in **that specific frame**.
This focuses the analysis on cases where the polymer genuinely bridges across
the protein surface.

### Why Dynamic (Per-Frame)?

The filter uses frame-wise C-alpha coordinates, not static crystal-structure
distances. This matters because:

1. **Loop motions** can bring distant residues closer or push nearby residues
   apart.
2. **Domain motions** can dramatically change inter-residue distances.
3. A static threshold based on the crystal structure would be wrong when the
   protein flexes.

The downside is that the effective stringency varies with protein dynamics. In
a highly flexible region, the same pair may alternate between qualifying and
not qualifying frame-to-frame, reducing the apparent multisite fraction
compared to a static filter.

### Choosing a Threshold

| Threshold | Effect | Use When |
|-----------|--------|----------|
| `0.0` | No filtering; any 2+ residues = multisite | Exploratory; counting all multi-residue contacts |
| `5.0–8.0 A` | Moderate; filters sequential neighbors | Default starting point for most proteins |
| `10.0–15.0 A` | Stringent; requires substantial spatial separation | Looking specifically for long-range bridging |
| `> 20.0 A` | Very stringent; may produce zero events | Only for large proteins with clear domain separation |

There is no single "correct" value. Report the threshold used and consider
running the analysis at 2–3 thresholds to assess sensitivity.

## Reading the Chemistry-Aware Outputs

### What the Probabilities Are

All chemistry outputs are **descriptive frequencies** over the multivalent
observation population. "Anchor protein class probability: aromatic = 0.45"
means that 45% of multivalent observations had an aromatic residue as the
anchor.

### What They Are Not

- **Not enrichment.** A probability of 0.45 for aromatics does not mean
  aromatics are preferentially involved in bridging. If 45% of the
  surface-exposed protein residues are aromatic, 0.45 is exactly the null
  expectation. Compare with surface composition (from contacts or binding
  preference analysis) before interpreting.

- **Not mechanism.** Observing that SBM monomers frequently anchor to aromatic
  residues does not prove a causal cation-pi or hydrophobic interaction. It
  may reflect spatial proximity in the initial placement, force-field bias,
  or sampling limitations.

- **Not normalized.** If condition A has 10 multivalent observations and
  condition B has 1000, both produce probability distributions that sum to 1.
  The statistical weight of the two conditions differs enormously, but this
  is not visible in the probabilities alone. Check `contacting_observations`
  and `multisite_observations` counts.

### Reading the Heatmaps

The anchor-to-peripheral matrix answers a conditional question: *"Given that
the anchor is class X, what classes are the peripheral contacts?"*

Each row is independently normalized to sum to 1.0. This means:

- You **can** compare cells within a row (e.g., "when the anchor is aromatic,
  peripheral contacts are 40% polar and 30% charged_negative").
- You **cannot** directly compare cells across rows unless the row totals
  (number of observations per anchor class) are similar.
- Empty rows indicate that no multivalent observation had that anchor class.

The same logic applies to the polymer-anchor-to-protein-anchor matrix.

## Limitations and Scientific Caveats

### What This Analysis Cannot Tell You

1. **Whether bridging is good or bad for conjugate function.** Multisite
   attachment could stabilize the enzyme (scaffold effect) or could lock
   the protein into an unfavorable conformation. The analysis describes
   the phenomenon; functional consequences require additional evidence
   (RMSF, triad geometry, activity assays).

2. **Whether condition A has "more" bridging than condition B in an absolute
   sense.** The multisite *fraction* can increase either because multisite
   events increase or because single-site events decrease. Both shift the
   fraction. Check the raw observation counts alongside the fractions.

3. **Equilibrium properties.** MD simulations of enzyme-polymer conjugates
   are rarely at equilibrium. Bridging statistics reflect the sampled
   trajectory, not a converged ensemble. Multiple independent replicates
   with different initial conditions help, but do not guarantee convergence.

### Known Methodological Limitations

- **No residence-time decomposition.** Cannot distinguish persistent bridging
  from rapid attachment/detachment.
- **Anchor selection is heuristic.** Based on minimum atom distance, which
  may not correspond to the energetically dominant contact.
- **No excluded-volume correction.** Larger polymer fragments have more
  opportunities for multisite contact purely due to chain length, independent
  of any interaction specificity.
- **Single-structure heatmaps.** The current plotting code shows
  cross-classification heatmaps for only the first condition. Programmatic
  loading is needed for cross-condition matrix comparison.

## Worked Example: Interpreting a Result

Suppose you have two conditions:

| Metric | 100% SBMA | SBMA-EGPMA 5% |
|--------|-----------|---------------|
| Multisite fraction | 0.31 +/- 0.02 | 0.49 +/- 0.03 |
| Mean contacts / oligomer | 1.41 +/- 0.03 | 1.72 +/- 0.05 |
| High-valency fraction | 0.05 +/- 0.01 | 0.13 +/- 0.01 |
| Anchor: aromatic | 0.38 | 0.44 |
| Anchor: charged_negative | 0.22 | 0.18 |

**What you can say:**

- The mixed copolymer shows significantly more multisite attachment
  (p < 0.01, large effect size). The difference is robust across replicates.
- The valency distribution shifts toward higher valency in the copolymer.
- Aromatic residues are the most common anchor class in both conditions,
  but this may simply reflect surface composition.

**What you should not say (without further analysis):**

- "The EGPMA comonomer *causes* more bridging." (Correlation; the copolymer
  differs in multiple ways.)
- "Aromatic residues *drive* bridging via pi-stacking." (No evidence of
  mechanism from frequencies alone.)
- "The copolymer provides better enzyme stabilization through bridging."
  (Bridging is not inherently stabilizing.)

## See Also

- {doc}`../how_to/analysis_polymer_bridging` — configuration, CLI, and output reference
- {doc}`../how_to/analysis_binding_preference` — surface-normalized enrichment for
  comparison
- {doc}`analysis_statistics_best_practices` — replicate planning and
  statistical interpretation