Polymer Affinity Score Analysis
Quantify the total polymer-protein interaction strength by combining per-contact free energies with contact counts across all residue groups.
This analysis answers the experimentalist’s question: “Which polymer composition has the strongest overall adhesion to the enzyme surface?”
Note
The polymer affinity score is a comparative scoring metric for ranking polymer
compositions. It is derived entirely from cached binding preference data - no
additional trajectory access is needed. Run contacts analysis with
compute_binding_preference: true first.
Warning
Polymer affinity is currently labeled experimental in the presentation release. The CLI command and plots remain available, but PolyzyMD now adds explicit experimental warnings because the score definition and interpretation may still change.
Scientific Motivation
Beyond Per-Contact Free Energy
Binding free energy analysis reports the thermodynamic preference of each polymer type for each residue group (e.g., “EGMA prefers aromatics by -0.52 kT”). However, this per-contact value misses the polyvalent binding effect: many contacts happen simultaneously.
A polymer with a modest per-contact affinity but many simultaneous contacts may have a stronger total interaction than one with strong per-contact affinity but few contacts. The polymer affinity score captures this by summing:
where:
N = mean number of simultaneous contacts with a residue group
ΔG\(_{\mathrm{sel}}\) = per-contact selectivity free energy from Boltzmann inversion
The Formula
Per (polymer_type, protein_group):
Per polymer type:
Per condition (headline score):
How N_contacts Is Computed
N_contacts is the mean number of exposed residues in a protein group that are simultaneously in contact with a given polymer type, averaged across all trajectory frames after equilibration. It is not a total count of contact events over the simulation, and it is not a binary flag for whether any contact occurred.
From Raw Frames to N
The computation proceeds in four stages:
Stage 1 — Per residue, per frame (binary). For each protein residue and each trajectory frame, a contact is recorded if any heavy atom of a polymer residue of a given type is within the cutoff distance (default 4.5 A) of any heavy atom of the protein residue. Multiple polymer segments of the same type contacting the same protein residue in the same frame count as one contact (unique frame counting via set operations).
Stage 2 — Per residue, per polymer type (fraction). For each exposed protein residue, the number of frames with at least one contact from polymer type P is divided by the total number of analyzed frames. This gives a per-residue contact fraction in [0, 1].
Stage 3 — Per (polymer_type, protein_group) (mean fraction). The per-residue contact fractions are averaged across all surface-exposed residues in the protein group:
This is mathematically equivalent to the arithmetic mean of the per-residue fractions, computed as a single division for efficiency. The value answers: “on average, what fraction of frames is each exposed residue in this group contacted by this polymer type?”
Stage 4 — N_contacts. Multiplying by the group size recovers the mean number of simultaneously contacted residues per frame:
The algebra cancels n_exposed_in_group from numerator and denominator,
revealing that N is simply the total contact-frames divided by the number of
frames — i.e., the average number of contacted residues per frame.
Worked Example
Consider a polymer type SBM contacting the “aromatic” protein group (5 exposed residues) over 1000 analyzed frames:
Residue |
Frames contacted |
Per-residue fraction |
|---|---|---|
Trp23 |
800 |
0.80 |
Phe45 |
200 |
0.20 |
Tyr67 |
150 |
0.15 |
Phe89 |
50 |
0.05 |
Trp112 |
0 |
0.00 |
total_contact_frames = 800 + 200 + 150 + 50 + 0 = 1200mean_contact_fraction = 1200 / (1000 × 5) = 0.24N = 0.24 × 5 = 1.2
Interpretation: on average, 1.2 aromatic residues are simultaneously in contact with SBM in any given frame. Some frames have 0 contacts, some have 3 or 4 — but the time-averaged expectation is 1.2 simultaneous contacts.
If this group’s ΔG\(_{\mathrm{sel}}\) is -0.22 kT, the group contribution to the affinity score
is 1.2 × (-0.22) = -0.26 kT.
Why This Quantity (and Not Alternatives)
Alternative definition of N |
Why it would be wrong |
|---|---|
Total contact count over the simulation |
Conflates simulation length with interaction strength. A 1 μs trajectory would score 10× higher than a 100 ns trajectory for identical binding. |
Fraction of frames with any contact in the group |
Loses polyvalency information. Cannot distinguish 1 residue contacted vs 15 residues contacted simultaneously — the entire point of this analysis. |
Number of unique residues ever contacted |
Loses temporal information. A residue contacted once in 10,000 frames contributes equally to one contacted every frame. |
Peak (maximum) simultaneous contacts |
Sensitive to outlier frames. Not representative of the sustained interaction. |
Note
What N does not capture. Two polymers can have identical N = 10 but differ in binding mode: one contacts the same 10 residues every frame (stable binding), while the other contacts different residues each frame (transient surface scanning). For ranking total adhesion strength, this distinction does not matter — both have 10 simultaneous contacts contributing to the score. To distinguish binding modes, use residence time analysis.
Cross-Replicate Aggregation
When multiple replicates are available, mean_contact_fraction is
arithmetically averaged across replicates with standard error of the mean
(SEM). n_exposed_in_group is determined from the SASA calculation on the
crystal structure PDB and is therefore identical across replicates — it is
a property of the protein, not the trajectory. The resulting N inherits the
cross-replicate mean and uncertainty from mean_contact_fraction alone.
Sign Convention
Score |
Meaning |
|---|---|
< 0 |
Net attractive polymer-protein interaction |
= 0 |
Neutral (contacts match random expectation) |
> 0 |
Net repulsive (polymer avoids protein surface) |
More negative = stronger overall polymer-protein adhesion.
Important
Independence assumption disclaimer. The affinity score sums individual contact free energies assuming thermodynamic independence of contacts. Conformational coupling, surface exclusion effects, and reference state shifts violate this assumption. The absolute values are therefore not rigorous thermodynamic ΔG. However, the relative differences between polymer compositions are meaningful for ranking purposes, as systematic biases cancel when comparing compositions analyzed under identical conditions.
Interpretation for Experimentalists
“Among compositions with similar RMSF (structural stability), the one with the most negative affinity score has the strongest polymer-protein adhesion — good for thermal resistance, but check that it doesn’t block the active site.”
The affinity score should always be interpreted alongside:
RMSF — does strong adhesion stabilize or rigidify the enzyme?
Triad geometry — does polymer contact disrupt the catalytic triad?
Substrate distance — does the polymer occlude the active site?
Quick Start
Step 1: Ensure binding preference data exists
Polymer affinity analysis reads cached binding preference files. These are
produced automatically by polyzymd compare contacts:
# comparison.yaml
analysis_settings:
contacts:
compute_binding_preference: true
surface_exposure_threshold: 0.2
enzyme_pdb_for_sasa: "structures/enzyme.pdb"
include_default_aa_groups: true
# Generate binding preference cache (if not already done)
polyzymd compare contacts -f comparison.yaml
Step 2: Add polymer affinity settings (optional)
The polymer affinity score works with zero configuration — defaults are suitable for most use cases. To customize:
# comparison.yaml
analysis_settings:
polymer_affinity:
surface_exposure_threshold: 0.2
comparison_settings:
polymer_affinity:
fdr_alpha: 0.05
Step 3: Run polymer affinity analysis
polyzymd compare polymer-affinity -f comparison.yaml
Example Output
Polymer Affinity Score: LipA 363K Comparison
================================================================================
Formula : Score = N_contacts × ΔG_sel, where ΔG_sel = -ln(contact_share / expected_share)
Units : kT (dimensionless, in units of k_bT)
Sign : Negative = net attractive interaction
Note : Assumes thermodynamic independence of contacts (see docs for caveats)
Per-Condition Summary (sign: more negative = stronger adhesion)
--------------------------------------------------------------------------------
100% SBMA (T = 363.0 K, n = 5)
Polymer AA Group N_contacts ΔG_sel/kT Score/kT
----------------------------------------------------------------
SBM aromatic 12.3 -0.19 -2.34
SBM charged_negative 8.7 +0.08 +0.70
SBM charged_positive 6.1 -0.01 -0.06
SBM nonpolar 15.2 +0.04 +0.61
SBM polar 9.4 -0.00 -0.00
----------------------------------------------------------------
SBM TOTAL -1.09
SBMA-EGPMA 5% (T = 363.0 K, n = 5)
Polymer AA Group N_contacts ΔG_sel/kT Score/kT
----------------------------------------------------------------
SBM aromatic 14.1 -0.22 -3.10
SBM nonpolar 17.3 +0.02 +0.35
EGP aromatic 1.2 -0.45 -0.54
...
----------------------------------------------------------------
ALL TOTAL -4.21
Pairwise Score Differences (Score_B - Score_A)
--------------------------------------------------------------------------------
100% SBMA → SBMA-EGPMA 5% : ΔScore = -3.12 kT (p = 0.003 **)
Interpreting the Numbers
From the example:
SBMA-EGPMA 5% total score: -4.21 kT — stronger overall adhesion than 100% SBMA (-1.09 kT). The 5% EGPMA comonomer increases total polymer-protein binding through aromatic contacts.
SBM aromatic N_contacts=14.1, ΔG\(_{\mathrm{sel}}\)=-0.22 kT — each of ~14 simultaneous aromatic contacts contributes -0.22 kT, giving a group score of -3.10 kT.
Pairwise ΔScore = -3.12 kT, p = 0.003 — the difference is statistically significant.
Tip
Scores are in kT units. At 363 K, 1 kT = 0.72 kcal/mol = 3.02 kJ/mol. A total score difference of |ΔScore| > 2 kT is generally considered meaningful.
Configuration Reference
analysis_settings.polymer_affinity
Field |
Type |
Default |
Description |
|---|---|---|---|
|
float |
|
Minimum relative SASA for exposed residues |
|
str |
|
MDAnalysis polymer selection |
|
str |
|
MDAnalysis protein selection |
|
float |
|
Contact distance cutoff (Angstroms) |
|
bool |
|
Must be true for affinity analysis |
comparison_settings.polymer_affinity
Field |
Type |
Default |
Description |
|---|---|---|---|
|
float |
|
Benjamini-Hochberg FDR threshold |
CLI Reference
polyzymd compare polymer-affinity [OPTIONS]
Option |
Description |
|---|---|
|
Path to |
|
Force recompute |
|
Output format (default: |
|
Save output to file |
|
Suppress INFO messages |
|
Enable DEBUG logging |
|
Override equilibration time |
Usage Examples
# Default: console table
polyzymd compare polymer-affinity -f comparison.yaml
# Save markdown report
polyzymd compare polymer-affinity --format markdown -o affinity_report.md
# JSON for downstream processing
polyzymd compare polymer-affinity --format json -o affinity_result.json
Per-Replicate Computation
When per-replicate binding preference files (binding_preference_rep{N}.json)
are available, the polymer affinity score computes exact per-replicate scores:
Mean and SEM are computed across replicates. Pairwise t-tests use these per-replicate score distributions.
When per-replicate files are unavailable, the comparator falls back to analytical error propagation:
Multi-Temperature Studies
Like binding free energy analysis, the polymer affinity score uses kT units that already account for temperature. However, the underlying contact distributions may differ between temperatures, so pairwise statistics are suppressed for cross-temperature pairs.
Per-condition scores are always shown — they are valid within each temperature.
Relation to Other Analyses
Analysis |
What It Measures |
Relation to Affinity Score |
|---|---|---|
Contacts |
Which residues are touched |
Provides the raw contact data |
Binding Preference |
Enrichment per residue group |
Provides enrichment → ΔG_sel |
Binding Free Energy |
Per-contact ΔG_sel |
Component of the score (without N) |
Polymer Affinity Score |
Total interaction strength |
N × ΔG_sel summed over all groups |
RMSF |
Structural flexibility |
Complementary: stability metric |
Triad |
Active site geometry |
Complementary: function metric |
Output File Location
Results are saved to the comparison’s results/ directory:
comparison_output/
└── results/
└── polymer_affinity_score_YYYYMMDD_HHMMSS.json
Reload for downstream processing:
from polyzymd.compare.results.polymer_affinity import PolymerAffinityScoreResult
result = PolymerAffinityScoreResult.load("polymer_affinity_score_20260228_120000.json")
# Access per-condition scores
for summary in result.conditions:
print(f"\n{summary.label}: total_score = {summary.total_score:+.2f} kT")
for entry in summary.entries:
print(
f" {entry.polymer_type} × {entry.protein_group}: "
f"N={entry.n_contacts:.1f}, ΔG_sel={entry.delta_G:+.3f}, "
f"Score={entry.score:+.3f} kT"
)
# Access pairwise comparisons
for pair in result.pairwise_comparisons:
if pair.p_value is not None:
sig = "**" if pair.p_value < 0.01 else "*" if pair.p_value < 0.05 else ""
print(
f"{pair.condition_a} → {pair.condition_b}: "
f"ΔScore = {pair.delta_score:+.2f} kT, p = {pair.p_value:.4f} {sig}"
)
Troubleshooting
“No binding preference data found”
Run contacts analysis with binding preference enabled first:
polyzymd compare contacts -f comparison.yaml
Ensure compute_binding_preference: true is set in your contacts analysis
settings.
Very large scores (|S| > 50 kT)
Large scores usually indicate many contacts with strong enrichment/depletion.
Check whether your polymer selection is too broad (including solvent) or your
contact cutoff is too large. Also verify that n_exposed_in_group values are
reasonable — very large groups amplify the score.
Score of exactly 0.0
This occurs when either mean_contact_fraction = 0 (no contacts) or
enrichment = 0 (contacts exactly match random expectation). Check the
binding preference data for the affected condition.
Different scores from BFE analysis
The affinity score and BFE analysis use the same ΔG\(_{\mathrm{sel}}\) formula but different aggregation: BFE reports per-contact ΔG\(_{\mathrm{sel}}\), while the affinity score multiplies by N_contacts. The per-contact ΔG\(_{\mathrm{sel}}\) values should match exactly; differences indicate a cache or configuration mismatch.
See Also
Binding Free Energy Analysis — per-contact ΔG_sel (the component metric)
Binding Preference Analysis — the prerequisite enrichment analysis
Contacts Analysis Quick Start — basic contact computation
Statistics Best Practices — replicate planning
Comparing Conditions — multi-condition workflows
Extending Comparators — add custom comparators