Contacts Analysis Cookbook
Worked examples for answering common scientific questions about polymer-protein interactions using PolyzyMD’s contacts analysis module.
Note
Prerequisites: Complete a contacts analysis first using the Contacts Quick Start, then return here for advanced analysis techniques.
Overview
This cookbook answers questions like:
Recipe |
Scientific Question |
|---|---|
Do zwitterionic polymers preferentially contact charged residues? |
|
Which protein regions does my polymer bind? |
|
Do zwitterionic polymers form more persistent contacts? |
|
Does polymer bind near the active site? |
|
Complex multi-criteria selections |
|
Which AA classes does my polymer prefer? (Enrichment analysis) |
YAML Configuration Reference
PolyzyMD supports contacts analysis via YAML configuration files. Choose the approach that fits your workflow:
Configure contacts as part of a simulation’s analysis suite. Place analysis.yaml
in the same directory as your config.yaml.
# analysis.yaml
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
polymer_selection: "chainID C" # MDAnalysis selection for polymer
protein_selection: "protein" # MDAnalysis selection for protein
cutoff: 4.5 # Contact distance in Angstroms
polymer_types: ["SBM", "EGM"] # Optional: filter by monomer type
grouping: "aa_class" # aa_class | secondary_structure | none
compute_residence_times: true # Enable residence time statistics
Run with:
polyzymd analyze init # Create template (if needed)
polyzymd analyze run # Run all enabled analyses
Best for: Single-condition analysis, reproducible configuration, CI/CD pipelines.
Configure contacts analysis for statistical comparison across multiple simulation conditions (e.g., different polymer ratios).
# comparison.yaml
name: "Polymer Stabilization Study"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../no_polymer/config.yaml"
replicates: [1, 2, 3]
- label: "100% SBMA"
config: "../sbma_100/config.yaml"
replicates: [1, 2, 3]
contacts:
name: "polymer_contacts"
polymer_selection: "resname SBM EGM" # 3-char residue names
protein_selection: "protein"
cutoff: 4.5
contact_criteria: "heavy_atom" # distance | heavy_atom | any_atom
fdr_alpha: 0.05 # For statistical corrections
min_effect_size: 0.5 # Cohen's d threshold
top_residues: 10 # Top residues in output table
Run with:
polyzymd compare contacts -f comparison.yaml
Best for: Multi-condition experiments, statistical comparisons, publication figures.
Tip
YAML vs Python: Use YAML for reproducible, shareable configurations. Use
Python when you need advanced features like CompositeSelector, custom contact
criteria classes, or programmatic post-processing.
Recipe 1: Do zwitterionic polymers preferentially contact charged residues?
Scientific Question
You’ve synthesized copolymers with zwitterionic (SBMA) and PEG-like (EGMA) monomers. You want to know if the zwitterionic component shows preferential binding to charged protein residues compared to nonpolar residues.
Approach
Use comparison.yaml to compare polymer types across conditions:
# comparison.yaml
name: "Polymer Type Preference Study"
control: "EGMA Only"
conditions:
- label: "SBMA Only"
config: "../sbma_100/config.yaml"
replicates: [1, 2, 3]
- label: "EGMA Only"
config: "../egma_100/config.yaml"
replicates: [1, 2, 3]
contacts:
name: "polymer_aa_preferences"
polymer_selection: "resname SBM EGM"
protein_selection: "protein"
cutoff: 4.5
contact_criteria: "heavy_atom"
fdr_alpha: 0.05
top_residues: 15
polyzymd compare contacts -f comparison.yaml --format markdown
The output shows per-residue contact fractions with statistical significance markers. Look for charged residues (ASP, GLU, LYS, ARG) in the top contacts.
Limitation: YAML comparison gives per-residue stats but doesn’t directly produce the polymer×AA-class interaction matrix. For that breakdown, use Python.
Run separate contact analyses for each polymer type:
# Analyze SBMA contacts only
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
--polymer-selection "segid C and resname SBM" \
-o analysis/contacts_sbma/
# Analyze EGMA contacts only
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
--polymer-selection "segid C and resname EGM" \
-o analysis/contacts_egma/
Then compare the JSON outputs manually or load both in Python for analysis.
Use interaction_matrix() to get a complete polymer×AA-class breakdown:
from polyzymd.analysis.contacts.results import ContactResult
from polyzymd.analysis.common.groupings import CustomGrouping
# Load a contacts result (analyzed with all polymer types)
result = ContactResult.load("analysis/contacts/contacts_rep1.json")
# Define custom polymer groupings
polymer_grouping = CustomGrouping.from_groups({
"zwitterionic": ["SBM", "SBMA", "MPC"],
"peg_like": ["EGM", "EGMA", "OEGMA"],
})
# Get interaction matrix with custom grouping
matrix = result.interaction_matrix(
metric="contact_fraction",
polymer_grouping=polymer_grouping,
)
# Compare zwitterionic vs PEG-like contacts with different AA classes
print("Contact fractions by polymer type and AA class:")
print("-" * 50)
for polymer_group in ["zwitterionic", "peg_like"]:
print(f"\n{polymer_group}:")
for aa_class in ["charged_positive", "charged_negative", "aromatic", "nonpolar"]:
if aa_class in matrix[polymer_group]:
frac = matrix[polymer_group][aa_class]
print(f" {aa_class}: {frac:.1%}")
Example output:
Contact fractions by polymer type and AA class:
--------------------------------------------------
zwitterionic:
charged_positive: 47.9%
charged_negative: 52.3%
aromatic: 45.4%
nonpolar: 27.2%
peg_like:
charged_positive: 31.5%
charged_negative: 38.1%
aromatic: 37.0%
nonpolar: 33.0%
Interpreting Results
In this example:
Zwitterionic polymers show higher contact fractions with charged residues (47.9% and 52.3%) compared to nonpolar residues (27.2%)
PEG-like polymers show more uniform contact across AA classes
The zwitterionic preference for charged residues is consistent with electrostatic complementarity
Tip
For statistical comparison across multiple replicates/conditions, use
polyzymd compare contacts with custom polymer selections.
Recipe 2: Which protein regions does my polymer bind?
Scientific Question
You want a breakdown of polymer contacts by amino acid type to understand the binding surface characteristics.
Approach
Configure analysis.yaml with AA class grouping:
# analysis.yaml
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
polymer_selection: "chainID C"
protein_selection: "protein"
cutoff: 4.5
grouping: "aa_class" # Groups results by amino acid class
compute_residence_times: true
polyzymd analyze run
The JSON output includes per-residue data with AA classifications. To get the coverage breakdown, load the result in Python or use CLI post-processing.
Run contacts analysis with default grouping:
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns
The console output shows overall coverage. For AA-class breakdown, load the JSON result in Python.
Use coverage_by_group() for a complete breakdown:
from polyzymd.analysis.contacts.results import ContactResult
result = ContactResult.load("analysis/contacts/contacts_rep1.json")
# Get coverage (fraction of residues contacted) by AA class
coverage = result.coverage_by_group()
print("Polymer coverage by amino acid class:")
print("-" * 40)
for group, frac in sorted(coverage.items(), key=lambda x: -x[1]):
# Visual bar
bar = "#" * int(frac * 20)
print(f"{group:20s} {frac:6.1%} {bar}")
Example output:
Polymer coverage by amino acid class:
----------------------------------------
charged_negative 100.0% ####################
aromatic 100.0% ####################
charged_positive 100.0% ####################
polar 93.5% ##################
nonpolar 86.2% #################
Understanding the Default AA Classification
PolyzyMD uses ProteinAAClassification which groups amino acids as:
Group |
Residues |
|---|---|
|
TRP, PHE, TYR (optionally HIS) |
|
LYS, ARG |
|
ASP, GLU |
|
SER, THR, ASN, GLN, HIS, CYS |
|
ALA, VAL, LEU, ILE, MET, PRO, GLY |
Using Custom Classifications
Define your own residue groupings (Python only):
from polyzymd.analysis.common.groupings import CustomGrouping
# Custom grouping for hydrophobicity
hydrophobicity_grouping = CustomGrouping.from_groups({
"hydrophobic": ["ALA", "VAL", "LEU", "ILE", "MET", "PHE", "TRP", "PRO"],
"hydrophilic": ["SER", "THR", "ASN", "GLN", "LYS", "ARG", "ASP", "GLU", "HIS"],
"amphipathic": ["TYR", "CYS", "GLY"],
})
# Use with interaction_matrix
matrix = result.interaction_matrix(
metric="contact_fraction",
protein_grouping=hydrophobicity_grouping,
)
Recipe 3: Comparing residence times by polymer type
Scientific Question
Do zwitterionic polymers form more persistent (longer-lasting) contacts with the protein compared to PEG-like polymers?
Approach
Enable residence time computation in analysis.yaml:
# analysis.yaml
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
polymer_selection: "chainID C"
protein_selection: "protein"
cutoff: 4.5
compute_residence_times: true # Enable residence time statistics
polyzymd analyze run
The JSON output includes residence time data per polymer type. Load in Python to compare across polymer types.
Use the --residence-times flag:
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns --residence-times
Output:
Contact Analysis Complete
Contacted residues: 138/181
Coverage: 76.2%
Mean contact fraction: 18.3%
Residence time by polymer type:
EGM: mean=9.12 frames, max=1406 frames (6066 events)
SBM: mean=8.93 frames, max=842 frames (5401 events)
Use residence_time_summary() for detailed statistics:
from polyzymd.analysis.contacts.results import ContactResult
result = ContactResult.load("analysis/contacts/contacts_rep1.json")
# Get residence time statistics by polymer type
rt_summary = result.residence_time_summary()
print("Residence time by polymer type:")
print("-" * 50)
for ptype, stats in rt_summary.items():
mean_frames = stats["mean_frames"]
max_frames = stats["max_frames"]
n_events = stats["n_events"]
print(f"{ptype}:")
print(f" Mean: {mean_frames:.1f} frames")
print(f" Max: {max_frames} frames")
print(f" Events: {n_events}")
Example output:
Residence time by polymer type:
--------------------------------------------------
SBM:
Mean: 4.6 frames
Max: 181 frames
Events: 2847
EGM:
Mean: 4.2 frames
Max: 272 frames
Events: 3102
Converting to Physical Time
Multiply by your trajectory’s output interval:
# If trajectory saves every 100 ps
timestep_ps = 100 # ps per frame
for ptype, stats in rt_summary.items():
mean_ns = stats["mean_frames"] * timestep_ps / 1000
max_ns = stats["max_frames"] * timestep_ps / 1000
print(f"{ptype}: mean={mean_ns:.2f} ns, max={max_ns:.1f} ns")
Residence Time by AA Class
Get residence times broken down by both polymer type AND amino acid class:
matrix = result.interaction_matrix(metric="residence_time")
print("Mean residence time (frames) by polymer type and AA class:")
for ptype in matrix.keys():
print(f"\n{ptype}:")
for aa_class, mean_frames in matrix[ptype].items():
print(f" {aa_class}: {mean_frames:.1f} frames")
Recipe 4: Active site contacts vs surface contacts
Scientific Question
Does your polymer bind preferentially near the enzyme’s active site, or only on the exposed surface? This is important for understanding whether polymer conjugation might affect catalytic function.
Approach
Run two analyses with different protein_selection values. Create two
analysis.yaml files or use CLI overrides:
Option 1: Separate analysis.yaml files
# analysis_active_site.yaml
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
polymer_selection: "chainID C"
protein_selection: "protein and (resid 75-80 or resid 130-140 or resid 153-160)"
cutoff: 4.5
# analysis_surface.yaml
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
polymer_selection: "chainID C"
protein_selection: "protein and not (resid 75-80 or resid 130-140 or resid 153-160)"
cutoff: 4.5
polyzymd analyze run -c analysis_active_site.yaml -o analysis/contacts_active_site/
polyzymd analyze run -c analysis_surface.yaml -o analysis/contacts_surface/
Option 2: Use CLI overrides (no extra YAML files)
See the CLI tab for this approach.
Run region-specific analyses using --protein-selection:
# Active site region (example: LipA catalytic triad vicinity)
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
--protein-selection "protein and (resid 75-80 or resid 130-140 or resid 153-160)" \
-o analysis/contacts_active_site/
# Surface residues (everything else)
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
--protein-selection "protein and not (resid 75-80 or resid 130-140 or resid 153-160)" \
-o analysis/contacts_surface/
Then compare JSON outputs in Python or manually.
Use ProteinResiduesNearReference for proximity-based selection:
from polyzymd.analysis.common.selectors import ProteinResiduesNearReference
# Select protein residues within 8 A of the catalytic triad
active_site_selector = ProteinResiduesNearReference(
reference_selection="resid 77 133 156 and name CA",
cutoff=8.0,
)
# Use in analysis (programmatic API)
from polyzymd.analysis.contacts import ContactAnalyzer
analyzer = ContactAnalyzer(
universe=universe,
protein_selector=active_site_selector,
# ... other parameters
)
Comparing Active Site vs Surface Contact Fractions
After running both analyses, compare the results:
from polyzymd.analysis.contacts.results import ContactResult
active_site = ContactResult.load("analysis/contacts_active_site/contacts_rep1.json")
surface = ContactResult.load("analysis/contacts_surface/contacts_rep1.json")
print(f"Active site mean contact: {active_site.mean_contact_fraction():.1%}")
print(f"Surface mean contact: {surface.mean_contact_fraction():.1%}")
# If active site contact is much lower, polymer avoids the catalytic region
ratio = surface.mean_contact_fraction() / active_site.mean_contact_fraction()
if ratio > 2:
print("Polymer preferentially binds surface (good for catalysis)")
elif ratio < 0.5:
print("Polymer preferentially binds active site (may affect catalysis)")
else:
print("Polymer binds uniformly across protein")
Recipe 5: Complex queries with CompositeSelector
Note
Python-only feature. CompositeSelector enables programmatic AND/OR
combination of selectors, which is not available via YAML configuration.
YAML supports MDAnalysis selection strings but not selector composition.
Scientific Question
Find contacts where BOTH conditions are met:
Polymer is zwitterionic (SBMA/MPC)
Protein residue is aromatic AND near the active site
This requires combining multiple selection criteria.
Python Approach: CompositeSelector
from polyzymd.analysis.common.selectors import (
PolymerResiduesByType,
ProteinResiduesByGroup,
ProteinResiduesNearReference,
CompositeSelector,
)
from polyzymd.analysis.common.groupings import ProteinAAClassification
# 1. Polymer selector: zwitterionic monomers only
polymer_selector = PolymerResiduesByType(
residue_names=["SBM", "SBMA", "MPC"]
)
# 2. Protein selector: aromatic residues
aromatic_selector = ProteinResiduesByGroup(
grouping=ProteinAAClassification(),
groups=["aromatic"],
)
# 3. Protein selector: near active site
near_active_site = ProteinResiduesNearReference(
reference_selection="resid 77 133 156",
cutoff=10.0,
)
# 4. Combine protein selectors with AND logic
aromatic_near_active_site = CompositeSelector(
selectors=[aromatic_selector, near_active_site],
mode="intersection", # AND logic
)
# 5. Use in analysis
from polyzymd.analysis.contacts import ContactAnalyzer
analyzer = ContactAnalyzer(
universe=universe,
polymer_selector=polymer_selector,
protein_selector=aromatic_near_active_site,
cutoff=4.5,
)
result = analyzer.run()
CompositeSelector Modes
Mode |
Behavior |
|---|---|
|
OR logic - residue matches if it passes ANY selector |
|
AND logic - residue matches if it passes ALL selectors |
Example: Exclude Specific Regions
Select all protein residues EXCEPT those near the active site:
For exclusion-based selections, use MDAnalysis not syntax directly:
# analysis.yaml
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
polymer_selection: "chainID C"
# Select surface residues by excluding active site vicinity
protein_selection: "protein and not (byres (resid 77 133 156) around 8.0)"
cutoff: 4.5
polyzymd analyze run
Note
The byres ... around X syntax selects complete residues within X Å of the
reference atoms. The not inverts this to get surface-only residues.
# Surface residues only (exclude active site vicinity)
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
--protein-selection "protein and not (byres (resid 77 133 156) around 8.0)"
from polyzymd.analysis.common.selectors import ProteinResiduesNearReference
# Select residues NOT near the active site (surface only)
surface_residues = ProteinResiduesNearReference(
reference_selection="resid 77 133 156",
cutoff=8.0,
exclude=True, # Invert: select residues OUTSIDE the cutoff
)
# Use in analysis
from polyzymd.analysis.contacts import ContactAnalyzer
analyzer = ContactAnalyzer(
universe=universe,
protein_selector=surface_residues,
cutoff=4.5,
)
result = analyzer.run()
The exclude=True parameter inverts the selection, giving you residues
that are outside the cutoff distance from the reference atoms.
Recipe 6: Binding Preference Enrichment Analysis
Scientific Question
“Which amino acid classes does my polymer preferentially bind, normalized by surface availability?”
Raw contact counts can be misleading because protein surfaces have unequal distributions of amino acid types. Binding preference analysis normalizes by surface exposure to reveal true polymer-residue preferences.
See also
For a comprehensive guide to binding preference analysis, including formulas and interpretation, see Binding Preference Analysis.
Quick Comparison Setup
Enable binding preference in your comparison.yaml:
# comparison.yaml
name: "Polymer AA Preference Study"
structures:
enzyme_pdb: "structures/enzyme.pdb"
conditions:
- label: "100% SBMA"
config: "../sbma_100/config.yaml"
replicates: [1, 2, 3]
- label: "100% EGMA"
config: "../egma_100/config.yaml"
replicates: [1, 2, 3]
analysis_settings:
contacts:
polymer_selection: "resname SBM EGM"
cutoff: 4.5
# Enable binding preference
compute_binding_preference: true
surface_exposure_threshold: 0.2
enzyme_pdb_for_sasa: "structures/enzyme.pdb"
include_default_aa_groups: true
# Optional: custom groups
protein_groups:
catalytic_triad: [77, 133, 156]
polyzymd compare contacts -f comparison.yaml
The output includes an enrichment table showing which AA classes each polymer type prefers (+) or avoids (-).
from polyzymd.analysis.contacts import ContactResult
from polyzymd.analysis.contacts.binding_preference import compute_binding_preference
from polyzymd.analysis.contacts.surface_exposure import SurfaceExposureFilter
# Load contacts and compute surface exposure
contacts = ContactResult.load("analysis/contacts/contacts_rep1.json")
exposure = SurfaceExposureFilter(threshold=0.2).calculate("enzyme.pdb")
# Define protein groups (resids from your structure)
protein_groups = {
"aromatic": {12, 45, 67, 89, 102},
"charged_positive": {23, 34, 56, 78},
"charged_negative": {15, 28, 91},
"nonpolar": {5, 10, 20, 30, 40, 50, 60, 70},
"polar": {8, 18, 25, 35, 42, 55, 65},
}
# Compute enrichment
result = compute_binding_preference(contacts, exposure, protein_groups)
# Display results
print("Enrichment (>1 = preference, <1 = avoidance):")
for poly, groups in result.enrichment_matrix().items():
print(f"\n{poly}:")
for group, enrich in sorted(groups.items(), key=lambda x: -x[1]):
marker = "+" if enrich > 1 else "-" if enrich < 1 else "="
print(f" {group:20s} {enrich:5.2f} {marker}")
Interpreting Enrichment Values
Enrichment |
Interpretation |
|---|---|
> 1.5 |
Strong preference — polymer seeks out this AA class |
1.0 - 1.5 |
Slight preference — above random chance |
≈ 1.0 |
Neutral — matches surface availability |
0.5 - 1.0 |
Slight avoidance — below random chance |
< 0.5 |
Strong avoidance — polymer actively avoids |
Example Findings
From the enrichment data:
EGM (EGMA):
aromatic 1.90 + # Strong preference for π-stacking
nonpolar 1.31 + # Hydrophobic interactions
polar 0.71 - # Avoids polar residues
charged_negative 0.45 - # Strong avoidance of ASP/GLU
SBM (SBMA):
aromatic 1.29 + # Moderate aromatic preference
charged_positive 1.02 + # Slight preference for LYS/ARG
nonpolar 0.95 - # Near-neutral
polar 1.00 = # Neutral
Interpretation:
EGMA is hydrophobic and prefers aromatic/nonpolar surfaces
SBMA is zwitterionic and shows more balanced binding
Both polymers avoid negatively charged residues
API Reference Summary
ContactResult Methods
Method |
Returns |
Use Case |
|---|---|---|
|
|
Compare polymer types x AA classes |
|
|
Which AA classes are contacted |
|
|
Residence time by polymer |
|
|
Overall contact fraction |
|
|
Fraction of protein residues contacted |
Selector Classes
Selector |
Purpose |
|---|---|
|
Filter polymer by monomer type (resname) |
|
Filter protein by AA classification |
|
Proximity to reference atoms |
|
Combine selectors with AND/OR logic |
|
Arbitrary MDAnalysis selection string |
Grouping Classes
Grouping |
Purpose |
|---|---|
|
Standard AA groups (aromatic, charged, polar, nonpolar) |
|
Define your own classification |
Imports
# Results
from polyzymd.analysis.contacts.results import ContactResult
# Selectors
from polyzymd.analysis.common.selectors import (
PolymerResiduesByType,
ProteinResiduesByGroup,
ProteinResiduesNearReference,
CompositeSelector,
MDAnalysisSelector,
)
# Groupings
from polyzymd.analysis.common.groupings import (
ProteinAAClassification,
CustomGrouping,
)
See Also
Contacts Quick Start - basic usage and CLI reference
Binding Preference Analysis - enrichment formulas and interpretation
Comparing Conditions - statistical comparison across simulations
RMSF Quick Start - complementary flexibility analysis