Polymer-Protein Contacts Analysis: Quick Start
Analyze polymer-protein contact frequencies and coverage in your MD simulations.
Note
This command analyzes contacts between polymer chains (Chain C) and protein residues (Chain A), following PolyzyMD’s chain convention. It computes per-residue contact fractions with proper statistical treatment.
TL;DR
# Single replicate analysis
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns
# Custom cutoff distance (default: 4.0 Å)
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns --cutoff 5.0
# Force recompute (ignore cache)
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns --recompute
Prerequisites
Before running contacts analysis, you need:
Completed production simulation - with polymer chains present
A config.yaml file - pointing to your simulation output
Trajectory files - in the scratch directory specified in your config
solvated_system.pdb - topology with correct chain assignments
PolyzyMD Chain Convention
The contacts analyzer uses PolyzyMD’s standard chain assignment:
Chain |
Contents |
|---|---|
A |
Protein/Enzyme |
B |
Substrate/Ligand |
C |
Polymers |
D+ |
Solvent (water, ions, co-solvents) |
By default, the command analyzes contacts between segid C (polymers) and
protein (chain A).
Basic Usage
Single Replicate Analysis
Create an analysis.yaml file alongside your config.yaml:
# analysis.yaml
replicates: [1]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
cutoff: 4.0
Then run all enabled analyses:
polyzymd analyze run
cd /path/to/your/project
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns
from pathlib import Path
from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer
# Load simulation config
config = SimulationConfig.from_yaml("config.yaml")
# Create analyzer with parameters
analyzer = ContactAnalyzer(
config=config,
replicate=1,
equilibration_time="10ns",
cutoff=4.0,
polymer_selection="segid C",
protein_selection="protein",
)
# Run analysis
result = analyzer.analyze()
# Access results
print(f"Coverage: {result.coverage:.1%}")
print(f"Mean contact fraction: {result.mean_contact_fraction:.1%}")
print(f"Contacted residues: {result.n_contacted}/{result.n_residues}")
Output:
Loading configuration from: config.yaml
Contact Analysis: MyEnzyme_Polymer_Study
Replicates: 1
Equilibration: 10ns
Cutoff: 4.0 Å
Polymer selection: segid C
Protein selection: protein
Grouping: residue
Processing replicate 1... done (134/181 residues contacted, 74.0% coverage, 18.0% mean contact)
Contact Analysis Complete
Contacted residues: 134/181
Coverage: 74.0%
Mean contact fraction: 18.0%
Results saved: /path/to/project/analysis/contacts/contacts_rep1.json
Key Metrics
Coverage: Fraction of protein residues that had at least one contact with polymer during the trajectory
Mean contact fraction: Average fraction of frames where each residue was in contact with polymer (averaged across all residues)
Contacted residues: Count of residues with any polymer contact
Using analysis.yaml
For reproducible, version-controlled analysis configuration, use analysis.yaml
instead of CLI flags. Place this file alongside your config.yaml.
Multi-Replicate Analysis with Full Options
Create an analysis.yaml file with all configuration options:
# analysis.yaml
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
polymer_selection: "chainID C" # MDAnalysis selection for polymer
protein_selection: "protein" # MDAnalysis selection for protein
cutoff: 4.5 # Contact distance in Angstroms
polymer_types: ["SBM", "EGM"] # Optional: filter by monomer type
grouping: "aa_class" # aa_class | secondary_structure | none
compute_residence_times: true # Enable residence time statistics
Then run all enabled analyses:
# Initialize a template (if starting fresh)
polyzymd analyze init
# Run all analyses defined in analysis.yaml
polyzymd analyze run
# Force recompute
polyzymd analyze run --recompute
# Run contacts analysis across multiple replicates with all options
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
--cutoff 4.5 \
--polymer-selection "chainID C" \
--protein-selection "protein" \
--residence-times
# Force recompute
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
--cutoff 4.5 --residence-times --recompute
from pathlib import Path
from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer
from polyzymd.analysis.contacts.aggregation import aggregate_contact_results
config = SimulationConfig.from_yaml("config.yaml")
# Analyze multiple replicates
results = []
for rep in [1, 2, 3]:
analyzer = ContactAnalyzer(
config=config,
replicate=rep,
equilibration_time="10ns",
cutoff=4.5,
polymer_selection="chainID C",
protein_selection="protein",
compute_residence_times=True,
)
results.append(analyzer.analyze())
# Aggregate results across replicates
aggregated = aggregate_contact_results(results)
print(f"Contact fraction: {aggregated.mean_contact_fraction:.1%} ± {aggregated.std_contact_fraction:.1%}")
Configuration Options
Field |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Whether to run contact analysis |
|
str |
|
MDAnalysis selection for polymer atoms |
|
str |
|
MDAnalysis selection for protein atoms |
|
float |
|
Distance cutoff in Angstroms |
|
list |
|
Filter by polymer residue names |
|
str |
|
How to group protein residues |
|
bool |
|
Compute residence time statistics |
Tip
When to use analysis.yaml vs CLI: Use analysis.yaml for standard,
reproducible workflows. Use CLI flags (polyzymd analyze contacts ...) for
one-off analyses or when exploring different parameters.
Residence Time Analysis
Residence time measures how long polymer segments remain in contact with protein residues. This is crucial for understanding binding kinetics and comparing different polymer types.
Enable Residence Time Statistics
# analysis.yaml
replicates: [1]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
compute_residence_times: true # Enable residence time statistics
polyzymd analyze run
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns --residence-times
from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer
config = SimulationConfig.from_yaml("config.yaml")
analyzer = ContactAnalyzer(
config=config,
replicate=1,
equilibration_time="10ns",
compute_residence_times=True, # Enable residence time statistics
)
result = analyzer.analyze()
# Access residence time data
for polymer_type, stats in result.residence_times.items():
print(f"{polymer_type}: mean={stats.mean:.2f} frames, max={stats.max} frames ({stats.n_events} events)")
Output:
Contact Analysis Complete
Contacted residues: 138/181
Coverage: 76.2%
Mean contact fraction: 18.3%
Residence time by polymer type:
EGM: mean=9.12 frames, max=1406 frames (6066 events)
SBM: mean=8.93 frames, max=842 frames (5401 events)
Multi-Replicate Aggregation
When analyzing multiple replicates, residence times are aggregated with proper statistical uncertainty:
# analysis.yaml
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
compute_residence_times: true
polyzymd analyze run
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns --residence-times
from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer
from polyzymd.analysis.contacts.aggregation import aggregate_contact_results
config = SimulationConfig.from_yaml("config.yaml")
results = []
for rep in [1, 2, 3]:
analyzer = ContactAnalyzer(
config=config,
replicate=rep,
equilibration_time="10ns",
compute_residence_times=True,
)
results.append(analyzer.analyze())
# Aggregate with proper uncertainty quantification
aggregated = aggregate_contact_results(results)
print(f"Contact fraction: {aggregated.mean_contact_fraction:.1%} ± {aggregated.std_contact_fraction:.1%}")
for polymer_type, stats in aggregated.residence_times.items():
print(f"{polymer_type}: {stats.mean:.2f} ± {stats.std:.2f} frames")
Output:
Aggregated Contact Analysis Complete
Contact fraction: 19.8% ± 1.9%
Residence time by polymer type:
EGM: 8.14 ± 0.56 frames
SBM: 9.60 ± 0.53 frames
Interpreting Residence Times
Mean residence time: Average duration of individual contact events (in frames)
Max residence time: Longest continuous contact observed
Total events: Number of separate contact events for this polymer type
Longer residence times suggest stronger or more stable polymer-protein interactions. Comparing residence times across polymer types reveals differences in binding behavior - for example, zwitterionic polymers (SBMA) often show different residence times than PEG-like polymers (EGMA).
Converting to Physical Time
Residence times are reported in frames by default. To convert to picoseconds, multiply by your trajectory’s timestep:
# If your trajectory saves every 100 ps:
residence_time_ps = mean_frames * 100 # ps
residence_time_ns = mean_frames * 0.1 # ns
The JSON output also includes mean_ps and max_ps fields computed using the
timestep from your configuration.
Command Options
Option |
Default |
Description |
|---|---|---|
|
Required |
Path to config.yaml |
|
|
Replicate specification: |
|
|
Equilibration time to skip |
|
|
Contact distance cutoff in Angstroms |
|
|
MDAnalysis selection for polymers |
|
|
MDAnalysis selection for protein |
|
False |
Compute and display residence time statistics by polymer type |
|
False |
Force recompute even if cached |
|
Auto |
Custom output directory |
Output Files
Results are saved as JSON in the project’s analysis directory:
project/
└── analysis/
└── contacts/
└── contacts_rep1.json
The JSON file contains per-residue contact data including:
Contact events (start frame, duration)
Contact fractions
Amino acid classifications
Polymer chain interactions
Custom Selections
Analyze specific polymer types
# analysis.yaml - Only SBMA monomers
replicates: [1]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
polymer_selection: "segid C and resname SBM" # Only SBMA
protein_selection: "protein"
polyzymd analyze run
To analyze EGMA instead, change the selection:
contacts:
polymer_selection: "segid C and resname EGM" # Only EGMA
# Only SBMA monomers
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns \
--polymer-selection "segid C and resname SBM"
# Only EGMA monomers
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns \
--polymer-selection "segid C and resname EGM"
from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer
config = SimulationConfig.from_yaml("config.yaml")
# Analyze only SBMA monomers
sbma_analyzer = ContactAnalyzer(
config=config,
replicate=1,
equilibration_time="10ns",
polymer_selection="segid C and resname SBM",
)
sbma_result = sbma_analyzer.analyze()
# Analyze only EGMA monomers
egma_analyzer = ContactAnalyzer(
config=config,
replicate=1,
equilibration_time="10ns",
polymer_selection="segid C and resname EGM",
)
egma_result = egma_analyzer.analyze()
print(f"SBMA coverage: {sbma_result.coverage:.1%}")
print(f"EGMA coverage: {egma_result.coverage:.1%}")
Analyze specific protein regions
# analysis.yaml - Only aromatic residues
replicates: [1]
defaults:
equilibration_time: "10ns"
contacts:
enabled: true
polymer_selection: "segid C"
protein_selection: "protein and (resname TRP PHE TYR)" # Aromatics only
polyzymd analyze run
For active site analysis:
contacts:
protein_selection: "protein and (resid 75-80 or resid 130-140)" # Active site
# Only aromatic residues
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns \
--protein-selection "protein and (resname TRP PHE TYR)"
# Active site region
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns \
--protein-selection "protein and (resid 75-80 or resid 130-140)"
from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer
config = SimulationConfig.from_yaml("config.yaml")
# Analyze contacts with aromatic residues only
aromatic_analyzer = ContactAnalyzer(
config=config,
replicate=1,
equilibration_time="10ns",
protein_selection="protein and (resname TRP PHE TYR)",
)
aromatic_result = aromatic_analyzer.analyze()
# Analyze contacts with active site region
active_site_analyzer = ContactAnalyzer(
config=config,
replicate=1,
equilibration_time="10ns",
protein_selection="protein and (resid 75-80 or resid 130-140)",
)
active_site_result = active_site_analyzer.analyze()
print(f"Aromatic contact fraction: {aromatic_result.mean_contact_fraction:.1%}")
print(f"Active site contact fraction: {active_site_result.mean_contact_fraction:.1%}")
Answering Scientific Questions
The contacts analysis module is designed to answer questions like:
Question |
Approach |
|---|---|
Do zwitterionic polymers preferentially contact aromatic residues? |
|
Which protein surface regions does my polymer bind? |
|
Does SBMA have longer residence times than EGMA? |
|
Quick Example: Interaction Matrix
The interaction_matrix() method computes contact metrics for each combination
of polymer type and protein amino acid class:
from polyzymd.analysis.contacts.results import ContactResult
result = ContactResult.load("analysis/contacts/contacts_rep1.json")
# Get contact fraction by (polymer_type, protein_AA_class)
matrix = result.interaction_matrix(metric="contact_fraction")
# Compare polymer types contacting aromatic residues
print(f"SBMA-aromatic: {matrix['SBM']['aromatic']:.1%}")
print(f"EGMA-aromatic: {matrix['EGM']['aromatic']:.1%}")
Example output:
SBMA-aromatic: 45.4%
EGMA-aromatic: 37.0%
Quick Example: Coverage by AA Group
See what fraction of each amino acid class your polymer contacts:
coverage = result.coverage_by_group()
for group, frac in sorted(coverage.items(), key=lambda x: -x[1]):
print(f"{group}: {frac:.1%}")
Example output:
charged_negative: 100.0%
aromatic: 100.0%
charged_positive: 100.0%
polar: 93.5%
nonpolar: 86.2%
Tip
For complete worked examples including custom polymer groupings, residence time comparisons, and complex queries, see the Contacts Analysis Cookbook.
Technical Details
Contact Definition
A contact is defined when any heavy atom of a polymer residue is within the
cutoff distance of any heavy atom of a protein residue. This uses MDAnalysis’s
capped_distance function with KDTree-based neighbor searching for O(N)
performance.
Contact Events
Contacts are stored as compressed events (start_frame, duration) rather than
frame-by-frame booleans. This provides efficient storage and enables residence
time analysis.
Statistical Treatment
Per-residue contact fractions include proper uncertainty quantification:
Statistical inefficiency (g) is computed per-residue following Chodera et al. (2007)
Effective sample sizes account for temporal autocorrelation
When aggregating across replicates, uncertainties are propagated correctly
See also
For a detailed explanation of autocorrelation, statistical inefficiency, and the LiveCoMS methodology, see the Statistics Best Practices Guide.
Troubleshooting
“solvated_system.pdb not found”
The contacts analyzer requires solvated_system.pdb in the run directory
(scratch). This file is created during polyzymd build and contains the
correct chain assignments. Do not use production_N_topology.pdb as it may
have lost chain information.
“No polymer atoms selected”
Check that your polymer selection is correct. Use MDAnalysis syntax:
segid C- all atoms in segment C (default polymer chain)resname SBM EGM- atoms with these residue namesVerify chain assignment with:
polyzymd validate -c config.yaml
Slow performance
For very large systems or long trajectories:
Use
--eq-timeto skip equilibration framesResults are cached - subsequent runs load from JSON
Consider using frame striding in custom scripts
See Also
Binding Preference Analysis - enrichment-based AA class preferences
Contacts Analysis Cookbook - worked examples for scientific questions
RMSF Analysis Quick Start - complementary stability analysis
Catalytic Triad Analysis - active site geometry
Comparing Conditions - statistical comparison across polymers