Polymer-Protein Contacts Analysis: Quick Start

Analyze polymer-protein contact frequencies and coverage in your MD simulations.

Note

This command analyzes contacts between polymer chains (Chain C) and protein residues (Chain A), following PolyzyMD’s chain convention. It computes per-residue contact fractions with proper statistical treatment.

TL;DR

# Single replicate analysis
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns

# Custom cutoff distance (default: 4.0 Å)
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns --cutoff 5.0

# Force recompute (ignore cache)
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns --recompute

Prerequisites

Before running contacts analysis, you need:

Completed production simulation - with polymer chains present
A config.yaml file - pointing to your simulation output
Trajectory files - in the scratch directory specified in your config
solvated_system.pdb - topology with correct chain assignments

PolyzyMD Chain Convention

The contacts analyzer uses PolyzyMD’s standard chain assignment:

Chain	Contents
A	Protein/Enzyme
B	Substrate/Ligand
C	Polymers
D+	Solvent (water, ions, co-solvents)

By default, the command analyzes contacts between segid C (polymers) and protein (chain A).

Basic Usage

Single Replicate Analysis

YAML (Recommended)

Create an analysis.yaml file alongside your config.yaml:

# analysis.yaml
replicates: [1]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  cutoff: 4.0

Then run all enabled analyses:

polyzymd analyze run

CLI

cd /path/to/your/project
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns

Python

from pathlib import Path
from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer

# Load simulation config
config = SimulationConfig.from_yaml("config.yaml")

# Create analyzer with parameters
analyzer = ContactAnalyzer(
    config=config,
    replicate=1,
    equilibration_time="10ns",
    cutoff=4.0,
    polymer_selection="segid C",
    protein_selection="protein",
)

# Run analysis
result = analyzer.analyze()

# Access results
print(f"Coverage: {result.coverage:.1%}")
print(f"Mean contact fraction: {result.mean_contact_fraction:.1%}")
print(f"Contacted residues: {result.n_contacted}/{result.n_residues}")

Output:

Loading configuration from: config.yaml
Contact Analysis: MyEnzyme_Polymer_Study
  Replicates: 1
  Equilibration: 10ns
  Cutoff: 4.0 Å
  Polymer selection: segid C
  Protein selection: protein
  Grouping: residue
  Processing replicate 1... done (134/181 residues contacted, 74.0% coverage, 18.0% mean contact)

Contact Analysis Complete
  Contacted residues: 134/181
  Coverage: 74.0%
  Mean contact fraction: 18.0%
  Results saved: /path/to/project/analysis/contacts/contacts_rep1.json

Key Metrics

Coverage: Fraction of protein residues that had at least one contact with polymer during the trajectory
Mean contact fraction: Average fraction of frames where each residue was in contact with polymer (averaged across all residues)
Contacted residues: Count of residues with any polymer contact

Using analysis.yaml

For reproducible, version-controlled analysis configuration, use analysis.yaml instead of CLI flags. Place this file alongside your config.yaml.

Multi-Replicate Analysis with Full Options

YAML (Recommended)

Create an analysis.yaml file with all configuration options:

# analysis.yaml
replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  polymer_selection: "chainID C"      # MDAnalysis selection for polymer
  protein_selection: "protein"        # MDAnalysis selection for protein
  cutoff: 4.5                         # Contact distance in Angstroms
  polymer_types: ["SBM", "EGM"]       # Optional: filter by monomer type
  grouping: "aa_class"                # aa_class | secondary_structure | none
  compute_residence_times: true       # Enable residence time statistics

Then run all enabled analyses:

# Initialize a template (if starting fresh)
polyzymd analyze init

# Run all analyses defined in analysis.yaml
polyzymd analyze run

# Force recompute
polyzymd analyze run --recompute

CLI

# Run contacts analysis across multiple replicates with all options
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
    --cutoff 4.5 \
    --polymer-selection "chainID C" \
    --protein-selection "protein" \
    --residence-times

# Force recompute
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
    --cutoff 4.5 --residence-times --recompute

Python

from pathlib import Path
from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer
from polyzymd.analysis.contacts.aggregation import aggregate_contact_results

config = SimulationConfig.from_yaml("config.yaml")

# Analyze multiple replicates
results = []
for rep in [1, 2, 3]:
    analyzer = ContactAnalyzer(
        config=config,
        replicate=rep,
        equilibration_time="10ns",
        cutoff=4.5,
        polymer_selection="chainID C",
        protein_selection="protein",
        compute_residence_times=True,
    )
    results.append(analyzer.analyze())

# Aggregate results across replicates
aggregated = aggregate_contact_results(results)

print(f"Contact fraction: {aggregated.mean_contact_fraction:.1%} ± {aggregated.std_contact_fraction:.1%}")

Configuration Options

Field	Type	Default	Description
`enabled`	bool	`false`	Whether to run contact analysis
`polymer_selection`	str	`"chainID C"`	MDAnalysis selection for polymer atoms
`protein_selection`	str	`"protein"`	MDAnalysis selection for protein atoms
`cutoff`	float	`4.5`	Distance cutoff in Angstroms
`polymer_types`	list	`null`	Filter by polymer residue names
`grouping`	str	`"aa_class"`	How to group protein residues
`compute_residence_times`	bool	`true`	Compute residence time statistics

Tip

When to use analysis.yaml vs CLI: Use analysis.yaml for standard, reproducible workflows. Use CLI flags (polyzymd analyze contacts ...) for one-off analyses or when exploring different parameters.

Residence Time Analysis

Residence time measures how long polymer segments remain in contact with protein residues. This is crucial for understanding binding kinetics and comparing different polymer types.

Enable Residence Time Statistics

YAML (Recommended)

# analysis.yaml
replicates: [1]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  compute_residence_times: true  # Enable residence time statistics

polyzymd analyze run

CLI

polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns --residence-times

Python

from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer

config = SimulationConfig.from_yaml("config.yaml")

analyzer = ContactAnalyzer(
    config=config,
    replicate=1,
    equilibration_time="10ns",
    compute_residence_times=True,  # Enable residence time statistics
)

result = analyzer.analyze()

# Access residence time data
for polymer_type, stats in result.residence_times.items():
    print(f"{polymer_type}: mean={stats.mean:.2f} frames, max={stats.max} frames ({stats.n_events} events)")

Output:

Contact Analysis Complete
  Contacted residues: 138/181
  Coverage: 76.2%
  Mean contact fraction: 18.3%
  Residence time by polymer type:
    EGM: mean=9.12 frames, max=1406 frames (6066 events)
    SBM: mean=8.93 frames, max=842 frames (5401 events)

Multi-Replicate Aggregation

When analyzing multiple replicates, residence times are aggregated with proper statistical uncertainty:

YAML (Recommended)

# analysis.yaml
replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  compute_residence_times: true

polyzymd analyze run

CLI

polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns --residence-times

Python

from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer
from polyzymd.analysis.contacts.aggregation import aggregate_contact_results

config = SimulationConfig.from_yaml("config.yaml")

results = []
for rep in [1, 2, 3]:
    analyzer = ContactAnalyzer(
        config=config,
        replicate=rep,
        equilibration_time="10ns",
        compute_residence_times=True,
    )
    results.append(analyzer.analyze())

# Aggregate with proper uncertainty quantification
aggregated = aggregate_contact_results(results)

print(f"Contact fraction: {aggregated.mean_contact_fraction:.1%} ± {aggregated.std_contact_fraction:.1%}")
for polymer_type, stats in aggregated.residence_times.items():
    print(f"{polymer_type}: {stats.mean:.2f} ± {stats.std:.2f} frames")

Output:

Aggregated Contact Analysis Complete
  Contact fraction: 19.8% ± 1.9%
  Residence time by polymer type:
    EGM: 8.14 ± 0.56 frames
    SBM: 9.60 ± 0.53 frames

Interpreting Residence Times

Mean residence time: Average duration of individual contact events (in frames)
Max residence time: Longest continuous contact observed
Total events: Number of separate contact events for this polymer type

Longer residence times suggest stronger or more stable polymer-protein interactions. Comparing residence times across polymer types reveals differences in binding behavior - for example, zwitterionic polymers (SBMA) often show different residence times than PEG-like polymers (EGMA).

Converting to Physical Time

Residence times are reported in frames by default. To convert to picoseconds, multiply by your trajectory’s timestep:

# If your trajectory saves every 100 ps:
residence_time_ps = mean_frames * 100  # ps
residence_time_ns = mean_frames * 0.1  # ns

The JSON output also includes mean_ps and max_ps fields computed using the timestep from your configuration.

Command Options

Option	Default	Description
`-c, --config`	Required	Path to config.yaml
`-r, --replicates`	`"1"`	Replicate specification: `'1-5'`, `'1,3,5'`, or `'1'`
`--eq-time`	`"0ns"`	Equilibration time to skip
`--cutoff`	`4.0`	Contact distance cutoff in Angstroms
`--polymer-selection`	`"segid C"`	MDAnalysis selection for polymers
`--protein-selection`	`"protein"`	MDAnalysis selection for protein
`--residence-times`	False	Compute and display residence time statistics by polymer type
`--recompute`	False	Force recompute even if cached
`-o, --output-dir`	Auto	Custom output directory

Output Files

Results are saved as JSON in the project’s analysis directory:

project/
└── analysis/
    └── contacts/
        └── contacts_rep1.json

The JSON file contains per-residue contact data including:

Contact events (start frame, duration)
Contact fractions
Amino acid classifications
Polymer chain interactions

Custom Selections

Analyze specific polymer types

YAML (Recommended)

# analysis.yaml - Only SBMA monomers
replicates: [1]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  polymer_selection: "segid C and resname SBM"  # Only SBMA
  protein_selection: "protein"

polyzymd analyze run

To analyze EGMA instead, change the selection:

contacts:
  polymer_selection: "segid C and resname EGM"  # Only EGMA

CLI

# Only SBMA monomers
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns \
    --polymer-selection "segid C and resname SBM"

# Only EGMA monomers  
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns \
    --polymer-selection "segid C and resname EGM"

Python

from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer

config = SimulationConfig.from_yaml("config.yaml")

# Analyze only SBMA monomers
sbma_analyzer = ContactAnalyzer(
    config=config,
    replicate=1,
    equilibration_time="10ns",
    polymer_selection="segid C and resname SBM",
)
sbma_result = sbma_analyzer.analyze()

# Analyze only EGMA monomers
egma_analyzer = ContactAnalyzer(
    config=config,
    replicate=1,
    equilibration_time="10ns",
    polymer_selection="segid C and resname EGM",
)
egma_result = egma_analyzer.analyze()

print(f"SBMA coverage: {sbma_result.coverage:.1%}")
print(f"EGMA coverage: {egma_result.coverage:.1%}")

Analyze specific protein regions

YAML (Recommended)

# analysis.yaml - Only aromatic residues
replicates: [1]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  polymer_selection: "segid C"
  protein_selection: "protein and (resname TRP PHE TYR)"  # Aromatics only

polyzymd analyze run

For active site analysis:

contacts:
  protein_selection: "protein and (resid 75-80 or resid 130-140)"  # Active site

CLI

# Only aromatic residues
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns \
    --protein-selection "protein and (resname TRP PHE TYR)"

# Active site region
polyzymd analyze contacts -c config.yaml -r 1 --eq-time 10ns \
    --protein-selection "protein and (resid 75-80 or resid 130-140)"

Python

from polyzymd.config import SimulationConfig
from polyzymd.analysis.contacts import ContactAnalyzer

config = SimulationConfig.from_yaml("config.yaml")

# Analyze contacts with aromatic residues only
aromatic_analyzer = ContactAnalyzer(
    config=config,
    replicate=1,
    equilibration_time="10ns",
    protein_selection="protein and (resname TRP PHE TYR)",
)
aromatic_result = aromatic_analyzer.analyze()

# Analyze contacts with active site region
active_site_analyzer = ContactAnalyzer(
    config=config,
    replicate=1,
    equilibration_time="10ns",
    protein_selection="protein and (resid 75-80 or resid 130-140)",
)
active_site_result = active_site_analyzer.analyze()

print(f"Aromatic contact fraction: {aromatic_result.mean_contact_fraction:.1%}")
print(f"Active site contact fraction: {active_site_result.mean_contact_fraction:.1%}")

Answering Scientific Questions

The contacts analysis module is designed to answer questions like:

Question	Approach
Do zwitterionic polymers preferentially contact aromatic residues?	`interaction_matrix()` with custom polymer grouping
Which protein surface regions does my polymer bind?	`coverage_by_group()`
Does SBMA have longer residence times than EGMA?	`residence_time_summary()`

Quick Example: Interaction Matrix

The interaction_matrix() method computes contact metrics for each combination of polymer type and protein amino acid class:

from polyzymd.analysis.contacts.results import ContactResult

result = ContactResult.load("analysis/contacts/contacts_rep1.json")

# Get contact fraction by (polymer_type, protein_AA_class)
matrix = result.interaction_matrix(metric="contact_fraction")

# Compare polymer types contacting aromatic residues
print(f"SBMA-aromatic: {matrix['SBM']['aromatic']:.1%}")
print(f"EGMA-aromatic: {matrix['EGM']['aromatic']:.1%}")

Example output:

SBMA-aromatic: 45.4%
EGMA-aromatic: 37.0%

Quick Example: Coverage by AA Group

See what fraction of each amino acid class your polymer contacts:

coverage = result.coverage_by_group()
for group, frac in sorted(coverage.items(), key=lambda x: -x[1]):
    print(f"{group}: {frac:.1%}")

Example output:

charged_negative: 100.0%
aromatic: 100.0%
charged_positive: 100.0%
polar: 93.5%
nonpolar: 86.2%

Tip

For complete worked examples including custom polymer groupings, residence time comparisons, and complex queries, see the Contacts Analysis Cookbook.

Technical Details

Contact Definition

A contact is defined when any heavy atom of a polymer residue is within the cutoff distance of any heavy atom of a protein residue. This uses MDAnalysis’s capped_distance function with KDTree-based neighbor searching for O(N) performance.

Contact Events

Contacts are stored as compressed events (start_frame, duration) rather than frame-by-frame booleans. This provides efficient storage and enables residence time analysis.

Statistical Treatment

Per-residue contact fractions include proper uncertainty quantification:

Statistical inefficiency (g) is computed per-residue following Chodera et al. (2007)
Effective sample sizes account for temporal autocorrelation
When aggregating across replicates, uncertainties are propagated correctly

Troubleshooting

“solvated_system.pdb not found”

The contacts analyzer requires solvated_system.pdb in the run directory (scratch). This file is created during polyzymd build and contains the correct chain assignments. Do not use production_N_topology.pdb as it may have lost chain information.

“No polymer atoms selected”

Check that your polymer selection is correct. Use MDAnalysis syntax:

segid C - all atoms in segment C (default polymer chain)
resname SBM EGM - atoms with these residue names
Verify chain assignment with: polyzymd validate -c config.yaml

Slow performance

For very large systems or long trajectories:

Use --eq-time to skip equilibration frames
Results are cached - subsequent runs load from JSON
Consider using frame striding in custom scripts

Polymer-Protein Contacts Analysis: Quick Start

TL;DR

Prerequisites

PolyzyMD Chain Convention

Basic Usage

Single Replicate Analysis

Key Metrics

Using analysis.yaml

Multi-Replicate Analysis with Full Options

Configuration Options

Residence Time Analysis

Enable Residence Time Statistics

Multi-Replicate Aggregation

Interpreting Residence Times

Converting to Physical Time

Command Options

Output Files

Custom Selections

Analyze specific polymer types

Analyze specific protein regions

Answering Scientific Questions

Quick Example: Interaction Matrix

Quick Example: Coverage by AA Group

Technical Details

Contact Definition

Contact Events

Statistical Treatment

Troubleshooting

“solvated_system.pdb not found”

“No polymer atoms selected”

Slow performance

See Also