Contacts Analysis Cookbook

Worked examples for answering common scientific questions about polymer-protein interactions using PolyzyMD’s contacts analysis module.

Note

Prerequisites: Complete a contacts analysis first using the Contacts Quick Start, then return here for advanced analysis techniques.

Overview

This cookbook answers questions like:

Recipe	Scientific Question
Recipe 1	Do zwitterionic polymers preferentially contact charged residues?
Recipe 2	Which protein regions does my polymer bind?
Recipe 3	Do zwitterionic polymers form more persistent contacts?
Recipe 4	Does polymer bind near the active site?
Recipe 5	Complex multi-criteria selections
Recipe 6	Which AA classes does my polymer prefer? (Enrichment analysis)

YAML Configuration Reference

PolyzyMD supports contacts analysis via YAML configuration files. Choose the approach that fits your workflow:

analysis.yaml (per-simulation)

Configure contacts as part of a simulation’s analysis suite. Place analysis.yaml in the same directory as your config.yaml.

# analysis.yaml
replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  polymer_selection: "chainID C"      # MDAnalysis selection for polymer
  protein_selection: "protein"        # MDAnalysis selection for protein
  cutoff: 4.5                         # Contact distance in Angstroms
  polymer_types: ["SBM", "EGM"]       # Optional: filter by monomer type
  grouping: "aa_class"                # aa_class | secondary_structure | none
  compute_residence_times: true       # Enable residence time statistics

Run with:

polyzymd analyze init      # Create template (if needed)
polyzymd analyze run       # Run all enabled analyses

Best for: Single-condition analysis, reproducible configuration, CI/CD pipelines.

comparison.yaml (multi-condition)

Configure contacts analysis for statistical comparison across multiple simulation conditions (e.g., different polymer ratios).

# comparison.yaml
name: "Polymer Stabilization Study"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../no_polymer/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../sbma_100/config.yaml"
    replicates: [1, 2, 3]

contacts:
  name: "polymer_contacts"
  polymer_selection: "resname SBM EGM"  # 3-char residue names
  protein_selection: "protein"
  cutoff: 4.5
  contact_criteria: "heavy_atom"        # distance | heavy_atom | any_atom
  fdr_alpha: 0.05                       # For statistical corrections
  min_effect_size: 0.5                  # Cohen's d threshold
  top_residues: 10                      # Top residues in output table

Run with:

polyzymd compare contacts -f comparison.yaml

Best for: Multi-condition experiments, statistical comparisons, publication figures.

Tip

YAML vs Python: Use YAML for reproducible, shareable configurations. Use Python when you need advanced features like CompositeSelector, custom contact criteria classes, or programmatic post-processing.

Recipe 1: Do zwitterionic polymers preferentially contact charged residues?

Scientific Question

You’ve synthesized copolymers with zwitterionic (SBMA) and PEG-like (EGMA) monomers. You want to know if the zwitterionic component shows preferential binding to charged protein residues compared to nonpolar residues.

Approach

YAML

Use comparison.yaml to compare polymer types across conditions:

# comparison.yaml
name: "Polymer Type Preference Study"
control: "EGMA Only"

conditions:
  - label: "SBMA Only"
    config: "../sbma_100/config.yaml"
    replicates: [1, 2, 3]

  - label: "EGMA Only"
    config: "../egma_100/config.yaml"
    replicates: [1, 2, 3]

contacts:
  name: "polymer_aa_preferences"
  polymer_selection: "resname SBM EGM"
  protein_selection: "protein"
  cutoff: 4.5
  contact_criteria: "heavy_atom"
  fdr_alpha: 0.05
  top_residues: 15

polyzymd compare contacts -f comparison.yaml --format markdown

The output shows per-residue contact fractions with statistical significance markers. Look for charged residues (ASP, GLU, LYS, ARG) in the top contacts.

Limitation: YAML comparison gives per-residue stats but doesn’t directly produce the polymer×AA-class interaction matrix. For that breakdown, use Python.

CLI

Run separate contact analyses for each polymer type:

# Analyze SBMA contacts only
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
    --polymer-selection "segid C and resname SBM" \
    -o analysis/contacts_sbma/

# Analyze EGMA contacts only
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
    --polymer-selection "segid C and resname EGM" \
    -o analysis/contacts_egma/

Then compare the JSON outputs manually or load both in Python for analysis.

Python

Use interaction_matrix() to get a complete polymer×AA-class breakdown:

from polyzymd.analysis.contacts.results import ContactResult
from polyzymd.analysis.common.groupings import CustomGrouping

# Load a contacts result (analyzed with all polymer types)
result = ContactResult.load("analysis/contacts/contacts_rep1.json")

# Define custom polymer groupings
polymer_grouping = CustomGrouping.from_groups({
    "zwitterionic": ["SBM", "SBMA", "MPC"],
    "peg_like": ["EGM", "EGMA", "OEGMA"],
})

# Get interaction matrix with custom grouping
matrix = result.interaction_matrix(
    metric="contact_fraction",
    polymer_grouping=polymer_grouping,
)

# Compare zwitterionic vs PEG-like contacts with different AA classes
print("Contact fractions by polymer type and AA class:")
print("-" * 50)
for polymer_group in ["zwitterionic", "peg_like"]:
    print(f"\n{polymer_group}:")
    for aa_class in ["charged_positive", "charged_negative", "aromatic", "nonpolar"]:
        if aa_class in matrix[polymer_group]:
            frac = matrix[polymer_group][aa_class]
            print(f"  {aa_class}: {frac:.1%}")

Example output:

Contact fractions by polymer type and AA class:
--------------------------------------------------

zwitterionic:
  charged_positive: 47.9%
  charged_negative: 52.3%
  aromatic: 45.4%
  nonpolar: 27.2%

peg_like:
  charged_positive: 31.5%
  charged_negative: 38.1%
  aromatic: 37.0%
  nonpolar: 33.0%

Interpreting Results

In this example:

Zwitterionic polymers show higher contact fractions with charged residues (47.9% and 52.3%) compared to nonpolar residues (27.2%)
PEG-like polymers show more uniform contact across AA classes
The zwitterionic preference for charged residues is consistent with electrostatic complementarity

Tip

For statistical comparison across multiple replicates/conditions, use polyzymd compare contacts with custom polymer selections.

Recipe 2: Which protein regions does my polymer bind?

Scientific Question

You want a breakdown of polymer contacts by amino acid type to understand the binding surface characteristics.

Approach

YAML

Configure analysis.yaml with AA class grouping:

# analysis.yaml
replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  polymer_selection: "chainID C"
  protein_selection: "protein"
  cutoff: 4.5
  grouping: "aa_class"              # Groups results by amino acid class
  compute_residence_times: true

polyzymd analyze run

The JSON output includes per-residue data with AA classifications. To get the coverage breakdown, load the result in Python or use CLI post-processing.

CLI

Run contacts analysis with default grouping:

polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns

The console output shows overall coverage. For AA-class breakdown, load the JSON result in Python.

Python

Use coverage_by_group() for a complete breakdown:

from polyzymd.analysis.contacts.results import ContactResult

result = ContactResult.load("analysis/contacts/contacts_rep1.json")

# Get coverage (fraction of residues contacted) by AA class
coverage = result.coverage_by_group()

print("Polymer coverage by amino acid class:")
print("-" * 40)
for group, frac in sorted(coverage.items(), key=lambda x: -x[1]):
    # Visual bar
    bar = "#" * int(frac * 20)
    print(f"{group:20s} {frac:6.1%} {bar}")

Example output:

Polymer coverage by amino acid class:
----------------------------------------
charged_negative     100.0% ####################
aromatic             100.0% ####################
charged_positive     100.0% ####################
polar                 93.5% ##################
nonpolar              86.2% #################

Understanding the Default AA Classification

PolyzyMD uses ProteinAAClassification which groups amino acids as:

Group	Residues
`aromatic`	TRP, PHE, TYR (optionally HIS)
`charged_positive`	LYS, ARG
`charged_negative`	ASP, GLU
`polar`	SER, THR, ASN, GLN, HIS, CYS
`nonpolar`	ALA, VAL, LEU, ILE, MET, PRO, GLY

Using Custom Classifications

Define your own residue groupings (Python only):

from polyzymd.analysis.common.groupings import CustomGrouping

# Custom grouping for hydrophobicity
hydrophobicity_grouping = CustomGrouping.from_groups({
    "hydrophobic": ["ALA", "VAL", "LEU", "ILE", "MET", "PHE", "TRP", "PRO"],
    "hydrophilic": ["SER", "THR", "ASN", "GLN", "LYS", "ARG", "ASP", "GLU", "HIS"],
    "amphipathic": ["TYR", "CYS", "GLY"],
})

# Use with interaction_matrix
matrix = result.interaction_matrix(
    metric="contact_fraction",
    protein_grouping=hydrophobicity_grouping,
)

Recipe 3: Comparing residence times by polymer type

Scientific Question

Do zwitterionic polymers form more persistent (longer-lasting) contacts with the protein compared to PEG-like polymers?

Approach

YAML

Enable residence time computation in analysis.yaml:

# analysis.yaml
replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  polymer_selection: "chainID C"
  protein_selection: "protein"
  cutoff: 4.5
  compute_residence_times: true     # Enable residence time statistics

polyzymd analyze run

The JSON output includes residence time data per polymer type. Load in Python to compare across polymer types.

CLI

Use the --residence-times flag:

polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns --residence-times

Output:

Contact Analysis Complete
  Contacted residues: 138/181
  Coverage: 76.2%
  Mean contact fraction: 18.3%
  Residence time by polymer type:
    EGM: mean=9.12 frames, max=1406 frames (6066 events)
    SBM: mean=8.93 frames, max=842 frames (5401 events)

Python

Use residence_time_summary() for detailed statistics:

from polyzymd.analysis.contacts.results import ContactResult

result = ContactResult.load("analysis/contacts/contacts_rep1.json")

# Get residence time statistics by polymer type
rt_summary = result.residence_time_summary()

print("Residence time by polymer type:")
print("-" * 50)
for ptype, stats in rt_summary.items():
    mean_frames = stats["mean_frames"]
    max_frames = stats["max_frames"]
    n_events = stats["n_events"]
    print(f"{ptype}:")
    print(f"  Mean: {mean_frames:.1f} frames")
    print(f"  Max:  {max_frames} frames")
    print(f"  Events: {n_events}")

Example output:

Residence time by polymer type:
--------------------------------------------------
SBM:
  Mean: 4.6 frames
  Max:  181 frames
  Events: 2847
EGM:
  Mean: 4.2 frames
  Max:  272 frames
  Events: 3102

Converting to Physical Time

Multiply by your trajectory’s output interval:

# If trajectory saves every 100 ps
timestep_ps = 100  # ps per frame

for ptype, stats in rt_summary.items():
    mean_ns = stats["mean_frames"] * timestep_ps / 1000
    max_ns = stats["max_frames"] * timestep_ps / 1000
    print(f"{ptype}: mean={mean_ns:.2f} ns, max={max_ns:.1f} ns")

Residence Time by AA Class

Get residence times broken down by both polymer type AND amino acid class:

matrix = result.interaction_matrix(metric="residence_time")

print("Mean residence time (frames) by polymer type and AA class:")
for ptype in matrix.keys():
    print(f"\n{ptype}:")
    for aa_class, mean_frames in matrix[ptype].items():
        print(f"  {aa_class}: {mean_frames:.1f} frames")

Recipe 4: Active site contacts vs surface contacts

Scientific Question

Does your polymer bind preferentially near the enzyme’s active site, or only on the exposed surface? This is important for understanding whether polymer conjugation might affect catalytic function.

Approach

YAML

Run two analyses with different protein_selection values. Create two analysis.yaml files or use CLI overrides:

Option 1: Separate analysis.yaml files

# analysis_active_site.yaml
replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  polymer_selection: "chainID C"
  protein_selection: "protein and (resid 75-80 or resid 130-140 or resid 153-160)"
  cutoff: 4.5

# analysis_surface.yaml
replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  polymer_selection: "chainID C"
  protein_selection: "protein and not (resid 75-80 or resid 130-140 or resid 153-160)"
  cutoff: 4.5

polyzymd analyze run -c analysis_active_site.yaml -o analysis/contacts_active_site/
polyzymd analyze run -c analysis_surface.yaml -o analysis/contacts_surface/

Option 2: Use CLI overrides (no extra YAML files)

See the CLI tab for this approach.

CLI

Run region-specific analyses using --protein-selection:

# Active site region (example: LipA catalytic triad vicinity)
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
    --protein-selection "protein and (resid 75-80 or resid 130-140 or resid 153-160)" \
    -o analysis/contacts_active_site/

# Surface residues (everything else)
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
    --protein-selection "protein and not (resid 75-80 or resid 130-140 or resid 153-160)" \
    -o analysis/contacts_surface/

Then compare JSON outputs in Python or manually.

Python

Use ProteinResiduesNearReference for proximity-based selection:

from polyzymd.analysis.common.selectors import ProteinResiduesNearReference

# Select protein residues within 8 A of the catalytic triad
active_site_selector = ProteinResiduesNearReference(
    reference_selection="resid 77 133 156 and name CA",
    cutoff=8.0,
)

# Use in analysis (programmatic API)
from polyzymd.analysis.contacts import ContactAnalyzer

analyzer = ContactAnalyzer(
    universe=universe,
    protein_selector=active_site_selector,
    # ... other parameters
)

Comparing Active Site vs Surface Contact Fractions

After running both analyses, compare the results:

from polyzymd.analysis.contacts.results import ContactResult

active_site = ContactResult.load("analysis/contacts_active_site/contacts_rep1.json")
surface = ContactResult.load("analysis/contacts_surface/contacts_rep1.json")

print(f"Active site mean contact: {active_site.mean_contact_fraction():.1%}")
print(f"Surface mean contact:     {surface.mean_contact_fraction():.1%}")

# If active site contact is much lower, polymer avoids the catalytic region
ratio = surface.mean_contact_fraction() / active_site.mean_contact_fraction()
if ratio > 2:
    print("Polymer preferentially binds surface (good for catalysis)")
elif ratio < 0.5:
    print("Polymer preferentially binds active site (may affect catalysis)")
else:
    print("Polymer binds uniformly across protein")

Recipe 5: Complex queries with CompositeSelector

Note

Python-only feature. CompositeSelector enables programmatic AND/OR combination of selectors, which is not available via YAML configuration. YAML supports MDAnalysis selection strings but not selector composition.

Scientific Question

Find contacts where BOTH conditions are met:

Polymer is zwitterionic (SBMA/MPC)
Protein residue is aromatic AND near the active site

This requires combining multiple selection criteria.

Python Approach: CompositeSelector

from polyzymd.analysis.common.selectors import (
    PolymerResiduesByType,
    ProteinResiduesByGroup,
    ProteinResiduesNearReference,
    CompositeSelector,
)
from polyzymd.analysis.common.groupings import ProteinAAClassification

# 1. Polymer selector: zwitterionic monomers only
polymer_selector = PolymerResiduesByType(
    residue_names=["SBM", "SBMA", "MPC"]
)

# 2. Protein selector: aromatic residues
aromatic_selector = ProteinResiduesByGroup(
    grouping=ProteinAAClassification(),
    groups=["aromatic"],
)

# 3. Protein selector: near active site
near_active_site = ProteinResiduesNearReference(
    reference_selection="resid 77 133 156",
    cutoff=10.0,
)

# 4. Combine protein selectors with AND logic
aromatic_near_active_site = CompositeSelector(
    selectors=[aromatic_selector, near_active_site],
    mode="intersection",  # AND logic
)

# 5. Use in analysis
from polyzymd.analysis.contacts import ContactAnalyzer

analyzer = ContactAnalyzer(
    universe=universe,
    polymer_selector=polymer_selector,
    protein_selector=aromatic_near_active_site,
    cutoff=4.5,
)
result = analyzer.run()

CompositeSelector Modes

Mode	Behavior
`"union"`	OR logic - residue matches if it passes ANY selector
`"intersection"`	AND logic - residue matches if it passes ALL selectors

Example: Exclude Specific Regions

Select all protein residues EXCEPT those near the active site:

YAML (Recommended)

For exclusion-based selections, use MDAnalysis not syntax directly:

# analysis.yaml
replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

contacts:
  enabled: true
  polymer_selection: "chainID C"
  # Select surface residues by excluding active site vicinity
  protein_selection: "protein and not (byres (resid 77 133 156) around 8.0)"
  cutoff: 4.5

polyzymd analyze run

Note

The byres ... around X syntax selects complete residues within X Å of the reference atoms. The not inverts this to get surface-only residues.

CLI

# Surface residues only (exclude active site vicinity)
polyzymd analyze contacts -c config.yaml -r 1-3 --eq-time 10ns \
    --protein-selection "protein and not (byres (resid 77 133 156) around 8.0)"

Python

from polyzymd.analysis.common.selectors import ProteinResiduesNearReference

# Select residues NOT near the active site (surface only)
surface_residues = ProteinResiduesNearReference(
    reference_selection="resid 77 133 156",
    cutoff=8.0,
    exclude=True,  # Invert: select residues OUTSIDE the cutoff
)

# Use in analysis
from polyzymd.analysis.contacts import ContactAnalyzer

analyzer = ContactAnalyzer(
    universe=universe,
    protein_selector=surface_residues,
    cutoff=4.5,
)
result = analyzer.run()

The exclude=True parameter inverts the selection, giving you residues that are outside the cutoff distance from the reference atoms.

Recipe 6: Binding Preference Enrichment Analysis

Scientific Question

“Which amino acid classes does my polymer preferentially bind, normalized by surface availability?”

Raw contact counts can be misleading because protein surfaces have unequal distributions of amino acid types. Binding preference analysis normalizes by surface exposure to reveal true polymer-residue preferences.

Quick Comparison Setup

YAML (Recommended)

Enable binding preference in your comparison.yaml:

# comparison.yaml
name: "Polymer AA Preference Study"

structures:
  enzyme_pdb: "structures/enzyme.pdb"

conditions:
  - label: "100% SBMA"
    config: "../sbma_100/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../egma_100/config.yaml"
    replicates: [1, 2, 3]

analysis_settings:
  contacts:
    polymer_selection: "resname SBM EGM"
    cutoff: 4.5
    
    # Enable binding preference
    compute_binding_preference: true
    surface_exposure_threshold: 0.2
    enzyme_pdb_for_sasa: "structures/enzyme.pdb"
    include_default_aa_groups: true
    
    # Optional: custom groups
    protein_groups:
      catalytic_triad: [77, 133, 156]

polyzymd compare contacts -f comparison.yaml

The output includes an enrichment table showing which AA classes each polymer type prefers (+) or avoids (-).

Python

from polyzymd.analysis.contacts import ContactResult
from polyzymd.analysis.contacts.binding_preference import compute_binding_preference
from polyzymd.analysis.contacts.surface_exposure import SurfaceExposureFilter

# Load contacts and compute surface exposure
contacts = ContactResult.load("analysis/contacts/contacts_rep1.json")
exposure = SurfaceExposureFilter(threshold=0.2).calculate("enzyme.pdb")

# Define protein groups (resids from your structure)
protein_groups = {
    "aromatic": {12, 45, 67, 89, 102},
    "charged_positive": {23, 34, 56, 78},
    "charged_negative": {15, 28, 91},
    "nonpolar": {5, 10, 20, 30, 40, 50, 60, 70},
    "polar": {8, 18, 25, 35, 42, 55, 65},
}

# Compute enrichment
result = compute_binding_preference(contacts, exposure, protein_groups)

# Display results
print("Enrichment (>1 = preference, <1 = avoidance):")
for poly, groups in result.enrichment_matrix().items():
    print(f"\n{poly}:")
    for group, enrich in sorted(groups.items(), key=lambda x: -x[1]):
        marker = "+" if enrich > 1 else "-" if enrich < 1 else "="
        print(f"  {group:20s} {enrich:5.2f} {marker}")

Interpreting Enrichment Values

Enrichment	Interpretation
> 1.5	Strong preference — polymer seeks out this AA class
1.0 - 1.5	Slight preference — above random chance
≈ 1.0	Neutral — matches surface availability
0.5 - 1.0	Slight avoidance — below random chance
< 0.5	Strong avoidance — polymer actively avoids

Example Findings

From the enrichment data:

EGM (EGMA):
  aromatic             1.90 +    # Strong preference for π-stacking
  nonpolar             1.31 +    # Hydrophobic interactions
  polar                0.71 -    # Avoids polar residues
  charged_negative     0.45 -    # Strong avoidance of ASP/GLU

SBM (SBMA):
  aromatic             1.29 +    # Moderate aromatic preference
  charged_positive     1.02 +    # Slight preference for LYS/ARG
  nonpolar             0.95 -    # Near-neutral
  polar                1.00 =    # Neutral

Interpretation:

EGMA is hydrophobic and prefers aromatic/nonpolar surfaces
SBMA is zwitterionic and shows more balanced binding
Both polymers avoid negatively charged residues

API Reference Summary

ContactResult Methods

Method	Returns	Use Case
`interaction_matrix(metric)`	`dict[polymer][aa_group]`	Compare polymer types x AA classes
`coverage_by_group()`	`dict[aa_group] -> float`	Which AA classes are contacted
`residence_time_summary()`	`dict[polymer_type] -> stats`	Residence time by polymer
`mean_contact_fraction()`	`float`	Overall contact fraction
`coverage()`	`float`	Fraction of protein residues contacted

Selector Classes

Selector	Purpose
`PolymerResiduesByType`	Filter polymer by monomer type (resname)
`ProteinResiduesByGroup`	Filter protein by AA classification
`ProteinResiduesNearReference`	Proximity to reference atoms
`CompositeSelector`	Combine selectors with AND/OR logic
`MDAnalysisSelector`	Arbitrary MDAnalysis selection string

Grouping Classes

Grouping	Purpose
`ProteinAAClassification`	Standard AA groups (aromatic, charged, polar, nonpolar)
`CustomGrouping.from_groups()`	Define your own classification

Imports

# Results
from polyzymd.analysis.contacts.results import ContactResult

# Selectors
from polyzymd.analysis.common.selectors import (
    PolymerResiduesByType,
    ProteinResiduesByGroup,
    ProteinResiduesNearReference,
    CompositeSelector,
    MDAnalysisSelector,
)

# Groupings
from polyzymd.analysis.common.groupings import (
    ProteinAAClassification,
    CustomGrouping,
)

Contacts Analysis Cookbook

Overview

YAML Configuration Reference

Recipe 1: Do zwitterionic polymers preferentially contact charged residues?

Scientific Question

Approach

Interpreting Results

Recipe 2: Which protein regions does my polymer bind?

Scientific Question

Approach

Understanding the Default AA Classification

Using Custom Classifications

Recipe 3: Comparing residence times by polymer type

Scientific Question

Approach

Converting to Physical Time

Residence Time by AA Class

Recipe 4: Active site contacts vs surface contacts

Scientific Question

Approach

Comparing Active Site vs Surface Contact Fractions

Recipe 5: Complex queries with CompositeSelector

Scientific Question

Python Approach: CompositeSelector

CompositeSelector Modes

Example: Exclude Specific Regions

Recipe 6: Binding Preference Enrichment Analysis

Scientific Question

Quick Comparison Setup

Interpreting Enrichment Values

Example Findings

API Reference Summary

ContactResult Methods

Selector Classes

Grouping Classes

Imports

See Also