# Contacts Analysis Cookbook Worked examples for answering common scientific questions about polymer-protein interactions using PolyzyMD's contacts analysis module. ```{note} **Prerequisites:** Complete a contacts analysis first using the [Contacts Quick Start](analysis_contacts_quickstart.md), then return here for advanced analysis techniques. ``` :::{admonition} Environment Setup :class: tip All commands below assume you have activated the PolyzyMD pixi environment: ```bash pixi shell -e build ``` Alternatively, prefix each command with `pixi run -e build`. ::: ## Overview This cookbook answers questions like: | Recipe | Scientific Question | |--------|---------------------| | [Recipe 1](#recipe-1-do-zwitterionic-polymers-preferentially-contact-charged-residues) | Do zwitterionic polymers preferentially contact charged residues? | | [Recipe 2](#recipe-2-which-protein-regions-does-my-polymer-bind) | Which protein regions does my polymer bind? | | [Recipe 3](#recipe-3-comparing-residence-times-by-polymer-type) | Do zwitterionic polymers form more persistent contacts? | | [Recipe 4](#recipe-4-active-site-contacts-vs-surface-contacts) | Does polymer bind near the active site? | | [Recipe 5](#recipe-5-complex-queries-with-compositeselector) | Complex multi-criteria selections | | [Recipe 6](#recipe-6-binding-preference-enrichment-analysis) | Which AA classes does my polymer prefer? (Enrichment analysis) | --- ## YAML Configuration Reference PolyzyMD supports contacts analysis via `comparison.yaml` configuration files: Configure contacts analysis for statistical comparison across multiple simulation conditions (e.g., different polymer ratios). ```yaml # comparison.yaml name: "Polymer Stabilization Study" control: "No Polymer" conditions: - label: "No Polymer" config: "../no_polymer/config.yaml" replicates: [1, 2, 3] - label: "100% SBMA" config: "../sbma_100/config.yaml" replicates: [1, 2, 3] plugins: contacts: name: "polymer_contacts" polymer_selection: "resname SBM EGM" # 3-char residue names protein_selection: "protein" cutoff: 4.5 contact_criteria: "heavy_atom" # distance | heavy_atom | any_atom fdr_alpha: 0.05 # For statistical corrections min_effect_size: 0.5 # Cohen's d threshold top_residues: 10 # Top residues in output table ``` **Run with:** ```bash polyzymd compare run contacts -f comparison.yaml ``` **Best for:** Multi-condition experiments, statistical comparisons, publication figures. ### Statistical Settings for Cross-Condition Comparison The `comparison.yaml` example above includes three statistical settings that control how cross-condition results are reported: - **`fdr_alpha`** (default 0.05) — Benjamini-Hochberg FDR correction applied to pairwise t-test p-values. All pairs × both metrics (coverage and contact_fraction) form one hypothesis family; ANOVA p-values form a separate small family. Formatters show both raw and BH-adjusted p-values, with significance markers (`*`, `**`, `***`) based on adjusted values. - **`min_effect_size`** (default 0.5) — Minimum Cohen's d for practical significance. Pairs that meet or exceed this threshold are highlighted with "†" in output. All pairs are still shown regardless of this setting. - **`top_residues`** (default 10) — Number of top contacted residues shown per condition, ranked by aggregated `contact_fraction_mean`. Affects both saved JSON and CLI output. For a full description of these settings, see the [Comparison Reference](../reference/analysis_comparison_reference.md#per-plugin-statistical-settings). ```{tip} **YAML vs Python:** Use YAML for reproducible, shareable configurations. Use Python when you need advanced features like `CompositeSelector`, custom contact criteria classes, or programmatic post-processing. ``` --- ## Recipe 1: Do zwitterionic polymers preferentially contact charged residues? ### Scientific Question You've synthesized copolymers with zwitterionic (SBMA) and PEG-like (EGMA) monomers. You want to know if the zwitterionic component shows preferential binding to charged protein residues compared to nonpolar residues. ### Approach `````{tab-set} ````{tab-item} YAML Use `comparison.yaml` to compare polymer types across conditions: ```yaml # comparison.yaml name: "Polymer Type Preference Study" control: "EGMA Only" conditions: - label: "SBMA Only" config: "../sbma_100/config.yaml" replicates: [1, 2, 3] - label: "EGMA Only" config: "../egma_100/config.yaml" replicates: [1, 2, 3] plugins: contacts: name: "polymer_aa_preferences" polymer_selection: "resname SBM EGM" protein_selection: "protein" cutoff: 4.5 contact_criteria: "heavy_atom" fdr_alpha: 0.05 top_residues: 15 ``` ```bash polyzymd compare run contacts -f comparison.yaml --format markdown ``` The output shows per-residue contact fractions with statistical significance markers. Look for charged residues (ASP, GLU, LYS, ARG) in the top contacts. **Limitation:** YAML comparison gives per-residue stats but doesn't directly produce the polymer×AA-class interaction matrix. For that breakdown, use Python. ```` ````{tab-item} CLI Run separate contact comparisons for each polymer type by updating `plugins.contacts.polymer_selection` in `comparison.yaml` between runs: ```bash # Analyze SBMA contacts only polyzymd compare run contacts -f comparison.yaml # Update polymer_selection to "segid C and resname EGM", then rerun polyzymd compare run contacts -f comparison.yaml ``` Then compare the JSON outputs manually or load both in Python for analysis. ```` ````{tab-item} Python Use `interaction_matrix()` to get a complete polymer×AA-class breakdown: ```python from polyzymd.analyses.contacts._results import ContactResult from polyzymd.analyses.shared.groupings import CustomGrouping # Load a contacts result (analyzed with all polymer types) result = ContactResult.load("analysis/contacts/contacts_rep1.json") # Define custom polymer groupings polymer_grouping = CustomGrouping.from_groups({ "zwitterionic": ["SBM", "SBMA", "MPC"], "peg_like": ["EGM", "EGMA", "OEGMA"], }) # Get interaction matrix with custom grouping matrix = result.interaction_matrix( metric="contact_fraction", polymer_grouping=polymer_grouping, ) # Compare zwitterionic vs PEG-like contacts with different AA classes print("Contact fractions by polymer type and AA class:") print("-" * 50) for polymer_group in ["zwitterionic", "peg_like"]: print(f"\n{polymer_group}:") for aa_class in ["charged_positive", "charged_negative", "aromatic", "nonpolar"]: if aa_class in matrix[polymer_group]: frac = matrix[polymer_group][aa_class] print(f" {aa_class}: {frac:.1%}") ``` **Example output:** ``` Contact fractions by polymer type and AA class: -------------------------------------------------- zwitterionic: charged_positive: 47.9% charged_negative: 52.3% aromatic: 45.4% nonpolar: 27.2% peg_like: charged_positive: 31.5% charged_negative: 38.1% aromatic: 37.0% nonpolar: 33.0% ``` ```` ````` ### Interpreting Results In this example: - **Zwitterionic polymers** show higher contact fractions with charged residues (47.9% and 52.3%) compared to nonpolar residues (27.2%) - **PEG-like polymers** show more uniform contact across AA classes - The zwitterionic preference for charged residues is consistent with electrostatic complementarity ```{tip} For statistical comparison across multiple replicates/conditions, use `polyzymd compare run contacts` with custom polymer selections. ``` --- ## Recipe 2: Which protein regions does my polymer bind? ### Scientific Question You want a breakdown of polymer contacts by amino acid type to understand the binding surface characteristics. ### Approach `````{tab-set} ````{tab-item} CLI Run contacts comparison using settings from `comparison.yaml`: ```bash polyzymd compare run contacts -f comparison.yaml ``` The console output shows overall coverage. For AA-class breakdown, load the JSON result in Python. ```` ````{tab-item} Python Use `coverage_by_group()` for a complete breakdown: ```python from polyzymd.analyses.contacts._results import ContactResult result = ContactResult.load("analysis/contacts/contacts_rep1.json") # Get coverage (fraction of residues contacted) by AA class coverage = result.coverage_by_group() print("Polymer coverage by amino acid class:") print("-" * 40) for group, frac in sorted(coverage.items(), key=lambda x: -x[1]): # Visual bar bar = "#" * int(frac * 20) print(f"{group:20s} {frac:6.1%} {bar}") ``` **Example output:** ``` Polymer coverage by amino acid class: ---------------------------------------- charged_negative 100.0% #################### aromatic 100.0% #################### charged_positive 100.0% #################### polar 93.5% ################## nonpolar 86.2% ################# ``` ```` ````` ### Understanding the Default AA Classification PolyzyMD uses `ProteinAAClassification` which groups amino acids as: | Group | Residues | |-------|----------| | `aromatic` | TRP, PHE, TYR (optionally HIS) | | `charged_positive` | LYS, ARG | | `charged_negative` | ASP, GLU | | `polar` | SER, THR, ASN, GLN, HIS, CYS | | `nonpolar` | ALA, VAL, LEU, ILE, MET, PRO, GLY | ### Using Custom Classifications Define your own residue groupings (Python only): ```python from polyzymd.analyses.shared.groupings import CustomGrouping # Custom grouping for hydrophobicity hydrophobicity_grouping = CustomGrouping.from_groups({ "hydrophobic": ["ALA", "VAL", "LEU", "ILE", "MET", "PHE", "TRP", "PRO"], "hydrophilic": ["SER", "THR", "ASN", "GLN", "LYS", "ARG", "ASP", "GLU", "HIS"], "amphipathic": ["TYR", "CYS", "GLY"], }) # Use with interaction_matrix matrix = result.interaction_matrix( metric="contact_fraction", protein_grouping=hydrophobicity_grouping, ) ``` --- ## Recipe 3: Comparing residence times by polymer type ### Scientific Question Do zwitterionic polymers form more persistent (longer-lasting) contacts with the protein compared to PEG-like polymers? ### Approach `````{tab-set} ````{tab-item} CLI Enable `compute_residence_times: true` in `comparison.yaml`, then run: ```bash polyzymd compare run contacts -f comparison.yaml ``` **Output:** ``` Contact Analysis Complete Contacted residues: 138/181 Coverage: 76.2% Mean contact fraction: 18.3% Residence time by polymer type: EGM: mean=9.12 frames, max=1406 frames (6066 events) SBM: mean=8.93 frames, max=842 frames (5401 events) ``` ```` ````{tab-item} Python Use `residence_time_summary()` for detailed statistics: ```python from polyzymd.analyses.contacts._results import ContactResult result = ContactResult.load("analysis/contacts/contacts_rep1.json") # Get residence time statistics by polymer type rt_summary = result.residence_time_summary() print("Residence time by polymer type:") print("-" * 50) for ptype, stats in rt_summary.items(): mean_frames = stats["mean_frames"] max_frames = stats["max_frames"] n_events = stats["n_events"] print(f"{ptype}:") print(f" Mean: {mean_frames:.1f} frames") print(f" Max: {max_frames} frames") print(f" Events: {n_events}") ``` **Example output:** ``` Residence time by polymer type: -------------------------------------------------- SBM: Mean: 4.6 frames Max: 181 frames Events: 2847 EGM: Mean: 4.2 frames Max: 272 frames Events: 3102 ``` ```` ````` ### Converting to Physical Time Multiply by your trajectory's output interval: ```python # If trajectory saves every 100 ps timestep_ps = 100 # ps per frame for ptype, stats in rt_summary.items(): mean_ns = stats["mean_frames"] * timestep_ps / 1000 max_ns = stats["max_frames"] * timestep_ps / 1000 print(f"{ptype}: mean={mean_ns:.2f} ns, max={max_ns:.1f} ns") ``` ### Residence Time by AA Class Get residence times broken down by both polymer type AND amino acid class: ```python matrix = result.interaction_matrix(metric="residence_time") print("Mean residence time (frames) by polymer type and AA class:") for ptype in matrix.keys(): print(f"\n{ptype}:") for aa_class, mean_frames in matrix[ptype].items(): print(f" {aa_class}: {mean_frames:.1f} frames") ``` --- ## Recipe 4: Active site contacts vs surface contacts ### Scientific Question Does your polymer bind preferentially near the enzyme's active site, or only on the exposed surface? This is important for understanding whether polymer conjugation might affect catalytic function. ### Approach `````{tab-set} ````{tab-item} YAML Run two analyses with different `protein_selection` values. Create two `comparison.yaml` variants or edit `comparison.yaml` between runs: **Option 1: Separate comparison.yaml files** ```yaml # comparison_active_site.yaml (excerpt) plugins: contacts: polymer_selection: "chainID C" protein_selection: "protein and (resid 75-80 or resid 130-140 or resid 153-160)" cutoff: 4.5 ``` ```yaml # comparison_surface.yaml (excerpt) plugins: contacts: polymer_selection: "chainID C" protein_selection: "protein and not (resid 75-80 or resid 130-140 or resid 153-160)" cutoff: 4.5 ``` ```bash polyzymd compare run contacts -f comparison_active_site.yaml polyzymd compare run contacts -f comparison_surface.yaml ``` **Option 2: Use CLI overrides (no extra YAML files)** See the CLI tab for this approach. ```` ````{tab-item} CLI Run region-specific analyses by updating `plugins.contacts.protein_selection` in `comparison.yaml` and rerunning: ```bash # Active site region (example: LipA catalytic triad vicinity) polyzymd compare run contacts -f comparison.yaml # Surface residues: update protein_selection, then rerun polyzymd compare run contacts -f comparison.yaml ``` Then compare JSON outputs in Python or manually. ```` ````{tab-item} Python Use `ProteinResiduesNearReference` for proximity-based selection: ```python from polyzymd.analyses.shared.selectors import ProteinResiduesNearReference # Select protein residues within 8 A of the catalytic triad active_site_selector = ProteinResiduesNearReference( reference_selection="resid 77 133 156 and name CA", cutoff=8.0, ) # Use in analysis (programmatic API) from polyzymd.analyses.contacts import ParallelContactAnalyzer analyzer = ParallelContactAnalyzer( universe=universe, protein_selector=active_site_selector, # ... other parameters ) ``` ```` ````` ### Comparing Active Site vs Surface Contact Fractions After running both analyses, compare the results: ```python from polyzymd.analyses.contacts._results import ContactResult active_site = ContactResult.load("analysis/contacts_active_site/contacts_rep1.json") surface = ContactResult.load("analysis/contacts_surface/contacts_rep1.json") print(f"Active site mean contact: {active_site.mean_contact_fraction():.1%}") print(f"Surface mean contact: {surface.mean_contact_fraction():.1%}") # If active site contact is much lower, polymer avoids the catalytic region ratio = surface.mean_contact_fraction() / active_site.mean_contact_fraction() if ratio > 2: print("Polymer preferentially binds surface (good for catalysis)") elif ratio < 0.5: print("Polymer preferentially binds active site (may affect catalysis)") else: print("Polymer binds uniformly across protein") ``` --- ## Recipe 5: Complex queries with CompositeSelector ```{note} **Python-only feature.** `CompositeSelector` enables programmatic AND/OR combination of selectors, which is not available via YAML configuration. YAML supports MDAnalysis selection strings but not selector composition. ``` ### Scientific Question Find contacts where BOTH conditions are met: - Polymer is zwitterionic (SBMA/MPC) - Protein residue is aromatic AND near the active site This requires combining multiple selection criteria. ### Python Approach: CompositeSelector ```python from polyzymd.analyses.shared.selectors import ( PolymerResiduesByType, ProteinResiduesByGroup, ProteinResiduesNearReference, CompositeSelector, ) from polyzymd.analyses.shared.groupings import ProteinAAClassification # 1. Polymer selector: zwitterionic monomers only polymer_selector = PolymerResiduesByType( residue_names=["SBM", "SBMA", "MPC"] ) # 2. Protein selector: aromatic residues aromatic_selector = ProteinResiduesByGroup( grouping=ProteinAAClassification(), groups=["aromatic"], ) # 3. Protein selector: near active site near_active_site = ProteinResiduesNearReference( reference_selection="resid 77 133 156", cutoff=10.0, ) # 4. Combine protein selectors with AND logic aromatic_near_active_site = CompositeSelector( selectors=[aromatic_selector, near_active_site], mode="intersection", # AND logic ) # 5. Use in analysis from polyzymd.analyses.contacts import ParallelContactAnalyzer analyzer = ParallelContactAnalyzer( universe=universe, polymer_selector=polymer_selector, protein_selector=aromatic_near_active_site, cutoff=4.5, ) result = analyzer.run() ``` ### CompositeSelector Modes | Mode | Behavior | |------|----------| | `"union"` | OR logic - residue matches if it passes ANY selector | | `"intersection"` | AND logic - residue matches if it passes ALL selectors | ### Example: Exclude Specific Regions Select all protein residues EXCEPT those near the active site: `````{tab-set} ````{tab-item} CLI ```bash # Surface residues only (exclude active site vicinity) polyzymd compare run contacts -f comparison.yaml ``` ```` ````{tab-item} Python ```python from polyzymd.analyses.shared.selectors import ProteinResiduesNearReference # Select residues NOT near the active site (surface only) surface_residues = ProteinResiduesNearReference( reference_selection="resid 77 133 156", cutoff=8.0, exclude=True, # Invert: select residues OUTSIDE the cutoff ) # Use in analysis from polyzymd.analyses.contacts import ParallelContactAnalyzer analyzer = ParallelContactAnalyzer( universe=universe, protein_selector=surface_residues, cutoff=4.5, ) result = analyzer.run() ``` The `exclude=True` parameter inverts the selection, giving you residues that are **outside** the cutoff distance from the reference atoms. ```` ````` --- ## Recipe 6: Binding Preference Enrichment Analysis ### Scientific Question *"Which amino acid classes does my polymer preferentially bind, normalized by surface availability?"* Raw contact counts can be misleading because protein surfaces have unequal distributions of amino acid types. **Binding preference analysis** normalizes by surface exposure to reveal true polymer-residue preferences. ```{seealso} For a comprehensive guide to binding preference analysis, including formulas and interpretation, see [Binding Preference Analysis](analysis_binding_preference.md). ``` ### Quick Comparison Setup `````{tab-set} ````{tab-item} YAML (Recommended) Enable binding preference in your `comparison.yaml`: ```yaml # comparison.yaml name: "Polymer AA Preference Study" structures: enzyme_pdb: "structures/enzyme.pdb" conditions: - label: "100% SBMA" config: "../sbma_100/config.yaml" replicates: [1, 2, 3] - label: "100% EGMA" config: "../egma_100/config.yaml" replicates: [1, 2, 3] plugins: contacts: polymer_selection: "resname SBM EGM" cutoff: 4.5 # Enable binding preference compute_binding_preference: true surface_exposure_threshold: 0.2 enzyme_pdb_for_sasa: "structures/enzyme.pdb" include_default_aa_groups: true # Optional: custom groups protein_groups: catalytic_triad: [77, 133, 156] ``` ```bash polyzymd compare run contacts -f comparison.yaml ``` The output includes an enrichment table showing which AA classes each polymer type prefers (+) or avoids (-). ```` ````{tab-item} Python ```python from polyzymd.analyses.contacts._results import ContactResult from polyzymd.analyses.shared.binding_preference import compute_binding_preference from polyzymd.analyses.shared.surface_exposure import SurfaceExposureFilter # Load contacts and compute surface exposure contacts = ContactResult.load("analysis/contacts/contacts_rep1.json") exposure = SurfaceExposureFilter(threshold=0.2).calculate("enzyme.pdb") # Define protein groups (resids from your structure) protein_groups = { "aromatic": {12, 45, 67, 89, 102}, "charged_positive": {23, 34, 56, 78}, "charged_negative": {15, 28, 91}, "nonpolar": {5, 10, 20, 30, 40, 50, 60, 70}, "polar": {8, 18, 25, 35, 42, 55, 65}, } # Compute enrichment result = compute_binding_preference(contacts, exposure, protein_groups) # Display results print("Enrichment (>1 = preference, <1 = avoidance):") for poly, groups in result.enrichment_matrix().items(): print(f"\n{poly}:") for group, enrich in sorted(groups.items(), key=lambda x: -x[1]): marker = "+" if enrich > 1 else "-" if enrich < 1 else "=" print(f" {group:20s} {enrich:5.2f} {marker}") ``` ```` ````` ### Interpreting Enrichment Values | Enrichment | Interpretation | |------------|----------------| | **> 1.5** | Strong preference — polymer seeks out this AA class | | **1.0 - 1.5** | Slight preference — above random chance | | **≈ 1.0** | Neutral — matches surface availability | | **0.5 - 1.0** | Slight avoidance — below random chance | | **< 0.5** | Strong avoidance — polymer actively avoids | ### Example Findings From the enrichment data: ``` EGM (EGMA): aromatic 1.90 + # Strong preference for π-stacking nonpolar 1.31 + # Hydrophobic interactions polar 0.71 - # Avoids polar residues charged_negative 0.45 - # Strong avoidance of ASP/GLU SBM (SBMA): aromatic 1.29 + # Moderate aromatic preference charged_positive 1.02 + # Slight preference for LYS/ARG nonpolar 0.95 - # Near-neutral polar 1.00 = # Neutral ``` **Interpretation:** - EGMA is hydrophobic and prefers aromatic/nonpolar surfaces - SBMA is zwitterionic and shows more balanced binding - Both polymers avoid negatively charged residues --- ## API Reference Summary ### ContactResult Methods | Method | Returns | Use Case | |--------|---------|----------| | `interaction_matrix(metric)` | `dict[polymer][aa_group]` | Compare polymer types x AA classes | | `coverage_by_group()` | `dict[aa_group] -> float` | Which AA classes are contacted | | `residence_time_summary()` | `dict[polymer_type] -> stats` | Residence time by polymer | | `mean_contact_fraction()` | `float` | Overall contact fraction | | `coverage()` | `float` | Fraction of protein residues contacted | ### Selector Classes | Selector | Purpose | |----------|---------| | `PolymerResiduesByType` | Filter polymer by monomer type (resname) | | `ProteinResiduesByGroup` | Filter protein by AA classification | | `ProteinResiduesNearReference` | Proximity to reference atoms | | `CompositeSelector` | Combine selectors with AND/OR logic | | `MDAnalysisSelector` | Arbitrary MDAnalysis selection string | ### Grouping Classes | Grouping | Purpose | |----------|---------| | `ProteinAAClassification` | Standard AA groups (aromatic, charged, polar, nonpolar) | | `CustomGrouping.from_groups()` | Define your own classification | ### Imports ```python # Results from polyzymd.analyses.contacts._results import ContactResult # Selectors from polyzymd.analyses.shared.selectors import ( PolymerResiduesByType, ProteinResiduesByGroup, ProteinResiduesNearReference, CompositeSelector, MDAnalysisSelector, ) # Groupings from polyzymd.analyses.shared.groupings import ( ProteinAAClassification, CustomGrouping, ) ``` --- ## See Also - [Contacts Quick Start](analysis_contacts_quickstart.md) - basic usage and CLI reference - [Binding Preference Analysis](analysis_binding_preference.md) - enrichment formulas and interpretation - [Comparing Conditions](analysis_compare_conditions.md) - statistical comparison across simulations - [RMSF Quick Start](analysis_rmsf_quickstart.md) - complementary flexibility analysis