How to Compare Simulation Conditions

Use this guide when you already have completed PolyzyMD simulations and want to compare conditions with the polyzymd compare workflow.

You will:

  • define a comparison.yaml

  • validate it

  • run stable comparisons

  • generate figures with polyzymd compare plot-all

Important

For the v1.2.0 presentation release, the stable comparison stack is RMSF, contacts, distances, catalytic triad, and secondary structure. Binding preference, exposure dynamics, binding free energy, and polymer affinity remain available, but PolyzyMD labels them as experimental.

Note

If you have not yet run per-condition analyses, start with Tutorial: Analyze a Study from Finished Simulations.

Before You Start

Make sure each condition already has:

  • a config.yaml

  • finished trajectories

  • per-condition analysis results from polyzymd analyze run

For metric-specific setup, use the individual analysis guides before comparing conditions.

Step 1: Create a Comparison Workspace

polyzymd compare init -n polymer_stability_study
cd polymer_stability_study

This creates:

polymer_stability_study/
├── comparison.yaml
├── figures/
├── results/
└── structures/

Step 2: Define a Minimal comparison.yaml

Start with one stable analysis. RMSF is a good first comparison because it has no extra structural dependencies.

name: "polymer_stability_study"
description: "Effect of polymer composition on enzyme flexibility"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

analysis_settings:
  rmsf:
    selection: "protein and name CA"

comparison_settings:
  rmsf: {}

You can add other stable analyses later by extending analysis_settings and comparison_settings.

Step 3: Validate the Config

polyzymd compare validate

You should see a passing summary with the study name, condition count, and the configured analysis sections.

Step 4: Run Comparisons

Run One Stable Comparison

polyzymd compare run rmsf

This loads the existing per-condition results, performs the statistical comparison, prints a formatted summary, and writes JSON to results/.

Run All Enabled Comparisons

Once you have multiple stable analyses configured, use:

polyzymd compare run-all

If you want figures immediately afterward:

polyzymd compare run-all --plot

Step 5: Generate Figures

For a final smoke test of the comparison workspace:

polyzymd compare plot-all --list-available
polyzymd compare plot-all

--list-available is useful because it shows which plot types are enabled and which ones are marked as experimental.

Step 6: Check the Outputs

After a successful run, expect files like these:

results/
├── rmsf_comparison_polymer_stability_study.json
├── contacts_comparison_polymer_stability_study.json
├── distances_comparison_polymer_stability_study.json
└── triad_comparison_polymer_stability_study.json

figures/
├── rmsf_comparison.png
├── rmsf_profile.png
├── triad_kde_panel.png
└── ...

If your smoke test is polyzymd compare plot-all, success means:

  • the command completes without error

  • stable plots render normally

  • experimental plots, if enabled, render with explicit experimental labeling

Adding More Stable Analyses

Common next additions to comparison.yaml are:

  • contacts for polymer coverage and contact fraction

  • distances for custom atom-pair distances

  • catalytic_triad for active-site geometry

For end-to-end examples of these workflows, see:

Experimental Workflows

Experimental workflows remain available, but they are not the default path for the presentation release. Use them only when you explicitly want those metrics:

Troubleshooting

config path not found

Paths in comparison.yaml are resolved relative to the location of comparison.yaml, not your current shell directory.

No analyses are enabled

You need at least one section under analysis_settings and a matching entry under comparison_settings.

plot-all runs but expected figures are missing

Check that the corresponding comparison JSON files already exist in results/ and use polyzymd compare plot-all --list-available to verify the enabled plot types.

See Also

RMSF Comparison: polymer_stability_study
============================================================
Equilibration: 10ns
Selection: protein and name CA
Control: No Polymer

Condition Summary (ranked by RMSF, lowest first)
------------------------------------------------------------
Rank  Condition            Mean RMSF    SEM        N
------------------------------------------------------------
1     100% SBMA               0.551 A    0.0344  2
2     100% EGMA               0.597 A    0.0725  3
3     No Polymer              0.715 A    0.0203  3   *
4     50/50 Mix               0.728 A    0.0336  3
------------------------------------------------------------
* = control condition

Pairwise Comparisons
--------------------------------------------------------------------------------
Comparison                     % Change   p-value      Cohen's d  Effect
--------------------------------------------------------------------------------
100% SBMA vs No Polymer        -22.9%     0.0211*      4.06       large
100% EGMA vs No Polymer        -16.4%     0.1944       1.27       large
50/50 Mix vs No Polymer        +1.9%      0.7445       -0.29      small
--------------------------------------------------------------------------------
* p < 0.05

Output Fields Explained

Field

Meaning

Mean RMSF

Average RMSF across all replicates (Angstroms)

SEM

Standard error of the mean

N

Number of replicates

% Change

Relative to control (negative = more stable)

p-value

Two-sample t-test p-value

Cohen’s d

Effect size (positive = control higher)

Effect

Effect size interpretation

Ranking

Conditions are ranked by mean RMSF:

  • Rank 1 = Lowest RMSF = Most stable

  • Lower RMSF indicates less flexibility/movement

Statistical Analysis

T-Tests

Each condition is compared to the control (or all pairs if no control) using an independent two-sample t-test:

  • p < 0.05: Statistically significant difference (marked with *)

  • p > 0.05: Not statistically significant

Tip

With small sample sizes (N=3), you can only detect large effects. Non-significant p-values don’t mean “no effect” – just insufficient evidence. See Best Practices Guide for interpretation guidelines.

Effect Size (Cohen’s d)

Cohen’s d quantifies the magnitude of the difference:

Cohen’s d

Interpretation

< 0.2

Negligible

0.2 - 0.5

Small

0.5 - 0.8

Medium

> 0.8

Large

Why report effect size? A large effect (d > 0.8) suggests a meaningful difference even if p > 0.05 due to small sample size.

ANOVA

For 3+ conditions, one-way ANOVA tests whether any condition differs:

One-way ANOVA
----------------------------------------
F-statistic: 3.151
p-value:     0.0955
Significant: No (alpha=0.05)

ANOVA p < 0.05 indicates at least one condition differs from the others. Use pairwise comparisons to identify which ones.

Output Formats

Table (Default)

Console-friendly ASCII table:

polyzymd compare rmsf --format table

Markdown

Publication-ready tables for documentation:

polyzymd compare rmsf --format markdown -o report.md

JSON

Machine-readable for further analysis:

polyzymd compare rmsf --format json -o results.json

Working Example

Comparing Polymer Stabilization

Setup for a study comparing enzyme stability with different polymer coatings:

comparison.yaml:

name: "SBMA_EGMA_stabilization"
description: "Does SBMA stabilize LipA better than EGMA?"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_LipA_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_DMSO/config.yaml"
    replicates: [1, 2]

  - label: "100% EGMA"
    config: "../EGMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "50/50 SBMA:EGMA"
    config: "../SBMA_EGMA_50_50_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

analysis_settings:
  rmsf:
    selection: "protein and name CA"

comparison_settings:
  rmsf: {}

Run:

polyzymd compare rmsf
from polyzymd.compare import ComparisonConfig, RMSFComparator

# Load comparison configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get RMSF settings from analysis_settings
rmsf_settings = config.analysis_settings.get("rmsf")

# Run RMSF comparison
comparator = RMSFComparator(
    config=config,
    rmsf_settings=rmsf_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Print summary table
print(f"RMSF Comparison: {result.name}")
print(f"Control: {result.control_label}")
print()

# Ranked conditions
print("Conditions (ranked by RMSF, lowest first):")
for i, label in enumerate(result.ranking, 1):
    cond = result.get_condition(label)
    marker = " *" if label == result.control_label else ""
    print(f"  {i}. {label}: {cond.mean_rmsf:.3f} ± {cond.sem_rmsf:.3f} Å{marker}")

print()

# Pairwise comparisons
print("Pairwise comparisons vs control:")
for comp in result.pairwise_comparisons:
    sig = "*" if comp.significant else ""
    print(f"  {comp.condition_b} vs {comp.condition_a}: "
          f"{comp.percent_change:+.1f}%, p={comp.p_value:.3f}{sig}, d={comp.cohens_d:.2f}")

# Save results
result.save("results/rmsf_comparison.json")

Key findings from output:

Condition

Mean RMSF

vs Control

Significant?

100% SBMA

0.551 A

-22.9%

Yes (p=0.021)

100% EGMA

0.597 A

-16.4%

No (p=0.194)

50/50 Mix

0.728 A

+1.9%

No (p=0.745)

Interpretation:

  • 100% SBMA significantly stabilizes the enzyme (22.9% lower RMSF, p < 0.05)

  • 100% EGMA shows a large effect (d=1.27) but isn’t statistically significant

  • 50/50 Mix shows no benefit over the control

Saved Results

Results are automatically saved to results/:

my_study/
├── comparison.yaml
├── results/
│   └── rmsf_comparison_my_study.json  # Full results
└── figures/

Result JSON Structure

{
    "metric": "rmsf",
    "name": "my_study",
    "control_label": "No Polymer",
    "conditions": [
        {
            "label": "No Polymer",
            "mean_rmsf": 0.715,
            "sem_rmsf": 0.020,
            "n_replicates": 3,
            "replicate_values": [0.755, 0.693, 0.696]
        },
        ...
    ],
    "pairwise_comparisons": [
        {
            "condition_a": "No Polymer",
            "condition_b": "100% SBMA",
            "percent_change": -22.9,
            "p_value": 0.0211,
            "cohens_d": 4.06,
            "significant": true
        },
        ...
    ],
    "ranking": ["100% SBMA", "100% EGMA", "No Polymer", "50/50 Mix"],
    ...
}

Viewing Saved Results

# Display a saved result
polyzymd compare show results/rmsf_comparison_my_study.json

# Different format
polyzymd compare show results/rmsf_comparison_my_study.json --format markdown

CLI Reference

polyzymd compare init

Create a new comparison project:

polyzymd compare init NAME [OPTIONS]

Arguments:
  NAME                   Project name (creates directory)

Options:
  --eq-time TEXT         Default equilibration time [default: 10ns]
  -o, --output-dir PATH  Parent directory [default: current]

polyzymd compare rmsf

Run RMSF comparison:

polyzymd compare rmsf [OPTIONS]

Options:
  -f, --file PATH                 Config file [default: comparison.yaml]
  --eq-time TEXT                  Override equilibration time
  --selection TEXT                Override atom selection
  --recompute                     Force recompute RMSF
  --format [table|markdown|json]  Output format [default: table]
  -o, --output PATH               Save formatted output to file
  -q, --quiet                     Suppress INFO messages
  --debug                         Enable DEBUG logging

polyzymd compare validate

Check configuration:

polyzymd compare validate [OPTIONS]

Options:
  -f, --file PATH    Config file [default: comparison.yaml]

polyzymd compare show

Display saved results:

polyzymd compare show RESULT_FILE [OPTIONS]

Arguments:
  RESULT_FILE                     Path to saved JSON

Options:
  --format [table|markdown|json]  Output format [default: table]

polyzymd compare plot

Generate publication-ready plots from saved results:

polyzymd compare plot RESULT_FILE [OPTIONS]

Arguments:
  RESULT_FILE                     Path to saved comparison JSON

Options:
  -o, --output-dir PATH           Output directory [default: figures/]
  --format [png|pdf|svg]          Image format [default: png]
  --dpi INTEGER                   Resolution for PNG [default: 150]
  --summary / --no-summary        Generate summary panel [default: yes]
  --show / --no-show              Display interactively [default: no]

Generating Plots

The polyzymd compare plot command creates publication-ready figures from comparison results.

Quick Start

# Generate all plots
polyzymd compare plot results/rmsf_comparison_my_study.json

# High resolution for publication
polyzymd compare plot results/rmsf_comparison_my_study.json --dpi 300

# PDF format (vector graphics)
polyzymd compare plot results/rmsf_comparison_my_study.json --format pdf

# Preview interactively
polyzymd compare plot results/rmsf_comparison_my_study.json --show
from pathlib import Path
from polyzymd.compare.config import ComparisonConfig
from polyzymd.compare.plotter import ComparisonPlotter

# Load comparison config
config = ComparisonConfig.from_yaml("comparison.yaml")

# Generate all plots via the plotter registry
plotter = ComparisonPlotter(config)
paths = plotter.plot_all()
print(f"Generated {len(paths)} plots")

Generated Plots

The plotter registry automatically generates all applicable plots for each analysis type configured in comparison.yaml. For RMSF analysis, this includes:

Plot

Description

rmsf_comparison.png

Bar chart of mean RMSF by condition with SEM error bars

rmsf_profile.png

Per-residue RMSF line plot with optional SS annotation

Other analysis types (contacts, distances, secondary structure, etc.) generate their own plot sets. See the plot_settings section in comparison.yaml for per-analysis customization options.

Example Output

After running:

polyzymd compare plot-all

You get all configured plots saved to the output_dir specified in plot_settings (default: figures/).

Python API

Programmatic Comparison

from polyzymd.compare import ComparisonConfig, RMSFComparator

# Load configuration (must have analysis_settings.rmsf section)
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get RMSF settings
rmsf_settings = config.analysis_settings.get("rmsf")

# Run comparison
comparator = RMSFComparator(
    config=config,
    rmsf_settings=rmsf_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access results
print(f"Most stable: {result.ranking[0]}")
for cond in result.conditions:
    print(f"{cond.label}: {cond.mean_rmsf:.3f} +/- {cond.sem_rmsf:.3f} A")

# Statistical comparisons
for comp in result.pairwise_comparisons:
    if comp.significant:
        print(f"{comp.condition_b} vs {comp.condition_a}: "
              f"p={comp.p_value:.4f}, d={comp.cohens_d:.2f}")

Formatting Results

from polyzymd.compare import format_markdown, format_console_table

# Get markdown output
md_text = format_markdown(result)
with open("report.md", "w") as f:
    f.write(md_text)

# Console output
print(format_console_table(result))

Loading Saved Results

from polyzymd.compare import ComparisonResult

# Load from JSON
result = ComparisonResult.load("results/rmsf_comparison_my_study.json")

# Access data
control = result.get_condition("No Polymer")
print(f"Control RMSF: {control.mean_rmsf:.3f} A")

# Get specific comparison
comp = result.get_comparison("100% SBMA")
print(f"SBMA vs control: {comp.percent_change:+.1f}%, p={comp.p_value:.4f}")

Plotting with Python

from pathlib import Path
from polyzymd.compare.config import ComparisonConfig
from polyzymd.compare.plotter import ComparisonPlotter

# Load config and generate all plots
config = ComparisonConfig.from_yaml("comparison.yaml")
plotter = ComparisonPlotter(config)
paths = plotter.plot_all()

for p in paths:
    print(f"  {p}")

Plot Customization

Plot appearance is controlled via the plot_settings section of comparison.yaml. You can set global options (DPI, format, color palette) and per-analysis overrides (figure sizes, which plot types to generate).

plot_settings:
  output_dir: "figures/"
  format: "png"
  dpi: 300
  style: "publication"
  color_palette: "tab10"

  # Per-analysis overrides
  rmsf:
    show_error: true
    highlight_residues: [77, 133, 156]
    figsize_profile: [14, 4]
    figsize_comparison: [8, 6]

See the theme: block in comparison.yaml for fine-grained control over font sizes, bar styling, line widths, spine visibility, and legend placement.

Troubleshooting

“Config not found for ‘Condition’”

Cause: Path in config: is incorrect

Fix: Check that the path exists relative to comparison.yaml:

ls ../projects/my_condition/config.yaml

“Need at least 2 conditions”

Cause: Only one condition defined in comparison.yaml

Fix: Add at least one more condition to compare

High p-value Despite Large Effect

Cause: Small sample size (N=2-3)

This is expected. With few replicates, you can only detect very large effects. The large Cohen’s d suggests a real difference exists; run more replicates to achieve statistical significance.

Results Don’t Match Manual Calculation

Cause: Different equilibration time or selection string

Fix: Ensure --eq-time and --selection match what you used for individual RMSF calculations. Check defaults: in comparison.yaml.

Comparing Catalytic Triad Geometry

In addition to RMSF (global flexibility), you can compare catalytic triad integrity across conditions. This is useful for enzymes where active site geometry is crucial for catalytic function.

What is Simultaneous Contact Fraction?

The catalytic triad comparison analyzes the simultaneous contact fraction – the percentage of simulation frames where ALL distance pairs in your triad are below the contact threshold at the same time. Higher values indicate better triad integrity.

For example, a Ser-His-Asp catalytic triad:

  • 95% simultaneous contact = triad is intact most of the time

  • 50% simultaneous contact = triad frequently disrupted

  • 10% simultaneous contact = triad rarely intact

Adding Catalytic Triad to comparison.yaml

Add a catalytic_triad section to your analysis_settings:

name: "polymer_stability_study"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_LipA_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

# Define your enzyme's catalytic triad in analysis_settings
analysis_settings:
  catalytic_triad:
    name: "LipA_Ser-His-Asp"
    description: "Lipase A catalytic triad"
    threshold: 3.5  # Angstroms (H-bond cutoff)
    pairs:
      - label: "Asp133-His156"
        selection_a: "midpoint(protein and resid 133 and name OD1 OD2)"
        selection_b: "protein and resid 156 and name ND1"
      - label: "His156-Ser77"
        selection_a: "protein and resid 156 and name NE2"
        selection_b: "protein and resid 77 and name OG"

# Must have corresponding entry in comparison_settings
comparison_settings:
  catalytic_triad: {}

Running Triad Comparison

# From your comparison project directory
polyzymd compare triad

# With options
polyzymd compare triad --eq-time 10ns --format markdown
polyzymd compare triad -o triad_report.md
from polyzymd.compare import ComparisonConfig, TriadComparator

# Load configuration (must have analysis_settings.catalytic_triad section)
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get triad settings from analysis_settings
triad_settings = config.analysis_settings.get("catalytic_triad")

# Run triad comparison
comparator = TriadComparator(
    config=config,
    triad_settings=triad_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Print results
print(f"Best triad integrity: {result.ranking[0]}")
for cond in result.conditions:
    contact_pct = cond.mean_simultaneous_contact * 100
    sem_pct = cond.sem_simultaneous_contact * 100
    print(f"{cond.label}: {contact_pct:.1f}% ± {sem_pct:.1f}%")

# Save to JSON
result.save("results/triad_comparison.json")

Example Output

Catalytic Triad Comparison: polymer_stability_study
======================================================================
Triad: LipA_Ser-His-Asp
Description: Lipase A catalytic triad
Pairs: Asp133-His156, His156-Ser77
Contact threshold: 3.5 A
Equilibration: 10ns
Control: No Polymer

Condition Summary (ranked by simultaneous contact, highest first)
----------------------------------------------------------------------
Rank  Condition            Contact %    SEM        N
----------------------------------------------------------------------
1     100% SBMA               87.3%      2.15%  3
2     No Polymer              72.1%      3.42%  3   *
3     50/50 Mix               68.5%      4.21%  3
----------------------------------------------------------------------
* = control condition

Pairwise Comparisons
-------------------------------------------------------------------------------------
Comparison                     % Change   p-value      Cohen's d  Effect
-------------------------------------------------------------------------------------
100% SBMA vs No Polymer        +21.1%     0.0156*      2.89       large
50/50 Mix vs No Polymer        -5.0%      0.5234       -0.52      medium
-------------------------------------------------------------------------------------
* p < 0.05
Positive % change = improved triad contact

Interpretation
----------------------------------------------------------------------
Best triad integrity: 100% SBMA (87.3% simultaneous contact)
  -> 21.1% higher than control (No Polymer)
  -> Statistically significant (p=0.0156, d=2.89 [large])

Per-Pair Distance Table

The output also includes a per-pair distance summary showing how each individual H-bond distance compares across conditions:

Per-Pair Distances (Mean ± SEM across replicates)
------------------------------------------------------------------------------------------
Condition            Asp133-His156    His156-Ser77
------------------------------------------------------------------------------------------
100% SBMA            2.81±0.08        2.74±0.05
No Polymer           3.12±0.11        2.89±0.09
50/50 Mix            3.28±0.15        2.95±0.12
------------------------------------------------------------------------------------------

Interpreting Results

Contact %

Interpretation

> 90%

Excellent triad integrity

70-90%

Good triad integrity

50-70%

Moderate disruption

< 50%

Significant triad disruption

Key metrics:

  • % Change: Positive = more triad contact = better

  • p-value: < 0.05 indicates statistically significant difference

  • Cohen’s d: Effect size magnitude

CLI Reference for Triad

polyzymd compare triad [OPTIONS]

Options:
  -f, --file PATH                 Config file [default: comparison.yaml]
  --eq-time TEXT                  Override equilibration time
  --recompute                     Force recompute triad analysis
  --format [table|markdown|json]  Output format [default: table]
  -o, --output PATH               Save formatted output to file
  -q, --quiet                     Suppress INFO messages
  --debug                         Enable DEBUG logging

Python API for Triad Comparison

from polyzymd.compare import ComparisonConfig, TriadComparator, format_triad_result

# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get triad settings from analysis_settings
triad_settings = config.analysis_settings.get("catalytic_triad")

# Run triad comparison
comparator = TriadComparator(
    config=config,
    triad_settings=triad_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access results
print(f"Best triad integrity: {result.ranking[0]}")
for cond in result.conditions:
    contact_pct = cond.mean_simultaneous_contact * 100
    print(f"{cond.label}: {contact_pct:.1f}% ± {cond.sem_simultaneous_contact*100:.1f}%")

# Format output
print(format_triad_result(result, format="markdown"))

# Save result
result.save("results/triad_comparison.json")

Loading Saved Triad Results

from polyzymd.compare import TriadComparisonResult

# Load from JSON
result = TriadComparisonResult.load("results/triad_comparison_my_study.json")

# Access condition data
control = result.get_condition("No Polymer")
print(f"Control contact: {control.mean_simultaneous_contact * 100:.1f}%")

# Get pairwise comparison
comp = result.get_comparison("100% SBMA")
if comp and comp.significant:
    print(f"SBMA significantly improves triad contact (p={comp.p_value:.4f})")

Comparing Polymer-Protein Contacts

Compare polymer-protein contact statistics across conditions to understand how different polymer compositions affect protein-polymer interactions.

Note

New to contacts analysis? Start with the Contacts Quick Start to run individual analyses, then return here to compare conditions.

Key Metrics

The contacts comparison analyzes two aggregate metrics:

Metric

Description

Higher means…

Coverage

% of protein residues contacted by polymer

More extensive binding

Mean Contact Fraction

Average % of frames each residue is in contact

Stronger/more persistent binding

Additionally, residence times by polymer type show how long each polymer type (e.g., SBMA vs EGMA) maintains contacts, revealing selectivity differences.

Running Contacts Comparison

# Basic comparison
polyzymd compare contacts

# With custom equilibration time
polyzymd compare contacts --eq-time 10ns

# Override polymer selection (only SBMA monomers)
polyzymd compare contacts --polymer-selection "resname SBM"

# Different output formats
polyzymd compare contacts --format markdown -o contacts_report.md
polyzymd compare contacts --format json
from polyzymd.compare import ComparisonConfig, ContactsComparator

# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get contacts settings from analysis_settings and comparison_settings
analysis_settings = config.analysis_settings.get("contacts")
comparison_settings = config.comparison_settings.get("contacts")

# Run comparison
comparator = ContactsComparator(
    config=config,
    analysis_settings=analysis_settings,
    comparison_settings=comparison_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access results
print(f"Highest coverage: {result.ranking_by_coverage[0]}")
print(f"Highest contact: {result.ranking_by_contact_fraction[0]}")

for cond in result.conditions:
    print(f"{cond.label}: {cond.coverage_mean*100:.1f}% coverage, "
          f"{cond.contact_fraction_mean*100:.1f}% contact")

# Save result
result.save("results/contacts_comparison.json")

Optional: Contacts Configuration in comparison.yaml

Add a contacts section to analysis_settings and comparison_settings:

name: "polymer_stability_study"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_LipA_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

# Configure contacts analysis in analysis_settings
analysis_settings:
  contacts:
    polymer_selection: "resname SBM EGM"  # MDAnalysis selection
    protein_selection: "protein"
    cutoff: 4.5                            # Angstroms
    contact_criteria: "heavy_atom"

# Comparison-specific parameters in comparison_settings
comparison_settings:
  contacts:
    fdr_alpha: 0.05                        # Benjamini-Hochberg FDR
    min_effect_size: 0.5                   # Cohen's d threshold
    top_residues: 10                       # Top residues to show

If no contacts section is provided, defaults are used.

Handling Conditions Without Polymer

Conditions without polymer atoms (e.g., “No Polymer” controls) are automatically excluded from contacts analysis since there’s nothing to measure. The command will warn you:

Note: 1 condition(s) auto-excluded (no polymer atoms): No Polymer

This is expected behavior. Comparisons will be made between polymer-containing conditions only.

Example Output

Polymer-Protein Contacts Comparison: polymer_ratio_study
================================================================================
Analysis: polymer_protein_contacts
Polymer selection: resname SBM EGM
Contact cutoff: 4.5 A
Contact criteria: heavy_atom
Equilibration: 10ns
Auto-excluded (no polymer): No Polymer

Condition Summary - Coverage (ranked, highest first)
--------------------------------------------------------------------------------
Rank  Condition                 Coverage     SEM        N   
--------------------------------------------------------------------------------
1     100% EGMA                     88.4%       0.55%  3    
2     25% SBMA / 75% EGMA           86.9%       1.44%  3    
3     50% SBMA / 50% EGMA           82.7%       0.66%  3    
4     75% SBMA / 25% EGMA           82.7%       1.57%  3    
5     100% SBMA                     74.9%       0.28%  2    
--------------------------------------------------------------------------------

Condition Summary - Mean Contact Fraction (ranked, highest first)
--------------------------------------------------------------------------------
Rank  Condition                 Contact %    SEM        N   
--------------------------------------------------------------------------------
1     75% SBMA / 25% EGMA           30.2%       0.50%  3    
2     25% SBMA / 75% EGMA           29.1%       5.35%  3    
3     100% EGMA                     25.3%       2.64%  3    
4     100% SBMA                     22.9%       1.47%  2    
5     50% SBMA / 50% EGMA           22.9%       2.18%  3    
--------------------------------------------------------------------------------

Residence Time by Polymer Type (frames)
--------------------------------------------------------------------------------
Condition                          EGM          SBM
--------------------------------------------------------------------------------
100% SBMA                           --  10.0±0.2 
75% SBMA / 25% EGMA         7.8±0.2    9.3±0.0 
50% SBMA / 50% EGMA         7.3±0.3    8.6±0.5 
25% SBMA / 75% EGMA         7.1±0.7   10.5±0.4 
100% EGMA                   7.1±0.4            --
--------------------------------------------------------------------------------

Aggregate Comparisons
-----------------------------------------------------------------------------------------------
Comparison                     Metric          % Change   p-value      Cohen d    Effect      
-----------------------------------------------------------------------------------------------
100% EGMA vs 100% SBMA         coverage        +18.1%     0.0004*      -16.64     large       
100% EGMA vs 100% SBMA         mean contact f  +10.7%     0.5445       -0.62      medium      
...
-----------------------------------------------------------------------------------------------
* p < 0.05; positive % change = more contact in treatment

One-way ANOVA
------------------------------------------------------------
Metric                    F-stat       p-value      Significant 
------------------------------------------------------------
coverage                  18.323       0.0002       Yes*        
mean contact fraction     1.200        0.3748       No          
------------------------------------------------------------

Interpreting Results

Coverage rankings:

  • Higher coverage = polymer interacts with more of the protein surface

  • 100% EGMA shows highest coverage (88.4%) - broader but possibly weaker binding

Contact fraction rankings:

  • Higher mean contact = more persistent interactions per residue

  • 75% SBMA / 25% EGMA shows highest contact fraction (30.2%) - more stable binding

Residence time by polymer type:

  • SBMA (SBM) tends to have longer residence times than EGMA (EGM)

  • This suggests SBMA forms more persistent interactions

  • Useful for understanding polymer selectivity

Statistical tests:

  • ANOVA tests whether any condition differs overall

  • Pairwise comparisons with Benjamini-Hochberg FDR correction

  • Cohen’s d quantifies effect magnitude independent of sample size

CLI Reference for Contacts

polyzymd compare contacts [OPTIONS]

Options:
  -f, --file PATH                 Config file [default: comparison.yaml]
  --eq-time TEXT                  Override equilibration time
  --polymer-selection TEXT        Override polymer selection (MDAnalysis syntax)
  --cutoff FLOAT                  Override contact cutoff (Angstroms)
  --fdr-alpha FLOAT               FDR alpha for multiple testing correction
  --recompute                     Force recompute contacts analysis
  --format [table|markdown|json]  Output format [default: table]
  -o, --output PATH               Save formatted output to file
  -q, --quiet                     Suppress INFO messages
  --debug                         Enable DEBUG logging

Python API for Contacts Comparison

from polyzymd.compare import (
    ComparisonConfig,
    ContactsComparator,
    format_contacts_result,
)

# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get contacts settings from analysis_settings and comparison_settings
analysis_settings = config.analysis_settings.get("contacts")
comparison_settings = config.comparison_settings.get("contacts")

# Run comparison
comparator = ContactsComparator(
    config=config,
    analysis_settings=analysis_settings,
    comparison_settings=comparison_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access results
print(f"Highest coverage: {result.ranking_by_coverage[0]}")
print(f"Highest contact: {result.ranking_by_contact_fraction[0]}")

for cond in result.conditions:
    print(f"{cond.label}: {cond.coverage_mean*100:.1f}% coverage, "
          f"{cond.contact_fraction_mean*100:.1f}% contact")
    
    # Residence time by polymer type
    for poly_type, (mean, sem) in cond.residence_time_by_polymer_type.items():
        print(f"  {poly_type}: {mean:.1f} ± {sem:.1f} frames")

# Format output
print(format_contacts_result(result, format="markdown"))

# Save result
result.save("results/contacts_comparison.json")

Loading Saved Contacts Results

from polyzymd.compare import ContactsComparisonResult

# Load from JSON
result = ContactsComparisonResult.load("results/contacts_comparison_my_study.json")

# Access condition data
for cond in result.conditions:
    print(f"{cond.label}: coverage={cond.coverage_mean*100:.1f}%")

# Get aggregate comparisons
for comp in result.aggregate_comparisons:
    if comp.significant:
        print(f"{comp.condition_a} vs {comp.condition_b} ({comp.metric}): "
              f"p={comp.p_value:.4f}")

Contacts vs RMSF: Complementary Analyses

Analysis

Question Answered

RMSF

Does polymer stabilize the enzyme (reduce flexibility)?

Contacts

Where and how strongly does polymer bind?

Combined

Do contact hotspots correlate with stabilization?

For mechanistic insights correlating contacts with flexibility changes, see polyzymd compare report (coming soon).

Comparing Distances Across Conditions

Compare inter-atomic distances across conditions with statistical analysis. This is useful for tracking specific interactions (e.g., substrate proximity, hydrogen bond distances) that may change with different polymer environments.

Note

New to distance analysis? Start with the Distance Analysis Quick Start to understand distance pair definitions and selection syntax, then return here to compare conditions.

Key Metrics

The distances comparison provides dual-metric ranking:

Metric

Description

Ranking

Mean Distance

Average distance across trajectory (primary)

Lowest first (closer = better)

Fraction Below Threshold

% of frames below contact threshold (secondary)

Highest first (more contact = better)

This dual approach captures both the typical distance AND the frequency of close contacts.

Running Distances Comparison

# Basic comparison
polyzymd compare run distances -f comparison.yaml

# With custom equilibration time
polyzymd compare run distances -f comparison.yaml --eq-time 10ns

# Different output formats
polyzymd compare run distances -f comparison.yaml --format markdown
polyzymd compare run distances -f comparison.yaml --format json -o distances_report.json
from polyzymd.compare import ComparisonConfig
from polyzymd.compare.comparators import DistancesComparator

# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get distances settings from analysis_settings
distances_settings = config.analysis_settings.get("distances")

# Run comparison
comparator = DistancesComparator(
    config=config,
    analysis_settings=distances_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access results
print(f"Closest mean distance: {result.ranking[0]}")
if result.ranking_by_fraction:
    print(f"Highest contact fraction: {result.ranking_by_fraction[0]}")

for cond in result.conditions:
    print(f"{cond.label}: {cond.overall_mean_distance:.2f} A")
    if cond.overall_fraction_below is not None:
        print(f"  Contact fraction: {cond.overall_fraction_below*100:.1f}%")

# Save result
result.save("results/distances_comparison.json")

Adding Distances to comparison.yaml

Add a distances section to your analysis_settings:

name: "substrate_proximity_study"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_LipA_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

# Define distance pairs in analysis_settings
analysis_settings:
  distances:
    threshold: 3.5  # Global default threshold (Angstroms, optional)
    pairs:
      - label: "Catalytic H-bond"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 133 and name NE2"
        threshold: 3.5  # Per-pair threshold (overrides global)
      - label: "Lid Domain Opening"
        selection_a: "com(protein and resid 141-148)"
        selection_b: "com(protein and resid 281-289)"
        threshold: 15.0  # Different threshold for this pair

# Must have corresponding entry in comparison_settings
comparison_settings:
  distances: {}

Per-Pair Thresholds

Different distance pairs often have different biologically relevant thresholds:

Type of Distance

Typical Threshold

Hydrogen bond

3.0 - 3.5 A

Salt bridge

4.0 - 4.5 A

Aromatic stacking

4.0 - 5.0 A

Domain separation

10 - 20 A

Lid opening

15 - 25 A

Threshold resolution order:

  1. Per-pair threshold in the pair definition (highest priority)

  2. Global threshold in the distances section (fallback)

  3. No threshold (fraction below not computed)

Example with mixed thresholds:

analysis_settings:
  distances:
    threshold: 4.0  # Default for pairs without explicit threshold
    pairs:
      - label: "Ser77-His133 H-bond"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 133 and name NE2"
        threshold: 3.5  # H-bond cutoff

      - label: "Asp156-His133 H-bond"
        selection_a: "midpoint(protein and resid 156 and name OD1 OD2)"
        selection_b: "protein and resid 133 and name ND1"
        # Uses global threshold: 4.0 A

      - label: "Lid-to-Core Distance"
        selection_a: "com(protein and resid 141-148)"
        selection_b: "com(protein and resid 281-289)"
        threshold: 15.0  # Large-scale motion threshold

Threshold cache invalidation: When you change a threshold value, PolyzyMD automatically detects the mismatch and recomputes contact fractions from the stored per-replicate distance data. This avoids expensive trajectory reprocessing - only the statistical aggregation is recalculated.

Example Output

Distance Comparison: substrate_proximity_study
================================================================================
Pairs analyzed: 2
Pair labels: Ser77-Substrate, His156-Substrate
Contact threshold: 3.5 A
Equilibration: 10ns
Control: No Polymer

Condition Summary (ranked by mean distance, lowest first)
--------------------------------------------------------------------------------
Rank  Condition                 Mean Dist    SEM        % Below    N   
--------------------------------------------------------------------------------
1     100% SBMA                 7.46 A       0.421         0.4%   3    
2     100% EGMA                 7.66 A       0.270         1.6%   3    
3     No Polymer                8.02 A       0.315         0.0%   3   *
--------------------------------------------------------------------------------
* = control condition

Secondary Ranking (by % below threshold, highest first)
------------------------------------------------------------
1     100% EGMA                 1.6% (SEM: 0.31%)
2     100% SBMA                 0.4% (SEM: 0.31%)
3     No Polymer                0.0% (SEM: 0.00%)

Per-Pair Distances (Mean +/- SEM across replicates)
------------------------------------------------------------------------------------------
Condition                 Ser77-Substrate His156-Substrate
------------------------------------------------------------------------------------------
100% SBMA                 7.46+/-0.42     6.12+/-0.25    
100% EGMA                 7.66+/-0.27     6.45+/-0.18    
No Polymer                8.02+/-0.31     6.89+/-0.22    
------------------------------------------------------------------------------------------

Pairwise Comparisons (Distance Metric)
------------------------------------------------------------------------------------------
Comparison                     % Change   p-value      Cohen d    Effect       Direction 
------------------------------------------------------------------------------------------
100% SBMA vs No Polymer        -7.0%      0.0451*      0.87       large        closer    
100% EGMA vs No Polymer        -4.4%      0.1234       0.70       medium       closer    
------------------------------------------------------------------------------------------
* p < 0.05
Negative % change = lower distance (closer)

Pairwise Comparisons (Fraction Below Threshold)
------------------------------------------------------------------------------------------
Comparison                     % Change   p-value      Cohen d    Effect       Direction   
------------------------------------------------------------------------------------------
100% SBMA vs No Polymer        +0.4%      0.2161       -1.20      large        more_contact
100% EGMA vs No Polymer        +1.6%      0.0065*      -4.25      large        more_contact
------------------------------------------------------------------------------------------
* p < 0.05
Positive % change = more frames below threshold (more contact)

One-way ANOVA
--------------------------------------------------
Distance metric:
  F-statistic: 3.901
  p-value:     0.0512
  Significant: No (alpha=0.05)
Fraction metric:
  F-statistic: 5.880
  p-value:     0.0171
  Significant: Yes (alpha=0.05)

Interpretation
--------------------------------------------------------------------------------
Closest mean distance: 100% SBMA (7.46 A)
  -> 7.0% closer than control (No Polymer)
  -> Statistically significant (p=0.0451, d=0.87 [large])

Highest contact fraction: 100% EGMA (1.6% below threshold)

Analysis completed: 2026-02-16 21:03:40
PolyzyMD version: 1.0.0

Interpreting Results

Primary metric (Mean Distance):

  • Lower distance = closer = atoms more frequently in proximity

  • Ranking from lowest to highest (Rank 1 = closest)

  • Negative % change vs control = improvement (closer)

Secondary metric (Fraction Below Threshold):

  • Only computed if threshold is specified in config

  • Higher fraction = more frames with close contact

  • Ranking from highest to lowest (Rank 1 = most contact)

  • Positive % change vs control = improvement (more contact)

Why dual metrics?

  • Mean distance captures typical behavior

  • Fraction below threshold captures extreme events (e.g., catalytic encounters)

  • A condition might have similar mean distance but more frequent close approaches

Statistical Analysis

Both metrics undergo independent statistical testing:

Test

Applied To

Interpretation

t-test

Each condition vs control

p < 0.05 = significant difference

Cohen’s d

Each comparison

Effect magnitude (regardless of p-value)

ANOVA

All conditions

Any condition differs? (3+ conditions)

Effect size interpretation:

Cohen’s d

Interpretation

< 0.2

Negligible

0.2 - 0.5

Small

0.5 - 0.8

Medium

> 0.8

Large

CLI Reference for Distances

polyzymd compare run distances [OPTIONS]

Options:
  -f, --file PATH                 Config file [default: comparison.yaml]
  --eq-time TEXT                  Override equilibration time
  --recompute                     Force recompute distance analysis
  --format [table|markdown|json]  Output format [default: table]
  -o, --output PATH               Save formatted output to file
  -q, --quiet                     Suppress INFO messages
  --debug                         Enable DEBUG logging

Python API for Distances Comparison

from polyzymd.compare import ComparisonConfig
from polyzymd.compare.comparators import DistancesComparator
from polyzymd.compare.distances_formatters import format_distances_result

# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get distances settings
distances_settings = config.analysis_settings.get("distances")

# Run comparison
comparator = DistancesComparator(
    config=config,
    analysis_settings=distances_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access primary ranking (by mean distance)
print(f"Closest: {result.ranking[0]}")
for cond in result.conditions:
    print(f"{cond.label}: {cond.overall_mean_distance:.2f} ± {cond.overall_sem_distance:.3f} A")

# Access secondary ranking (by fraction below threshold)
if result.ranking_by_fraction:
    print(f"\nHighest contact: {result.ranking_by_fraction[0]}")
    for cond in result.conditions:
        if cond.overall_fraction_below is not None:
            print(f"{cond.label}: {cond.overall_fraction_below*100:.1f}%")

# Access per-pair details
for cond in result.conditions:
    print(f"\n{cond.label}:")
    for pair in cond.pair_summaries:
        print(f"  {pair.label}: {pair.mean_distance:.2f} ± {pair.sem_distance:.2f} A")

# Format output
print(format_distances_result(result, format="markdown"))

# Save result
result.save("results/distances_comparison.json")

Loading Saved Distance Results

from polyzymd.compare.results import DistanceComparisonResult

# Load from JSON
result = DistanceComparisonResult.load("results/distances_comparison_my_study.json")

# Access condition data
control = result.get_condition("No Polymer")
print(f"Control mean distance: {control.overall_mean_distance:.2f} A")

# Get pairwise comparison
comp = result.get_comparison("100% SBMA")
if comp and comp.distance_significant:
    print(f"SBMA significantly closer (p={comp.distance_p_value:.4f})")

Use Cases for Distance Comparison

Use Case

Configuration

Substrate binding

Distance from catalytic residues to substrate atoms

Active site geometry

Similar to triad, but for non-catalytic interactions

Polymer-residue proximity

Distance from polymer termini to specific residues

Conformational changes

Distance between domains or loops

Distances vs Catalytic Triad Comparison

Feature

compare run distances

compare run triad

Metric

Mean distance + fraction

Simultaneous contact fraction

Pairs

Any atom pairs

Pre-defined triad geometry

Ranking

Dual (distance + fraction)

Single (simultaneous contact)

Use case

General distance tracking

Catalytic geometry integrity

Tip

Use distances for monitoring specific interactions with dual-metric analysis. Use triad when all pairs must be in contact simultaneously (catalytic triad geometry).

See Also