How to Compare Simulation Conditions
Use this guide when you already have completed PolyzyMD simulations and want to
compare conditions with the polyzymd compare workflow.
You will:
define a
comparison.yamlvalidate it
run stable comparisons
generate figures with
polyzymd compare plot-all
Important
For the v1.2.0 presentation release, the stable comparison stack is RMSF,
contacts, distances, catalytic triad, and secondary structure. Binding
preference, exposure dynamics, binding free energy, and polymer affinity remain
available, but PolyzyMD labels them as experimental.
Note
If you have not yet run per-condition analyses, start with Tutorial: Analyze a Study from Finished Simulations.
Before You Start
Make sure each condition already has:
a
config.yamlfinished trajectories
per-condition analysis results from
polyzymd analyze run
For metric-specific setup, use the individual analysis guides before comparing conditions.
Step 1: Create a Comparison Workspace
polyzymd compare init -n polymer_stability_study
cd polymer_stability_study
This creates:
polymer_stability_study/
├── comparison.yaml
├── figures/
├── results/
└── structures/
Step 2: Define a Minimal comparison.yaml
Start with one stable analysis. RMSF is a good first comparison because it has no extra structural dependencies.
name: "polymer_stability_study"
description: "Effect of polymer composition on enzyme flexibility"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../noPoly_enzyme_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "100% SBMA"
config: "../SBMA_100_enzyme_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "100% EGMA"
config: "../EGMA_100_enzyme_DMSO/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
analysis_settings:
rmsf:
selection: "protein and name CA"
comparison_settings:
rmsf: {}
You can add other stable analyses later by extending analysis_settings and
comparison_settings.
Step 3: Validate the Config
polyzymd compare validate
You should see a passing summary with the study name, condition count, and the configured analysis sections.
Step 4: Run Comparisons
Run One Stable Comparison
polyzymd compare run rmsf
This loads the existing per-condition results, performs the statistical
comparison, prints a formatted summary, and writes JSON to results/.
Run All Enabled Comparisons
Once you have multiple stable analyses configured, use:
polyzymd compare run-all
If you want figures immediately afterward:
polyzymd compare run-all --plot
Step 5: Generate Figures
For a final smoke test of the comparison workspace:
polyzymd compare plot-all --list-available
polyzymd compare plot-all
--list-available is useful because it shows which plot types are enabled and
which ones are marked as experimental.
Step 6: Check the Outputs
After a successful run, expect files like these:
results/
├── rmsf_comparison_polymer_stability_study.json
├── contacts_comparison_polymer_stability_study.json
├── distances_comparison_polymer_stability_study.json
└── triad_comparison_polymer_stability_study.json
figures/
├── rmsf_comparison.png
├── rmsf_profile.png
├── triad_kde_panel.png
└── ...
If your smoke test is polyzymd compare plot-all, success means:
the command completes without error
stable plots render normally
experimental plots, if enabled, render with explicit experimental labeling
Adding More Stable Analyses
Common next additions to comparison.yaml are:
contactsfor polymer coverage and contact fractiondistancesfor custom atom-pair distancescatalytic_triadfor active-site geometry
For end-to-end examples of these workflows, see:
Experimental Workflows
Experimental workflows remain available, but they are not the default path for the presentation release. Use them only when you explicitly want those metrics:
Troubleshooting
config path not found
Paths in comparison.yaml are resolved relative to the location of
comparison.yaml, not your current shell directory.
No analyses are enabled
You need at least one section under analysis_settings and a matching entry
under comparison_settings.
plot-all runs but expected figures are missing
Check that the corresponding comparison JSON files already exist in results/
and use polyzymd compare plot-all --list-available to verify the enabled plot
types.
See Also
RMSF Comparison: polymer_stability_study
============================================================
Equilibration: 10ns
Selection: protein and name CA
Control: No Polymer
Condition Summary (ranked by RMSF, lowest first)
------------------------------------------------------------
Rank Condition Mean RMSF SEM N
------------------------------------------------------------
1 100% SBMA 0.551 A 0.0344 2
2 100% EGMA 0.597 A 0.0725 3
3 No Polymer 0.715 A 0.0203 3 *
4 50/50 Mix 0.728 A 0.0336 3
------------------------------------------------------------
* = control condition
Pairwise Comparisons
--------------------------------------------------------------------------------
Comparison % Change p-value Cohen's d Effect
--------------------------------------------------------------------------------
100% SBMA vs No Polymer -22.9% 0.0211* 4.06 large
100% EGMA vs No Polymer -16.4% 0.1944 1.27 large
50/50 Mix vs No Polymer +1.9% 0.7445 -0.29 small
--------------------------------------------------------------------------------
* p < 0.05
Output Fields Explained
Field |
Meaning |
|---|---|
Mean RMSF |
Average RMSF across all replicates (Angstroms) |
SEM |
Standard error of the mean |
N |
Number of replicates |
% Change |
Relative to control (negative = more stable) |
p-value |
Two-sample t-test p-value |
Cohen’s d |
Effect size (positive = control higher) |
Effect |
Effect size interpretation |
Ranking
Conditions are ranked by mean RMSF:
Rank 1 = Lowest RMSF = Most stable
Lower RMSF indicates less flexibility/movement
Statistical Analysis
T-Tests
Each condition is compared to the control (or all pairs if no control) using an independent two-sample t-test:
p < 0.05: Statistically significant difference (marked with *)
p > 0.05: Not statistically significant
Tip
With small sample sizes (N=3), you can only detect large effects. Non-significant p-values don’t mean “no effect” – just insufficient evidence. See Best Practices Guide for interpretation guidelines.
Effect Size (Cohen’s d)
Cohen’s d quantifies the magnitude of the difference:
Cohen’s d |
Interpretation |
|---|---|
< 0.2 |
Negligible |
0.2 - 0.5 |
Small |
0.5 - 0.8 |
Medium |
> 0.8 |
Large |
Why report effect size? A large effect (d > 0.8) suggests a meaningful difference even if p > 0.05 due to small sample size.
ANOVA
For 3+ conditions, one-way ANOVA tests whether any condition differs:
One-way ANOVA
----------------------------------------
F-statistic: 3.151
p-value: 0.0955
Significant: No (alpha=0.05)
ANOVA p < 0.05 indicates at least one condition differs from the others. Use pairwise comparisons to identify which ones.
Output Formats
Table (Default)
Console-friendly ASCII table:
polyzymd compare rmsf --format table
Markdown
Publication-ready tables for documentation:
polyzymd compare rmsf --format markdown -o report.md
JSON
Machine-readable for further analysis:
polyzymd compare rmsf --format json -o results.json
Working Example
Comparing Polymer Stabilization
Setup for a study comparing enzyme stability with different polymer coatings:
comparison.yaml:
name: "SBMA_EGMA_stabilization"
description: "Does SBMA stabilize LipA better than EGMA?"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../noPoly_LipA_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "100% SBMA"
config: "../SBMA_100_DMSO/config.yaml"
replicates: [1, 2]
- label: "100% EGMA"
config: "../EGMA_100_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "50/50 SBMA:EGMA"
config: "../SBMA_EGMA_50_50_DMSO/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
analysis_settings:
rmsf:
selection: "protein and name CA"
comparison_settings:
rmsf: {}
Run:
polyzymd compare rmsf
from polyzymd.compare import ComparisonConfig, RMSFComparator
# Load comparison configuration
config = ComparisonConfig.from_yaml("comparison.yaml")
# Get RMSF settings from analysis_settings
rmsf_settings = config.analysis_settings.get("rmsf")
# Run RMSF comparison
comparator = RMSFComparator(
config=config,
rmsf_settings=rmsf_settings,
equilibration="10ns",
)
result = comparator.compare()
# Print summary table
print(f"RMSF Comparison: {result.name}")
print(f"Control: {result.control_label}")
print()
# Ranked conditions
print("Conditions (ranked by RMSF, lowest first):")
for i, label in enumerate(result.ranking, 1):
cond = result.get_condition(label)
marker = " *" if label == result.control_label else ""
print(f" {i}. {label}: {cond.mean_rmsf:.3f} ± {cond.sem_rmsf:.3f} Å{marker}")
print()
# Pairwise comparisons
print("Pairwise comparisons vs control:")
for comp in result.pairwise_comparisons:
sig = "*" if comp.significant else ""
print(f" {comp.condition_b} vs {comp.condition_a}: "
f"{comp.percent_change:+.1f}%, p={comp.p_value:.3f}{sig}, d={comp.cohens_d:.2f}")
# Save results
result.save("results/rmsf_comparison.json")
Key findings from output:
Condition |
Mean RMSF |
vs Control |
Significant? |
|---|---|---|---|
100% SBMA |
0.551 A |
-22.9% |
Yes (p=0.021) |
100% EGMA |
0.597 A |
-16.4% |
No (p=0.194) |
50/50 Mix |
0.728 A |
+1.9% |
No (p=0.745) |
Interpretation:
100% SBMA significantly stabilizes the enzyme (22.9% lower RMSF, p < 0.05)
100% EGMA shows a large effect (d=1.27) but isn’t statistically significant
50/50 Mix shows no benefit over the control
Saved Results
Results are automatically saved to results/:
my_study/
├── comparison.yaml
├── results/
│ └── rmsf_comparison_my_study.json # Full results
└── figures/
Result JSON Structure
{
"metric": "rmsf",
"name": "my_study",
"control_label": "No Polymer",
"conditions": [
{
"label": "No Polymer",
"mean_rmsf": 0.715,
"sem_rmsf": 0.020,
"n_replicates": 3,
"replicate_values": [0.755, 0.693, 0.696]
},
...
],
"pairwise_comparisons": [
{
"condition_a": "No Polymer",
"condition_b": "100% SBMA",
"percent_change": -22.9,
"p_value": 0.0211,
"cohens_d": 4.06,
"significant": true
},
...
],
"ranking": ["100% SBMA", "100% EGMA", "No Polymer", "50/50 Mix"],
...
}
Viewing Saved Results
# Display a saved result
polyzymd compare show results/rmsf_comparison_my_study.json
# Different format
polyzymd compare show results/rmsf_comparison_my_study.json --format markdown
CLI Reference
polyzymd compare init
Create a new comparison project:
polyzymd compare init NAME [OPTIONS]
Arguments:
NAME Project name (creates directory)
Options:
--eq-time TEXT Default equilibration time [default: 10ns]
-o, --output-dir PATH Parent directory [default: current]
polyzymd compare rmsf
Run RMSF comparison:
polyzymd compare rmsf [OPTIONS]
Options:
-f, --file PATH Config file [default: comparison.yaml]
--eq-time TEXT Override equilibration time
--selection TEXT Override atom selection
--recompute Force recompute RMSF
--format [table|markdown|json] Output format [default: table]
-o, --output PATH Save formatted output to file
-q, --quiet Suppress INFO messages
--debug Enable DEBUG logging
polyzymd compare validate
Check configuration:
polyzymd compare validate [OPTIONS]
Options:
-f, --file PATH Config file [default: comparison.yaml]
polyzymd compare show
Display saved results:
polyzymd compare show RESULT_FILE [OPTIONS]
Arguments:
RESULT_FILE Path to saved JSON
Options:
--format [table|markdown|json] Output format [default: table]
polyzymd compare plot
Generate publication-ready plots from saved results:
polyzymd compare plot RESULT_FILE [OPTIONS]
Arguments:
RESULT_FILE Path to saved comparison JSON
Options:
-o, --output-dir PATH Output directory [default: figures/]
--format [png|pdf|svg] Image format [default: png]
--dpi INTEGER Resolution for PNG [default: 150]
--summary / --no-summary Generate summary panel [default: yes]
--show / --no-show Display interactively [default: no]
Generating Plots
The polyzymd compare plot command creates publication-ready figures from
comparison results.
Quick Start
# Generate all plots
polyzymd compare plot results/rmsf_comparison_my_study.json
# High resolution for publication
polyzymd compare plot results/rmsf_comparison_my_study.json --dpi 300
# PDF format (vector graphics)
polyzymd compare plot results/rmsf_comparison_my_study.json --format pdf
# Preview interactively
polyzymd compare plot results/rmsf_comparison_my_study.json --show
from pathlib import Path
from polyzymd.compare.config import ComparisonConfig
from polyzymd.compare.plotter import ComparisonPlotter
# Load comparison config
config = ComparisonConfig.from_yaml("comparison.yaml")
# Generate all plots via the plotter registry
plotter = ComparisonPlotter(config)
paths = plotter.plot_all()
print(f"Generated {len(paths)} plots")
Generated Plots
The plotter registry automatically generates all applicable plots for each
analysis type configured in comparison.yaml. For RMSF analysis, this includes:
Plot |
Description |
|---|---|
|
Bar chart of mean RMSF by condition with SEM error bars |
|
Per-residue RMSF line plot with optional SS annotation |
Other analysis types (contacts, distances, secondary structure, etc.) generate
their own plot sets. See the plot_settings section in comparison.yaml for
per-analysis customization options.
Example Output
After running:
polyzymd compare plot-all
You get all configured plots saved to the output_dir specified in
plot_settings (default: figures/).
Python API
Programmatic Comparison
from polyzymd.compare import ComparisonConfig, RMSFComparator
# Load configuration (must have analysis_settings.rmsf section)
config = ComparisonConfig.from_yaml("comparison.yaml")
# Get RMSF settings
rmsf_settings = config.analysis_settings.get("rmsf")
# Run comparison
comparator = RMSFComparator(
config=config,
rmsf_settings=rmsf_settings,
equilibration="10ns",
)
result = comparator.compare()
# Access results
print(f"Most stable: {result.ranking[0]}")
for cond in result.conditions:
print(f"{cond.label}: {cond.mean_rmsf:.3f} +/- {cond.sem_rmsf:.3f} A")
# Statistical comparisons
for comp in result.pairwise_comparisons:
if comp.significant:
print(f"{comp.condition_b} vs {comp.condition_a}: "
f"p={comp.p_value:.4f}, d={comp.cohens_d:.2f}")
Formatting Results
from polyzymd.compare import format_markdown, format_console_table
# Get markdown output
md_text = format_markdown(result)
with open("report.md", "w") as f:
f.write(md_text)
# Console output
print(format_console_table(result))
Loading Saved Results
from polyzymd.compare import ComparisonResult
# Load from JSON
result = ComparisonResult.load("results/rmsf_comparison_my_study.json")
# Access data
control = result.get_condition("No Polymer")
print(f"Control RMSF: {control.mean_rmsf:.3f} A")
# Get specific comparison
comp = result.get_comparison("100% SBMA")
print(f"SBMA vs control: {comp.percent_change:+.1f}%, p={comp.p_value:.4f}")
Plotting with Python
from pathlib import Path
from polyzymd.compare.config import ComparisonConfig
from polyzymd.compare.plotter import ComparisonPlotter
# Load config and generate all plots
config = ComparisonConfig.from_yaml("comparison.yaml")
plotter = ComparisonPlotter(config)
paths = plotter.plot_all()
for p in paths:
print(f" {p}")
Plot Customization
Plot appearance is controlled via the plot_settings section of
comparison.yaml. You can set global options (DPI, format, color palette)
and per-analysis overrides (figure sizes, which plot types to generate).
plot_settings:
output_dir: "figures/"
format: "png"
dpi: 300
style: "publication"
color_palette: "tab10"
# Per-analysis overrides
rmsf:
show_error: true
highlight_residues: [77, 133, 156]
figsize_profile: [14, 4]
figsize_comparison: [8, 6]
See the theme: block in comparison.yaml for fine-grained control over
font sizes, bar styling, line widths, spine visibility, and legend placement.
Troubleshooting
“Config not found for ‘Condition’”
Cause: Path in config: is incorrect
Fix: Check that the path exists relative to comparison.yaml:
ls ../projects/my_condition/config.yaml
“Need at least 2 conditions”
Cause: Only one condition defined in comparison.yaml
Fix: Add at least one more condition to compare
High p-value Despite Large Effect
Cause: Small sample size (N=2-3)
This is expected. With few replicates, you can only detect very large effects. The large Cohen’s d suggests a real difference exists; run more replicates to achieve statistical significance.
Results Don’t Match Manual Calculation
Cause: Different equilibration time or selection string
Fix: Ensure --eq-time and --selection match what you used for
individual RMSF calculations. Check defaults: in comparison.yaml.
Comparing Catalytic Triad Geometry
In addition to RMSF (global flexibility), you can compare catalytic triad integrity across conditions. This is useful for enzymes where active site geometry is crucial for catalytic function.
What is Simultaneous Contact Fraction?
The catalytic triad comparison analyzes the simultaneous contact fraction – the percentage of simulation frames where ALL distance pairs in your triad are below the contact threshold at the same time. Higher values indicate better triad integrity.
For example, a Ser-His-Asp catalytic triad:
95% simultaneous contact = triad is intact most of the time
50% simultaneous contact = triad frequently disrupted
10% simultaneous contact = triad rarely intact
Adding Catalytic Triad to comparison.yaml
Add a catalytic_triad section to your analysis_settings:
name: "polymer_stability_study"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../noPoly_LipA_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "100% SBMA"
config: "../SBMA_100_DMSO/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
# Define your enzyme's catalytic triad in analysis_settings
analysis_settings:
catalytic_triad:
name: "LipA_Ser-His-Asp"
description: "Lipase A catalytic triad"
threshold: 3.5 # Angstroms (H-bond cutoff)
pairs:
- label: "Asp133-His156"
selection_a: "midpoint(protein and resid 133 and name OD1 OD2)"
selection_b: "protein and resid 156 and name ND1"
- label: "His156-Ser77"
selection_a: "protein and resid 156 and name NE2"
selection_b: "protein and resid 77 and name OG"
# Must have corresponding entry in comparison_settings
comparison_settings:
catalytic_triad: {}
Running Triad Comparison
# From your comparison project directory
polyzymd compare triad
# With options
polyzymd compare triad --eq-time 10ns --format markdown
polyzymd compare triad -o triad_report.md
from polyzymd.compare import ComparisonConfig, TriadComparator
# Load configuration (must have analysis_settings.catalytic_triad section)
config = ComparisonConfig.from_yaml("comparison.yaml")
# Get triad settings from analysis_settings
triad_settings = config.analysis_settings.get("catalytic_triad")
# Run triad comparison
comparator = TriadComparator(
config=config,
triad_settings=triad_settings,
equilibration="10ns",
)
result = comparator.compare()
# Print results
print(f"Best triad integrity: {result.ranking[0]}")
for cond in result.conditions:
contact_pct = cond.mean_simultaneous_contact * 100
sem_pct = cond.sem_simultaneous_contact * 100
print(f"{cond.label}: {contact_pct:.1f}% ± {sem_pct:.1f}%")
# Save to JSON
result.save("results/triad_comparison.json")
Example Output
Catalytic Triad Comparison: polymer_stability_study
======================================================================
Triad: LipA_Ser-His-Asp
Description: Lipase A catalytic triad
Pairs: Asp133-His156, His156-Ser77
Contact threshold: 3.5 A
Equilibration: 10ns
Control: No Polymer
Condition Summary (ranked by simultaneous contact, highest first)
----------------------------------------------------------------------
Rank Condition Contact % SEM N
----------------------------------------------------------------------
1 100% SBMA 87.3% 2.15% 3
2 No Polymer 72.1% 3.42% 3 *
3 50/50 Mix 68.5% 4.21% 3
----------------------------------------------------------------------
* = control condition
Pairwise Comparisons
-------------------------------------------------------------------------------------
Comparison % Change p-value Cohen's d Effect
-------------------------------------------------------------------------------------
100% SBMA vs No Polymer +21.1% 0.0156* 2.89 large
50/50 Mix vs No Polymer -5.0% 0.5234 -0.52 medium
-------------------------------------------------------------------------------------
* p < 0.05
Positive % change = improved triad contact
Interpretation
----------------------------------------------------------------------
Best triad integrity: 100% SBMA (87.3% simultaneous contact)
-> 21.1% higher than control (No Polymer)
-> Statistically significant (p=0.0156, d=2.89 [large])
Per-Pair Distance Table
The output also includes a per-pair distance summary showing how each individual H-bond distance compares across conditions:
Per-Pair Distances (Mean ± SEM across replicates)
------------------------------------------------------------------------------------------
Condition Asp133-His156 His156-Ser77
------------------------------------------------------------------------------------------
100% SBMA 2.81±0.08 2.74±0.05
No Polymer 3.12±0.11 2.89±0.09
50/50 Mix 3.28±0.15 2.95±0.12
------------------------------------------------------------------------------------------
Interpreting Results
Contact % |
Interpretation |
|---|---|
> 90% |
Excellent triad integrity |
70-90% |
Good triad integrity |
50-70% |
Moderate disruption |
< 50% |
Significant triad disruption |
Key metrics:
% Change: Positive = more triad contact = better
p-value: < 0.05 indicates statistically significant difference
Cohen’s d: Effect size magnitude
CLI Reference for Triad
polyzymd compare triad [OPTIONS]
Options:
-f, --file PATH Config file [default: comparison.yaml]
--eq-time TEXT Override equilibration time
--recompute Force recompute triad analysis
--format [table|markdown|json] Output format [default: table]
-o, --output PATH Save formatted output to file
-q, --quiet Suppress INFO messages
--debug Enable DEBUG logging
Python API for Triad Comparison
from polyzymd.compare import ComparisonConfig, TriadComparator, format_triad_result
# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")
# Get triad settings from analysis_settings
triad_settings = config.analysis_settings.get("catalytic_triad")
# Run triad comparison
comparator = TriadComparator(
config=config,
triad_settings=triad_settings,
equilibration="10ns",
)
result = comparator.compare()
# Access results
print(f"Best triad integrity: {result.ranking[0]}")
for cond in result.conditions:
contact_pct = cond.mean_simultaneous_contact * 100
print(f"{cond.label}: {contact_pct:.1f}% ± {cond.sem_simultaneous_contact*100:.1f}%")
# Format output
print(format_triad_result(result, format="markdown"))
# Save result
result.save("results/triad_comparison.json")
Loading Saved Triad Results
from polyzymd.compare import TriadComparisonResult
# Load from JSON
result = TriadComparisonResult.load("results/triad_comparison_my_study.json")
# Access condition data
control = result.get_condition("No Polymer")
print(f"Control contact: {control.mean_simultaneous_contact * 100:.1f}%")
# Get pairwise comparison
comp = result.get_comparison("100% SBMA")
if comp and comp.significant:
print(f"SBMA significantly improves triad contact (p={comp.p_value:.4f})")
Comparing Polymer-Protein Contacts
Compare polymer-protein contact statistics across conditions to understand how different polymer compositions affect protein-polymer interactions.
Note
New to contacts analysis? Start with the Contacts Quick Start to run individual analyses, then return here to compare conditions.
Key Metrics
The contacts comparison analyzes two aggregate metrics:
Metric |
Description |
Higher means… |
|---|---|---|
Coverage |
% of protein residues contacted by polymer |
More extensive binding |
Mean Contact Fraction |
Average % of frames each residue is in contact |
Stronger/more persistent binding |
Additionally, residence times by polymer type show how long each polymer type (e.g., SBMA vs EGMA) maintains contacts, revealing selectivity differences.
Running Contacts Comparison
# Basic comparison
polyzymd compare contacts
# With custom equilibration time
polyzymd compare contacts --eq-time 10ns
# Override polymer selection (only SBMA monomers)
polyzymd compare contacts --polymer-selection "resname SBM"
# Different output formats
polyzymd compare contacts --format markdown -o contacts_report.md
polyzymd compare contacts --format json
from polyzymd.compare import ComparisonConfig, ContactsComparator
# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")
# Get contacts settings from analysis_settings and comparison_settings
analysis_settings = config.analysis_settings.get("contacts")
comparison_settings = config.comparison_settings.get("contacts")
# Run comparison
comparator = ContactsComparator(
config=config,
analysis_settings=analysis_settings,
comparison_settings=comparison_settings,
equilibration="10ns",
)
result = comparator.compare()
# Access results
print(f"Highest coverage: {result.ranking_by_coverage[0]}")
print(f"Highest contact: {result.ranking_by_contact_fraction[0]}")
for cond in result.conditions:
print(f"{cond.label}: {cond.coverage_mean*100:.1f}% coverage, "
f"{cond.contact_fraction_mean*100:.1f}% contact")
# Save result
result.save("results/contacts_comparison.json")
Optional: Contacts Configuration in comparison.yaml
Add a contacts section to analysis_settings and comparison_settings:
name: "polymer_stability_study"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../noPoly_LipA_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "100% SBMA"
config: "../SBMA_100_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "100% EGMA"
config: "../EGMA_100_DMSO/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
# Configure contacts analysis in analysis_settings
analysis_settings:
contacts:
polymer_selection: "resname SBM EGM" # MDAnalysis selection
protein_selection: "protein"
cutoff: 4.5 # Angstroms
contact_criteria: "heavy_atom"
# Comparison-specific parameters in comparison_settings
comparison_settings:
contacts:
fdr_alpha: 0.05 # Benjamini-Hochberg FDR
min_effect_size: 0.5 # Cohen's d threshold
top_residues: 10 # Top residues to show
If no contacts section is provided, defaults are used.
Handling Conditions Without Polymer
Conditions without polymer atoms (e.g., “No Polymer” controls) are automatically excluded from contacts analysis since there’s nothing to measure. The command will warn you:
Note: 1 condition(s) auto-excluded (no polymer atoms): No Polymer
This is expected behavior. Comparisons will be made between polymer-containing conditions only.
Example Output
Polymer-Protein Contacts Comparison: polymer_ratio_study
================================================================================
Analysis: polymer_protein_contacts
Polymer selection: resname SBM EGM
Contact cutoff: 4.5 A
Contact criteria: heavy_atom
Equilibration: 10ns
Auto-excluded (no polymer): No Polymer
Condition Summary - Coverage (ranked, highest first)
--------------------------------------------------------------------------------
Rank Condition Coverage SEM N
--------------------------------------------------------------------------------
1 100% EGMA 88.4% 0.55% 3
2 25% SBMA / 75% EGMA 86.9% 1.44% 3
3 50% SBMA / 50% EGMA 82.7% 0.66% 3
4 75% SBMA / 25% EGMA 82.7% 1.57% 3
5 100% SBMA 74.9% 0.28% 2
--------------------------------------------------------------------------------
Condition Summary - Mean Contact Fraction (ranked, highest first)
--------------------------------------------------------------------------------
Rank Condition Contact % SEM N
--------------------------------------------------------------------------------
1 75% SBMA / 25% EGMA 30.2% 0.50% 3
2 25% SBMA / 75% EGMA 29.1% 5.35% 3
3 100% EGMA 25.3% 2.64% 3
4 100% SBMA 22.9% 1.47% 2
5 50% SBMA / 50% EGMA 22.9% 2.18% 3
--------------------------------------------------------------------------------
Residence Time by Polymer Type (frames)
--------------------------------------------------------------------------------
Condition EGM SBM
--------------------------------------------------------------------------------
100% SBMA -- 10.0±0.2
75% SBMA / 25% EGMA 7.8±0.2 9.3±0.0
50% SBMA / 50% EGMA 7.3±0.3 8.6±0.5
25% SBMA / 75% EGMA 7.1±0.7 10.5±0.4
100% EGMA 7.1±0.4 --
--------------------------------------------------------------------------------
Aggregate Comparisons
-----------------------------------------------------------------------------------------------
Comparison Metric % Change p-value Cohen d Effect
-----------------------------------------------------------------------------------------------
100% EGMA vs 100% SBMA coverage +18.1% 0.0004* -16.64 large
100% EGMA vs 100% SBMA mean contact f +10.7% 0.5445 -0.62 medium
...
-----------------------------------------------------------------------------------------------
* p < 0.05; positive % change = more contact in treatment
One-way ANOVA
------------------------------------------------------------
Metric F-stat p-value Significant
------------------------------------------------------------
coverage 18.323 0.0002 Yes*
mean contact fraction 1.200 0.3748 No
------------------------------------------------------------
Interpreting Results
Coverage rankings:
Higher coverage = polymer interacts with more of the protein surface
100% EGMA shows highest coverage (88.4%) - broader but possibly weaker binding
Contact fraction rankings:
Higher mean contact = more persistent interactions per residue
75% SBMA / 25% EGMA shows highest contact fraction (30.2%) - more stable binding
Residence time by polymer type:
SBMA (SBM) tends to have longer residence times than EGMA (EGM)
This suggests SBMA forms more persistent interactions
Useful for understanding polymer selectivity
Statistical tests:
ANOVA tests whether any condition differs overall
Pairwise comparisons with Benjamini-Hochberg FDR correction
Cohen’s d quantifies effect magnitude independent of sample size
CLI Reference for Contacts
polyzymd compare contacts [OPTIONS]
Options:
-f, --file PATH Config file [default: comparison.yaml]
--eq-time TEXT Override equilibration time
--polymer-selection TEXT Override polymer selection (MDAnalysis syntax)
--cutoff FLOAT Override contact cutoff (Angstroms)
--fdr-alpha FLOAT FDR alpha for multiple testing correction
--recompute Force recompute contacts analysis
--format [table|markdown|json] Output format [default: table]
-o, --output PATH Save formatted output to file
-q, --quiet Suppress INFO messages
--debug Enable DEBUG logging
Python API for Contacts Comparison
from polyzymd.compare import (
ComparisonConfig,
ContactsComparator,
format_contacts_result,
)
# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")
# Get contacts settings from analysis_settings and comparison_settings
analysis_settings = config.analysis_settings.get("contacts")
comparison_settings = config.comparison_settings.get("contacts")
# Run comparison
comparator = ContactsComparator(
config=config,
analysis_settings=analysis_settings,
comparison_settings=comparison_settings,
equilibration="10ns",
)
result = comparator.compare()
# Access results
print(f"Highest coverage: {result.ranking_by_coverage[0]}")
print(f"Highest contact: {result.ranking_by_contact_fraction[0]}")
for cond in result.conditions:
print(f"{cond.label}: {cond.coverage_mean*100:.1f}% coverage, "
f"{cond.contact_fraction_mean*100:.1f}% contact")
# Residence time by polymer type
for poly_type, (mean, sem) in cond.residence_time_by_polymer_type.items():
print(f" {poly_type}: {mean:.1f} ± {sem:.1f} frames")
# Format output
print(format_contacts_result(result, format="markdown"))
# Save result
result.save("results/contacts_comparison.json")
Loading Saved Contacts Results
from polyzymd.compare import ContactsComparisonResult
# Load from JSON
result = ContactsComparisonResult.load("results/contacts_comparison_my_study.json")
# Access condition data
for cond in result.conditions:
print(f"{cond.label}: coverage={cond.coverage_mean*100:.1f}%")
# Get aggregate comparisons
for comp in result.aggregate_comparisons:
if comp.significant:
print(f"{comp.condition_a} vs {comp.condition_b} ({comp.metric}): "
f"p={comp.p_value:.4f}")
Contacts vs RMSF: Complementary Analyses
Analysis |
Question Answered |
|---|---|
RMSF |
Does polymer stabilize the enzyme (reduce flexibility)? |
Contacts |
Where and how strongly does polymer bind? |
Combined |
Do contact hotspots correlate with stabilization? |
For mechanistic insights correlating contacts with flexibility changes, see
polyzymd compare report (coming soon).
Comparing Distances Across Conditions
Compare inter-atomic distances across conditions with statistical analysis. This is useful for tracking specific interactions (e.g., substrate proximity, hydrogen bond distances) that may change with different polymer environments.
Note
New to distance analysis? Start with the Distance Analysis Quick Start to understand distance pair definitions and selection syntax, then return here to compare conditions.
Key Metrics
The distances comparison provides dual-metric ranking:
Metric |
Description |
Ranking |
|---|---|---|
Mean Distance |
Average distance across trajectory (primary) |
Lowest first (closer = better) |
Fraction Below Threshold |
% of frames below contact threshold (secondary) |
Highest first (more contact = better) |
This dual approach captures both the typical distance AND the frequency of close contacts.
Running Distances Comparison
# Basic comparison
polyzymd compare run distances -f comparison.yaml
# With custom equilibration time
polyzymd compare run distances -f comparison.yaml --eq-time 10ns
# Different output formats
polyzymd compare run distances -f comparison.yaml --format markdown
polyzymd compare run distances -f comparison.yaml --format json -o distances_report.json
from polyzymd.compare import ComparisonConfig
from polyzymd.compare.comparators import DistancesComparator
# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")
# Get distances settings from analysis_settings
distances_settings = config.analysis_settings.get("distances")
# Run comparison
comparator = DistancesComparator(
config=config,
analysis_settings=distances_settings,
equilibration="10ns",
)
result = comparator.compare()
# Access results
print(f"Closest mean distance: {result.ranking[0]}")
if result.ranking_by_fraction:
print(f"Highest contact fraction: {result.ranking_by_fraction[0]}")
for cond in result.conditions:
print(f"{cond.label}: {cond.overall_mean_distance:.2f} A")
if cond.overall_fraction_below is not None:
print(f" Contact fraction: {cond.overall_fraction_below*100:.1f}%")
# Save result
result.save("results/distances_comparison.json")
Adding Distances to comparison.yaml
Add a distances section to your analysis_settings:
name: "substrate_proximity_study"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../noPoly_LipA_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "100% SBMA"
config: "../SBMA_100_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "100% EGMA"
config: "../EGMA_100_DMSO/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
# Define distance pairs in analysis_settings
analysis_settings:
distances:
threshold: 3.5 # Global default threshold (Angstroms, optional)
pairs:
- label: "Catalytic H-bond"
selection_a: "protein and resid 77 and name OG"
selection_b: "protein and resid 133 and name NE2"
threshold: 3.5 # Per-pair threshold (overrides global)
- label: "Lid Domain Opening"
selection_a: "com(protein and resid 141-148)"
selection_b: "com(protein and resid 281-289)"
threshold: 15.0 # Different threshold for this pair
# Must have corresponding entry in comparison_settings
comparison_settings:
distances: {}
Per-Pair Thresholds
Different distance pairs often have different biologically relevant thresholds:
Type of Distance |
Typical Threshold |
|---|---|
Hydrogen bond |
3.0 - 3.5 A |
Salt bridge |
4.0 - 4.5 A |
Aromatic stacking |
4.0 - 5.0 A |
Domain separation |
10 - 20 A |
Lid opening |
15 - 25 A |
Threshold resolution order:
Per-pair
thresholdin the pair definition (highest priority)Global
thresholdin thedistancessection (fallback)No threshold (fraction below not computed)
Example with mixed thresholds:
analysis_settings:
distances:
threshold: 4.0 # Default for pairs without explicit threshold
pairs:
- label: "Ser77-His133 H-bond"
selection_a: "protein and resid 77 and name OG"
selection_b: "protein and resid 133 and name NE2"
threshold: 3.5 # H-bond cutoff
- label: "Asp156-His133 H-bond"
selection_a: "midpoint(protein and resid 156 and name OD1 OD2)"
selection_b: "protein and resid 133 and name ND1"
# Uses global threshold: 4.0 A
- label: "Lid-to-Core Distance"
selection_a: "com(protein and resid 141-148)"
selection_b: "com(protein and resid 281-289)"
threshold: 15.0 # Large-scale motion threshold
Threshold cache invalidation: When you change a threshold value, PolyzyMD automatically detects the mismatch and recomputes contact fractions from the stored per-replicate distance data. This avoids expensive trajectory reprocessing - only the statistical aggregation is recalculated.
Example Output
Distance Comparison: substrate_proximity_study
================================================================================
Pairs analyzed: 2
Pair labels: Ser77-Substrate, His156-Substrate
Contact threshold: 3.5 A
Equilibration: 10ns
Control: No Polymer
Condition Summary (ranked by mean distance, lowest first)
--------------------------------------------------------------------------------
Rank Condition Mean Dist SEM % Below N
--------------------------------------------------------------------------------
1 100% SBMA 7.46 A 0.421 0.4% 3
2 100% EGMA 7.66 A 0.270 1.6% 3
3 No Polymer 8.02 A 0.315 0.0% 3 *
--------------------------------------------------------------------------------
* = control condition
Secondary Ranking (by % below threshold, highest first)
------------------------------------------------------------
1 100% EGMA 1.6% (SEM: 0.31%)
2 100% SBMA 0.4% (SEM: 0.31%)
3 No Polymer 0.0% (SEM: 0.00%)
Per-Pair Distances (Mean +/- SEM across replicates)
------------------------------------------------------------------------------------------
Condition Ser77-Substrate His156-Substrate
------------------------------------------------------------------------------------------
100% SBMA 7.46+/-0.42 6.12+/-0.25
100% EGMA 7.66+/-0.27 6.45+/-0.18
No Polymer 8.02+/-0.31 6.89+/-0.22
------------------------------------------------------------------------------------------
Pairwise Comparisons (Distance Metric)
------------------------------------------------------------------------------------------
Comparison % Change p-value Cohen d Effect Direction
------------------------------------------------------------------------------------------
100% SBMA vs No Polymer -7.0% 0.0451* 0.87 large closer
100% EGMA vs No Polymer -4.4% 0.1234 0.70 medium closer
------------------------------------------------------------------------------------------
* p < 0.05
Negative % change = lower distance (closer)
Pairwise Comparisons (Fraction Below Threshold)
------------------------------------------------------------------------------------------
Comparison % Change p-value Cohen d Effect Direction
------------------------------------------------------------------------------------------
100% SBMA vs No Polymer +0.4% 0.2161 -1.20 large more_contact
100% EGMA vs No Polymer +1.6% 0.0065* -4.25 large more_contact
------------------------------------------------------------------------------------------
* p < 0.05
Positive % change = more frames below threshold (more contact)
One-way ANOVA
--------------------------------------------------
Distance metric:
F-statistic: 3.901
p-value: 0.0512
Significant: No (alpha=0.05)
Fraction metric:
F-statistic: 5.880
p-value: 0.0171
Significant: Yes (alpha=0.05)
Interpretation
--------------------------------------------------------------------------------
Closest mean distance: 100% SBMA (7.46 A)
-> 7.0% closer than control (No Polymer)
-> Statistically significant (p=0.0451, d=0.87 [large])
Highest contact fraction: 100% EGMA (1.6% below threshold)
Analysis completed: 2026-02-16 21:03:40
PolyzyMD version: 1.0.0
Interpreting Results
Primary metric (Mean Distance):
Lower distance = closer = atoms more frequently in proximity
Ranking from lowest to highest (Rank 1 = closest)
Negative % change vs control = improvement (closer)
Secondary metric (Fraction Below Threshold):
Only computed if
thresholdis specified in configHigher fraction = more frames with close contact
Ranking from highest to lowest (Rank 1 = most contact)
Positive % change vs control = improvement (more contact)
Why dual metrics?
Mean distance captures typical behavior
Fraction below threshold captures extreme events (e.g., catalytic encounters)
A condition might have similar mean distance but more frequent close approaches
Statistical Analysis
Both metrics undergo independent statistical testing:
Test |
Applied To |
Interpretation |
|---|---|---|
t-test |
Each condition vs control |
p < 0.05 = significant difference |
Cohen’s d |
Each comparison |
Effect magnitude (regardless of p-value) |
ANOVA |
All conditions |
Any condition differs? (3+ conditions) |
Effect size interpretation:
Cohen’s d |
Interpretation |
|---|---|
< 0.2 |
Negligible |
0.2 - 0.5 |
Small |
0.5 - 0.8 |
Medium |
> 0.8 |
Large |
CLI Reference for Distances
polyzymd compare run distances [OPTIONS]
Options:
-f, --file PATH Config file [default: comparison.yaml]
--eq-time TEXT Override equilibration time
--recompute Force recompute distance analysis
--format [table|markdown|json] Output format [default: table]
-o, --output PATH Save formatted output to file
-q, --quiet Suppress INFO messages
--debug Enable DEBUG logging
Python API for Distances Comparison
from polyzymd.compare import ComparisonConfig
from polyzymd.compare.comparators import DistancesComparator
from polyzymd.compare.distances_formatters import format_distances_result
# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")
# Get distances settings
distances_settings = config.analysis_settings.get("distances")
# Run comparison
comparator = DistancesComparator(
config=config,
analysis_settings=distances_settings,
equilibration="10ns",
)
result = comparator.compare()
# Access primary ranking (by mean distance)
print(f"Closest: {result.ranking[0]}")
for cond in result.conditions:
print(f"{cond.label}: {cond.overall_mean_distance:.2f} ± {cond.overall_sem_distance:.3f} A")
# Access secondary ranking (by fraction below threshold)
if result.ranking_by_fraction:
print(f"\nHighest contact: {result.ranking_by_fraction[0]}")
for cond in result.conditions:
if cond.overall_fraction_below is not None:
print(f"{cond.label}: {cond.overall_fraction_below*100:.1f}%")
# Access per-pair details
for cond in result.conditions:
print(f"\n{cond.label}:")
for pair in cond.pair_summaries:
print(f" {pair.label}: {pair.mean_distance:.2f} ± {pair.sem_distance:.2f} A")
# Format output
print(format_distances_result(result, format="markdown"))
# Save result
result.save("results/distances_comparison.json")
Loading Saved Distance Results
from polyzymd.compare.results import DistanceComparisonResult
# Load from JSON
result = DistanceComparisonResult.load("results/distances_comparison_my_study.json")
# Access condition data
control = result.get_condition("No Polymer")
print(f"Control mean distance: {control.overall_mean_distance:.2f} A")
# Get pairwise comparison
comp = result.get_comparison("100% SBMA")
if comp and comp.distance_significant:
print(f"SBMA significantly closer (p={comp.distance_p_value:.4f})")
Use Cases for Distance Comparison
Use Case |
Configuration |
|---|---|
Substrate binding |
Distance from catalytic residues to substrate atoms |
Active site geometry |
Similar to triad, but for non-catalytic interactions |
Polymer-residue proximity |
Distance from polymer termini to specific residues |
Conformational changes |
Distance between domains or loops |
Distances vs Catalytic Triad Comparison
Feature |
|
|
|---|---|---|
Metric |
Mean distance + fraction |
Simultaneous contact fraction |
Pairs |
Any atom pairs |
Pre-defined triad geometry |
Ranking |
Dual (distance + fraction) |
Single (simultaneous contact) |
Use case |
General distance tracking |
Catalytic geometry integrity |
Tip
Use distances for monitoring specific interactions with dual-metric analysis. Use triad when all pairs must be in contact simultaneously (catalytic triad geometry).
See Also
RMSF Quick Start – Run individual RMSF analysis
Contacts Quick Start – Run individual contacts analysis
Distance Analysis Quick Start – Run individual distance analysis
Catalytic Triad Analysis – Run individual triad analysis
Statistical Best Practices – Understanding statistics
Reference Selection – Alignment options