How to Compare Simulation Conditions

Use this guide when you already have completed PolyzyMD simulations and want to run the current polyzymd compare workflow.

You will:

  • create a comparison workspace

  • configure one or more analysis plugins under plugins:

  • run polyzymd compare run or polyzymd compare run-all

  • generate figures with polyzymd compare plot-all

Important

For the v1.3.0 release, the stable comparison stack is RMSD, Rg, RMSF, contacts, distances, catalytic triad, secondary structure, and SASA. Binding preference, exposure dynamics, binding free energy, and polymer affinity remain available, but PolyzyMD labels them as experimental.

Note

If you have not yet run a full analysis/comparison workflow, start with Tutorial: Analyze a Study from Finished Simulations.

Environment Setup

All commands below assume you have activated the PolyzyMD pixi environment:

pixi shell -e build

Alternatively, prefix each command with pixi run -e build.

Before You Start

Make sure each condition already has:

  • a simulation config.yaml

  • finished trajectories for the replicates you want to compare

  • any shared inputs needed by the plugin you plan to run

The comparison pipeline can reuse cached analysis data when it exists, but it can also compute missing per-condition results during polyzymd compare run.

Step 1: Create a Comparison Workspace

polyzymd compare init -n polymer_stability_study
cd polymer_stability_study

This creates:

polymer_stability_study/
├── comparison.yaml
├── comparison/
├── figures/
└── structures/
  • comparison.yaml defines the conditions and enabled plugins

  • comparison/ stores cached comparison JSON, one subdirectory per analysis

  • figures/ stores generated plots

  • structures/ holds shared reference files such as an enzyme PDB for SASA

Step 2: Define a Minimal comparison.yaml

Start with one stable analysis. RMSF is a good first comparison because it has few extra inputs.

name: "polymer_stability_study"
description: "Effect of polymer composition on enzyme flexibility"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

plugins:
  rmsf:
    selection: "protein and name CA"

To enable more analyses, add more sections under plugins::

plugins:
  rmsf:
    selection: "protein and name CA"

  contacts:
    polymer_selection: "chainID C"
    protein_selection: "protein"
    cutoff: 4.5

  catalytic_triad:
    name: "Ser-His-Asp"
    threshold: 3.5
    pairs:
      - label: "Ser77-His156"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 156 and name NE2"

  distances:
    pairs:
      - label: "Substrate-Ser77"
        selection_a: "resname SUB and name C1"
        selection_b: "protein and resid 77 and name OG"

  rmsd:
    runs:
      - label: "Protein Backbone"
        selection: "protein and name CA"
        alignment_selection: "protein and name CA"
        reference_mode: "centroid"
      - label: "Active Site"
        selection: "protein and (resid 77 or resid 133 or resid 156) and name CA"
        alignment_selection: "protein and name CA"
        reference_mode: "centroid"

  rg:
    runs:
      - label: "Whole Protein"
        selection: "protein"
      - label: "Protein Backbone"
        selection: "protein and name CA"

Statistical settings for pairwise comparisons

Plugins that perform cross-condition statistical tests support per-plugin settings in the plugins: block. For example, contacts supports fdr_alpha, min_effect_size, and top_residues; binding free energy and polymer affinity support fdr_alpha. See the Comparison Reference for the full settings table. For post-hoc method details (BH t-tests, Tukey HSD, Cohen’s d, and significance markers), see the Post-Hoc Testing Reference.

Step 3: Validate the Config

polyzymd compare validate

You should see a passing summary with the study name, condition count, and the enabled plugin sections.

Step 4: Run One Comparison

polyzymd compare run rmsf

This command:

  • resolves plugins.rmsf from comparison.yaml

  • computes or reloads per-condition RMSF data

  • performs the cross-condition comparison

  • writes the canonical cache file to comparison/rmsf/result.json

  • prints a formatted summary to the terminal

Running on an HPC cluster?

For expensive analyses (SASA, contacts, hydrogen bonds) or large studies with many conditions and replicates, use polyzymd compare submit to dispatch analysis as SLURM jobs instead of running interactively:

polyzymd compare submit sasa --partition <part> --mem 8G --time 02:00:00
polyzymd compare status sasa       # monitor progress
polyzymd compare finalize sasa     # (if needed) re-run compare + plot

Each replicate runs as an independent job, with automatic dependency wiring for aggregation and finalization. See How To: Submit Analysis Jobs to a SLURM Cluster for the full workflow, including dry-run previews and job arrays.

You can save the formatted report separately with -o:

polyzymd compare run rmsf --format markdown -o reports/rmsf.md

Step 5: Run All Enabled Comparisons

Once you have multiple plugin sections configured, run them together:

polyzymd compare run-all

Or run them and generate plots in one pass:

polyzymd compare run-all --plot

Step 6: Generate Figures

For a plotting smoke test:

polyzymd compare plot-all --list-available
polyzymd compare plot-all

--list-available is useful because it shows which plot types are available for the currently enabled plugins and which are experimental.

Step 7: Check the Outputs

After a successful run, expect files like these:

polymer_stability_study/
├── comparison.yaml
├── comparison/
│   ├── rmsf/
│   │   └── result.json
│   ├── contacts/
│   │   └── result.json
│   ├── distances/
│   │   └── result.json
│   └── catalytic_triad/
│       └── result.json
└── figures/
    ├── rmsf/
    │   ├── rmsf_comparison.png
    │   └── rmsf_profile.png
    └── ...

If your smoke test is polyzymd compare plot-all, success means:

  • the command completes without error

  • stable plots render normally

  • experimental plots, if enabled, render with explicit experimental labeling

Programmatic Use

If you need to run the comparison pipeline from Python, use the plugin orchestrator directly:

from pathlib import Path

from polyzymd.analyses.discovery import get_analysis
from polyzymd.analyses.orchestrator import run_comparison
from polyzymd.config.comparison import ComparisonConfig

config = ComparisonConfig.from_yaml(Path("comparison.yaml"))
analysis = get_analysis("rmsf")()

pipeline_result = run_comparison(
    analysis,
    config,
    equilibration="10ns",
)

result = pipeline_result["comparison"]
print(result.ranking)
print(pipeline_result["comparison_path"])

Adding More Stable Analyses

Common next additions to comparison.yaml are:

  • rmsd for RMSD timeseries and structural stability comparison

  • rg for Radius of Gyration and structural compactness comparison

  • contacts for polymer coverage and contact fraction

  • distances for custom atom-pair distances

  • catalytic_triad for active-site geometry

  • secondary_structure for helix/strand persistence and content

For end-to-end examples, see:

Experimental Workflows

Experimental workflows remain available, but they are not the default path for the presentation release:

Troubleshooting

config path not found

Paths in comparison.yaml are resolved relative to the location of comparison.yaml, not your current shell directory.

No analyses are enabled

You need at least one configured section under plugins:.

plot-all runs but expected figures are missing

Check that the corresponding comparison JSON files already exist under comparison/<analysis>/result.json and use polyzymd compare plot-all --list-available to verify the enabled plot types.

polyzymd compare run fails for an experimental metric

Run the prerequisite analysis first. For example, binding_free_energy and polymer_affinity depend on cached contact-derived data, so you usually run:

polyzymd compare run contacts
polyzymd compare run binding_free_energy

See Also