How to Compare Simulation Conditions

Use this guide when you already have completed PolyzyMD simulations and want to run the current polyzymd compare workflow.

You will:

  • create a comparison workspace

  • configure one or more analysis plugins under plugins:

  • run polyzymd compare run or polyzymd compare run-all

  • generate figures with polyzymd compare plot-all

Important

For the v1.3.0 release, the stable comparison stack is RMSD, Rg, RMSF, contacts, distances, catalytic triad, secondary structure, SASA, and hydrogen bonds.

Note

If you have not yet run a full analysis/comparison workflow, start with Tutorial: Analyze a Study from Finished Simulations.

Environment Setup

All commands below assume you have activated the PolyzyMD pixi environment:

pixi shell -e build

Alternatively, prefix each command with pixi run -e build.

Resource requirements

Validation, status, and help commands are lightweight. polyzymd compare run, run-all, and plotting over large cached results may load trajectories and can require substantial RAM, CPU/GPU time, and scratch I/O. On shared HPC systems, run these commands inside an allocated job or interactive compute session, not on a login node. If a command is killed or runs out of memory, request more resources or use polyzymd compare submit.

Before You Start

Make sure each condition already has:

  • a simulation config.yaml

  • finished trajectories for the replicates you want to compare

  • any shared inputs needed by the plugin you plan to run

The comparison pipeline can reuse cached analysis data when it exists, but it can also compute missing per-condition results during polyzymd compare run.

Step 1: Create a Comparison Workspace

polyzymd compare init -n polymer_stability_study
cd polymer_stability_study

This creates:

polymer_stability_study/
├── comparison.yaml
├── comparison/
├── figures/
└── structures/
  • comparison.yaml defines the conditions and enabled plugins

  • comparison/ stores cached comparison JSON, one subdirectory per analysis

  • figures/ stores generated plots

  • structures/ holds shared reference files such as an enzyme PDB for SASA

Step 2: Define a Minimal comparison.yaml

Start with one stable analysis. RMSF is a good first comparison because it has few extra inputs.

name: "polymer_stability_study"
description: "Effect of polymer composition on enzyme flexibility"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

plugins:
  rmsf:
    selection: "protein and name CA"

To enable more analyses, add more sections under plugins::

plugins:
  rmsf:
    selection: "protein and name CA"

  contacts:
    polymer_selection: "chainid C"
    protein_selection: "chainid A"
    cutoff: 4.5

  catalytic_triad:
    name: "Ser-His-Asp"
    threshold: 3.5
    pairs:
      - label: "Ser77-His156"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 156 and name NE2"

  distances:
    pairs:
      - label: "Substrate-Ser77"
        selection_a: "resname SUB and name C1"
        selection_b: "protein and resid 77 and name OG"

  rmsd:
    runs:
      - label: "Protein Backbone"
        selection: "protein and name CA"
        alignment_selection: "protein and name CA"
        reference_mode: "centroid"
      - label: "Active Site"
        selection: "protein and (resid 77 or resid 133 or resid 156) and name CA"
        alignment_selection: "protein and name CA"
        reference_mode: "centroid"

  rg:
    runs:
      - label: "Whole Protein"
        selection: "protein"
      - label: "Protein Backbone"
        selection: "protein and name CA"

Statistical settings for pairwise comparisons

Plugins that perform cross-condition statistical tests support per-plugin settings in the plugins: block. For example, contacts supports fdr_alpha, min_effect_size, and top_residues. See the Comparison Reference for the full settings table. For post-hoc method details (BH t-tests, Tukey HSD, Cohen’s d, and significance markers), see the Post-Hoc Testing Reference.

Step 3: Validate the Config

polyzymd compare validate

You should see a passing summary with the study name, condition count, and the enabled plugin sections.

Step 4: Run One Comparison

polyzymd compare run rmsf

This command:

  • resolves plugins.rmsf from comparison.yaml

  • computes or reloads per-condition RMSF data

  • performs the cross-condition comparison

  • writes the canonical cache file to comparison/rmsf/result.json

  • prints a formatted summary to the terminal

Running on an HPC cluster?

For expensive analyses (SASA, contacts, hydrogen bonds) or large studies with many conditions and replicates, use polyzymd compare submit to dispatch analysis as SLURM jobs instead of running interactively:

polyzymd compare submit sasa --partition <part> --mem 8G --time 02:00:00
polyzymd compare status sasa       # monitor progress
polyzymd compare finalize sasa     # (if needed) re-run compare + plot

Each replicate runs as an independent job, with automatic dependency wiring for aggregation and finalization. See How To: Submit Analysis Jobs to a SLURM Cluster for the full workflow, including dry-run previews and job arrays.

You can save the formatted report separately with -o:

polyzymd compare run rmsf --format markdown -o reports/rmsf.md

Step 5: Run All Enabled Comparisons

Once you have multiple plugin sections configured, run them together:

polyzymd compare run-all

Or run them and generate plots in one pass:

polyzymd compare run-all --plot

Step 6: Generate Figures

For a plotting smoke test:

polyzymd compare plot-all --list-available
polyzymd compare plot-all

--list-available is useful because it shows which plot types are available for the currently enabled plugins and which are experimental.

Step 7: Check the Outputs

After a successful run, expect files like these:

polymer_stability_study/
├── comparison.yaml
├── analysis/
│   ├── no_polymer/
│   │   └── rmsf/
│   │       ├── run_1/
│   │       │   └── result.json
│   │       ├── run_2/
│   │       │   └── result.json
│   │       └── aggregated/
│   │           └── result.json
│   └── 100_sbma/
│       └── rmsf/
│           └── ...
├── comparison/
│   ├── rmsf/
│   │   └── result.json
│   ├── contacts/
│   │   └── result.json
│   ├── distances/
│   │   └── result.json
│   └── catalytic_triad/
│       └── result.json
└── figures/
    ├── rmsf/
    │   ├── rmsf_comparison.png
    │   └── rmsf_profile.png
    └── ...

If your smoke test is polyzymd compare plot-all, success means:

  • the command completes without error

  • stable plots render normally

  • experimental plots, if enabled, render with explicit experimental labeling

Programmatic Use

If you need to run the comparison pipeline from Python, use the plugin orchestrator directly:

from pathlib import Path

from polyzymd.analyses.discovery import get_analysis
from polyzymd.analyses.orchestrator import run_comparison
from polyzymd.config.comparison import ComparisonConfig

config = ComparisonConfig.from_yaml(Path("comparison.yaml"))
analysis = get_analysis("rmsf")()

pipeline_result = run_comparison(
    analysis,
    config,
    equilibration="10ns",
)

result = pipeline_result["comparison"]
print(result.ranking)
print(pipeline_result["comparison_path"])

Adding More Stable Analyses

Common next additions to comparison.yaml are:

  • rmsd for RMSD timeseries and structural stability comparison

  • rg for Radius of Gyration and structural compactness comparison

  • contacts for polymer coverage and contact fraction

  • distances for custom atom-pair distances

  • catalytic_triad for active-site geometry

  • secondary_structure for helix/strand persistence and content

  • hydrogen_bonds for hydrogen-bond occupancy and lifetime summaries

For end-to-end examples, see:

Archived experimental analyses are not active v1.3 plugins. See Experimental analyses for historical access details.

Troubleshooting

config path not found

Paths in comparison.yaml are resolved relative to the location of comparison.yaml, not your current shell directory.

No analyses are enabled

You need at least one configured section under plugins:.

plot-all runs but expected figures are missing

Check that the corresponding comparison JSON files already exist under comparison/<analysis>/result.json and use polyzymd compare plot-all --list-available to verify the enabled plot types.

See Also