# How to Compare Simulation Conditions Use this guide when you already have completed PolyzyMD simulations and want to run the current `polyzymd compare` workflow. You will: - create a comparison workspace - configure one or more analysis plugins under `plugins:` - run `polyzymd compare run` or `polyzymd compare run-all` - generate figures with `polyzymd compare plot-all` ```{important} For the `v1.3.0` release, the stable comparison stack is RMSD, Rg, RMSF, contacts, distances, catalytic triad, secondary structure, SASA, and hydrogen bonds. ``` ```{note} If you have not yet run a full analysis/comparison workflow, start with [Tutorial: Analyze a Study from Finished Simulations](../tutorials/analysis_complete_workflow.md). ``` :::{admonition} Environment Setup :class: tip All commands below assume you have activated the PolyzyMD pixi environment: ```bash pixi shell -e build ``` Alternatively, prefix each command with `pixi run -e build`. ::: :::{admonition} Resource requirements :class: important Validation, status, and help commands are lightweight. `polyzymd compare run`, `run-all`, and plotting over large cached results may load trajectories and can require substantial RAM, CPU/GPU time, and scratch I/O. On shared HPC systems, run these commands inside an allocated job or interactive compute session, not on a login node. If a command is killed or runs out of memory, request more resources or use `polyzymd compare submit`. ::: ## Before You Start Make sure each condition already has: - a simulation `config.yaml` - finished trajectories for the replicates you want to compare - any shared inputs needed by the plugin you plan to run The comparison pipeline can reuse cached analysis data when it exists, but it can also compute missing per-condition results during `polyzymd compare run`. ## Step 1: Create a Comparison Workspace ```bash polyzymd compare init -n polymer_stability_study cd polymer_stability_study ``` This creates: ```text polymer_stability_study/ ├── comparison.yaml ├── comparison/ ├── figures/ └── structures/ ``` - `comparison.yaml` defines the conditions and enabled plugins - `comparison/` stores cached comparison JSON, one subdirectory per analysis - `figures/` stores generated plots - `structures/` holds shared reference files such as an enzyme PDB for SASA ## Step 2: Define a Minimal `comparison.yaml` Start with one stable analysis. RMSF is a good first comparison because it has few extra inputs. ```yaml name: "polymer_stability_study" description: "Effect of polymer composition on enzyme flexibility" control: "No Polymer" conditions: - label: "No Polymer" config: "../noPoly_enzyme_DMSO/config.yaml" replicates: [1, 2, 3] - label: "100% SBMA" config: "../SBMA_100_enzyme_DMSO/config.yaml" replicates: [1, 2, 3] - label: "100% EGMA" config: "../EGMA_100_enzyme_DMSO/config.yaml" replicates: [1, 2, 3] defaults: equilibration_time: "10ns" plugins: rmsf: selection: "protein and name CA" ``` To enable more analyses, add more sections under `plugins:`: ```yaml plugins: rmsf: selection: "protein and name CA" contacts: polymer_selection: "chainid C" protein_selection: "chainid A" cutoff: 4.5 catalytic_triad: name: "Ser-His-Asp" threshold: 3.5 pairs: - label: "Ser77-His156" selection_a: "protein and resid 77 and name OG" selection_b: "protein and resid 156 and name NE2" distances: pairs: - label: "Substrate-Ser77" selection_a: "resname SUB and name C1" selection_b: "protein and resid 77 and name OG" rmsd: runs: - label: "Protein Backbone" selection: "protein and name CA" alignment_selection: "protein and name CA" reference_mode: "centroid" - label: "Active Site" selection: "protein and (resid 77 or resid 133 or resid 156) and name CA" alignment_selection: "protein and name CA" reference_mode: "centroid" rg: runs: - label: "Whole Protein" selection: "protein" - label: "Protein Backbone" selection: "protein and name CA" ``` :::{admonition} Statistical settings for pairwise comparisons :class: tip Plugins that perform cross-condition statistical tests support per-plugin settings in the `plugins:` block. For example, contacts supports `fdr_alpha`, `min_effect_size`, and `top_residues`. See the [Comparison Reference](../reference/analysis_comparison_reference.md#per-plugin-statistical-settings) for the full settings table. For post-hoc method details (BH t-tests, Tukey HSD, Cohen's d, and significance markers), see the [Post-Hoc Testing Reference](../reference/posthoc_testing.md). ::: ## Step 3: Validate the Config ```bash polyzymd compare validate ``` You should see a passing summary with the study name, condition count, and the enabled plugin sections. ## Step 4: Run One Comparison ```bash polyzymd compare run rmsf ``` This command: - resolves `plugins.rmsf` from `comparison.yaml` - computes or reloads per-condition RMSF data - performs the cross-condition comparison - writes the canonical cache file to `comparison/rmsf/result.json` - prints a formatted summary to the terminal :::{admonition} Running on an HPC cluster? :class: tip For expensive analyses (SASA, contacts, hydrogen bonds) or large studies with many conditions and replicates, use `polyzymd compare submit` to dispatch analysis as SLURM jobs instead of running interactively: ```bash polyzymd compare submit sasa --partition --mem 8G --time 02:00:00 polyzymd compare status sasa # monitor progress polyzymd compare finalize sasa # (if needed) re-run compare + plot ``` Each replicate runs as an independent job, with automatic dependency wiring for aggregation and finalization. See {doc}`hpc_execution` for the full workflow, including dry-run previews and job arrays. ::: You can save the formatted report separately with `-o`: ```bash polyzymd compare run rmsf --format markdown -o reports/rmsf.md ``` ## Step 5: Run All Enabled Comparisons Once you have multiple plugin sections configured, run them together: ```bash polyzymd compare run-all ``` Or run them and generate plots in one pass: ```bash polyzymd compare run-all --plot ``` ## Step 6: Generate Figures For a plotting smoke test: ```bash polyzymd compare plot-all --list-available polyzymd compare plot-all ``` `--list-available` is useful because it shows which plot types are available for the currently enabled plugins and which are experimental. ## Step 7: Check the Outputs After a successful run, expect files like these: ```text polymer_stability_study/ ├── comparison.yaml ├── analysis/ │ ├── no_polymer/ │ │ └── rmsf/ │ │ ├── run_1/ │ │ │ └── result.json │ │ ├── run_2/ │ │ │ └── result.json │ │ └── aggregated/ │ │ └── result.json │ └── 100_sbma/ │ └── rmsf/ │ └── ... ├── comparison/ │ ├── rmsf/ │ │ └── result.json │ ├── contacts/ │ │ └── result.json │ ├── distances/ │ │ └── result.json │ └── catalytic_triad/ │ └── result.json └── figures/ ├── rmsf/ │ ├── rmsf_comparison.png │ └── rmsf_profile.png └── ... ``` If your smoke test is `polyzymd compare plot-all`, success means: - the command completes without error - stable plots render normally - experimental plots, if enabled, render with explicit experimental labeling ## Programmatic Use If you need to run the comparison pipeline from Python, use the plugin orchestrator directly: ```python from pathlib import Path from polyzymd.analyses.discovery import get_analysis from polyzymd.analyses.orchestrator import run_comparison from polyzymd.config.comparison import ComparisonConfig config = ComparisonConfig.from_yaml(Path("comparison.yaml")) analysis = get_analysis("rmsf")() pipeline_result = run_comparison( analysis, config, equilibration="10ns", ) result = pipeline_result["comparison"] print(result.ranking) print(pipeline_result["comparison_path"]) ``` ## Adding More Stable Analyses Common next additions to `comparison.yaml` are: - `rmsd` for RMSD timeseries and structural stability comparison - `rg` for Radius of Gyration and structural compactness comparison - `contacts` for polymer coverage and contact fraction - `distances` for custom atom-pair distances - `catalytic_triad` for active-site geometry - `secondary_structure` for helix/strand persistence and content - `hydrogen_bonds` for hydrogen-bond occupancy and lifetime summaries For end-to-end examples, see: - [Run RMSD Analysis](analysis_rmsd_quickstart.md) - [Run Rg Analysis](analysis_rg_quickstart.md) - [Run RMSF Analysis](analysis_rmsf_quickstart.md) - [Run Contacts Analysis](analysis_contacts_quickstart.md) - [Run Distance Analysis](analysis_distances_quickstart.md) - [Run Catalytic Triad Analysis](analysis_triad_quickstart.md) Archived experimental analyses are not active v1.3 plugins. See [Experimental analyses](../reference/experimental_analyses_archive.md) for historical access details. ## Troubleshooting ### `config` path not found Paths in `comparison.yaml` are resolved relative to the location of `comparison.yaml`, not your current shell directory. ### `No analyses are enabled` You need at least one configured section under `plugins:`. ### `plot-all` runs but expected figures are missing Check that the corresponding comparison JSON files already exist under `comparison//result.json` and use `polyzymd compare plot-all --list-available` to verify the enabled plot types. ## See Also - [Tutorial: Analyze a Study from Finished Simulations](../tutorials/analysis_complete_workflow.md) - [Comparison and Plotting Reference](../reference/analysis_comparison_reference.md) - [Statistical Best Practices for Analysis](../explanation/analysis_statistics_best_practices.md)