# How to Compare Simulation Conditions

Use this guide when you already have completed PolyzyMD simulations and want to
run the current `polyzymd compare` workflow.

You will:

- create a comparison workspace
- configure one or more analysis plugins under `plugins:`
- run `polyzymd compare run` or `polyzymd compare run-all`
- generate figures with `polyzymd compare plot-all`

```{important}
For the `v1.3.0` release, the stable comparison stack is RMSD, Rg, RMSF,
contacts, distances, catalytic triad, secondary structure, SASA, and hydrogen
bonds.
```

```{note}
If you have not yet run a full analysis/comparison workflow, start with
[Tutorial: Analyze a Study from Finished Simulations](../tutorials/analysis_complete_workflow.md).
```

:::{admonition} Environment Setup
:class: tip

All commands below assume you have activated the PolyzyMD pixi environment:

```bash
pixi shell -e build
```

Alternatively, prefix each command with `pixi run -e build`.
:::

:::{admonition} Resource requirements
:class: important

Validation, status, and help commands are lightweight. `polyzymd compare run`,
`run-all`, and plotting over large cached results may load trajectories and can
require substantial RAM, CPU/GPU time, and scratch I/O. On shared HPC systems,
run these commands inside an allocated job or interactive compute session, not
on a login node. If a command is killed or runs out of memory, request more
resources or use `polyzymd compare submit`.
:::

## Before You Start

Make sure each condition already has:

- a simulation `config.yaml`
- finished trajectories for the replicates you want to compare
- any shared inputs needed by the plugin you plan to run

The comparison pipeline can reuse cached analysis data when it exists, but it
can also compute missing per-condition results during `polyzymd compare run`.

## Step 1: Create a Comparison Workspace

```bash
polyzymd compare init -n polymer_stability_study
cd polymer_stability_study
```

This creates:

```text
polymer_stability_study/
├── comparison.yaml
├── comparison/
├── figures/
└── structures/
```

- `comparison.yaml` defines the conditions and enabled plugins
- `comparison/` stores cached comparison JSON, one subdirectory per analysis
- `figures/` stores generated plots
- `structures/` holds shared reference files such as an enzyme PDB for SASA

## Step 2: Define a Minimal `comparison.yaml`

Start with one stable analysis. RMSF is a good first comparison because it has
few extra inputs.

```yaml
name: "polymer_stability_study"
description: "Effect of polymer composition on enzyme flexibility"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

plugins:
  rmsf:
    selection: "protein and name CA"
```

To enable more analyses, add more sections under `plugins:`:

```yaml
plugins:
  rmsf:
    selection: "protein and name CA"

  contacts:
    polymer_selection: "chainid C"
    protein_selection: "chainid A"
    cutoff: 4.5

  catalytic_triad:
    name: "Ser-His-Asp"
    threshold: 3.5
    pairs:
      - label: "Ser77-His156"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 156 and name NE2"

  distances:
    pairs:
      - label: "Substrate-Ser77"
        selection_a: "resname SUB and name C1"
        selection_b: "protein and resid 77 and name OG"

  rmsd:
    runs:
      - label: "Protein Backbone"
        selection: "protein and name CA"
        alignment_selection: "protein and name CA"
        reference_mode: "centroid"
      - label: "Active Site"
        selection: "protein and (resid 77 or resid 133 or resid 156) and name CA"
        alignment_selection: "protein and name CA"
        reference_mode: "centroid"

  rg:
    runs:
      - label: "Whole Protein"
        selection: "protein"
      - label: "Protein Backbone"
        selection: "protein and name CA"
```

:::{admonition} Statistical settings for pairwise comparisons
:class: tip

Plugins that perform cross-condition statistical tests support per-plugin
settings in the `plugins:` block. For example, contacts supports `fdr_alpha`,
`min_effect_size`, and `top_residues`. See the
[Comparison Reference](../reference/analysis_comparison_reference.md#per-plugin-statistical-settings)
for the full settings table. For post-hoc method details (BH t-tests, Tukey
HSD, Cohen's d, and significance markers), see the
[Post-Hoc Testing Reference](../reference/posthoc_testing.md).
:::

## Step 3: Validate the Config

```bash
polyzymd compare validate
```

You should see a passing summary with the study name, condition count, and the
enabled plugin sections.

## Step 4: Run One Comparison

```bash
polyzymd compare run rmsf
```

This command:

- resolves `plugins.rmsf` from `comparison.yaml`
- computes or reloads per-condition RMSF data
- performs the cross-condition comparison
- writes the canonical cache file to `comparison/rmsf/result.json`
- prints a formatted summary to the terminal

:::{admonition} Running on an HPC cluster?
:class: tip

For expensive analyses (SASA, contacts, hydrogen bonds) or large studies with
many conditions and replicates, use `polyzymd compare submit` to dispatch
analysis as SLURM jobs instead of running interactively:

```bash
polyzymd compare submit sasa --partition <part> --mem 8G --time 02:00:00
polyzymd compare status sasa       # monitor progress
polyzymd compare finalize sasa     # (if needed) re-run compare + plot
```

Each replicate runs as an independent job, with automatic dependency wiring
for aggregation and finalization. See {doc}`hpc_execution` for the full
workflow, including dry-run previews and job arrays.
:::

You can save the formatted report separately with `-o`:

```bash
polyzymd compare run rmsf --format markdown -o reports/rmsf.md
```

## Step 5: Run All Enabled Comparisons

Once you have multiple plugin sections configured, run them together:

```bash
polyzymd compare run-all
```

Or run them and generate plots in one pass:

```bash
polyzymd compare run-all --plot
```

## Step 6: Generate Figures

For a plotting smoke test:

```bash
polyzymd compare plot-all --list-available
polyzymd compare plot-all
```

`--list-available` is useful because it shows which plot types are available
for the currently enabled plugins and which are experimental.

## Step 7: Check the Outputs

After a successful run, expect files like these:

```text
polymer_stability_study/
├── comparison.yaml
├── analysis/
│   ├── no_polymer/
│   │   └── rmsf/
│   │       ├── run_1/
│   │       │   └── result.json
│   │       ├── run_2/
│   │       │   └── result.json
│   │       └── aggregated/
│   │           └── result.json
│   └── 100_sbma/
│       └── rmsf/
│           └── ...
├── comparison/
│   ├── rmsf/
│   │   └── result.json
│   ├── contacts/
│   │   └── result.json
│   ├── distances/
│   │   └── result.json
│   └── catalytic_triad/
│       └── result.json
└── figures/
    ├── rmsf/
    │   ├── rmsf_comparison.png
    │   └── rmsf_profile.png
    └── ...
```

If your smoke test is `polyzymd compare plot-all`, success means:

- the command completes without error
- stable plots render normally
- experimental plots, if enabled, render with explicit experimental labeling

## Programmatic Use

If you need to run the comparison pipeline from Python, use the plugin
orchestrator directly:

```python
from pathlib import Path

from polyzymd.analyses.discovery import get_analysis
from polyzymd.analyses.orchestrator import run_comparison
from polyzymd.config.comparison import ComparisonConfig

config = ComparisonConfig.from_yaml(Path("comparison.yaml"))
analysis = get_analysis("rmsf")()

pipeline_result = run_comparison(
    analysis,
    config,
    equilibration="10ns",
)

result = pipeline_result["comparison"]
print(result.ranking)
print(pipeline_result["comparison_path"])
```

## Adding More Stable Analyses

Common next additions to `comparison.yaml` are:

- `rmsd` for RMSD timeseries and structural stability comparison
- `rg` for Radius of Gyration and structural compactness comparison
- `contacts` for polymer coverage and contact fraction
- `distances` for custom atom-pair distances
- `catalytic_triad` for active-site geometry
- `secondary_structure` for helix/strand persistence and content
- `hydrogen_bonds` for hydrogen-bond occupancy and lifetime summaries

For end-to-end examples, see:

- [Run RMSD Analysis](analysis_rmsd_quickstart.md)
- [Run Rg Analysis](analysis_rg_quickstart.md)
- [Run RMSF Analysis](analysis_rmsf_quickstart.md)
- [Run Contacts Analysis](analysis_contacts_quickstart.md)
- [Run Distance Analysis](analysis_distances_quickstart.md)
- [Run Catalytic Triad Analysis](analysis_triad_quickstart.md)

Archived experimental analyses are not active v1.3 plugins. See
[Experimental analyses](../reference/experimental_analyses_archive.md) for
historical access details.


## Troubleshooting

### `config` path not found

Paths in `comparison.yaml` are resolved relative to the location of
`comparison.yaml`, not your current shell directory.

### `No analyses are enabled`

You need at least one configured section under `plugins:`.

### `plot-all` runs but expected figures are missing

Check that the corresponding comparison JSON files already exist under
`comparison/<analysis>/result.json` and use
`polyzymd compare plot-all --list-available` to verify the enabled plot types.

## See Also

- [Tutorial: Analyze a Study from Finished Simulations](../tutorials/analysis_complete_workflow.md)
- [Comparison and Plotting Reference](../reference/analysis_comparison_reference.md)
- [Statistical Best Practices for Analysis](../explanation/analysis_statistics_best_practices.md)