# How to Compare Simulation Conditions

Use this guide when you already have completed PolyzyMD simulations and want to
compare conditions with the `polyzymd compare` workflow.

You will:

- define a `comparison.yaml`
- validate it
- run stable comparisons
- generate figures with `polyzymd compare plot-all`

```{important}
For the `v1.2.0` presentation release, the stable comparison stack is RMSF,
contacts, distances, catalytic triad, and secondary structure. Binding
preference, exposure dynamics, binding free energy, and polymer affinity remain
available, but PolyzyMD labels them as experimental.
```

```{note}
If you have not yet run per-condition analyses, start with
[Tutorial: Analyze a Study from Finished Simulations](analysis_complete_workflow.md).
```

## Before You Start

Make sure each condition already has:

- a `config.yaml`
- finished trajectories
- per-condition analysis results from `polyzymd analyze run`

For metric-specific setup, use the individual analysis guides before comparing
conditions.

## Step 1: Create a Comparison Workspace

```bash
polyzymd compare init -n polymer_stability_study
cd polymer_stability_study
```

This creates:

```text
polymer_stability_study/
├── comparison.yaml
├── figures/
├── results/
└── structures/
```

## Step 2: Define a Minimal `comparison.yaml`

Start with one stable analysis. RMSF is a good first comparison because it has
no extra structural dependencies.

```yaml
name: "polymer_stability_study"
description: "Effect of polymer composition on enzyme flexibility"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

analysis_settings:
  rmsf:
    selection: "protein and name CA"

comparison_settings:
  rmsf: {}
```

You can add other stable analyses later by extending `analysis_settings` and
`comparison_settings`.

<!-- IMAGE OPPORTUNITY: Add an annotated `comparison.yaml` graphic showing the
roles of `conditions`, `analysis_settings`, `comparison_settings`, and
`plot_settings`. -->

## Step 3: Validate the Config

```bash
polyzymd compare validate
```

You should see a passing summary with the study name, condition count, and the
configured analysis sections.

## Step 4: Run Comparisons

### Run One Stable Comparison

```bash
polyzymd compare run rmsf
```

This loads the existing per-condition results, performs the statistical
comparison, prints a formatted summary, and writes JSON to `results/`.

### Run All Enabled Comparisons

Once you have multiple stable analyses configured, use:

```bash
polyzymd compare run-all
```

If you want figures immediately afterward:

```bash
polyzymd compare run-all --plot
```

## Step 5: Generate Figures

For a final smoke test of the comparison workspace:

```bash
polyzymd compare plot-all --list-available
polyzymd compare plot-all
```

`--list-available` is useful because it shows which plot types are enabled and
which ones are marked as experimental.

## Step 6: Check the Outputs

After a successful run, expect files like these:

```text
results/
├── rmsf_comparison_polymer_stability_study.json
├── contacts_comparison_polymer_stability_study.json
├── distances_comparison_polymer_stability_study.json
└── triad_comparison_polymer_stability_study.json

figures/
├── rmsf_comparison.png
├── rmsf_profile.png
├── triad_kde_panel.png
└── ...
```

If your smoke test is `polyzymd compare plot-all`, success means:

- the command completes without error
- stable plots render normally
- experimental plots, if enabled, render with explicit experimental labeling

## Adding More Stable Analyses

Common next additions to `comparison.yaml` are:

- `contacts` for polymer coverage and contact fraction
- `distances` for custom atom-pair distances
- `catalytic_triad` for active-site geometry

For end-to-end examples of these workflows, see:

- [Run Contacts Analysis](analysis_contacts_quickstart.md)
- [Run Distance Analysis](analysis_distances_quickstart.md)
- [Run Catalytic Triad Analysis](analysis_triad_quickstart.md)

## Experimental Workflows

Experimental workflows remain available, but they are not the default path for
the presentation release. Use them only when you explicitly want those metrics:

- [Experimental: Analyze Binding Preference](analysis_binding_preference.md)
- [Experimental: Analyze Binding Free Energy](analysis_binding_free_energy.md)
- [Experimental: Analyze Polymer Affinity](analysis_polymer_affinity.md)
- [Experimental: Analyze Exposure Dynamics](analysis_exposure_dynamics.md)

## Troubleshooting

### `config` path not found

Paths in `comparison.yaml` are resolved relative to the location of
`comparison.yaml`, not your current shell directory.

### `No analyses are enabled`

You need at least one section under `analysis_settings` and a matching entry
under `comparison_settings`.

### `plot-all` runs but expected figures are missing

Check that the corresponding comparison JSON files already exist in `results/`
and use `polyzymd compare plot-all --list-available` to verify the enabled plot
types.

## See Also

- [Tutorial: Analyze a Study from Finished Simulations](analysis_complete_workflow.md)
- [Comparison and Plotting Reference](../reference/analysis_comparison_reference.md)
- [Statistical Best Practices for Analysis](analysis_statistics_best_practices.md)

```
RMSF Comparison: polymer_stability_study
============================================================
Equilibration: 10ns
Selection: protein and name CA
Control: No Polymer

Condition Summary (ranked by RMSF, lowest first)
------------------------------------------------------------
Rank  Condition            Mean RMSF    SEM        N
------------------------------------------------------------
1     100% SBMA               0.551 A    0.0344  2
2     100% EGMA               0.597 A    0.0725  3
3     No Polymer              0.715 A    0.0203  3   *
4     50/50 Mix               0.728 A    0.0336  3
------------------------------------------------------------
* = control condition

Pairwise Comparisons
--------------------------------------------------------------------------------
Comparison                     % Change   p-value      Cohen's d  Effect
--------------------------------------------------------------------------------
100% SBMA vs No Polymer        -22.9%     0.0211*      4.06       large
100% EGMA vs No Polymer        -16.4%     0.1944       1.27       large
50/50 Mix vs No Polymer        +1.9%      0.7445       -0.29      small
--------------------------------------------------------------------------------
* p < 0.05
```

### Output Fields Explained

| Field | Meaning |
|-------|---------|
| **Mean RMSF** | Average RMSF across all replicates (Angstroms) |
| **SEM** | Standard error of the mean |
| **N** | Number of replicates |
| **% Change** | Relative to control (negative = more stable) |
| **p-value** | Two-sample t-test p-value |
| **Cohen's d** | Effect size (positive = control higher) |
| **Effect** | Effect size interpretation |

### Ranking

Conditions are ranked by mean RMSF:
- **Rank 1** = Lowest RMSF = Most stable
- Lower RMSF indicates less flexibility/movement

## Statistical Analysis

### T-Tests

Each condition is compared to the control (or all pairs if no control) using
an independent two-sample t-test:

- **p < 0.05**: Statistically significant difference (marked with *)
- **p > 0.05**: Not statistically significant

```{tip}
With small sample sizes (N=3), you can only detect large effects.
Non-significant p-values don't mean "no effect" -- just insufficient evidence.
See [Best Practices Guide](analysis_rmsf_best_practices.md#comparing-conditions)
for interpretation guidelines.
```

### Effect Size (Cohen's d)

Cohen's d quantifies the magnitude of the difference:

| Cohen's d | Interpretation |
|-----------|---------------|
| < 0.2 | Negligible |
| 0.2 - 0.5 | Small |
| 0.5 - 0.8 | Medium |
| > 0.8 | Large |

**Why report effect size?** A large effect (d > 0.8) suggests a meaningful
difference even if p > 0.05 due to small sample size.

### ANOVA

For 3+ conditions, one-way ANOVA tests whether **any** condition differs:

```
One-way ANOVA
----------------------------------------
F-statistic: 3.151
p-value:     0.0955
Significant: No (alpha=0.05)
```

ANOVA p < 0.05 indicates at least one condition differs from the others.
Use pairwise comparisons to identify which ones.

## Output Formats

### Table (Default)

Console-friendly ASCII table:

```bash
polyzymd compare rmsf --format table
```

### Markdown

Publication-ready tables for documentation:

```bash
polyzymd compare rmsf --format markdown -o report.md
```

### JSON

Machine-readable for further analysis:

```bash
polyzymd compare rmsf --format json -o results.json
```

## Working Example

### Comparing Polymer Stabilization

Setup for a study comparing enzyme stability with different polymer coatings:

`````{tab-set}
````{tab-item} YAML + CLI (Recommended)
**comparison.yaml:**

```yaml
name: "SBMA_EGMA_stabilization"
description: "Does SBMA stabilize LipA better than EGMA?"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_LipA_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_DMSO/config.yaml"
    replicates: [1, 2]

  - label: "100% EGMA"
    config: "../EGMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "50/50 SBMA:EGMA"
    config: "../SBMA_EGMA_50_50_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

analysis_settings:
  rmsf:
    selection: "protein and name CA"

comparison_settings:
  rmsf: {}
```

**Run:**

```bash
polyzymd compare rmsf
```
````

````{tab-item} Python
```python
from polyzymd.compare import ComparisonConfig, RMSFComparator

# Load comparison configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get RMSF settings from analysis_settings
rmsf_settings = config.analysis_settings.get("rmsf")

# Run RMSF comparison
comparator = RMSFComparator(
    config=config,
    rmsf_settings=rmsf_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Print summary table
print(f"RMSF Comparison: {result.name}")
print(f"Control: {result.control_label}")
print()

# Ranked conditions
print("Conditions (ranked by RMSF, lowest first):")
for i, label in enumerate(result.ranking, 1):
    cond = result.get_condition(label)
    marker = " *" if label == result.control_label else ""
    print(f"  {i}. {label}: {cond.mean_rmsf:.3f} ± {cond.sem_rmsf:.3f} Å{marker}")

print()

# Pairwise comparisons
print("Pairwise comparisons vs control:")
for comp in result.pairwise_comparisons:
    sig = "*" if comp.significant else ""
    print(f"  {comp.condition_b} vs {comp.condition_a}: "
          f"{comp.percent_change:+.1f}%, p={comp.p_value:.3f}{sig}, d={comp.cohens_d:.2f}")

# Save results
result.save("results/rmsf_comparison.json")
```
````
`````

**Key findings from output:**

| Condition | Mean RMSF | vs Control | Significant? |
|-----------|-----------|------------|--------------|
| 100% SBMA | 0.551 A | -22.9% | Yes (p=0.021) |
| 100% EGMA | 0.597 A | -16.4% | No (p=0.194) |
| 50/50 Mix | 0.728 A | +1.9% | No (p=0.745) |

**Interpretation:**
- 100% SBMA significantly stabilizes the enzyme (22.9% lower RMSF, p < 0.05)
- 100% EGMA shows a large effect (d=1.27) but isn't statistically significant
- 50/50 Mix shows no benefit over the control

## Saved Results

Results are automatically saved to `results/`:

```
my_study/
├── comparison.yaml
├── results/
│   └── rmsf_comparison_my_study.json  # Full results
└── figures/
```

### Result JSON Structure

```python
{
    "metric": "rmsf",
    "name": "my_study",
    "control_label": "No Polymer",
    "conditions": [
        {
            "label": "No Polymer",
            "mean_rmsf": 0.715,
            "sem_rmsf": 0.020,
            "n_replicates": 3,
            "replicate_values": [0.755, 0.693, 0.696]
        },
        ...
    ],
    "pairwise_comparisons": [
        {
            "condition_a": "No Polymer",
            "condition_b": "100% SBMA",
            "percent_change": -22.9,
            "p_value": 0.0211,
            "cohens_d": 4.06,
            "significant": true
        },
        ...
    ],
    "ranking": ["100% SBMA", "100% EGMA", "No Polymer", "50/50 Mix"],
    ...
}
```

### Viewing Saved Results

```bash
# Display a saved result
polyzymd compare show results/rmsf_comparison_my_study.json

# Different format
polyzymd compare show results/rmsf_comparison_my_study.json --format markdown
```

## CLI Reference

### polyzymd compare init

Create a new comparison project:

```bash
polyzymd compare init NAME [OPTIONS]

Arguments:
  NAME                   Project name (creates directory)

Options:
  --eq-time TEXT         Default equilibration time [default: 10ns]
  -o, --output-dir PATH  Parent directory [default: current]
```

### polyzymd compare rmsf

Run RMSF comparison:

```bash
polyzymd compare rmsf [OPTIONS]

Options:
  -f, --file PATH                 Config file [default: comparison.yaml]
  --eq-time TEXT                  Override equilibration time
  --selection TEXT                Override atom selection
  --recompute                     Force recompute RMSF
  --format [table|markdown|json]  Output format [default: table]
  -o, --output PATH               Save formatted output to file
  -q, --quiet                     Suppress INFO messages
  --debug                         Enable DEBUG logging
```

### polyzymd compare validate

Check configuration:

```bash
polyzymd compare validate [OPTIONS]

Options:
  -f, --file PATH    Config file [default: comparison.yaml]
```

### polyzymd compare show

Display saved results:

```bash
polyzymd compare show RESULT_FILE [OPTIONS]

Arguments:
  RESULT_FILE                     Path to saved JSON

Options:
  --format [table|markdown|json]  Output format [default: table]
```

### polyzymd compare plot

Generate publication-ready plots from saved results:

```bash
polyzymd compare plot RESULT_FILE [OPTIONS]

Arguments:
  RESULT_FILE                     Path to saved comparison JSON

Options:
  -o, --output-dir PATH           Output directory [default: figures/]
  --format [png|pdf|svg]          Image format [default: png]
  --dpi INTEGER                   Resolution for PNG [default: 150]
  --summary / --no-summary        Generate summary panel [default: yes]
  --show / --no-show              Display interactively [default: no]
```

## Generating Plots

The `polyzymd compare plot` command creates publication-ready figures from
comparison results.

### Quick Start

`````{tab-set}
````{tab-item} CLI (Recommended)
```bash
# Generate all plots
polyzymd compare plot results/rmsf_comparison_my_study.json

# High resolution for publication
polyzymd compare plot results/rmsf_comparison_my_study.json --dpi 300

# PDF format (vector graphics)
polyzymd compare plot results/rmsf_comparison_my_study.json --format pdf

# Preview interactively
polyzymd compare plot results/rmsf_comparison_my_study.json --show
```
````

````{tab-item} Python
```python
from pathlib import Path
from polyzymd.compare.config import ComparisonConfig
from polyzymd.compare.plotter import ComparisonPlotter

# Load comparison config
config = ComparisonConfig.from_yaml("comparison.yaml")

# Generate all plots via the plotter registry
plotter = ComparisonPlotter(config)
paths = plotter.plot_all()
print(f"Generated {len(paths)} plots")
```
````
`````

### Generated Plots

The plotter registry automatically generates all applicable plots for each
analysis type configured in `comparison.yaml`. For RMSF analysis, this includes:

| Plot | Description |
|------|-------------|
| `rmsf_comparison.png` | Bar chart of mean RMSF by condition with SEM error bars |
| `rmsf_profile.png` | Per-residue RMSF line plot with optional SS annotation |

Other analysis types (contacts, distances, secondary structure, etc.) generate
their own plot sets. See the `plot_settings` section in `comparison.yaml` for
per-analysis customization options.

### Example Output

After running:

```bash
polyzymd compare plot-all
```

You get all configured plots saved to the `output_dir` specified in
`plot_settings` (default: `figures/`).

## Python API

### Programmatic Comparison

```python
from polyzymd.compare import ComparisonConfig, RMSFComparator

# Load configuration (must have analysis_settings.rmsf section)
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get RMSF settings
rmsf_settings = config.analysis_settings.get("rmsf")

# Run comparison
comparator = RMSFComparator(
    config=config,
    rmsf_settings=rmsf_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access results
print(f"Most stable: {result.ranking[0]}")
for cond in result.conditions:
    print(f"{cond.label}: {cond.mean_rmsf:.3f} +/- {cond.sem_rmsf:.3f} A")

# Statistical comparisons
for comp in result.pairwise_comparisons:
    if comp.significant:
        print(f"{comp.condition_b} vs {comp.condition_a}: "
              f"p={comp.p_value:.4f}, d={comp.cohens_d:.2f}")
```

### Formatting Results

```python
from polyzymd.compare import format_markdown, format_console_table

# Get markdown output
md_text = format_markdown(result)
with open("report.md", "w") as f:
    f.write(md_text)

# Console output
print(format_console_table(result))
```

### Loading Saved Results

```python
from polyzymd.compare import ComparisonResult

# Load from JSON
result = ComparisonResult.load("results/rmsf_comparison_my_study.json")

# Access data
control = result.get_condition("No Polymer")
print(f"Control RMSF: {control.mean_rmsf:.3f} A")

# Get specific comparison
comp = result.get_comparison("100% SBMA")
print(f"SBMA vs control: {comp.percent_change:+.1f}%, p={comp.p_value:.4f}")
```

### Plotting with Python

```python
from pathlib import Path
from polyzymd.compare.config import ComparisonConfig
from polyzymd.compare.plotter import ComparisonPlotter

# Load config and generate all plots
config = ComparisonConfig.from_yaml("comparison.yaml")
plotter = ComparisonPlotter(config)
paths = plotter.plot_all()

for p in paths:
    print(f"  {p}")
```

### Plot Customization

Plot appearance is controlled via the `plot_settings` section of
`comparison.yaml`. You can set global options (DPI, format, color palette)
and per-analysis overrides (figure sizes, which plot types to generate).

```yaml
plot_settings:
  output_dir: "figures/"
  format: "png"
  dpi: 300
  style: "publication"
  color_palette: "tab10"

  # Per-analysis overrides
  rmsf:
    show_error: true
    highlight_residues: [77, 133, 156]
    figsize_profile: [14, 4]
    figsize_comparison: [8, 6]
```

See the `theme:` block in `comparison.yaml` for fine-grained control over
font sizes, bar styling, line widths, spine visibility, and legend placement.

## Troubleshooting

### "Config not found for 'Condition'"

**Cause:** Path in `config:` is incorrect

**Fix:** Check that the path exists relative to comparison.yaml:

```bash
ls ../projects/my_condition/config.yaml
```

### "Need at least 2 conditions"

**Cause:** Only one condition defined in comparison.yaml

**Fix:** Add at least one more condition to compare

### High p-value Despite Large Effect

**Cause:** Small sample size (N=2-3)

**This is expected.** With few replicates, you can only detect very large
effects. The large Cohen's d suggests a real difference exists; run more
replicates to achieve statistical significance.

### Results Don't Match Manual Calculation

**Cause:** Different equilibration time or selection string

**Fix:** Ensure `--eq-time` and `--selection` match what you used for
individual RMSF calculations. Check `defaults:` in comparison.yaml.

## Comparing Catalytic Triad Geometry

In addition to RMSF (global flexibility), you can compare **catalytic triad integrity**
across conditions. This is useful for enzymes where active site geometry is crucial
for catalytic function.

### What is Simultaneous Contact Fraction?

The catalytic triad comparison analyzes the **simultaneous contact fraction** -- the
percentage of simulation frames where ALL distance pairs in your triad are below
the contact threshold at the same time. Higher values indicate better triad integrity.

For example, a Ser-His-Asp catalytic triad:
- 95% simultaneous contact = triad is intact most of the time
- 50% simultaneous contact = triad frequently disrupted
- 10% simultaneous contact = triad rarely intact

### Adding Catalytic Triad to comparison.yaml

Add a `catalytic_triad` section to your `analysis_settings`:

```yaml
name: "polymer_stability_study"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_LipA_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

# Define your enzyme's catalytic triad in analysis_settings
analysis_settings:
  catalytic_triad:
    name: "LipA_Ser-His-Asp"
    description: "Lipase A catalytic triad"
    threshold: 3.5  # Angstroms (H-bond cutoff)
    pairs:
      - label: "Asp133-His156"
        selection_a: "midpoint(protein and resid 133 and name OD1 OD2)"
        selection_b: "protein and resid 156 and name ND1"
      - label: "His156-Ser77"
        selection_a: "protein and resid 156 and name NE2"
        selection_b: "protein and resid 77 and name OG"

# Must have corresponding entry in comparison_settings
comparison_settings:
  catalytic_triad: {}
```

### Running Triad Comparison

`````{tab-set}
````{tab-item} CLI (Recommended)
```bash
# From your comparison project directory
polyzymd compare triad

# With options
polyzymd compare triad --eq-time 10ns --format markdown
polyzymd compare triad -o triad_report.md
```
````

````{tab-item} Python
```python
from polyzymd.compare import ComparisonConfig, TriadComparator

# Load configuration (must have analysis_settings.catalytic_triad section)
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get triad settings from analysis_settings
triad_settings = config.analysis_settings.get("catalytic_triad")

# Run triad comparison
comparator = TriadComparator(
    config=config,
    triad_settings=triad_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Print results
print(f"Best triad integrity: {result.ranking[0]}")
for cond in result.conditions:
    contact_pct = cond.mean_simultaneous_contact * 100
    sem_pct = cond.sem_simultaneous_contact * 100
    print(f"{cond.label}: {contact_pct:.1f}% ± {sem_pct:.1f}%")

# Save to JSON
result.save("results/triad_comparison.json")
```
````
`````

### Example Output

```
Catalytic Triad Comparison: polymer_stability_study
======================================================================
Triad: LipA_Ser-His-Asp
Description: Lipase A catalytic triad
Pairs: Asp133-His156, His156-Ser77
Contact threshold: 3.5 A
Equilibration: 10ns
Control: No Polymer

Condition Summary (ranked by simultaneous contact, highest first)
----------------------------------------------------------------------
Rank  Condition            Contact %    SEM        N
----------------------------------------------------------------------
1     100% SBMA               87.3%      2.15%  3
2     No Polymer              72.1%      3.42%  3   *
3     50/50 Mix               68.5%      4.21%  3
----------------------------------------------------------------------
* = control condition

Pairwise Comparisons
-------------------------------------------------------------------------------------
Comparison                     % Change   p-value      Cohen's d  Effect
-------------------------------------------------------------------------------------
100% SBMA vs No Polymer        +21.1%     0.0156*      2.89       large
50/50 Mix vs No Polymer        -5.0%      0.5234       -0.52      medium
-------------------------------------------------------------------------------------
* p < 0.05
Positive % change = improved triad contact

Interpretation
----------------------------------------------------------------------
Best triad integrity: 100% SBMA (87.3% simultaneous contact)
  -> 21.1% higher than control (No Polymer)
  -> Statistically significant (p=0.0156, d=2.89 [large])
```

### Per-Pair Distance Table

The output also includes a per-pair distance summary showing how each
individual H-bond distance compares across conditions:

```
Per-Pair Distances (Mean ± SEM across replicates)
------------------------------------------------------------------------------------------
Condition            Asp133-His156    His156-Ser77
------------------------------------------------------------------------------------------
100% SBMA            2.81±0.08        2.74±0.05
No Polymer           3.12±0.11        2.89±0.09
50/50 Mix            3.28±0.15        2.95±0.12
------------------------------------------------------------------------------------------
```

### Interpreting Results

| Contact % | Interpretation |
|-----------|---------------|
| > 90% | Excellent triad integrity |
| 70-90% | Good triad integrity |
| 50-70% | Moderate disruption |
| < 50% | Significant triad disruption |

**Key metrics:**
- **% Change**: Positive = more triad contact = better
- **p-value**: < 0.05 indicates statistically significant difference
- **Cohen's d**: Effect size magnitude

### CLI Reference for Triad

```bash
polyzymd compare triad [OPTIONS]

Options:
  -f, --file PATH                 Config file [default: comparison.yaml]
  --eq-time TEXT                  Override equilibration time
  --recompute                     Force recompute triad analysis
  --format [table|markdown|json]  Output format [default: table]
  -o, --output PATH               Save formatted output to file
  -q, --quiet                     Suppress INFO messages
  --debug                         Enable DEBUG logging
```

### Python API for Triad Comparison

```python
from polyzymd.compare import ComparisonConfig, TriadComparator, format_triad_result

# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get triad settings from analysis_settings
triad_settings = config.analysis_settings.get("catalytic_triad")

# Run triad comparison
comparator = TriadComparator(
    config=config,
    triad_settings=triad_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access results
print(f"Best triad integrity: {result.ranking[0]}")
for cond in result.conditions:
    contact_pct = cond.mean_simultaneous_contact * 100
    print(f"{cond.label}: {contact_pct:.1f}% ± {cond.sem_simultaneous_contact*100:.1f}%")

# Format output
print(format_triad_result(result, format="markdown"))

# Save result
result.save("results/triad_comparison.json")
```

### Loading Saved Triad Results

```python
from polyzymd.compare import TriadComparisonResult

# Load from JSON
result = TriadComparisonResult.load("results/triad_comparison_my_study.json")

# Access condition data
control = result.get_condition("No Polymer")
print(f"Control contact: {control.mean_simultaneous_contact * 100:.1f}%")

# Get pairwise comparison
comp = result.get_comparison("100% SBMA")
if comp and comp.significant:
    print(f"SBMA significantly improves triad contact (p={comp.p_value:.4f})")
```

## Comparing Polymer-Protein Contacts

Compare **polymer-protein contact statistics** across conditions to understand how
different polymer compositions affect protein-polymer interactions.

```{note}
**New to contacts analysis?** Start with the [Contacts Quick Start](analysis_contacts_quickstart.md)
to run individual analyses, then return here to compare conditions.
```

### Key Metrics

The contacts comparison analyzes two aggregate metrics:

| Metric | Description | Higher means... |
|--------|-------------|-----------------|
| **Coverage** | % of protein residues contacted by polymer | More extensive binding |
| **Mean Contact Fraction** | Average % of frames each residue is in contact | Stronger/more persistent binding |

Additionally, **residence times by polymer type** show how long each polymer type
(e.g., SBMA vs EGMA) maintains contacts, revealing selectivity differences.

### Running Contacts Comparison

`````{tab-set}
````{tab-item} CLI (Recommended)
```bash
# Basic comparison
polyzymd compare contacts

# With custom equilibration time
polyzymd compare contacts --eq-time 10ns

# Override polymer selection (only SBMA monomers)
polyzymd compare contacts --polymer-selection "resname SBM"

# Different output formats
polyzymd compare contacts --format markdown -o contacts_report.md
polyzymd compare contacts --format json
```
````

````{tab-item} Python
```python
from polyzymd.compare import ComparisonConfig, ContactsComparator

# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get contacts settings from analysis_settings and comparison_settings
analysis_settings = config.analysis_settings.get("contacts")
comparison_settings = config.comparison_settings.get("contacts")

# Run comparison
comparator = ContactsComparator(
    config=config,
    analysis_settings=analysis_settings,
    comparison_settings=comparison_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access results
print(f"Highest coverage: {result.ranking_by_coverage[0]}")
print(f"Highest contact: {result.ranking_by_contact_fraction[0]}")

for cond in result.conditions:
    print(f"{cond.label}: {cond.coverage_mean*100:.1f}% coverage, "
          f"{cond.contact_fraction_mean*100:.1f}% contact")

# Save result
result.save("results/contacts_comparison.json")
```
````
`````

### Optional: Contacts Configuration in comparison.yaml

Add a `contacts` section to `analysis_settings` and `comparison_settings`:

```yaml
name: "polymer_stability_study"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_LipA_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

# Configure contacts analysis in analysis_settings
analysis_settings:
  contacts:
    polymer_selection: "resname SBM EGM"  # MDAnalysis selection
    protein_selection: "protein"
    cutoff: 4.5                            # Angstroms
    contact_criteria: "heavy_atom"

# Comparison-specific parameters in comparison_settings
comparison_settings:
  contacts:
    fdr_alpha: 0.05                        # Benjamini-Hochberg FDR
    min_effect_size: 0.5                   # Cohen's d threshold
    top_residues: 10                       # Top residues to show
```

If no `contacts` section is provided, defaults are used.

### Handling Conditions Without Polymer

Conditions without polymer atoms (e.g., "No Polymer" controls) are **automatically
excluded** from contacts analysis since there's nothing to measure. The command
will warn you:

```
Note: 1 condition(s) auto-excluded (no polymer atoms): No Polymer
```

This is expected behavior. Comparisons will be made between polymer-containing
conditions only.

### Example Output

```
Polymer-Protein Contacts Comparison: polymer_ratio_study
================================================================================
Analysis: polymer_protein_contacts
Polymer selection: resname SBM EGM
Contact cutoff: 4.5 A
Contact criteria: heavy_atom
Equilibration: 10ns
Auto-excluded (no polymer): No Polymer

Condition Summary - Coverage (ranked, highest first)
--------------------------------------------------------------------------------
Rank  Condition                 Coverage     SEM        N   
--------------------------------------------------------------------------------
1     100% EGMA                     88.4%       0.55%  3    
2     25% SBMA / 75% EGMA           86.9%       1.44%  3    
3     50% SBMA / 50% EGMA           82.7%       0.66%  3    
4     75% SBMA / 25% EGMA           82.7%       1.57%  3    
5     100% SBMA                     74.9%       0.28%  2    
--------------------------------------------------------------------------------

Condition Summary - Mean Contact Fraction (ranked, highest first)
--------------------------------------------------------------------------------
Rank  Condition                 Contact %    SEM        N   
--------------------------------------------------------------------------------
1     75% SBMA / 25% EGMA           30.2%       0.50%  3    
2     25% SBMA / 75% EGMA           29.1%       5.35%  3    
3     100% EGMA                     25.3%       2.64%  3    
4     100% SBMA                     22.9%       1.47%  2    
5     50% SBMA / 50% EGMA           22.9%       2.18%  3    
--------------------------------------------------------------------------------

Residence Time by Polymer Type (frames)
--------------------------------------------------------------------------------
Condition                          EGM          SBM
--------------------------------------------------------------------------------
100% SBMA                           --  10.0±0.2 
75% SBMA / 25% EGMA         7.8±0.2    9.3±0.0 
50% SBMA / 50% EGMA         7.3±0.3    8.6±0.5 
25% SBMA / 75% EGMA         7.1±0.7   10.5±0.4 
100% EGMA                   7.1±0.4            --
--------------------------------------------------------------------------------

Aggregate Comparisons
-----------------------------------------------------------------------------------------------
Comparison                     Metric          % Change   p-value      Cohen d    Effect      
-----------------------------------------------------------------------------------------------
100% EGMA vs 100% SBMA         coverage        +18.1%     0.0004*      -16.64     large       
100% EGMA vs 100% SBMA         mean contact f  +10.7%     0.5445       -0.62      medium      
...
-----------------------------------------------------------------------------------------------
* p < 0.05; positive % change = more contact in treatment

One-way ANOVA
------------------------------------------------------------
Metric                    F-stat       p-value      Significant 
------------------------------------------------------------
coverage                  18.323       0.0002       Yes*        
mean contact fraction     1.200        0.3748       No          
------------------------------------------------------------
```

### Interpreting Results

**Coverage rankings:**
- Higher coverage = polymer interacts with more of the protein surface
- 100% EGMA shows highest coverage (88.4%) - broader but possibly weaker binding

**Contact fraction rankings:**
- Higher mean contact = more persistent interactions per residue
- 75% SBMA / 25% EGMA shows highest contact fraction (30.2%) - more stable binding

**Residence time by polymer type:**
- SBMA (SBM) tends to have longer residence times than EGMA (EGM)
- This suggests SBMA forms more persistent interactions
- Useful for understanding polymer selectivity

**Statistical tests:**
- ANOVA tests whether any condition differs overall
- Pairwise comparisons with Benjamini-Hochberg FDR correction
- Cohen's d quantifies effect magnitude independent of sample size

### CLI Reference for Contacts

```bash
polyzymd compare contacts [OPTIONS]

Options:
  -f, --file PATH                 Config file [default: comparison.yaml]
  --eq-time TEXT                  Override equilibration time
  --polymer-selection TEXT        Override polymer selection (MDAnalysis syntax)
  --cutoff FLOAT                  Override contact cutoff (Angstroms)
  --fdr-alpha FLOAT               FDR alpha for multiple testing correction
  --recompute                     Force recompute contacts analysis
  --format [table|markdown|json]  Output format [default: table]
  -o, --output PATH               Save formatted output to file
  -q, --quiet                     Suppress INFO messages
  --debug                         Enable DEBUG logging
```

### Python API for Contacts Comparison

```python
from polyzymd.compare import (
    ComparisonConfig,
    ContactsComparator,
    format_contacts_result,
)

# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get contacts settings from analysis_settings and comparison_settings
analysis_settings = config.analysis_settings.get("contacts")
comparison_settings = config.comparison_settings.get("contacts")

# Run comparison
comparator = ContactsComparator(
    config=config,
    analysis_settings=analysis_settings,
    comparison_settings=comparison_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access results
print(f"Highest coverage: {result.ranking_by_coverage[0]}")
print(f"Highest contact: {result.ranking_by_contact_fraction[0]}")

for cond in result.conditions:
    print(f"{cond.label}: {cond.coverage_mean*100:.1f}% coverage, "
          f"{cond.contact_fraction_mean*100:.1f}% contact")
    
    # Residence time by polymer type
    for poly_type, (mean, sem) in cond.residence_time_by_polymer_type.items():
        print(f"  {poly_type}: {mean:.1f} ± {sem:.1f} frames")

# Format output
print(format_contacts_result(result, format="markdown"))

# Save result
result.save("results/contacts_comparison.json")
```

### Loading Saved Contacts Results

```python
from polyzymd.compare import ContactsComparisonResult

# Load from JSON
result = ContactsComparisonResult.load("results/contacts_comparison_my_study.json")

# Access condition data
for cond in result.conditions:
    print(f"{cond.label}: coverage={cond.coverage_mean*100:.1f}%")

# Get aggregate comparisons
for comp in result.aggregate_comparisons:
    if comp.significant:
        print(f"{comp.condition_a} vs {comp.condition_b} ({comp.metric}): "
              f"p={comp.p_value:.4f}")
```

### Contacts vs RMSF: Complementary Analyses

| Analysis | Question Answered |
|----------|-------------------|
| **RMSF** | Does polymer stabilize the enzyme (reduce flexibility)? |
| **Contacts** | Where and how strongly does polymer bind? |
| **Combined** | Do contact hotspots correlate with stabilization? |

For mechanistic insights correlating contacts with flexibility changes, see
`polyzymd compare report` (coming soon).

## Comparing Distances Across Conditions

Compare **inter-atomic distances** across conditions with statistical analysis.
This is useful for tracking specific interactions (e.g., substrate proximity,
hydrogen bond distances) that may change with different polymer environments.

```{note}
**New to distance analysis?** Start with the [Distance Analysis Quick Start](analysis_distances_quickstart.md)
to understand distance pair definitions and selection syntax, then return here
to compare conditions.
```

### Key Metrics

The distances comparison provides **dual-metric ranking**:

| Metric | Description | Ranking |
|--------|-------------|---------|
| **Mean Distance** | Average distance across trajectory (primary) | Lowest first (closer = better) |
| **Fraction Below Threshold** | % of frames below contact threshold (secondary) | Highest first (more contact = better) |

This dual approach captures both the typical distance AND the frequency of close contacts.

### Running Distances Comparison

`````{tab-set}
````{tab-item} CLI (Recommended)
```bash
# Basic comparison
polyzymd compare run distances -f comparison.yaml

# With custom equilibration time
polyzymd compare run distances -f comparison.yaml --eq-time 10ns

# Different output formats
polyzymd compare run distances -f comparison.yaml --format markdown
polyzymd compare run distances -f comparison.yaml --format json -o distances_report.json
```
````

````{tab-item} Python
```python
from polyzymd.compare import ComparisonConfig
from polyzymd.compare.comparators import DistancesComparator

# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get distances settings from analysis_settings
distances_settings = config.analysis_settings.get("distances")

# Run comparison
comparator = DistancesComparator(
    config=config,
    analysis_settings=distances_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access results
print(f"Closest mean distance: {result.ranking[0]}")
if result.ranking_by_fraction:
    print(f"Highest contact fraction: {result.ranking_by_fraction[0]}")

for cond in result.conditions:
    print(f"{cond.label}: {cond.overall_mean_distance:.2f} A")
    if cond.overall_fraction_below is not None:
        print(f"  Contact fraction: {cond.overall_fraction_below*100:.1f}%")

# Save result
result.save("results/distances_comparison.json")
```
````
`````

### Adding Distances to comparison.yaml

Add a `distances` section to your `analysis_settings`:

```yaml
name: "substrate_proximity_study"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_LipA_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

# Define distance pairs in analysis_settings
analysis_settings:
  distances:
    threshold: 3.5  # Global default threshold (Angstroms, optional)
    pairs:
      - label: "Catalytic H-bond"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 133 and name NE2"
        threshold: 3.5  # Per-pair threshold (overrides global)
      - label: "Lid Domain Opening"
        selection_a: "com(protein and resid 141-148)"
        selection_b: "com(protein and resid 281-289)"
        threshold: 15.0  # Different threshold for this pair

# Must have corresponding entry in comparison_settings
comparison_settings:
  distances: {}
```

### Per-Pair Thresholds

Different distance pairs often have different biologically relevant thresholds:

| Type of Distance | Typical Threshold |
|------------------|-------------------|
| Hydrogen bond | 3.0 - 3.5 A |
| Salt bridge | 4.0 - 4.5 A |
| Aromatic stacking | 4.0 - 5.0 A |
| Domain separation | 10 - 20 A |
| Lid opening | 15 - 25 A |

**Threshold resolution order:**
1. Per-pair `threshold` in the pair definition (highest priority)
2. Global `threshold` in the `distances` section (fallback)
3. No threshold (fraction below not computed)

**Example with mixed thresholds:**

```yaml
analysis_settings:
  distances:
    threshold: 4.0  # Default for pairs without explicit threshold
    pairs:
      - label: "Ser77-His133 H-bond"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 133 and name NE2"
        threshold: 3.5  # H-bond cutoff

      - label: "Asp156-His133 H-bond"
        selection_a: "midpoint(protein and resid 156 and name OD1 OD2)"
        selection_b: "protein and resid 133 and name ND1"
        # Uses global threshold: 4.0 A

      - label: "Lid-to-Core Distance"
        selection_a: "com(protein and resid 141-148)"
        selection_b: "com(protein and resid 281-289)"
        threshold: 15.0  # Large-scale motion threshold
```

**Threshold cache invalidation:** When you change a threshold value, PolyzyMD
automatically detects the mismatch and recomputes contact fractions from the
stored per-replicate distance data. This avoids expensive trajectory
reprocessing - only the statistical aggregation is recalculated.

### Example Output

```
Distance Comparison: substrate_proximity_study
================================================================================
Pairs analyzed: 2
Pair labels: Ser77-Substrate, His156-Substrate
Contact threshold: 3.5 A
Equilibration: 10ns
Control: No Polymer

Condition Summary (ranked by mean distance, lowest first)
--------------------------------------------------------------------------------
Rank  Condition                 Mean Dist    SEM        % Below    N   
--------------------------------------------------------------------------------
1     100% SBMA                 7.46 A       0.421         0.4%   3    
2     100% EGMA                 7.66 A       0.270         1.6%   3    
3     No Polymer                8.02 A       0.315         0.0%   3   *
--------------------------------------------------------------------------------
* = control condition

Secondary Ranking (by % below threshold, highest first)
------------------------------------------------------------
1     100% EGMA                 1.6% (SEM: 0.31%)
2     100% SBMA                 0.4% (SEM: 0.31%)
3     No Polymer                0.0% (SEM: 0.00%)

Per-Pair Distances (Mean +/- SEM across replicates)
------------------------------------------------------------------------------------------
Condition                 Ser77-Substrate His156-Substrate
------------------------------------------------------------------------------------------
100% SBMA                 7.46+/-0.42     6.12+/-0.25    
100% EGMA                 7.66+/-0.27     6.45+/-0.18    
No Polymer                8.02+/-0.31     6.89+/-0.22    
------------------------------------------------------------------------------------------

Pairwise Comparisons (Distance Metric)
------------------------------------------------------------------------------------------
Comparison                     % Change   p-value      Cohen d    Effect       Direction 
------------------------------------------------------------------------------------------
100% SBMA vs No Polymer        -7.0%      0.0451*      0.87       large        closer    
100% EGMA vs No Polymer        -4.4%      0.1234       0.70       medium       closer    
------------------------------------------------------------------------------------------
* p < 0.05
Negative % change = lower distance (closer)

Pairwise Comparisons (Fraction Below Threshold)
------------------------------------------------------------------------------------------
Comparison                     % Change   p-value      Cohen d    Effect       Direction   
------------------------------------------------------------------------------------------
100% SBMA vs No Polymer        +0.4%      0.2161       -1.20      large        more_contact
100% EGMA vs No Polymer        +1.6%      0.0065*      -4.25      large        more_contact
------------------------------------------------------------------------------------------
* p < 0.05
Positive % change = more frames below threshold (more contact)

One-way ANOVA
--------------------------------------------------
Distance metric:
  F-statistic: 3.901
  p-value:     0.0512
  Significant: No (alpha=0.05)
Fraction metric:
  F-statistic: 5.880
  p-value:     0.0171
  Significant: Yes (alpha=0.05)

Interpretation
--------------------------------------------------------------------------------
Closest mean distance: 100% SBMA (7.46 A)
  -> 7.0% closer than control (No Polymer)
  -> Statistically significant (p=0.0451, d=0.87 [large])

Highest contact fraction: 100% EGMA (1.6% below threshold)

Analysis completed: 2026-02-16 21:03:40
PolyzyMD version: 1.0.0
```

### Interpreting Results

**Primary metric (Mean Distance):**
- Lower distance = closer = atoms more frequently in proximity
- Ranking from lowest to highest (Rank 1 = closest)
- Negative % change vs control = improvement (closer)

**Secondary metric (Fraction Below Threshold):**
- Only computed if `threshold` is specified in config
- Higher fraction = more frames with close contact
- Ranking from highest to lowest (Rank 1 = most contact)
- Positive % change vs control = improvement (more contact)

**Why dual metrics?**
- Mean distance captures typical behavior
- Fraction below threshold captures extreme events (e.g., catalytic encounters)
- A condition might have similar mean distance but more frequent close approaches

### Statistical Analysis

Both metrics undergo independent statistical testing:

| Test | Applied To | Interpretation |
|------|-----------|----------------|
| **t-test** | Each condition vs control | p < 0.05 = significant difference |
| **Cohen's d** | Each comparison | Effect magnitude (regardless of p-value) |
| **ANOVA** | All conditions | Any condition differs? (3+ conditions) |

**Effect size interpretation:**

| Cohen's d | Interpretation |
|-----------|---------------|
| < 0.2 | Negligible |
| 0.2 - 0.5 | Small |
| 0.5 - 0.8 | Medium |
| > 0.8 | Large |

### CLI Reference for Distances

```bash
polyzymd compare run distances [OPTIONS]

Options:
  -f, --file PATH                 Config file [default: comparison.yaml]
  --eq-time TEXT                  Override equilibration time
  --recompute                     Force recompute distance analysis
  --format [table|markdown|json]  Output format [default: table]
  -o, --output PATH               Save formatted output to file
  -q, --quiet                     Suppress INFO messages
  --debug                         Enable DEBUG logging
```

### Python API for Distances Comparison

```python
from polyzymd.compare import ComparisonConfig
from polyzymd.compare.comparators import DistancesComparator
from polyzymd.compare.distances_formatters import format_distances_result

# Load configuration
config = ComparisonConfig.from_yaml("comparison.yaml")

# Get distances settings
distances_settings = config.analysis_settings.get("distances")

# Run comparison
comparator = DistancesComparator(
    config=config,
    analysis_settings=distances_settings,
    equilibration="10ns",
)
result = comparator.compare()

# Access primary ranking (by mean distance)
print(f"Closest: {result.ranking[0]}")
for cond in result.conditions:
    print(f"{cond.label}: {cond.overall_mean_distance:.2f} ± {cond.overall_sem_distance:.3f} A")

# Access secondary ranking (by fraction below threshold)
if result.ranking_by_fraction:
    print(f"\nHighest contact: {result.ranking_by_fraction[0]}")
    for cond in result.conditions:
        if cond.overall_fraction_below is not None:
            print(f"{cond.label}: {cond.overall_fraction_below*100:.1f}%")

# Access per-pair details
for cond in result.conditions:
    print(f"\n{cond.label}:")
    for pair in cond.pair_summaries:
        print(f"  {pair.label}: {pair.mean_distance:.2f} ± {pair.sem_distance:.2f} A")

# Format output
print(format_distances_result(result, format="markdown"))

# Save result
result.save("results/distances_comparison.json")
```

### Loading Saved Distance Results

```python
from polyzymd.compare.results import DistanceComparisonResult

# Load from JSON
result = DistanceComparisonResult.load("results/distances_comparison_my_study.json")

# Access condition data
control = result.get_condition("No Polymer")
print(f"Control mean distance: {control.overall_mean_distance:.2f} A")

# Get pairwise comparison
comp = result.get_comparison("100% SBMA")
if comp and comp.distance_significant:
    print(f"SBMA significantly closer (p={comp.distance_p_value:.4f})")
```

### Use Cases for Distance Comparison

| Use Case | Configuration |
|----------|---------------|
| **Substrate binding** | Distance from catalytic residues to substrate atoms |
| **Active site geometry** | Similar to triad, but for non-catalytic interactions |
| **Polymer-residue proximity** | Distance from polymer termini to specific residues |
| **Conformational changes** | Distance between domains or loops |

### Distances vs Catalytic Triad Comparison

| Feature | `compare run distances` | `compare run triad` |
|---------|-------------------------|---------------------|
| **Metric** | Mean distance + fraction | Simultaneous contact fraction |
| **Pairs** | Any atom pairs | Pre-defined triad geometry |
| **Ranking** | Dual (distance + fraction) | Single (simultaneous contact) |
| **Use case** | General distance tracking | Catalytic geometry integrity |

```{tip}
Use **distances** for monitoring specific interactions with dual-metric analysis.
Use **triad** when all pairs must be in contact simultaneously (catalytic triad geometry).
```

## See Also

- [RMSF Quick Start](analysis_rmsf_quickstart.md) -- Run individual RMSF analysis
- [Contacts Quick Start](analysis_contacts_quickstart.md) -- Run individual contacts analysis
- [Distance Analysis Quick Start](analysis_distances_quickstart.md) -- Run individual distance analysis
- [Catalytic Triad Analysis](analysis_triad_quickstart.md) -- Run individual triad analysis
- [Statistical Best Practices](analysis_rmsf_best_practices.md) -- Understanding statistics
- [Reference Selection](analysis_reference_selection.md) -- Alignment options