Tutorial: Analyze a Study from Finished Simulations

This tutorial walks through one complete PolyzyMD analysis story:

three simulation conditions already exist
you create one analysis.yaml
you run per-condition analyses
you compare the conditions
you finish with polyzymd compare plot-all as the smoke test

By the end, you will have a working comparison workspace with JSON results and figures for a small three-condition study.

Important

This tutorial uses the stable v1.2.0 comparison stack: RMSF, contacts, distances, and catalytic triad. Experimental workflows are linked at the end, but they are not part of the main tutorial path.

Before You Start

You need:

completed OpenMM production trajectories for each condition (DCD format in PolyzyMD’s standard directory layout; GROMACS XTC support is planned for v1.2.1)
one config.yaml per condition
a topology such as solvated_system.pdb already produced during the build

The Study We Will Analyze

We will assume a project laid out like this:

my_enzyme_study/
├── noPoly_enzyme_DMSO/
│   ├── config.yaml
│   └── scratch/
├── SBMA_100_enzyme_DMSO/
│   ├── config.yaml
│   └── scratch/
└── EGMA_100_enzyme_DMSO/
    ├── config.yaml
    └── scratch/

The scratch/ directories may be symlinks to large trajectory storage on your cluster. PolyzyMD resolves those paths through each condition’s config.yaml.

Step 1: Create `analysis.yaml` for the First Condition

Start in the control condition:

cd my_enzyme_study/noPoly_enzyme_DMSO
polyzymd analyze init

Edit the generated analysis.yaml so it enables a small stable analysis set:

replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

rmsf:
  enabled: true
  selection: "protein and name CA"
  reference_mode: "average"

catalytic_triad:
  enabled: true
  name: "Ser-His-Asp"
  threshold: 3.5
  pairs:
    - label: "Ser77-His156"
      selection_a: "protein and resid 77 and name OG"
      selection_b: "protein and resid 156 and name NE2"
    - label: "His156-Asp133"
      selection_a: "protein and resid 156 and name ND1"
      selection_b: "midpoint(protein and resid 133 and name OD1 OD2)"

distances:
  enabled: true
  pairs:
    - label: "Substrate-Ser77"
      selection_a: "resname SUB and name C1"
      selection_b: "protein and resid 77 and name OG"

contacts:
  enabled: true
  polymer_selection: "chainID C"
  protein_selection: "protein"
  cutoff: 4.5
  compute_residence_times: true

Step 2: Reuse the Same `analysis.yaml` for the Other Conditions

From the study root:

cd ../
cp noPoly_enzyme_DMSO/analysis.yaml SBMA_100_enzyme_DMSO/
cp noPoly_enzyme_DMSO/analysis.yaml EGMA_100_enzyme_DMSO/

This works because the condition-specific trajectory paths come from each condition’s own config.yaml.

Step 3: Run Per-Condition Analyses

Run the same command in each condition directory:

cd noPoly_enzyme_DMSO
polyzymd analyze run

cd ../SBMA_100_enzyme_DMSO
polyzymd analyze run

cd ../EGMA_100_enzyme_DMSO
polyzymd analyze run

After each run, expect an analysis/ directory with subdirectories such as:

analysis/
├── rmsf/
├── catalytic_triad/
├── distances/
└── contacts/

For the no-polymer control, contacts may be skipped automatically because there are no polymer atoms to analyze.

Step 4: Create the Comparison Workspace

Go back to the study root and initialize a comparison project:

cd ..
polyzymd compare init -n polymer_stabilization_study
cd polymer_stabilization_study

Now edit comparison.yaml to point at the three conditions:

name: "polymer_stabilization_study"
description: "Effect of SBMA vs EGMA polymer conjugation on enzyme stability"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

analysis_settings:
  rmsf:
    selection: "protein and name CA"
    reference_mode: "average"

  catalytic_triad:
    name: "Ser-His-Asp"
    threshold: 3.5
    pairs:
      - label: "Ser77-His156"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 156 and name NE2"
      - label: "His156-Asp133"
        selection_a: "protein and resid 156 and name ND1"
        selection_b: "midpoint(protein and resid 133 and name OD1 OD2)"

  distances:
    pairs:
      - label: "Substrate-Ser77"
        selection_a: "resname SUB and name C1"
        selection_b: "protein and resid 77 and name OG"

  contacts:
    polymer_selection: "chainID C"
    protein_selection: "protein"
    cutoff: 4.5
    compute_residence_times: true

comparison_settings:
  rmsf: {}
  catalytic_triad: {}
  distances: {}
  contacts: {}

Step 5: Validate the Comparison Config

polyzymd compare validate

You should see a passing summary that lists the three conditions and the enabled analyses.

Step 6: Run the Cross-Condition Comparison

For the tutorial, use the batch runner:

polyzymd compare run-all

This runs every enabled comparison and writes JSON results into results/.

If you prefer to inspect one comparison first, a good sanity check is:

polyzymd compare run rmsf

Step 7: Generate the Figures

Now run the plotting smoke test:

polyzymd compare plot-all --list-available
polyzymd compare plot-all

If those commands succeed, your comparison workspace is in good shape.

What Success Looks Like

At this point you should have:

polymer_stabilization_study/
├── comparison.yaml
├── results/
│   ├── rmsf_comparison_polymer_stabilization_study.json
│   ├── contacts_comparison_polymer_stabilization_study.json
│   ├── distances_comparison_polymer_stabilization_study.json
│   └── triad_comparison_polymer_stabilization_study.json
└── figures/
    ├── rmsf_comparison.png
    ├── rmsf_profile.png
    ├── triad_kde_panel.png
    └── ...

That is the tutorial success state: the comparison JSON exists, the figures exist, and polyzymd compare plot-all completes without error.

What to Do Next

Use How to Compare Simulation Conditions for a shorter operational version of this workflow
Use Comparison and Plotting Reference for CLI, config, and output lookup
Explore metric-specific guides:

Experimental workflows remain available, but they are intentionally outside the main tutorial path for this release.