Tutorial: Analyze a Study from Finished Simulations

This tutorial walks through one complete PolyzyMD analysis story:

  • three simulation conditions already exist

  • you create one comparison.yaml

  • you compare the conditions

  • you finish with polyzymd compare plot-all as the smoke test

By the end, you will have a working comparison workspace with JSON results and figures for a small three-condition study.

What You Will Learn

  • How to initialize a comparison workspace with polyzymd compare init

  • How to write a comparison.yaml that defines conditions and analysis plugins

  • How to run cross-condition comparisons and generate figures

  • What the output directory structure looks like after a successful run

Prerequisites

Before starting, make sure you have:

  • Completed production trajectories for at least three conditions (DCD format in PolyzyMD’s standard directory layout)

  • One config.yaml per condition

  • A topology such as solvated_system.pdb already produced during the build

  • PolyzyMD installed in a pixi environment (see Install PolyzyMD with pixi)

If you have not run a single-condition analysis yet, complete Tutorial: Run Your First Analysis first.

Important

This tutorial uses the stable v1.3.0 comparison stack: RMSD, Rg, RMSF, contacts, distances, catalytic triad, secondary structure, SASA, and hydrogen bonds. Experimental workflows are linked at the end, but they are not part of the main tutorial path.

Important

Resource requirements: Workspace setup and validation commands are lightweight. Commands that load trajectories, such as polyzymd compare run, run-all, and plotting over large cached results, can require substantial RAM, CPU/GPU time, and scratch I/O. On shared HPC systems, run them inside an allocated job or interactive compute session, not on a login node. If a command is killed or runs out of memory, request more resources or use polyzymd compare submit.

The Study We Will Analyze

We will assume a project laid out like this:

my_enzyme_study/
├── noPoly_enzyme_DMSO/
│   ├── config.yaml
│   └── scratch/
├── SBMA_100_enzyme_DMSO/
│   ├── config.yaml
│   └── scratch/
└── EGMA_100_enzyme_DMSO/
    ├── config.yaml
    └── scratch/

The scratch/ directories may be symlinks to large trajectory storage on your cluster. PolyzyMD resolves those paths through each condition’s config.yaml.

Step 1: Create the Comparison Workspace

From the study root, initialize a comparison project and move into it:

cd my_enzyme_study
pixi run -e build polyzymd compare init -n polymer_stabilization_study
cd polymer_stabilization_study

Now edit comparison.yaml to point at the three conditions and define the analysis settings:

name: "polymer_stabilization_study"
description: "Effect of SBMA vs EGMA polymer conjugation on enzyme stability"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% SBMA"
    config: "../SBMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

  - label: "100% EGMA"
    config: "../EGMA_100_enzyme_DMSO/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

plugins:
  rmsf:
    selection: "protein and name CA"
    reference_mode: "average"

  catalytic_triad:
    name: "Ser-His-Asp"
    threshold: 3.5
    pairs:
      - label: "Ser77-His156"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 156 and name NE2"
      - label: "His156-Asp133"
        selection_a: "protein and resid 156 and name ND1"
        selection_b: "midpoint(protein and resid 133 and name OD1 OD2)"

  distances:
    pairs:
      - label: "Substrate-Ser77"
        selection_a: "resname SUB and name C1"
        selection_b: "protein and resid 77 and name OG"

  contacts:
    polymer_selection: "chainid C"
    protein_selection: "chainid A"
    cutoff: 4.5
    compute_residence_times: true

Step 2: Validate the Comparison Config

pixi run -e build polyzymd compare validate

You should see a passing summary that lists the three conditions and the enabled analyses.

Step 3: Run the Cross-Condition Comparison

For the tutorial, use the batch runner:

pixi run -e build polyzymd compare run-all

This runs each enabled analysis through its replicate, aggregate, and cross-condition comparison stages. Successful runs write canonical per-replicate ReplicateArtifact files under analysis/, per-condition ConditionArtifact files under analysis/, and cross-condition comparison outputs under comparison/<analysis>/result.json.

Tip

On an HPC cluster? For large studies, submit each analysis as a SLURM job DAG instead of running interactively:

pixi run -e build polyzymd compare submit sasa --partition <part> --mem 8G

This parallelizes across replicates and conditions. See How To: Submit Analysis Jobs to a SLURM Cluster for the complete HPC workflow.

If you prefer to inspect one comparison first, a good sanity check is:

pixi run -e build polyzymd compare run rmsf

Step 4: Generate the Figures

Now run the plotting smoke test:

pixi run -e build polyzymd compare plot-all --list-available
pixi run -e build polyzymd compare plot-all

If those commands succeed, your comparison workspace is in good shape.

What Success Looks Like

At this point you should have:

polymer_stabilization_study/
├── comparison.yaml
├── analysis/
│   ├── No Polymer/
│   │   ├── rmsf/
│   │   │   ├── run_1/
│   │   │   │   └── result.json        # ReplicateArtifact
│   │   │   ├── run_2/
│   │   │   │   └── result.json        # ReplicateArtifact
│   │   │   ├── run_3/
│   │   │   │   └── result.json        # ReplicateArtifact
│   │   │   └── aggregated/
│   │   │       └── result.json        # ConditionArtifact
│   │   └── contacts/
│   │       └── ...
│   ├── 100% SBMA/
│   │   └── rmsf/
│   │       ├── run_1/
│   │       │   └── result.json        # ReplicateArtifact
│   │       └── aggregated/
│   │           └── result.json        # ConditionArtifact
│   └── 100% EGMA/
│       └── ...
├── comparison/
│   ├── rmsf/
│   │   └── result.json                # cross-condition comparison output
│   ├── contacts/
│   │   └── result.json                # cross-condition comparison output
│   ├── distances/
│   │   └── result.json                # cross-condition comparison output
│   └── catalytic_triad/
│       └── result.json                # cross-condition comparison output
└── figures/
    ├── rmsf/
    │   ├── rmsf_comparison.png
    │   └── rmsf_profile.png
    ├── catalytic_triad/
    │   ├── triad_kde_panel.png
    │   └── triad_threshold_bars.png
    └── ...

That is the tutorial success state: canonical ReplicateArtifact and ConditionArtifact files exist under analysis/, each comparison/<analysis>/result.json contains the cross-condition comparison artifact or plugin-specific summary output, the figures exist, and polyzymd compare plot-all completes without error.

What to Do Next