Tutorial: Analyze a Study from Finished Simulations
This tutorial walks through one complete PolyzyMD analysis story:
three simulation conditions already exist
you create one
analysis.yamlyou run per-condition analyses
you compare the conditions
you finish with
polyzymd compare plot-allas the smoke test
By the end, you will have a working comparison workspace with JSON results and figures for a small three-condition study.
Important
This tutorial uses the stable v1.2.0 comparison stack: RMSF, contacts,
distances, and catalytic triad. Experimental workflows are linked at the end,
but they are not part of the main tutorial path.
Before You Start
You need:
completed OpenMM production trajectories for each condition (DCD format in PolyzyMD’s standard directory layout; GROMACS XTC support is planned for v1.2.1)
one
config.yamlper conditiona topology such as
solvated_system.pdbalready produced during the build
The Study We Will Analyze
We will assume a project laid out like this:
my_enzyme_study/
├── noPoly_enzyme_DMSO/
│ ├── config.yaml
│ └── scratch/
├── SBMA_100_enzyme_DMSO/
│ ├── config.yaml
│ └── scratch/
└── EGMA_100_enzyme_DMSO/
├── config.yaml
└── scratch/
The scratch/ directories may be symlinks to large trajectory storage on your
cluster. PolyzyMD resolves those paths through each condition’s config.yaml.
Step 1: Create analysis.yaml for the First Condition
Start in the control condition:
cd my_enzyme_study/noPoly_enzyme_DMSO
polyzymd analyze init
Edit the generated analysis.yaml so it enables a small stable analysis set:
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
rmsf:
enabled: true
selection: "protein and name CA"
reference_mode: "average"
catalytic_triad:
enabled: true
name: "Ser-His-Asp"
threshold: 3.5
pairs:
- label: "Ser77-His156"
selection_a: "protein and resid 77 and name OG"
selection_b: "protein and resid 156 and name NE2"
- label: "His156-Asp133"
selection_a: "protein and resid 156 and name ND1"
selection_b: "midpoint(protein and resid 133 and name OD1 OD2)"
distances:
enabled: true
pairs:
- label: "Substrate-Ser77"
selection_a: "resname SUB and name C1"
selection_b: "protein and resid 77 and name OG"
contacts:
enabled: true
polymer_selection: "chainID C"
protein_selection: "protein"
cutoff: 4.5
compute_residence_times: true
Step 2: Reuse the Same analysis.yaml for the Other Conditions
From the study root:
cd ../
cp noPoly_enzyme_DMSO/analysis.yaml SBMA_100_enzyme_DMSO/
cp noPoly_enzyme_DMSO/analysis.yaml EGMA_100_enzyme_DMSO/
This works because the condition-specific trajectory paths come from each
condition’s own config.yaml.
Step 3: Run Per-Condition Analyses
Run the same command in each condition directory:
cd noPoly_enzyme_DMSO
polyzymd analyze run
cd ../SBMA_100_enzyme_DMSO
polyzymd analyze run
cd ../EGMA_100_enzyme_DMSO
polyzymd analyze run
After each run, expect an analysis/ directory with subdirectories such as:
analysis/
├── rmsf/
├── catalytic_triad/
├── distances/
└── contacts/
For the no-polymer control, contacts may be skipped automatically because there are no polymer atoms to analyze.
Step 4: Create the Comparison Workspace
Go back to the study root and initialize a comparison project:
cd ..
polyzymd compare init -n polymer_stabilization_study
cd polymer_stabilization_study
Now edit comparison.yaml to point at the three conditions:
name: "polymer_stabilization_study"
description: "Effect of SBMA vs EGMA polymer conjugation on enzyme stability"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../noPoly_enzyme_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "100% SBMA"
config: "../SBMA_100_enzyme_DMSO/config.yaml"
replicates: [1, 2, 3]
- label: "100% EGMA"
config: "../EGMA_100_enzyme_DMSO/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
analysis_settings:
rmsf:
selection: "protein and name CA"
reference_mode: "average"
catalytic_triad:
name: "Ser-His-Asp"
threshold: 3.5
pairs:
- label: "Ser77-His156"
selection_a: "protein and resid 77 and name OG"
selection_b: "protein and resid 156 and name NE2"
- label: "His156-Asp133"
selection_a: "protein and resid 156 and name ND1"
selection_b: "midpoint(protein and resid 133 and name OD1 OD2)"
distances:
pairs:
- label: "Substrate-Ser77"
selection_a: "resname SUB and name C1"
selection_b: "protein and resid 77 and name OG"
contacts:
polymer_selection: "chainID C"
protein_selection: "protein"
cutoff: 4.5
compute_residence_times: true
comparison_settings:
rmsf: {}
catalytic_triad: {}
distances: {}
contacts: {}
Step 5: Validate the Comparison Config
polyzymd compare validate
You should see a passing summary that lists the three conditions and the enabled analyses.
Step 6: Run the Cross-Condition Comparison
For the tutorial, use the batch runner:
polyzymd compare run-all
This runs every enabled comparison and writes JSON results into results/.
If you prefer to inspect one comparison first, a good sanity check is:
polyzymd compare run rmsf
Step 7: Generate the Figures
Now run the plotting smoke test:
polyzymd compare plot-all --list-available
polyzymd compare plot-all
If those commands succeed, your comparison workspace is in good shape.
What Success Looks Like
At this point you should have:
polymer_stabilization_study/
├── comparison.yaml
├── results/
│ ├── rmsf_comparison_polymer_stabilization_study.json
│ ├── contacts_comparison_polymer_stabilization_study.json
│ ├── distances_comparison_polymer_stabilization_study.json
│ └── triad_comparison_polymer_stabilization_study.json
└── figures/
├── rmsf_comparison.png
├── rmsf_profile.png
├── triad_kde_panel.png
└── ...
That is the tutorial success state: the comparison JSON exists, the figures
exist, and polyzymd compare plot-all completes without error.
What to Do Next
Use How to Compare Simulation Conditions for a shorter operational version of this workflow
Use Comparison and Plotting Reference for CLI, config, and output lookup
Explore metric-specific guides:
Experimental workflows remain available, but they are intentionally outside the main tutorial path for this release.