# Tutorial: Analyze a Study from Finished Simulations This tutorial walks through one complete PolyzyMD analysis story: - three simulation conditions already exist - you create one `analysis.yaml` - you run per-condition analyses - you compare the conditions - you finish with `polyzymd compare plot-all` as the smoke test By the end, you will have a working comparison workspace with JSON results and figures for a small three-condition study. ```{important} This tutorial uses the stable `v1.2.0` comparison stack: RMSF, contacts, distances, and catalytic triad. Experimental workflows are linked at the end, but they are not part of the main tutorial path. ``` ## Before You Start You need: - completed **OpenMM** production trajectories for each condition (DCD format in PolyzyMD's standard directory layout; GROMACS XTC support is planned for [v1.2.1](https://github.com/joelaforet/polyzymd/issues/47)) - one `config.yaml` per condition - a topology such as `solvated_system.pdb` already produced during the build ## The Study We Will Analyze We will assume a project laid out like this: ```text my_enzyme_study/ ├── noPoly_enzyme_DMSO/ │ ├── config.yaml │ └── scratch/ ├── SBMA_100_enzyme_DMSO/ │ ├── config.yaml │ └── scratch/ └── EGMA_100_enzyme_DMSO/ ├── config.yaml └── scratch/ ``` The `scratch/` directories may be symlinks to large trajectory storage on your cluster. PolyzyMD resolves those paths through each condition's `config.yaml`. ## Step 1: Create `analysis.yaml` for the First Condition Start in the control condition: ```bash cd my_enzyme_study/noPoly_enzyme_DMSO polyzymd analyze init ``` Edit the generated `analysis.yaml` so it enables a small stable analysis set: ```yaml replicates: [1, 2, 3] defaults: equilibration_time: "10ns" rmsf: enabled: true selection: "protein and name CA" reference_mode: "average" catalytic_triad: enabled: true name: "Ser-His-Asp" threshold: 3.5 pairs: - label: "Ser77-His156" selection_a: "protein and resid 77 and name OG" selection_b: "protein and resid 156 and name NE2" - label: "His156-Asp133" selection_a: "protein and resid 156 and name ND1" selection_b: "midpoint(protein and resid 133 and name OD1 OD2)" distances: enabled: true pairs: - label: "Substrate-Ser77" selection_a: "resname SUB and name C1" selection_b: "protein and resid 77 and name OG" contacts: enabled: true polymer_selection: "chainID C" protein_selection: "protein" cutoff: 4.5 compute_residence_times: true ``` ## Step 2: Reuse the Same `analysis.yaml` for the Other Conditions From the study root: ```bash cd ../ cp noPoly_enzyme_DMSO/analysis.yaml SBMA_100_enzyme_DMSO/ cp noPoly_enzyme_DMSO/analysis.yaml EGMA_100_enzyme_DMSO/ ``` This works because the condition-specific trajectory paths come from each condition's own `config.yaml`. ## Step 3: Run Per-Condition Analyses Run the same command in each condition directory: ```bash cd noPoly_enzyme_DMSO polyzymd analyze run cd ../SBMA_100_enzyme_DMSO polyzymd analyze run cd ../EGMA_100_enzyme_DMSO polyzymd analyze run ``` After each run, expect an `analysis/` directory with subdirectories such as: ```text analysis/ ├── rmsf/ ├── catalytic_triad/ ├── distances/ └── contacts/ ``` For the no-polymer control, contacts may be skipped automatically because there are no polymer atoms to analyze. ## Step 4: Create the Comparison Workspace Go back to the study root and initialize a comparison project: ```bash cd .. polyzymd compare init -n polymer_stabilization_study cd polymer_stabilization_study ``` Now edit `comparison.yaml` to point at the three conditions: ```yaml name: "polymer_stabilization_study" description: "Effect of SBMA vs EGMA polymer conjugation on enzyme stability" control: "No Polymer" conditions: - label: "No Polymer" config: "../noPoly_enzyme_DMSO/config.yaml" replicates: [1, 2, 3] - label: "100% SBMA" config: "../SBMA_100_enzyme_DMSO/config.yaml" replicates: [1, 2, 3] - label: "100% EGMA" config: "../EGMA_100_enzyme_DMSO/config.yaml" replicates: [1, 2, 3] defaults: equilibration_time: "10ns" analysis_settings: rmsf: selection: "protein and name CA" reference_mode: "average" catalytic_triad: name: "Ser-His-Asp" threshold: 3.5 pairs: - label: "Ser77-His156" selection_a: "protein and resid 77 and name OG" selection_b: "protein and resid 156 and name NE2" - label: "His156-Asp133" selection_a: "protein and resid 156 and name ND1" selection_b: "midpoint(protein and resid 133 and name OD1 OD2)" distances: pairs: - label: "Substrate-Ser77" selection_a: "resname SUB and name C1" selection_b: "protein and resid 77 and name OG" contacts: polymer_selection: "chainID C" protein_selection: "protein" cutoff: 4.5 compute_residence_times: true comparison_settings: rmsf: {} catalytic_triad: {} distances: {} contacts: {} ``` ## Step 5: Validate the Comparison Config ```bash polyzymd compare validate ``` You should see a passing summary that lists the three conditions and the enabled analyses. ## Step 6: Run the Cross-Condition Comparison For the tutorial, use the batch runner: ```bash polyzymd compare run-all ``` This runs every enabled comparison and writes JSON results into `results/`. If you prefer to inspect one comparison first, a good sanity check is: ```bash polyzymd compare run rmsf ``` ## Step 7: Generate the Figures Now run the plotting smoke test: ```bash polyzymd compare plot-all --list-available polyzymd compare plot-all ``` If those commands succeed, your comparison workspace is in good shape. ## What Success Looks Like At this point you should have: ```text polymer_stabilization_study/ ├── comparison.yaml ├── results/ │ ├── rmsf_comparison_polymer_stabilization_study.json │ ├── contacts_comparison_polymer_stabilization_study.json │ ├── distances_comparison_polymer_stabilization_study.json │ └── triad_comparison_polymer_stabilization_study.json └── figures/ ├── rmsf_comparison.png ├── rmsf_profile.png ├── triad_kde_panel.png └── ... ``` That is the tutorial success state: the comparison JSON exists, the figures exist, and `polyzymd compare plot-all` completes without error. ## What to Do Next - Use [How to Compare Simulation Conditions](analysis_compare_conditions.md) for a shorter operational version of this workflow - Use [Comparison and Plotting Reference](../reference/analysis_comparison_reference.md) for CLI, config, and output lookup - Explore metric-specific guides: - [Run RMSF Analysis](analysis_rmsf_quickstart.md) - [Run Contacts Analysis](analysis_contacts_quickstart.md) - [Run Distance Analysis](analysis_distances_quickstart.md) - [Run Catalytic Triad Analysis](analysis_triad_quickstart.md) Experimental workflows remain available, but they are intentionally outside the main tutorial path for this release.