# Tutorial: Analyze a Study from Finished Simulations This tutorial walks through one complete PolyzyMD analysis story: - three simulation conditions already exist - you create one `comparison.yaml` - you compare the conditions - you finish with `polyzymd compare plot-all` as the smoke test By the end, you will have a working comparison workspace with JSON results and figures for a small three-condition study. ## What You Will Learn - How to initialize a comparison workspace with `polyzymd compare init` - How to write a `comparison.yaml` that defines conditions and analysis plugins - How to run cross-condition comparisons and generate figures - What the output directory structure looks like after a successful run ## Prerequisites Before starting, make sure you have: - Completed production trajectories for at least three conditions (DCD format in PolyzyMD's standard directory layout) - One `config.yaml` per condition - A topology such as `solvated_system.pdb` already produced during the build - PolyzyMD installed in a pixi environment (see {doc}`../get_started/installation`) If you have not run a single-condition analysis yet, complete {doc}`first_analysis` first. ```{important} This tutorial uses the stable `v1.3.0` comparison stack: RMSD, Rg, RMSF, contacts, distances, catalytic triad, secondary structure, SASA, and hydrogen bonds. Experimental workflows are linked at the end, but they are not part of the main tutorial path. ``` ```{important} **Resource requirements:** Workspace setup and validation commands are lightweight. Commands that load trajectories, such as `polyzymd compare run`, `run-all`, and plotting over large cached results, can require substantial RAM, CPU/GPU time, and scratch I/O. On shared HPC systems, run them inside an allocated job or interactive compute session, not on a login node. If a command is killed or runs out of memory, request more resources or use `polyzymd compare submit`. ``` ## The Study We Will Analyze We will assume a project laid out like this: ```text my_enzyme_study/ ├── noPoly_enzyme_DMSO/ │ ├── config.yaml │ └── scratch/ ├── SBMA_100_enzyme_DMSO/ │ ├── config.yaml │ └── scratch/ └── EGMA_100_enzyme_DMSO/ ├── config.yaml └── scratch/ ``` The `scratch/` directories may be symlinks to large trajectory storage on your cluster. PolyzyMD resolves those paths through each condition's `config.yaml`. ## Step 1: Create the Comparison Workspace From the study root, initialize a comparison project and move into it: ```bash cd my_enzyme_study pixi run -e build polyzymd compare init -n polymer_stabilization_study cd polymer_stabilization_study ``` Now edit `comparison.yaml` to point at the three conditions and define the analysis settings: ```yaml name: "polymer_stabilization_study" description: "Effect of SBMA vs EGMA polymer conjugation on enzyme stability" control: "No Polymer" conditions: - label: "No Polymer" config: "../noPoly_enzyme_DMSO/config.yaml" replicates: [1, 2, 3] - label: "100% SBMA" config: "../SBMA_100_enzyme_DMSO/config.yaml" replicates: [1, 2, 3] - label: "100% EGMA" config: "../EGMA_100_enzyme_DMSO/config.yaml" replicates: [1, 2, 3] defaults: equilibration_time: "10ns" plugins: rmsf: selection: "protein and name CA" reference_mode: "average" catalytic_triad: name: "Ser-His-Asp" threshold: 3.5 pairs: - label: "Ser77-His156" selection_a: "protein and resid 77 and name OG" selection_b: "protein and resid 156 and name NE2" - label: "His156-Asp133" selection_a: "protein and resid 156 and name ND1" selection_b: "midpoint(protein and resid 133 and name OD1 OD2)" distances: pairs: - label: "Substrate-Ser77" selection_a: "resname SUB and name C1" selection_b: "protein and resid 77 and name OG" contacts: polymer_selection: "chainid C" protein_selection: "chainid A" cutoff: 4.5 compute_residence_times: true ``` ## Step 2: Validate the Comparison Config ```bash pixi run -e build polyzymd compare validate ``` You should see a passing summary that lists the three conditions and the enabled analyses. ## Step 3: Run the Cross-Condition Comparison For the tutorial, use the batch runner: ```bash pixi run -e build polyzymd compare run-all ``` This runs each enabled analysis through its replicate, aggregate, and cross-condition comparison stages. Successful runs write canonical per-replicate `ReplicateArtifact` files under `analysis/`, per-condition `ConditionArtifact` files under `analysis/`, and cross-condition comparison outputs under `comparison//result.json`. ```{tip} **On an HPC cluster?** For large studies, submit each analysis as a SLURM job DAG instead of running interactively: pixi run -e build polyzymd compare submit sasa --partition --mem 8G This parallelizes across replicates and conditions. See {doc}`../how_to/hpc_execution` for the complete HPC workflow. ``` If you prefer to inspect one comparison first, a good sanity check is: ```bash pixi run -e build polyzymd compare run rmsf ``` ## Step 4: Generate the Figures Now run the plotting smoke test: ```bash pixi run -e build polyzymd compare plot-all --list-available pixi run -e build polyzymd compare plot-all ``` If those commands succeed, your comparison workspace is in good shape. ## What Success Looks Like At this point you should have: ```text polymer_stabilization_study/ ├── comparison.yaml ├── analysis/ │ ├── No Polymer/ │ │ ├── rmsf/ │ │ │ ├── run_1/ │ │ │ │ └── result.json # ReplicateArtifact │ │ │ ├── run_2/ │ │ │ │ └── result.json # ReplicateArtifact │ │ │ ├── run_3/ │ │ │ │ └── result.json # ReplicateArtifact │ │ │ └── aggregated/ │ │ │ └── result.json # ConditionArtifact │ │ └── contacts/ │ │ └── ... │ ├── 100% SBMA/ │ │ └── rmsf/ │ │ ├── run_1/ │ │ │ └── result.json # ReplicateArtifact │ │ └── aggregated/ │ │ └── result.json # ConditionArtifact │ └── 100% EGMA/ │ └── ... ├── comparison/ │ ├── rmsf/ │ │ └── result.json # cross-condition comparison output │ ├── contacts/ │ │ └── result.json # cross-condition comparison output │ ├── distances/ │ │ └── result.json # cross-condition comparison output │ └── catalytic_triad/ │ └── result.json # cross-condition comparison output └── figures/ ├── rmsf/ │ ├── rmsf_comparison.png │ └── rmsf_profile.png ├── catalytic_triad/ │ ├── triad_kde_panel.png │ └── triad_threshold_bars.png └── ... ``` That is the tutorial success state: canonical `ReplicateArtifact` and `ConditionArtifact` files exist under `analysis/`, each `comparison//result.json` contains the cross-condition comparison artifact or plugin-specific summary output, the figures exist, and `polyzymd compare plot-all` completes without error. ## What to Do Next - Use [How to Compare Simulation Conditions](../how_to/analysis_compare_conditions.md) for a shorter operational version of this workflow - Use [Comparison and Plotting Reference](../reference/analysis_comparison_reference.md) for CLI, config, and output lookup - Explore metric-specific guides: - [Run RMSF Analysis](../how_to/analysis_rmsf_quickstart.md) - [Run Contacts Analysis](../how_to/analysis_contacts_quickstart.md) - [Run Distance Analysis](../how_to/analysis_distances_quickstart.md) - [Run Catalytic Triad Analysis](../how_to/analysis_triad_quickstart.md) - For removed experimental analyses, see [Experimental analyses](../reference/experimental_analyses_archive.md); they are not active v1.3 workflows.