Data Requirements & Directory Layout
This page documents the directory structures, file formats, and naming conventions that PolyzyMD uses for simulations and analysis. Use it as a lookup reference when setting up new projects or troubleshooting missing-file errors.
The Two-Project Pattern
PolyzyMD separates simulation execution from cross-condition analysis into two distinct project types, each with its own directory scaffold and configuration file:
Project Type |
Created By |
Config File |
Purpose |
|---|---|---|---|
Simulation project |
|
|
Build, run, and store one simulation condition |
Comparison project |
|
|
Analyze and compare results across conditions |
A comparison project does not contain trajectory data. Instead, its
comparison.yaml points to one or more simulation project config.yaml files,
which in turn resolve to the trajectory directories on disk.
Simulation Project Layout
Running polyzymd init -n my_simulation creates:
my_simulation/
├── config.yaml # Simulation configuration (edit this)
├── structures/ # Input PDB/SDF files
├── job_scripts/ # Generated SLURM submission scripts
└── slurm_logs/ # SLURM stdout/stderr logs
After building and running a simulation, the output directory (on scratch or in the projects directory) grows to:
{scratch_dir}/{naming_template}/ # One directory per replicate
├── solvated_system.pdb # Topology (created by polyzymd build)
├── equilibration_heating/ # Equilibration stage output
│ └── ...
├── production_0/ # First production segment
│ ├── production_0_trajectory.dcd # Trajectory
│ └── production_0_topology.pdb # Topology snapshot
├── production_1/ # Daisy-chain continuation segment
│ ├── production_1_trajectory.dcd
│ └── production_1_topology.pdb
└── ... # Additional segments if daisy-chained
Each replicate gets its own complete directory containing a topology file and one or more trajectory segments.
Directory Naming Template
The naming_template field in the output section of config.yaml controls
how per-replicate directories are named.
Default template:
{enzyme}_{substrate}_{polymer_type}_{duration}ns_{temperature}K_run{replicate}
Available placeholders:
Placeholder |
Source |
Example Value |
|---|---|---|
|
|
|
|
|
|
|
Derived from polymer config, or |
|
|
|
|
|
Replicate number (1-indexed) |
|
|
|
|
Example resolved name:
LipA_ResorufinButyrate_SBMA-EGPMA_A70_B30_100ns_300K_run1
Scratch vs Projects Directories
PolyzyMD supports separating lightweight project files (scripts, logs) from large simulation output (trajectories, checkpoints). This is common on HPC systems where long-term storage and high-performance scratch are different filesystems.
Field |
Purpose |
Example |
|---|---|---|
|
Scripts, configs, SLURM logs |
|
|
Trajectories, checkpoints, state data |
|
If scratch_directory is null or omitted, all output goes to
projects_directory.
Example config.yaml snippet:
output:
projects_directory: "/projects/$USER/polyzymd"
scratch_directory: "/scratch/alpine/$USER/simulations"
naming_template: "{enzyme}_{substrate}_{polymer_type}_{duration}ns_{temperature}K_run{replicate}"
Environment variables ($USER, $HOME, ${VAR}) and ~ are expanded
automatically in both path fields.
What the Analysis Framework Expects
The TrajectoryLoader class resolves trajectory paths from a simulation
config.yaml. It uses the config’s scratch_directory (or
projects_directory as fallback) combined with the naming_template to
locate each replicate’s working directory.
Topology and trajectory layout
Current OpenMM runs write solvated_system.pdb in the replicate working
directory and production trajectories as indexed daisy-chain segments:
production_N/production_N_trajectory.dcd.
When multiple daisy-chain segments exist (e.g., production_0/,
production_1/, production_2/), they are automatically stitched together in
segment-index order using the MDAnalysis ChainReader. The resulting
Universe presents all segments as a single continuous trajectory.
Input File Requirements
These are the input files placed in the simulation project’s structures/
directory and referenced from config.yaml.
File |
Format |
Config Field |
Requirements |
|---|---|---|---|
Protein structure |
PDB ( |
|
Standard residue names, protonated at simulation pH, no missing heavy atoms in regions of interest |
Substrate |
SDF ( |
|
3D coordinates with docked pose, explicit hydrogens preferred |
Polymer (if pre-built) |
SDF ( |
|
One SDF per chain, or use dynamic generation from SMILES |
Reaction templates |
RXN or |
|
The string |
Note
The sentinel value "default" for reaction templates is not a file path.
Do not prepend a directory to it. PolyzyMD resolves "default" to bundled
reaction files at runtime.
Comparison Project Layout
Running polyzymd compare init -n my_study creates:
my_study/
├── comparison.yaml # Analysis configuration (edit this)
├── comparison/ # Cross-condition comparison results
├── figures/ # Generated plots
└── structures/ # (Optional) shared structure files (e.g., enzyme PDB for SASA)
Analysis runs also create and populate analysis/ with canonical
ReplicateArtifact and ConditionArtifact outputs for per-replicate and
per-condition results. The comparison/ directory is reserved for
cross-condition comparison results.
comparison.yaml structure
The comparison config references simulation projects by pointing to their
config.yaml files:
name: "polymer_study"
description: "Comparison of polymer conjugation effects"
control: "No Polymer" # Label of the control condition, or null
conditions:
- label: "No Polymer"
config: "../no_polymer/config.yaml"
replicates: [1, 2, 3]
- label: "PEG 10k"
config: "../peg_10k/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
plugins:
rmsf:
selection: "protein and name CA"
# ... additional analysis plugins
Relative paths in conditions[].config are resolved relative to the directory
containing comparison.yaml, not the current working directory.
Connecting It All Together
polyzymd init --> config.yaml --> polyzymd build --> polyzymd run --> trajectories/
|
polyzymd compare init --> comparison.yaml --> polyzymd compare run --> results + figures
|
(points to config.yaml files)
The comparison framework reads each condition’s config.yaml, resolves the
scratch directory and naming template, then uses TrajectoryLoader to find
topology and trajectory files for each replicate.
Common Pitfalls
Warning
Path resolution is config-relative, not CWD-relative.
Relative paths in config.yaml (e.g., enzyme.pdb_path: "structures/enzyme.pdb")
are resolved relative to the directory containing config.yaml, not your
shell’s current working directory. The same applies to conditions[].config
paths in comparison.yaml.
Mismatched scratch directory. If you built and ran simulations with one
scratch_directoryvalue but later changed it inconfig.yaml, the analysis framework will look in the wrong location. Thescratch_directoryinconfig.yamlmust match where the trajectory files actually reside.The
"default"sentinel for reactions. Settingpolymers.reactions.initiation: "default"tells PolyzyMD to use a bundled reaction template. Writing"structures/default"or any path containing"default"will fail because no such file exists.Missing replicate directories. Each replicate number listed in
comparison.yamlmust have a corresponding directory on disk. If replicate 3 was never simulated, the analysis will fail with aFileNotFoundErrorshowing the expected path.Incomplete replicate directories. Every replicate directory must contain at least a topology file (
solvated_system.pdb) and one or more production trajectory files. Partially completed simulations that crashed before writing a trajectory will cause load failures.
See Also
Configuration Reference – Full configuration field reference
CLI Reference – CLI command reference including
initandcompare initHow to Compare Simulation Conditions – How to set up and run a comparison
Run Your First PolyzyMD Simulation – Run your first simulation end-to-end