Architecture Guide
This guide provides a detailed map of PolyzyMD’s codebase, helping developers and power users understand where to find and modify specific functionality.
Package Structure
src/polyzymd/
├── cli/ # Command-line interface
│ └── main.py # Click commands (build, run, submit, continue, validate)
├── config/ # Configuration and validation
│ ├── schema.py # Pydantic models for YAML config
│ └── loader.py # YAML loading utilities
├── builders/ # System construction (Stage 1-4)
│ ├── system_builder.py # Main orchestrator - coordinates all builders
│ ├── enzyme.py # Protein/enzyme loading and preparation
│ ├── substrate.py # Small molecule (ligand) handling
│ ├── polymer.py # Random copolymer generation
│ └── solvent.py # Solvation, ions, box setup
├── simulation/ # MD execution (Stage 5-6)
│ ├── runner.py # Initial simulation (equilibration + production)
│ └── continuation.py # Checkpoint-based continuation for daisy-chain
├── workflow/ # HPC job management
│ ├── slurm.py # SLURM script templates and generation
│ └── daisy_chain.py # Job submission with dependencies
├── core/ # Shared utilities
│ ├── parameters.py # Simulation parameter dataclasses
│ └── restraints.py # Distance restraint definitions
└── __init__.py # Package version and exports
Data Flow Overview
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ config.yaml │────▶│ SimulationConfig│────▶│ SystemBuilder │
└─────────────────┘ │ (schema.py) │ │ (system_builder│
└─────────────────┘ │ .py) │
└────────┬────────┘
│
┌────────────────────────────────┼────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ EnzymeBuilder │ │ SubstrateBuilder│ │ PolymerBuilder │
│ (enzyme.py) │ │ (substrate.py) │ │ (polymer.py) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└────────────────────────────────┼────────────────────────────────┘
│
▼
┌─────────────────┐
│ SolventBuilder │
│ (solvent.py) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Interchange │
│ (OpenFF) │
└────────┬────────┘
│
┌────────────────────────────────┼────────────────────────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│SimulationRunner │ │ContinuationMgr │
│ (runner.py) │ │(continuation.py)│
└─────────────────┘ └─────────────────┘
Component Details
CLI (cli/main.py)
The command-line interface is built with Click. Each command maps to a specific workflow:
Command |
Function |
Primary Module Used |
|---|---|---|
|
Initialize new project directory |
Template files |
|
Build system without running |
|
|
Build and run initial simulation |
|
|
Continue from checkpoint |
|
|
Submit daisy-chain to SLURM |
|
|
Validate config file |
|
|
Show installation info |
Version + dependencies |
Where to modify:
Add new CLI commands:
cli/main.pyChange CLI option defaults: Look for
@click.optiondecorators
Configuration (config/schema.py)
All configuration is defined as Pydantic models, providing automatic validation and type checking.
Key classes:
SimulationConfig- Top-level container, hasfrom_yaml()class methodEnzymeConfig- Enzyme PDB path and settingsSubstrateConfig- Ligand SDF path, charge methodPolymerConfig- Polymer generation settings (monomers, length, count)SolventConfig- Water model, ion concentration, box shapeSimulationPhaseConfig- Duration, timestep, ensemble settingsOutputConfig- Directory paths for scratch/projects
Where to modify:
Add new config options: Add fields to the appropriate Pydantic model
Change validation rules: Add
@validatordecoratorsChange defaults: Modify field default values
Builders (builders/)
The builders follow a pipeline pattern, each responsible for one component:
system_builder.py - Orchestrator
Key methods:
from_config(config)- Create builder from SimulationConfigbuild_from_config(config, working_dir, polymer_seed)- Full pipelinebuild_enzyme(pdb_path)- Stage 1build_substrate(sdf_path, ...)- Stage 2build_polymers(characters, probabilities, ...)- Stage 3pack_polymers(padding, working_dir)- Uses PACKMOLsolvate(composition, padding, box_shape)- Stage 4create_interchange()- Stage 5get_openmm_components()- Extract topology, system, positions
enzyme.py - Protein Loading
Uses OpenFF Toolkit to load PDB files with proper residue templates.
Key method: build(pdb_path) -> Topology
substrate.py - Ligand Handling
Loads docked conformers from SDF, assigns partial charges (NAGL by default).
Key method: build(sdf_path, conformer_index, charge_method) -> Molecule
polymer.py - Random Copolymer Generation
Generates random sequences based on monomer probabilities, loads pre-built SDFs.
Key method: build(count, seed) -> Tuple[List[Molecule], List[int]]
Where to modify:
Change polymer loading:
_load_polymer_sdf()methodChange sequence generation:
_generate_sequences()method
solvent.py - Solvation
Adds water (TIP3P), neutralizing ions, and optional co-solvents.
Key method: solvate(topology, composition, padding, box_shape) -> Topology
Simulation (simulation/)
runner.py - Initial Runs
Handles energy minimization, equilibration (NVT), and production (NPT).
Key methods:
minimize(max_iterations, tolerance)- Energy minimizationrun_equilibration(temperature, duration_ns, ...)- NVT equilibrationrun_production(temperature, duration_ns, pressure, segment_index, ...)- NPT productionrun_from_config(config, segment_index)- Run using config settings
Reporters configured (lines 273-288, 398-413):
StateDataReporter(
state_path,
report_interval,
step=True,
time=True,
potentialEnergy=True,
kineticEnergy=True,
totalEnergy=True,
temperature=True,
volume=True,
density=True,
speed=True, # Reports performance in ns/day
)
Where to modify:
Change reported quantities: Modify
StateDataReporterargumentsChange integrator settings:
_create_integrator()methodChange barostat settings:
_add_barostat()method
continuation.py - Checkpoint Continuation
Loads state from previous segment checkpoint and continues simulation.
Key methods:
load_previous_state()- Load checkpoint and systemrun_segment(duration_ns, num_samples)- Run next segment
Workflow (workflow/)
slurm.py - SLURM Script Generation
Contains job script templates and preset configurations.
Key components:
SlurmConfig- Dataclass with partition, QOS, time limit, etc.SlurmConfig.from_preset(preset)- Load preset (aa100, al40, testing, etc.)INITIAL_JOB_TEMPLATE- Template for first segment (builds system)CONTINUATION_JOB_TEMPLATE- Template for subsequent segmentsSlurmJobGenerator- Fills templates with runtime values
Where to modify:
Add new SLURM presets: Add to
presetsdict infrom_preset()methodChange job script behavior: Modify
INITIAL_JOB_TEMPLATEorCONTINUATION_JOB_TEMPLATEChange module loading: Edit the
module loadlines in templates
Current template structure (simplified):
#!/bin/bash
#SBATCH --partition={partition}
#SBATCH --qos={qos}
# ... other SBATCH directives
# Load conda environment (ignore module warnings)
module purge 2>/dev/null || true
module load miniforge 2>/dev/null || true
eval "$(conda shell.bash hook)"
mamba activate {conda_env}
# Enable strict error handling after environment setup
set -e
# Run simulation
polyzymd run -c "{config_path}" --replicate {replicate} ...
daisy_chain.py - Job Submission
Manages job dependencies for long simulations split across multiple SLURM jobs.
Key components:
DaisyChainConfig- Settings for the chain (segments, time per segment)DaisyChainSubmitter- Generates scripts and submits with dependenciessubmit_daisy_chain()- Main entry point function
Where to modify:
Change dependency type: Modify
--dependency=afterok:in submissionChange job naming: Modify
JobContextcreation
Core Utilities (core/)
parameters.py - Simulation Parameters
Dataclasses for simulation settings, useful for programmatic use.
IntegratorParameters- Timestep, friction, thermostatEquilibrationParameters- NVT settingsProductionParameters- NPT settingsReporterParameters- Output frequency settingsSimulationParameters- Combined container
restraints.py - Distance Restraints
Defines atom selections and harmonic distance restraints.
AtomSelection- MDAnalysis-style selection with chain/resid/atom nameRestraintDefinition- Two selections + distance + force constantRestraintFactory- Creates OpenMM forces from definitions
Common Modifications
Adding a New Reporter Output
File: src/polyzymd/simulation/runner.py
Location: Lines 275-288 (equilibration) and 399-413 (production)
Add parameters to StateDataReporter:
StateDataReporter(
...,
remainingTime=True, # Add this for estimated time remaining
elapsedTime=True, # Add this for elapsed wall time
)
See OpenMM StateDataReporter docs for all options.
Adding a New SLURM Preset
File: src/polyzymd/workflow/slurm.py
Location: SlurmConfig.from_preset() method (~line 50)
presets = {
"aa100": {...},
"my_new_preset": {
"partition": "my_partition",
"qos": "my_qos",
"time_limit": "12:00:00",
"gpus": 2,
...
},
}
Changing the Force Field
File: src/polyzymd/builders/system_builder.py
Location: __init__ method (line 54-58)
def __init__(
self,
protein_forcefield: str = "ff14sb_off_impropers_0.0.4.offxml", # Change this
small_molecule_forcefield: str = "openff-2.0.0.offxml", # Or this
) -> None:
Or in your config.yaml:
force_field:
protein: "ff19sb.offxml"
small_molecule: "openff-2.1.0.offxml"
Adding Custom Pre/Post-Processing Steps
File: src/polyzymd/simulation/runner.py
Add methods to SimulationRunner class, then call them from cli/main.py.
Common Workflows
Submitting Jobs to HPC
PolyzyMD provides several SLURM presets for common HPC partitions. Each replicate’s daisy chain runs in parallel, with segments within each chain running sequentially.
Available Presets
Preset |
Partition |
QOS |
Account |
Default Time |
Notes |
|---|---|---|---|---|---|
|
aa100 |
normal |
ucb625_asc1 |
24h |
NVIDIA A100 GPUs |
|
al40 |
normal |
ucb625_asc1 |
24h |
NVIDIA L40 GPUs |
|
blanca,blanca-shirts |
preemptable |
blanca-shirts |
24h |
Blanca condo (preemptable) |
|
atesting_a100 |
testing |
ucb625_asc1 |
6min |
Quick tests only |
Note: Available partitions depend on your HPC account and permissions. For example, CU Boulder users with Blanca access may not have access to
blanca-shirtsspecifically - that requires membership in the Shirts group condo. Check with your HPC administrators or usesinfoto see which partitions you can access.
Example Commands
Quick test (single replicate, 10 minutes):
polyzymd submit -c config.yaml --preset testing --replicates 1 --time-limit 0:10:00
Production run on A100 GPUs (5 replicates):
polyzymd submit -c config.yaml --preset aa100 --replicates 1-5
Overnight run on Blanca preemptable (3 replicates):
polyzymd submit -c config.yaml --preset blanca-shirts --replicates 1-3
Dry run to inspect generated scripts without submitting:
polyzymd submit -c config.yaml --preset aa100 --replicates 1-3 --dry-run
Override time limit:
polyzymd submit -c config.yaml --preset blanca-shirts --replicates 1-3 --time-limit 12:00:00
Replicate Specification
Replicates can be specified in several formats:
Single:
--replicates 1Range:
--replicates 1-5(runs replicates 1, 2, 3, 4, 5)List:
--replicates 1,3,5(runs replicates 1, 3, 5)
Monitoring Jobs
After submission:
# Check job status
squeue -u $USER
# View job details
scontrol show job <job_id>
# Check SLURM output logs
cat slurm_logs/s0_r1_*.out
Testing Your Changes
After modifying the code:
Verify syntax:
python -m py_compile src/polyzymd/path/to/modified_file.py
Reinstall package:
pip install -e .
Test with dry-run:
polyzymd submit -c config.yaml --preset testing --dry-run
Check generated scripts:
cat job_scripts/initial_seg0_rep1.sh