Architecture Guide
This guide provides a detailed map of PolyzyMD’s codebase, helping developers and power users understand where to find and modify specific functionality.
Package Structure
src/polyzymd/
├── cli/ # Command-line interface
│ └── main.py # Click commands (build, run-gromacs, submit, run-segment, validate)
├── config/ # Configuration and validation
│ ├── schema.py # Pydantic models for YAML config
│ └── loader.py # YAML loading utilities
├── builders/ # System construction (Stage 1-4)
│ ├── system_builder.py # Main orchestrator - coordinates all builders
│ ├── enzyme.py # Protein/enzyme loading and preparation
│ ├── substrate.py # Small molecule (ligand) handling
│ ├── polymer.py # Random copolymer generation
│ └── solvent.py # Solvation, ions, box setup
├── simulation/ # MD execution (Stage 5-6)
│ ├── runner.py # Initial simulation (equilibration + production)
│ └── continuation.py # Checkpoint-based continuation for multi-segment runs
├── workflow/ # HPC job management
│ ├── slurm.py # SLURM script templates and generation
│ └── daisy_chain.py # Self-resubmitting job submission
├── core/ # Shared utilities
│ ├── parameters.py # Simulation parameter dataclasses
│ └── restraints.py # Distance restraint definitions
└── __init__.py # Package version and exports
Data Flow Overview
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ config.yaml │────▶│ SimulationConfig│────▶│ SystemBuilder │
└─────────────────┘ │ (schema.py) │ │ (system_builder│
└─────────────────┘ │ .py) │
└────────┬────────┘
│
┌────────────────────────────────┼────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ EnzymeBuilder │ │ SubstrateBuilder│ │ PolymerBuilder │
│ (enzyme.py) │ │ (substrate.py) │ │ (polymer.py) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└────────────────────────────────┼────────────────────────────────┘
│
▼
┌─────────────────┐
│ SolventBuilder │
│ (solvent.py) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Interchange │
│ (OpenFF) │
└────────┬────────┘
│
┌────────────────────────────────┼────────────────────────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│SimulationRunner │ │ContinuationMgr │
│ (runner.py) │ │(continuation.py)│
└─────────────────┘ └─────────────────┘
Component Details
CLI (cli/main.py)
The command-line interface is built with Click. Each command maps to a specific workflow:
Command |
Function |
Primary Module Used |
|---|---|---|
|
Initialize new project directory |
Template files |
|
Build system without running |
|
|
Build and run GROMACS simulation |
|
|
Run a single production segment |
|
|
Submit self-resubmitting jobs to SLURM |
|
|
Resume a stalled simulation |
Progress scanning |
|
Check simulation completion status |
|
|
Validate config file |
|
|
Show installation info |
Version + dependencies |
Where to modify:
Add new CLI commands:
cli/main.pyChange CLI option defaults: Look for
@click.optiondecorators
Configuration (config/schema.py)
All configuration is defined as Pydantic models, providing automatic validation and type checking.
Key classes:
SimulationConfig- Top-level container, hasfrom_yaml()class methodEnzymeConfig- Enzyme PDB path and settingsSubstrateConfig- Ligand SDF path, charge methodPolymerConfig- Polymer generation settings (monomers, length, count)SolventConfig- Water model, ion concentration, box shapeSimulationPhaseConfig- Duration, timestep, ensemble settingsOutputConfig- Directory paths for scratch/projects
Where to modify:
Add new config options: Add fields to the appropriate Pydantic model
Change validation rules: Add
@validatordecoratorsChange defaults: Modify field default values
Builders (builders/)
The builders follow a pipeline pattern, each responsible for one component:
system_builder.py - Orchestrator
Key methods:
from_config(config)- Create builder from SimulationConfigbuild_from_config(config, working_dir, polymer_seed)- Full pipelinebuild_enzyme(pdb_path)- Stage 1build_substrate(sdf_path, ...)- Stage 2build_polymers(characters, probabilities, ...)- Stage 3pack_polymers(padding, working_dir)- Uses PACKMOLsolvate(composition, padding, box_shape)- Stage 4create_interchange()- Stage 5get_openmm_components()- Extract topology, system, positions
enzyme.py - Protein Loading
Uses OpenFF Toolkit to load PDB files with proper residue templates.
Key method: build(pdb_path) -> Topology
substrate.py - Ligand Handling
Loads docked conformers from SDF, assigns partial charges (NAGL by default).
Key method: build(sdf_path, conformer_index, charge_method) -> Molecule
polymer.py - Random Copolymer Generation
Generates random sequences based on monomer probabilities, loads pre-built SDFs.
Key method: build(count, seed) -> Tuple[List[Molecule], List[int]]
Where to modify:
Change polymer loading:
_load_polymer_sdf()methodChange sequence generation:
_generate_sequences()method
solvent.py - Solvation
Adds water (TIP3P), neutralizing ions, and optional co-solvents.
Key method: solvate(topology, composition, padding, box_shape) -> Topology
Simulation (simulation/)
runner.py - Initial Runs
Handles energy minimization, equilibration (NVT), and production (NPT).
Key methods:
minimize(max_iterations, tolerance)- Energy minimizationrun_equilibration(temperature, duration_ns, ...)- NVT equilibrationrun_production(temperature, duration_ns, pressure, segment_index, ...)- NPT productionrun_from_config(config, segment_index)- Run using config settings
Reporters configured (lines 273-288, 398-413):
StateDataReporter(
state_path,
report_interval,
step=True,
time=True,
potentialEnergy=True,
kineticEnergy=True,
totalEnergy=True,
temperature=True,
volume=True,
density=True,
speed=True, # Reports performance in ns/day
)
Where to modify:
Change reported quantities: Modify
StateDataReporterargumentsChange integrator settings:
_create_integrator()methodChange barostat settings:
_add_barostat()method
continuation.py - Checkpoint Continuation
Loads state from previous segment checkpoint and continues simulation.
Key methods:
load_previous_state()- Load checkpoint and systemrun_segment(duration_ns, num_samples)- Run next segment
Workflow (workflow/)
slurm.py - SLURM Script Generation
Contains job script templates and preset configurations.
Key components:
SlurmConfig- Dataclass with partition, QOS, time limit, etc.SlurmConfig.from_preset(preset)- Load preset (aa100, al40, testing, etc.)JOB_TEMPLATE- Unified self-resubmitting job templateSlurmScriptGenerator- Fills template with runtime values
Where to modify:
Add new SLURM presets: Add to
presetsdict infrom_preset()methodChange job script behavior: Modify
JOB_TEMPLATEAdd support for a new CUDA version: Add a new feature block in
pixi.tomland a mapping inPRESET_DEFAULT_PIXI_ENV
Current template structure (simplified):
#!/bin/bash
#SBATCH --partition={partition}
#SBATCH --qos={qos}
# ... other SBATCH directives
# Activate pixi environment
eval "$(pixi shell-hook -e {pixi_env} --manifest-path {manifest_path})"
# Enable strict error handling after environment setup
set -e
# Check if simulation is already complete
polyzymd check-progress -c "{config_path}" -r {replicate}
if [ $? -eq 0 ]; then exit 0; fi
# Run next production segment
polyzymd run-segment -c "{config_path}" -r {replicate}
# Self-resubmit if work remains
polyzymd check-progress -c "{config_path}" -r {replicate}
if [ $? -ne 0 ]; then sbatch "$THIS_SCRIPT"; fi
daisy_chain.py - Job Submission
Manages self-resubmitting SLURM jobs for long simulations. Each replicate gets a single job that resubmits itself until the total production time is reached.
Key components:
DaisyChainConfig- Settings for the submission (time per segment, etc.)DaisyChainSubmitter- Generates scripts and submits jobssubmit_daisy_chain()- Main entry point function
Where to modify:
Change job naming: Modify
_create_job_name()method
Core Utilities (core/)
parameters.py - Simulation Parameters
Dataclasses for simulation settings, useful for programmatic use.
IntegratorParameters- Timestep, friction, thermostatEquilibrationParameters- NVT settingsProductionParameters- NPT settingsReporterParameters- Output frequency settingsSimulationParameters- Combined container
restraints.py - Distance Restraints
Defines atom selections and harmonic distance restraints.
AtomSelection- MDAnalysis-style selection with chain/resid/atom nameRestraintDefinition- Two selections + distance + force constantRestraintFactory- Creates OpenMM forces from definitions
Common Modifications
Adding a New Reporter Output
File: src/polyzymd/simulation/runner.py
Location: Lines 275-288 (equilibration) and 399-413 (production)
Add parameters to StateDataReporter:
StateDataReporter(
...,
remainingTime=True, # Add this for estimated time remaining
elapsedTime=True, # Add this for elapsed wall time
)
See OpenMM StateDataReporter docs for all options.
Adding a New SLURM Preset
File: src/polyzymd/workflow/slurm.py
Location: SlurmConfig.from_preset() method (~line 50)
presets = {
"aa100": {...},
"my_new_preset": {
"partition": "my_partition",
"qos": "my_qos",
"time_limit": "12:00:00",
"gpus": 2,
...
},
}
Changing the Force Field
File: src/polyzymd/builders/system_builder.py
Location: __init__ method (line 54-58)
def __init__(
self,
protein_forcefield: str = "ff14sb_off_impropers_0.0.4.offxml", # Change this
small_molecule_forcefield: str = "openff-2.0.0.offxml", # Or this
) -> None:
Or in your config.yaml:
force_field:
protein: "ff19sb.offxml"
small_molecule: "openff-2.1.0.offxml"
Adding Custom Pre/Post-Processing Steps
File: src/polyzymd/simulation/runner.py
Add methods to SimulationRunner class, then call them from cli/main.py.
Common Workflows
Submitting Jobs to HPC
PolyzyMD provides several SLURM presets for common HPC partitions. Each replicate runs as a single self-resubmitting job that continues until the total production time is reached.
Available Presets
Preset |
Partition |
QOS |
Account |
Default Time |
Notes |
|---|---|---|---|---|---|
|
aa100 |
normal |
ucb625_asc1 |
24h |
NVIDIA A100 GPUs |
|
al40 |
normal |
ucb625_asc1 |
24h |
NVIDIA L40 GPUs |
|
blanca,blanca-shirts |
preemptable |
blanca-shirts |
24h |
Blanca condo (preemptable) |
|
atesting_a100 |
testing |
ucb625_asc1 |
6min |
Quick tests only |
Note: Available partitions depend on your HPC account and permissions. For example, CU Boulder users with Blanca access may not have access to
blanca-shirtsspecifically - that requires membership in the Shirts group condo. Check with your HPC administrators or usesinfoto see which partitions you can access.
Example Commands
Quick test (single replicate, 10 minutes):
polyzymd submit -c config.yaml --preset testing --replicates 1 --time-limit 0:10:00
Production run on A100 GPUs (5 replicates):
polyzymd submit -c config.yaml --preset aa100 --replicates 1-5
Overnight run on Blanca preemptable (3 replicates):
polyzymd submit -c config.yaml --preset blanca-shirts --replicates 1-3
Dry run to inspect generated scripts without submitting:
polyzymd submit -c config.yaml --preset aa100 --replicates 1-3 --dry-run
Override time limit:
polyzymd submit -c config.yaml --preset blanca-shirts --replicates 1-3 --time-limit 12:00:00
Replicate Specification
Replicates can be specified in several formats:
Single:
--replicates 1Range:
--replicates 1-5(runs replicates 1, 2, 3, 4, 5)List:
--replicates 1,3,5(runs replicates 1, 3, 5)
Monitoring Jobs
After submission:
# Check job status
squeue -u $USER
# View job details
scontrol show job <job_id>
# Check SLURM output logs
cat slurm_logs/s0_r1_*.out
Testing Your Changes
After modifying the code:
Verify syntax:
python -m py_compile src/polyzymd/path/to/modified_file.py
Reinstall package:
pip install -e .
Test with dry-run:
polyzymd submit -c config.yaml --preset testing --dry-run
Check generated scripts:
cat job_scripts/initial_seg0_rep1.sh