Architecture Guide

This guide provides a detailed map of PolyzyMD’s codebase, helping developers and power users understand where to find and modify specific functionality.

Package Structure

src/polyzymd/
├── cli/                    # Command-line interface
│   └── main.py             # Click commands (build, run, submit, continue, validate)
├── config/                 # Configuration and validation
│   ├── schema.py           # Pydantic models for YAML config
│   └── loader.py           # YAML loading utilities
├── builders/               # System construction (Stage 1-4)
│   ├── system_builder.py   # Main orchestrator - coordinates all builders
│   ├── enzyme.py           # Protein/enzyme loading and preparation
│   ├── substrate.py        # Small molecule (ligand) handling
│   ├── polymer.py          # Random copolymer generation
│   └── solvent.py          # Solvation, ions, box setup
├── simulation/             # MD execution (Stage 5-6)
│   ├── runner.py           # Initial simulation (equilibration + production)
│   └── continuation.py     # Checkpoint-based continuation for daisy-chain
├── workflow/               # HPC job management
│   ├── slurm.py            # SLURM script templates and generation
│   └── daisy_chain.py      # Job submission with dependencies
├── core/                   # Shared utilities
│   ├── parameters.py       # Simulation parameter dataclasses
│   └── restraints.py       # Distance restraint definitions
└── __init__.py             # Package version and exports

Data Flow Overview

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   config.yaml   │────▶│  SimulationConfig│────▶│  SystemBuilder  │
└─────────────────┘     │  (schema.py)     │     │  (system_builder│
                        └─────────────────┘     │   .py)          │
                                                └────────┬────────┘
                                                         │
                        ┌────────────────────────────────┼────────────────────────────────┐
                        │                                │                                │
                        ▼                                ▼                                ▼
               ┌─────────────────┐             ┌─────────────────┐             ┌─────────────────┐
               │  EnzymeBuilder  │             │  SubstrateBuilder│            │  PolymerBuilder │
               │  (enzyme.py)    │             │  (substrate.py)  │            │  (polymer.py)   │
               └────────┬────────┘             └────────┬────────┘            └────────┬────────┘
                        │                                │                                │
                        └────────────────────────────────┼────────────────────────────────┘
                                                         │
                                                         ▼
                                                ┌─────────────────┐
                                                │  SolventBuilder │
                                                │  (solvent.py)   │
                                                └────────┬────────┘
                                                         │
                                                         ▼
                                                ┌─────────────────┐
                                                │   Interchange   │
                                                │   (OpenFF)      │
                                                └────────┬────────┘
                                                         │
                        ┌────────────────────────────────┼────────────────────────────────┐
                        │                                                                 │
                        ▼                                                                 ▼
               ┌─────────────────┐                                               ┌─────────────────┐
               │SimulationRunner │                                               │ContinuationMgr  │
               │  (runner.py)    │                                               │(continuation.py)│
               └─────────────────┘                                               └─────────────────┘

Component Details

CLI (cli/main.py)

The command-line interface is built with Click. Each command maps to a specific workflow:

Command

Function

Primary Module Used

polyzymd init

Initialize new project directory

Template files

polyzymd build

Build system without running

SystemBuilder

polyzymd run

Build and run initial simulation

SystemBuilder + SimulationRunner

polyzymd continue

Continue from checkpoint

ContinuationManager

polyzymd submit

Submit daisy-chain to SLURM

DaisyChainSubmitter

polyzymd validate

Validate config file

SimulationConfig

polyzymd info

Show installation info

Version + dependencies

Where to modify:

  • Add new CLI commands: cli/main.py

  • Change CLI option defaults: Look for @click.option decorators

Configuration (config/schema.py)

All configuration is defined as Pydantic models, providing automatic validation and type checking.

Key classes:

  • SimulationConfig - Top-level container, has from_yaml() class method

  • EnzymeConfig - Enzyme PDB path and settings

  • SubstrateConfig - Ligand SDF path, charge method

  • PolymerConfig - Polymer generation settings (monomers, length, count)

  • SolventConfig - Water model, ion concentration, box shape

  • SimulationPhaseConfig - Duration, timestep, ensemble settings

  • OutputConfig - Directory paths for scratch/projects

Where to modify:

  • Add new config options: Add fields to the appropriate Pydantic model

  • Change validation rules: Add @validator decorators

  • Change defaults: Modify field default values

Builders (builders/)

The builders follow a pipeline pattern, each responsible for one component:

system_builder.py - Orchestrator

Key methods:

  • from_config(config) - Create builder from SimulationConfig

  • build_from_config(config, working_dir, polymer_seed) - Full pipeline

  • build_enzyme(pdb_path) - Stage 1

  • build_substrate(sdf_path, ...) - Stage 2

  • build_polymers(characters, probabilities, ...) - Stage 3

  • pack_polymers(padding, working_dir) - Uses PACKMOL

  • solvate(composition, padding, box_shape) - Stage 4

  • create_interchange() - Stage 5

  • get_openmm_components() - Extract topology, system, positions

enzyme.py - Protein Loading

Uses OpenFF Toolkit to load PDB files with proper residue templates.

Key method: build(pdb_path) -> Topology

substrate.py - Ligand Handling

Loads docked conformers from SDF, assigns partial charges (NAGL by default).

Key method: build(sdf_path, conformer_index, charge_method) -> Molecule

polymer.py - Random Copolymer Generation

Generates random sequences based on monomer probabilities, loads pre-built SDFs.

Key method: build(count, seed) -> Tuple[List[Molecule], List[int]]

Where to modify:

  • Change polymer loading: _load_polymer_sdf() method

  • Change sequence generation: _generate_sequences() method

solvent.py - Solvation

Adds water (TIP3P), neutralizing ions, and optional co-solvents.

Key method: solvate(topology, composition, padding, box_shape) -> Topology

Simulation (simulation/)

runner.py - Initial Runs

Handles energy minimization, equilibration (NVT), and production (NPT).

Key methods:

  • minimize(max_iterations, tolerance) - Energy minimization

  • run_equilibration(temperature, duration_ns, ...) - NVT equilibration

  • run_production(temperature, duration_ns, pressure, segment_index, ...) - NPT production

  • run_from_config(config, segment_index) - Run using config settings

Reporters configured (lines 273-288, 398-413):

StateDataReporter(
    state_path,
    report_interval,
    step=True,
    time=True,
    potentialEnergy=True,
    kineticEnergy=True,
    totalEnergy=True,
    temperature=True,
    volume=True,
    density=True,
    speed=True,  # Reports performance in ns/day
)

Where to modify:

  • Change reported quantities: Modify StateDataReporter arguments

  • Change integrator settings: _create_integrator() method

  • Change barostat settings: _add_barostat() method

continuation.py - Checkpoint Continuation

Loads state from previous segment checkpoint and continues simulation.

Key methods:

  • load_previous_state() - Load checkpoint and system

  • run_segment(duration_ns, num_samples) - Run next segment

Workflow (workflow/)

slurm.py - SLURM Script Generation

Contains job script templates and preset configurations.

Key components:

  • SlurmConfig - Dataclass with partition, QOS, time limit, etc.

  • SlurmConfig.from_preset(preset) - Load preset (aa100, al40, testing, etc.)

  • INITIAL_JOB_TEMPLATE - Template for first segment (builds system)

  • CONTINUATION_JOB_TEMPLATE - Template for subsequent segments

  • SlurmJobGenerator - Fills templates with runtime values

Where to modify:

  • Add new SLURM presets: Add to presets dict in from_preset() method

  • Change job script behavior: Modify INITIAL_JOB_TEMPLATE or CONTINUATION_JOB_TEMPLATE

  • Change module loading: Edit the module load lines in templates

Current template structure (simplified):

#!/bin/bash
#SBATCH --partition={partition}
#SBATCH --qos={qos}
# ... other SBATCH directives

# Load conda environment (ignore module warnings)
module purge 2>/dev/null || true
module load miniforge 2>/dev/null || true
eval "$(conda shell.bash hook)"
mamba activate {conda_env}

# Enable strict error handling after environment setup
set -e

# Run simulation
polyzymd run -c "{config_path}" --replicate {replicate} ...

daisy_chain.py - Job Submission

Manages job dependencies for long simulations split across multiple SLURM jobs.

Key components:

  • DaisyChainConfig - Settings for the chain (segments, time per segment)

  • DaisyChainSubmitter - Generates scripts and submits with dependencies

  • submit_daisy_chain() - Main entry point function

Where to modify:

  • Change dependency type: Modify --dependency=afterok: in submission

  • Change job naming: Modify JobContext creation

Core Utilities (core/)

parameters.py - Simulation Parameters

Dataclasses for simulation settings, useful for programmatic use.

  • IntegratorParameters - Timestep, friction, thermostat

  • EquilibrationParameters - NVT settings

  • ProductionParameters - NPT settings

  • ReporterParameters - Output frequency settings

  • SimulationParameters - Combined container

restraints.py - Distance Restraints

Defines atom selections and harmonic distance restraints.

  • AtomSelection - MDAnalysis-style selection with chain/resid/atom name

  • RestraintDefinition - Two selections + distance + force constant

  • RestraintFactory - Creates OpenMM forces from definitions

Common Modifications

Adding a New Reporter Output

File: src/polyzymd/simulation/runner.py

Location: Lines 275-288 (equilibration) and 399-413 (production)

Add parameters to StateDataReporter:

StateDataReporter(
    ...,
    remainingTime=True,  # Add this for estimated time remaining
    elapsedTime=True,    # Add this for elapsed wall time
)

See OpenMM StateDataReporter docs for all options.

Adding a New SLURM Preset

File: src/polyzymd/workflow/slurm.py

Location: SlurmConfig.from_preset() method (~line 50)

presets = {
    "aa100": {...},
    "my_new_preset": {
        "partition": "my_partition",
        "qos": "my_qos",
        "time_limit": "12:00:00",
        "gpus": 2,
        ...
    },
}

Changing the Force Field

File: src/polyzymd/builders/system_builder.py

Location: __init__ method (line 54-58)

def __init__(
    self,
    protein_forcefield: str = "ff14sb_off_impropers_0.0.4.offxml",  # Change this
    small_molecule_forcefield: str = "openff-2.0.0.offxml",         # Or this
) -> None:

Or in your config.yaml:

force_field:
  protein: "ff19sb.offxml"
  small_molecule: "openff-2.1.0.offxml"

Adding Custom Pre/Post-Processing Steps

File: src/polyzymd/simulation/runner.py

Add methods to SimulationRunner class, then call them from cli/main.py.

Common Workflows

Submitting Jobs to HPC

PolyzyMD provides several SLURM presets for common HPC partitions. Each replicate’s daisy chain runs in parallel, with segments within each chain running sequentially.

Available Presets

Preset

Partition

QOS

Account

Default Time

Notes

aa100

aa100

normal

ucb625_asc1

24h

NVIDIA A100 GPUs

al40

al40

normal

ucb625_asc1

24h

NVIDIA L40 GPUs

blanca-shirts

blanca,blanca-shirts

preemptable

blanca-shirts

24h

Blanca condo (preemptable)

testing

atesting_a100

testing

ucb625_asc1

6min

Quick tests only

Note: Available partitions depend on your HPC account and permissions. For example, CU Boulder users with Blanca access may not have access to blanca-shirts specifically - that requires membership in the Shirts group condo. Check with your HPC administrators or use sinfo to see which partitions you can access.

Example Commands

Quick test (single replicate, 10 minutes):

polyzymd submit -c config.yaml --preset testing --replicates 1 --time-limit 0:10:00

Production run on A100 GPUs (5 replicates):

polyzymd submit -c config.yaml --preset aa100 --replicates 1-5

Overnight run on Blanca preemptable (3 replicates):

polyzymd submit -c config.yaml --preset blanca-shirts --replicates 1-3

Dry run to inspect generated scripts without submitting:

polyzymd submit -c config.yaml --preset aa100 --replicates 1-3 --dry-run

Override time limit:

polyzymd submit -c config.yaml --preset blanca-shirts --replicates 1-3 --time-limit 12:00:00

Replicate Specification

Replicates can be specified in several formats:

  • Single: --replicates 1

  • Range: --replicates 1-5 (runs replicates 1, 2, 3, 4, 5)

  • List: --replicates 1,3,5 (runs replicates 1, 3, 5)

Monitoring Jobs

After submission:

# Check job status
squeue -u $USER

# View job details
scontrol show job <job_id>

# Check SLURM output logs
cat slurm_logs/s0_r1_*.out

Testing Your Changes

After modifying the code:

  1. Verify syntax:

    python -m py_compile src/polyzymd/path/to/modified_file.py
    
  2. Reinstall package:

    pip install -e .
    
  3. Test with dry-run:

    polyzymd submit -c config.yaml --preset testing --dry-run
    
  4. Check generated scripts:

    cat job_scripts/initial_seg0_rep1.sh