Workflow Module
Daisy Chain Submitter
Daisy-chain job submission for HPC SLURM scheduler.
This module provides utilities for breaking long MD simulations into smaller dependent jobs that are automatically chained together using SLURM job dependencies.
- class polyzymd.workflow.daisy_chain.SegmentInfo(index, duration_ns, samples, is_initial, cumulative_time_ns)[source]
Bases:
objectInformation about a single simulation segment.
- __init__(index, duration_ns, samples, is_initial, cumulative_time_ns)
- class polyzymd.workflow.daisy_chain.DaisyChainConfig(slurm_config, total_production_time_ns, total_segments=10, total_samples=2500, equilibration_time_ns=0.5, replicates=<factory>, dry_run=False, output_script_dir=PosixPath('daisy_chain_scripts'), config_path='config.yaml')[source]
Bases:
objectConfiguration for daisy-chain submission.
- slurm_config
SLURM job configuration
- output_script_dir
Directory for generated job scripts
- Type:
- slurm_config: SlurmConfig
- get_segments()[source]
Generate segment information for all segments.
- Returns:
List of SegmentInfo objects for each segment.
- Return type:
- classmethod from_simulation_config(sim_config, slurm_config, replicates='1', dry_run=False, output_script_dir='daisy_chain_scripts', config_path='config.yaml')[source]
Create DaisyChainConfig from a SimulationConfig.
- Parameters:
sim_config (SimulationConfig) – Simulation configuration
slurm_config (SlurmConfig) – SLURM configuration
replicates (str | List[int]) – Replicate range string (e.g., “1-5”) or list of ints
dry_run (bool) – If True, don’t submit jobs
config_path (str) – Path to the YAML configuration file
- Returns:
Configured DaisyChainConfig
- Return type:
- __init__(slurm_config, total_production_time_ns, total_segments=10, total_samples=2500, equilibration_time_ns=0.5, replicates=<factory>, dry_run=False, output_script_dir=PosixPath('daisy_chain_scripts'), config_path='config.yaml')
- class polyzymd.workflow.daisy_chain.SubmissionResult(job_id, script_path, segment_index, replicate, is_dry_run=False)[source]
Bases:
objectResult of job submission.
- script_path
Path to the generated script
- Type:
- __init__(job_id, script_path, segment_index, replicate, is_dry_run=False)
- class polyzymd.workflow.daisy_chain.DaisyChainSubmitter(sim_config, dc_config, conda_env='polymerist-env', openff_logs=False, skip_build=False)[source]
Bases:
objectHandles daisy-chain job submission for MD simulations.
This class generates SLURM job scripts and submits them with proper dependencies so that continuation jobs run after their prerequisites.
Example
>>> sim_config = SimulationConfig.from_yaml("config.yaml") >>> slurm_config = SlurmConfig.from_preset("aa100", email="user@example.com") >>> dc_config = DaisyChainConfig.from_simulation_config( ... sim_config, slurm_config, replicates="1-3" ... ) >>> submitter = DaisyChainSubmitter(sim_config, dc_config) >>> results = submitter.submit_all()
- __init__(sim_config, dc_config, conda_env='polymerist-env', openff_logs=False, skip_build=False)[source]
Initialize the DaisyChainSubmitter.
- Parameters:
sim_config (SimulationConfig) – Simulation configuration
dc_config (DaisyChainConfig) – Daisy-chain configuration
conda_env (str) – Conda environment name
openff_logs (bool) – Enable verbose OpenFF logs in generated scripts
skip_build (bool) – Skip system building in generated scripts (use pre-built system)
- property sim_config: SimulationConfig
Get the simulation configuration.
- property dc_config: DaisyChainConfig
Get the daisy-chain configuration.
- property job_chains: Dict[int, List[SubmissionResult]]
Get the job chains for all replicates.
- generate_continuation_script(segment_index, replicate)[source]
Generate a continuation job script content.
- polyzymd.workflow.daisy_chain.submit_daisy_chain(config_path, slurm_preset='aa100', replicates='1', email='', dry_run=False, conda_env='polymerist-env', output_dir=None, scratch_dir=None, projects_dir=None, time_limit=None, memory=None, openff_logs=False, skip_build=False)[source]
Convenience function to submit daisy-chain jobs from a YAML config.
- Parameters:
slurm_preset (str) – SLURM preset name (aa100, al40, blanca-shirts, testing)
replicates (str) – Replicate range string (e.g., “1-5”, “1,3,5”)
email (str) – Email for job notifications
dry_run (bool) – If True, don’t submit jobs
conda_env (str) – Conda environment name
output_dir (str | Path | None) – Directory for job scripts (default: from config or “job_scripts”)
scratch_dir (str | Path | None) – Override scratch directory for simulation output
projects_dir (str | Path | None) – Override projects directory for scripts/logs
time_limit (str | None) – Override SLURM time limit (format: HH:MM:SS or M:SS)
memory (str | None) – Override SLURM memory allocation (e.g., “4G”, “8G”)
openff_logs (bool) – Enable verbose OpenFF logs in generated scripts
skip_build (bool) – Skip system building in generated scripts (use pre-built system)
- Returns:
Dictionary mapping replicate numbers to submission results
- Return type:
Example
>>> results = submit_daisy_chain( ... config_path="simulation.yaml", ... slurm_preset="aa100", ... replicates="1-5", ... email="user@example.com", ... dry_run=True, ... )
SLURM Configuration
SLURM job script generation for HPC cluster submission.
This module provides templates and utilities for generating SLURM batch scripts for MD simulations.
- class polyzymd.workflow.slurm.SlurmConfig(partition='aa100', qos='normal', account='ucb625_asc1', time_limit='23:59:59', email='', nodes=1, ntasks=1, memory='3G', gpus=1, exclude=None)[source]
Bases:
objectConfiguration for SLURM job submission.
- classmethod from_preset(preset, email='')[source]
Create a SlurmConfig from a preset.
- Parameters:
- Returns:
SlurmConfig with preset values.
- Return type:
- __init__(partition='aa100', qos='normal', account='ucb625_asc1', time_limit='23:59:59', email='', nodes=1, ntasks=1, memory='3G', gpus=1, exclude=None)
- class polyzymd.workflow.slurm.JobContext(job_name, output_file, scratch_dir, projects_dir='.', segment_index=0, replicate_num=1, extra_vars=<factory>)[source]
Bases:
objectContext for job script template rendering.
- extra_vars
Additional template variables.
- Type:
Dict
- __init__(job_name, output_file, scratch_dir, projects_dir='.', segment_index=0, replicate_num=1, extra_vars=<factory>)
- class polyzymd.workflow.slurm.SlurmScriptGenerator(config, conda_env='polymerist-env', openff_logs=False, skip_build=False)[source]
Bases:
objectGenerator for SLURM batch scripts.
Supports separate directories for: - projects_dir: Where scripts live and jobs are submitted from - scratch_dir: Where simulation output goes (trajectories, checkpoints)
Example
>>> config = SlurmConfig.from_preset("aa100", email="user@example.com") >>> generator = SlurmScriptGenerator(config) >>> script = generator.generate_initial_job( ... context=JobContext( ... job_name="my_sim", ... output_file="logs/output.log", ... scratch_dir="/scratch/user/sim_output", ... projects_dir="/projects/user/polyzymd", ... ), ... python_script="run_simulation.py", ... python_args={"temperature": 300}, ... )
- INITIAL_JOB_TEMPLATE = '#!/bin/bash\n#SBATCH --partition={partition}\n#SBATCH --job-name=i_{job_name}\n#SBATCH --output={output_file}\n#SBATCH --qos={qos}\n#SBATCH --nodes={nodes}\n#SBATCH --ntasks={ntasks}\n#SBATCH --mem={memory}\n#SBATCH --time={time_limit}\n#SBATCH --gres=gpu:{gpus}\n#SBATCH --mail-type=FAIL\n#SBATCH --mail-user={email}\n#SBATCH --account={account}\n{exclude_line}\n\n# =============================================================================\n# PolyzyMD Initial Simulation Job\n# Segment: {segment_index}\n# =============================================================================\n\n# Load conda environment (ignore module warnings on some HPC systems)\nmodule purge 2>/dev/null || true\nmodule load miniforge 2>/dev/null || true\n\n# Initialize conda/mamba for non-interactive shell\neval "$(conda shell.bash hook)"\nmamba activate {conda_env}\n\n# Enable strict error handling after environment setup\nset -e\n\n# Projects directory (scripts, configs, logs)\nPROJECTS_DIR="{projects_dir}"\n\n# Scratch directory (simulation output)\nSCRATCH_DIR="{scratch_dir}"\n\n# Ensure scratch directory exists\nmkdir -p "$SCRATCH_DIR"\n\n# Change to projects directory where config and scripts live\ncd "$PROJECTS_DIR"\n\necho "Starting initial simulation segment {segment_index}"\necho "Projects dir: $PROJECTS_DIR"\necho "Scratch dir: $SCRATCH_DIR"\necho "Config: {config_path}"\necho "Replicate: {replicate}"\necho "Timestamp: $(date)"\n\n# Run the initial simulation using polyzymd CLI\n# This builds the system, runs equilibration, and runs the first production segment\npolyzymd{openff_logs_flag} run -c "{config_path}" \\\n --replicate {replicate} \\\n --scratch-dir "$SCRATCH_DIR" \\\n --segment-time {segment_time} \\\n --segment-frames {segment_frames}{skip_build_flag}\n\necho "Segment {segment_index} completed successfully at $(date)"\n'
- CONTINUATION_JOB_TEMPLATE = '#!/bin/bash\n#SBATCH --partition={partition}\n#SBATCH --job-name=c_{job_name}\n#SBATCH --output={output_file}\n#SBATCH --qos={qos}\n#SBATCH --nodes={nodes}\n#SBATCH --ntasks={ntasks}\n#SBATCH --mem={memory}\n#SBATCH --time={time_limit}\n#SBATCH --gres=gpu:{gpus}\n#SBATCH --mail-type=FAIL\n#SBATCH --mail-user={email}\n#SBATCH --account={account}\n{exclude_line}\n\n# =============================================================================\n# PolyzyMD Continuation Job\n# Segment: {segment_index}\n# =============================================================================\n\n# Load conda environment (ignore module warnings on some HPC systems)\nmodule purge 2>/dev/null || true\nmodule load miniforge 2>/dev/null || true\n\n# Initialize conda/mamba for non-interactive shell\neval "$(conda shell.bash hook)"\nmamba activate {conda_env}\n\n# Enable strict error handling after environment setup\nset -e\n\n# Projects directory (scripts, configs, logs)\nPROJECTS_DIR="{projects_dir}"\n\n# Scratch directory (simulation output - where previous segment data lives)\nSCRATCH_DIR="{scratch_dir}"\n\n# Change to projects directory\ncd "$PROJECTS_DIR"\n\necho "Starting continuation segment {segment_index}"\necho "Projects dir: $PROJECTS_DIR"\necho "Scratch dir: $SCRATCH_DIR"\necho "Timestamp: $(date)"\n\n# Continue simulation from previous segment using polyzymd CLI\n# Reads checkpoint from previous segment in SCRATCH_DIR\n# Writes new trajectory and checkpoint to SCRATCH_DIR\npolyzymd{openff_logs_flag} continue \\\n -w "$SCRATCH_DIR" \\\n -s {segment_index} \\\n -t {segment_time} \\\n -n {num_samples}\n\necho "Segment {segment_index} completed successfully at $(date)"\n'
- __init__(config, conda_env='polymerist-env', openff_logs=False, skip_build=False)[source]
Initialize the generator.
- Parameters:
config (SlurmConfig) – SLURM configuration.
conda_env (str) – Conda environment name.
openff_logs (bool) – Enable verbose OpenFF logs in generated scripts.
skip_build (bool) – Skip system building in generated scripts (use pre-built system).
- property config: SlurmConfig
Get the SLURM configuration.
- generate_initial_job(context, config_path, replicate, segment_time, segment_frames)[source]
Generate an initial simulation job script.
- Parameters:
context (JobContext) – Job context information.
config_path (str) – Path to the YAML configuration file.
replicate (int) – Replicate number.
segment_time (float) – Duration of first segment in nanoseconds.
segment_frames (int) – Number of frames to save in first segment.
- Returns:
SLURM batch script content.
- Return type:
- generate_continuation_job(context, segment_time, num_samples)[source]
Generate a continuation job script.
- Parameters:
context (JobContext) – Job context information.
segment_time (float) – Duration of this segment in nanoseconds.
num_samples (int) – Number of frames to save.
- Returns:
SLURM batch script content.
- Return type:
- polyzymd.workflow.slurm.parse_replicate_range(replicate_range)[source]
Parse a SLURM array range into a list of replicate numbers.
- Parameters:
replicate_range (str) – SLURM array format (e.g., “1-5”, “1,3,5”, “1-10:2”).
- Returns:
List of replicate numbers.
- Return type:
Example
>>> parse_replicate_range("1-5") [1, 2, 3, 4, 5] >>> parse_replicate_range("1,3,5") [1, 3, 5] >>> parse_replicate_range("1-10:2") [1, 3, 5, 7, 9]
- polyzymd.workflow.slurm.validate_replicate_range(replicate_range)[source]
Validate that a replicate range is in proper SLURM array format.
- Parameters:
replicate_range (str) – Range string to validate.
- Returns:
True if valid.
- Raises:
ValueError – If the format is invalid.
- Return type: