Workflow Module

Job Submitter

Job submission for HPC SLURM scheduler.

This module provides utilities for submitting daisy-chain MD simulation jobs to SLURM. In PolyzyMD, daisy-chain is the canonical term for serial MD segments on preempted hardware: each replicate gets a job script that calls polyzymd run-segment, checks progress, and resubmits itself until the simulation is complete.

Changed in version 1.1.0: Standardized daisy-chain execution on self-resubmitting jobs, where each submission advances one serial MD segment before scheduling the next segment as needed.

polyzymd.workflow.daisy_chain.check_existing_slurm_jobs(job_name)[source]

Query SLURM for RUNNING or PENDING jobs that match job_name.

This is a best-effort check: if squeue is unavailable (e.g. in a non-SLURM environment or CI), a warning is logged and an empty list is returned so that submission proceeds unimpeded.

Parameters:

job_name (str) – The SLURM --job-name to search for (exact match).

Returns:

SLURM job IDs that are RUNNING or PENDING with the given name. Empty if squeue is unavailable or returns no matches.

Return type:

list of str

polyzymd.workflow.daisy_chain.create_job_name(sim_config, replicate)[source]

Create a descriptive SLURM job name for a replicate.

Produces names like r1_310K_Fibronectin_SBMA-OEGMA_A75_B25 matching the directory naming convention.

Parameters:
  • sim_config (SimulationConfig) – Validated simulation configuration.

  • replicate (int) – Replicate number.

Returns:

Formatted job name.

Return type:

str

class polyzymd.workflow.daisy_chain.DaisyChainConfig(slurm_config, total_production_time_ns, total_samples=2500, equilibration_time_ns=0.5, replicates=<factory>, dry_run=False, generate_only=False, force=False, output_script_dir=PosixPath('daisy_chain_scripts'), config_path='config.yaml')[source]

Bases: object

Configuration for daisy-chain job submission.

Daisy-chain submission is PolyzyMD’s canonical SLURM workflow for serial MD segments on preempted hardware. Each replicate is managed by one self-resubmitting job script that advances the trajectory segment by segment until production is complete.

slurm_config

SLURM job configuration.

Type:

SlurmConfig

total_production_time_ns

Total production time in nanoseconds.

Type:

float

total_samples

Total trajectory frames across the entire production run.

Type:

int

equilibration_time_ns

Equilibration time (informational only).

Type:

float

replicates

Replicate numbers to run.

Type:

list of int

dry_run

If True, preview only. No scripts are written and no jobs are submitted.

Type:

bool

generate_only

If True, create scripts but don’t submit.

Type:

bool

force

If True, skip the squeue duplicate-job check and submit even if a RUNNING/PENDING job already exists for the same replicate.

Type:

bool

output_script_dir

Directory for generated job scripts.

Type:

Path

config_path

Path to the YAML configuration file.

Type:

str

slurm_config: SlurmConfig
total_production_time_ns: float
total_samples: int = 2500
equilibration_time_ns: float = 0.5
replicates: List[int]
dry_run: bool = False
generate_only: bool = False
force: bool = False
output_script_dir: Path = PosixPath('daisy_chain_scripts')
config_path: str = 'config.yaml'
classmethod from_simulation_config(sim_config, slurm_config, replicates='1', dry_run=False, generate_only=False, force=False, output_script_dir='daisy_chain_scripts', config_path='config.yaml')[source]

Create DaisyChainConfig from a SimulationConfig.

Parameters:
  • sim_config (SimulationConfig) – Simulation configuration.

  • slurm_config (SlurmConfig) – SLURM configuration.

  • replicates (str or list of int) – Replicate range string (e.g. "1-5") or list of ints.

  • dry_run (bool) – If True, preview only and write no files.

  • generate_only (bool) – If True, create scripts without submitting.

  • force (bool) – If True, skip duplicate-job check.

  • output_script_dir (str or Path) – Directory for job scripts.

  • config_path (str) – Path to the YAML configuration file.

Returns:

Configured instance.

Return type:

DaisyChainConfig

__init__(slurm_config, total_production_time_ns, total_samples=2500, equilibration_time_ns=0.5, replicates=<factory>, dry_run=False, generate_only=False, force=False, output_script_dir=PosixPath('daisy_chain_scripts'), config_path='config.yaml')
class polyzymd.workflow.daisy_chain.SubmissionResult(job_id, script_path, segment_index, replicate, is_dry_run=False, is_generated_only=False)[source]

Bases: object

Result of job submission.

job_id

SLURM job ID (or dummy ID for dry run).

Type:

str

script_path

Path to the generated script.

Type:

Path

segment_index

Initial segment index for the self-resubmitting daisy-chain job.

Type:

int

replicate

Replicate number.

Type:

int

is_dry_run

Whether this was a dry run.

Type:

bool

is_generated_only

Whether this was a generate-only script output.

Type:

bool

job_id: str
script_path: Path
segment_index: int
replicate: int
is_dry_run: bool = False
is_generated_only: bool = False
__init__(job_id, script_path, segment_index, replicate, is_dry_run=False, is_generated_only=False)
class polyzymd.workflow.daisy_chain.DaisyChainSubmitter(sim_config, dc_config, pixi_env='cuda-12-4', openff_logs=False, skip_build=False)[source]

Bases: object

Handle daisy-chain job submission for MD simulations.

In PolyzyMD’s daisy-chain model, each replicate gets a single self-resubmitting job script. The script calls polyzymd run-segment, checks progress, and resubmits itself to run serial MD segments until the simulation is complete.

Example

>>> sim_config = SimulationConfig.from_yaml("config.yaml")
>>> slurm_config = SlurmConfig.from_preset("aa100", email="user@example.com")
>>> dc_config = DaisyChainConfig.from_simulation_config(
...     sim_config, slurm_config, replicates="1-3"
... )
>>> submitter = DaisyChainSubmitter(sim_config, dc_config)
>>> results = submitter.submit_all()
__init__(sim_config, dc_config, pixi_env='cuda-12-4', openff_logs=False, skip_build=False)[source]

Initialize the submitter.

Parameters:
  • sim_config (SimulationConfig) – Simulation configuration.

  • dc_config (DaisyChainConfig) – Submission configuration.

  • pixi_env (str) – Pixi environment name (e.g. "cuda-12-4", "cuda-12-6").

  • openff_logs (bool) – Enable verbose OpenFF logs in generated scripts.

  • skip_build (bool) – Skip system building in generated scripts.

property sim_config: SimulationConfig

Get the simulation configuration.

property dc_config: DaisyChainConfig

Get the submission configuration.

property job_chains: Dict[int, List[SubmissionResult]]

Get the submission results for all replicates.

generate_job_script(replicate)[source]

Generate a self-resubmitting job script for a replicate.

Parameters:

replicate (int) – Replicate number.

Returns:

Complete SLURM batch script content.

Return type:

str

submit_replicate(replicate)[source]

Generate and submit the job for a single replicate.

Before submitting, checks squeue for existing RUNNING/PENDING jobs with the same job name. If duplicates are found and force is not set, raises RuntimeError.

Parameters:

replicate (int) – Replicate number.

Returns:

Submission result.

Return type:

SubmissionResult

Raises:

RuntimeError – If a SLURM job is already RUNNING or PENDING for this replicate and force is False.

submit_all()[source]

Submit jobs for all replicates.

Returns:

Mapping of replicate numbers to daisy-chain submission results.

Return type:

dict

polyzymd.workflow.daisy_chain.submit_daisy_chain(config_path, slurm_preset='aa100', replicates='1', email='', dry_run=False, generate_only=False, force=False, pixi_env='cuda-12-4', output_dir=None, scratch_dir=None, projects_dir=None, time_limit=None, memory=None, account=None, partition=None, qos=None, gpu_type=None, constraint=None, nodelist=None, openff_logs=False, skip_build=False)[source]

Submit daisy-chain simulation jobs from a YAML config.

This is the main entry point called by polyzymd submit. Daisy-chain is PolyzyMD’s canonical term for serial MD segments on preempted hardware; this function submits one self-resubmitting job per replicate to advance those segments until completion.

Parameters:
  • config_path (str or Path) – Path to simulation YAML config.

  • slurm_preset (str) – SLURM preset name (aa100, al40, blanca-shirts, bridges2, testing).

  • replicates (str) – Replicate range string (e.g. "1-5", "1,3,5").

  • email (str) – Email for job notifications.

  • dry_run (bool) – If True, preview only and write no files.

  • generate_only (bool) – If True, create scripts without submitting.

  • force (bool) – If True, skip the squeue duplicate-job check.

  • pixi_env (str) – Pixi environment name (e.g. "cuda-12-4", "cuda-12-6").

  • output_dir (str or Path or None) – Directory for job scripts.

  • scratch_dir (str or Path or None) – Override scratch directory for simulation output.

  • projects_dir (str or Path or None) – Override projects directory for scripts/logs.

  • time_limit (str or None) – Override SLURM time limit (format: HH:MM:SS).

  • memory (str or None) – Override SLURM memory allocation (e.g. "4G").

  • account (str or None) – Override SLURM account / allocation ID.

  • partition (str or None) – Override SLURM partition.

  • qos (str or None) – Override SLURM QoS value.

  • gpu_type (str or None) – Override GPU type for presets that use --gpus directive.

  • constraint (str or None) – SLURM --constraint expression (e.g. "A40|A100").

  • nodelist (str or None) – Optional SLURM --nodelist override.

  • openff_logs (bool) – Enable verbose OpenFF logs in generated scripts.

  • skip_build (bool) – Skip system building in generated scripts.

Returns:

Mapping of replicate numbers to submission results.

Return type:

dict

Raises:

ValueError – If the SLURM account is empty on a preset that requires one and neither dry_run nor generate_only is set.

SLURM Configuration

SLURM job script generation for HPC cluster submission.

This module provides templates and utilities for generating SLURM batch scripts for self-resubmitting MD simulation jobs.

Changed in version 1.1.0: Replaced conda/module-load environment activation with pixi. The module_load and conda_command fields on SlurmConfig have been removed. Environment activation is now handled by pixi shell-hook using the pixi_env parameter on SlurmScriptGenerator.

class polyzymd.workflow.slurm.SlurmConfig(partition='aa100', qos='normal', account='ucb625_asc1', time_limit='23:59:59', email='', nodes=1, ntasks=1, cpus_per_task=1, memory='3G', gpus=1, exclude=None, nodelist=None, gpu_type=None, gpu_directive_style='gres', constraint=None)[source]

Bases: object

Configuration for SLURM job submission.

partition

SLURM partition(s) to use.

Type:

str

qos

Quality of service. Set to "" to omit the --qos directive entirely (required for clusters such as Bridges2 that do not use QoS).

Type:

str

account

Account / allocation ID for resource allocation. Set to "" to omit the --account directive entirely (e.g. Bridges2, which infers the allocation from the submitting user’s login).

Type:

str

time_limit

Wall time limit (HH:MM:SS).

Type:

str

email

Email address for SLURM failure notifications. Set to "" to omit both --mail-type and --mail-user directives.

Type:

str

nodes

Number of nodes.

Type:

int

ntasks

Number of tasks. Ignored when gpu_directive_style == "gpus" (Bridges2-style); those scripts emit #SBATCH -N {nodes} only.

Type:

int

cpus_per_task

Number of CPUs allocated per task.

Type:

int

memory

Memory allocation (e.g. "3G"). Set to None to omit the --mem directive entirely (some clusters allocate memory per GPU and reject an explicit --mem request).

Type:

str | None

gpus

Number of GPUs.

Type:

int

exclude

Nodes to exclude (omitted when None).

Type:

str | None

nodelist

Optional SLURM --nodelist value.

Type:

str | None

gpu_type

Optional GPU type string used with the --gpus directive (e.g. "v100-32" for Bridges2). When None the classic --gres=gpu:<N> directive is emitted instead.

Type:

str | None

gpu_directive_style

"gres" (default, Alpine-style) or "gpus" (Bridges2-style). Controls which SBATCH GPU directive is written. Also governs which nodes/ntasks format is emitted.

Type:

str

constraint

Optional SLURM --constraint expression. Supports boolean expressions with | (OR) and & (AND), such as "A40|A100".

Type:

str | None

partition: str = 'aa100'
qos: str = 'normal'
account: str = 'ucb625_asc1'
time_limit: str = '23:59:59'
email: str = ''
nodes: int = 1
ntasks: int = 1
cpus_per_task: int = 1
memory: str | None = '3G'
gpus: int = 1
exclude: str | None = None
nodelist: str | None = None
gpu_type: str | None = None
gpu_directive_style: str = 'gres'
constraint: str | None = None
classmethod from_preset(preset, email='')[source]

Create a SlurmConfig from a named preset.

Parameters:
  • preset (Literal['aa100', 'al40', 'blanca-shirts', 'bridges2', 'testing']) – Preset name.

  • email (str) – Email for notifications.

Returns:

SlurmConfig with preset values.

Return type:

SlurmConfig

__init__(partition='aa100', qos='normal', account='ucb625_asc1', time_limit='23:59:59', email='', nodes=1, ntasks=1, cpus_per_task=1, memory='3G', gpus=1, exclude=None, nodelist=None, gpu_type=None, gpu_directive_style='gres', constraint=None)
class polyzymd.workflow.slurm.JobContext(job_name, output_file, scratch_dir, projects_dir='.', segment_index=0, replicate_num=1, extra_vars=<factory>)[source]

Bases: object

Context for job script template rendering.

job_name

SLURM job name.

Type:

str

output_file

Output file pattern (for SLURM logs).

Type:

str

scratch_dir

Directory for simulation output (trajectories, checkpoints).

Type:

str

projects_dir

Directory for scripts and logs.

Type:

str

segment_index

Current segment index.

Type:

int

replicate_num

Replicate number.

Type:

int

extra_vars

Additional template variables.

Type:

Dict

job_name: str
output_file: str
scratch_dir: str
projects_dir: str = '.'
segment_index: int = 0
replicate_num: int = 1
extra_vars: Dict
__init__(job_name, output_file, scratch_dir, projects_dir='.', segment_index=0, replicate_num=1, extra_vars=<factory>)
class polyzymd.workflow.slurm.SlurmScriptGenerator(config, pixi_env='cuda-12-4', openff_logs=False, skip_build=False)[source]

Bases: object

Generator for SLURM batch scripts.

Supports separate directories for: - projects_dir: Where scripts live and jobs are submitted from - scratch_dir: Where simulation output goes (trajectories, checkpoints)

Example

>>> config = SlurmConfig.from_preset("aa100", email="user@example.com")
>>> generator = SlurmScriptGenerator(config)
>>> script = generator.generate_job_script(
...     config_path="/projects/user/config.yaml",
...     replicate=1,
...     working_dir="/scratch/user/sim_output",
... )
JOB_TEMPLATE = '#!/bin/bash\n#SBATCH --partition={{ partition }}\n#SBATCH --job-name={{ job_name }}\n#SBATCH --output={{ output_file }}\n{{ qos_line }}\n{{ nodes_line }}\n{{ cpus_line }}\n{{ mem_line }}\n#SBATCH --time={{ time_limit }}\n{{ gpu_line }}\n{{ mail_line }}\n{{ account_line }}\n{{ exclude_line }}\n{{ nodelist_line }}\n{{ constraint_line }}\n#SBATCH --signal=B:USR1@300\n#SBATCH --no-requeue\n\n# =============================================================================\n# PolyzyMD Self-Resubmitting Simulation Job\n# {{ FULL_CREDIT_LINE }}\n# Generated by polyzymd do not edit manually\n# =============================================================================\n\n# Activate pixi environment\n# The manifest path was resolved at submission time from `which polyzymd`.\neval "$(pixi shell-hook -e {{ pixi_env | shell_quote }} --manifest-path {{ manifest_path | shell_quote }})"\n\n# Enable strict error handling after environment setup\nset -e\n\n# Required for OpenFF Interchange.combine() functionality\nexport INTERCHANGE_EXPERIMENTAL=1\n\n# Resolve this script\'s path for self-resubmission.\n# $SLURM_JOB_SCRIPT is only available in SLURM >= 22.05; fall back to $0.\nTHIS_SCRIPT="${SLURM_JOB_SCRIPT:-$(realpath "$0")}"\n\n# Configuration\nCONFIG_PATH="{{ config_path }}"\nREPLICATE={{ replicate }}\nWORKING_DIR="{{ working_dir }}"\n\n# Ensure working directory exists\nmkdir -p "$WORKING_DIR"\n\necho "=================================================="\necho "PolyzyMD self-resubmitting job"\necho "{{ FULL_CREDIT_LINE }}"\necho "Config:    $CONFIG_PATH"\necho "Replicate: $REPLICATE"\necho "Work dir:  $WORKING_DIR"\necho "Pixi env:  {{ pixi_env }}"\necho "Job ID:    ${SLURM_JOB_ID:-local}"\necho "Timestamp: $(date)"\necho "=================================================="\n\n# =========================================================================\n# Signal forwarding: SLURM sends signals to the batch shell, not to child\n# processes.  We trap SIGUSR1 (wall-time warning) and SIGTERM (preemption)\n# and forward them to the Python process running in the background.\n# =========================================================================\nCHILD_PID=""\nforward_signal() {\n    if [ -n "$CHILD_PID" ] && kill -0 "$CHILD_PID" 2>/dev/null; then\n        echo "Forwarding $1 to Python process (PID $CHILD_PID)"\n        kill -"$1" "$CHILD_PID"\n    fi\n}\ntrap \'forward_signal USR1\' USR1\ntrap \'forward_signal TERM\' TERM\n\n# Run the next segment (backgrounded for signal forwarding)\npolyzymd{{ openff_logs_flag }} run-segment \\\n    -c "$CONFIG_PATH" \\\n    -r "$REPLICATE" \\\n    --scratch-dir "$WORKING_DIR"{{ skip_build_flag }} &\nCHILD_PID=$!\n\n# Wait for the child; \'wait\' is interrupted by trapped signals, so loop\n# until the child actually exits.  Temporarily disable \'set -e\' so we can\n# capture non-zero exit codes (e.g. 99 for graceful shutdown) without the\n# shell exiting prematurely.\nset +e\nwait "$CHILD_PID" 2>/dev/null\nRC=$?\nwhile kill -0 "$CHILD_PID" 2>/dev/null; do\n    wait "$CHILD_PID" 2>/dev/null\n    RC=$?\ndone\nset -e\n\necho "run-segment exited with code $RC at $(date)"\n\n# =========================================================================\n# Resubmission logic\n# =========================================================================\nif [ $RC -eq 2 ]; then\n    echo "CONCURRENT: Another job is already running this replicate NOT resubmitting."\n    echo "This duplicate job chain will now terminate cleanly."\n    exit 0\nfi\n\nif [ $RC -ne 0 ] && [ $RC -ne 99 ]; then\n    echo "FATAL: run-segment failed (exit code $RC) NOT resubmitting"\n    exit $RC\nfi\n\n# Check whether more work remains\nset +e\npolyzymd check-progress -c "$CONFIG_PATH" -r "$REPLICATE" --scratch-dir "$WORKING_DIR"\nPROGRESS_RC=$?\nset -e\n\nif [ $PROGRESS_RC -eq 0 ]; then\n    echo "Simulation complete no resubmission needed."\n    exit 0\nfi\n\nif [ $PROGRESS_RC -ne 1 ]; then\n    echo "FATAL: check-progress failed (exit code $PROGRESS_RC) NOT resubmitting"\n    exit $PROGRESS_RC\nfi\n\n# Work remains (exit code 1) resubmit this same script\necho "Work remains resubmitting job..."\nset +e\nsbatch "$THIS_SCRIPT"\nSUBMIT_RC=$?\nset -e\n\nif [ $SUBMIT_RC -eq 0 ]; then\n    echo "Resubmitted successfully."\nelse\n    echo "WARNING: sbatch resubmission failed (exit code $SUBMIT_RC)"\n    echo "You can manually resume with:"\n    echo "  sbatch $THIS_SCRIPT"\n    exit 1\nfi\n\nexit 0\n'
__init__(config, pixi_env='cuda-12-4', openff_logs=False, skip_build=False)[source]

Initialize the generator.

Parameters:
  • config (SlurmConfig) – SLURM configuration.

  • pixi_env (str) – Pixi environment name (e.g. "cuda-12-4", "cuda-12-6").

  • openff_logs (bool) – Enable verbose OpenFF logs in generated scripts.

  • skip_build (bool) – Skip system building in generated scripts (use pre-built system).

property config: SlurmConfig

Get the SLURM configuration.

generate_job_script(config_path, replicate, working_dir, job_name=None, output_file=None)[source]

Generate a self-resubmitting SLURM job script.

This produces a single script that handles the entire simulation lifecycle. Each invocation calls polyzymd run-segment which determines what work remains, runs the next segment, and exits. The bash wrapper then checks progress and resubmits itself if more work is needed.

Parameters:
  • config_path (str) – Absolute path to the YAML configuration file.

  • replicate (int) – Replicate number.

  • working_dir (str) – Directory for simulation output (trajectories, checkpoints).

  • job_name (str or None, optional) – SLURM job name. Callers should use create_job_name() to produce descriptive names (e.g. r1_310K_Fibronectin_...). Falls back to pzmd_r{replicate} if not provided.

  • output_file (str or None, optional) – SLURM log file pattern. Falls back to slurm_logs/{job_name}.%j.out relative to the directory where sbatch is invoked.

Returns:

Complete SLURM batch script content.

Return type:

str

save_script(script_content, output_path, make_executable=True)[source]

Save a script to a file.

Parameters:
  • script_content (str) – Script content.

  • output_path (str | Path) – Output file path.

  • make_executable (bool) – Whether to make the script executable.

Returns:

Path to the saved script.

Return type:

Path

Analysis SLURM Orchestration

Replicate-level SLURM orchestration for analysis comparisons.

This module provides a shared DAG submission layer for analysis plugins:

  • one replicate worker per (condition, replicate)

  • one aggregate worker per condition

  • one finalizer worker per analysis comparison

The DAG parallelizes at the per-replicate compute-stage boundary, with one SLURM job per (condition, replicate) pair. This per-replicate worker is the analysis lifecycle’s atomic unit. Sub-replicate parallelism (for example, per-run work inside SASA-style calculations) is intentionally handled inside each plugin’s compute path. Plugins can use internal threading/multiprocessing for that finer-grained work when needed.

class polyzymd.workflow.analysis_slurm.AnalysisSlurmResources(*, pixi_path='pixi', partition=None, qos=None, account=None, ntasks=1, cpus_per_task=1, mem='4G', time='01:00:00', max_retries=3, mail_user=None, mail_type='FAIL')[source]

Bases: BaseModel

SLURM resource settings for analysis workers.

pixi_path: str
partition: str | None
qos: str | None
account: str | None
ntasks: int
cpus_per_task: int
mem: str
time: str
max_retries: int
mail_user: str | None
mail_type: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.workflow.analysis_slurm.ReplicateTaskSpec(*, condition_index, replicate, condition_label, condition_slug)[source]

Bases: BaseModel

Task spec for one replicate job.

condition_index: int
replicate: int
condition_label: str
condition_slug: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.workflow.analysis_slurm.ConditionTaskSpec(*, condition_index, condition_label, condition_slug, replicate_specs)[source]

Bases: BaseModel

Task spec for one condition aggregate job.

condition_index: int
condition_label: str
condition_slug: str
replicate_specs: list[ReplicateTaskSpec]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.workflow.analysis_slurm.AnalysisJobManifest(*, analysis_name, comparison_yaml, condition_specs, settings_snapshot, snapshot_hash, pipeline_mode, partial_policy, equilibration, recompute, resources, created_at)[source]

Bases: BaseModel

Snapshot of inputs needed to run analysis workers.

analysis_name: str
comparison_yaml: str
condition_specs: list[ConditionTaskSpec]
settings_snapshot: dict[str, Any]
snapshot_hash: str
pipeline_mode: Literal['full', 'finalize_only']
partial_policy: Literal['strict', 'allow_partial']
equilibration: str
recompute: bool
resources: AnalysisSlurmResources
created_at: str
save(path)[source]

Save manifest as JSON.

classmethod load(path)[source]

Load manifest from JSON.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.workflow.analysis_slurm.SubmittedJobGraph(*, replicate_jobs, array_jobs=None, aggregator_jobs, finalizer_job_id)[source]

Bases: BaseModel

Submitted SLURM job IDs for analysis DAG nodes.

replicate_jobs: dict[tuple[int, int], str]
array_jobs: dict[str, str] | None
aggregator_jobs: dict[int, str]
finalizer_job_id: str
save(path)[source]

Save graph as JSON with portable keys.

classmethod load(path)[source]

Load graph from JSON with tuple/int key reconstruction.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.workflow.analysis_slurm.TaskStatus(*, state, attempt_count=0, error_message=None, last_updated, slurm_job_id=None)[source]

Bases: BaseModel

Task status persisted by worker wrappers.

state: Literal['pending', 'running', 'succeeded', 'failed', 'retrying']
attempt_count: int
error_message: str | None
last_updated: str
slurm_job_id: str | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

polyzymd.workflow.analysis_slurm.compute_manifest_snapshot_hash(analysis_name, settings_snapshot, condition_specs, equilibration)[source]

Compute deterministic hash for manifest-sensitive comparison inputs.

polyzymd.workflow.analysis_slurm.validate_manifest_snapshot(manifest, analysis, config)[source]

Validate that live comparison inputs match the manifest snapshot.

Returns:

Prepared conditions, resolved equilibration, and analysis root.

Return type:

tuple[list[Any], str, Path]

Raises:

RuntimeError – If current config/plugin settings drift from the submitted manifest.

polyzymd.workflow.analysis_slurm.update_task_status(status_path, state, attempt_count, error_message=None)[source]

Atomically write a task status JSON file.

polyzymd.workflow.analysis_slurm.build_manifest(analysis, config, resources, recompute, equilibration, allow_partial=False)[source]

Build submission manifest from comparison config and plugin settings.

polyzymd.workflow.analysis_slurm.generate_replicate_script(manifest, task_spec, resources, hpc_dir)[source]

Generate a replicate worker script with automatic retries.

polyzymd.workflow.analysis_slurm.generate_aggregate_script(manifest, cond_spec, resources, hpc_dir)[source]

Generate an aggregate worker script with retries.

polyzymd.workflow.analysis_slurm.generate_array_script(cond_spec, manifest, resources, replicates, hpc_dir)[source]

Generate one array worker script for all replicates of a condition.

Parameters:
  • cond_spec (ConditionTaskSpec) – Condition task specification from the manifest.

  • manifest (AnalysisJobManifest) – Submission manifest used by worker commands.

  • resources (AnalysisSlurmResources) – SLURM resource settings used in script header.

  • replicates (list[int]) – Replicate IDs included in this array job.

  • hpc_dir (Path) – Root directory where scripts and logs are written.

Returns:

Generated executable script path.

Return type:

Path

polyzymd.workflow.analysis_slurm.generate_finalize_script(manifest, resources, hpc_dir)[source]

Generate the final compare+plot worker script with retries.

polyzymd.workflow.analysis_slurm.reconcile_status_with_slurm(hpc_dir)[source]

Reconcile local status files with live SLURM accounting state.

Parameters:

hpc_dir (Path) – Root HPC artifact directory for one analysis submission.

Returns:

Summary with checked file count, update count, and per-file changes.

Return type:

dict[str, Any]

polyzymd.workflow.analysis_slurm.submit_analysis_graph(manifest, resources, hpc_dir, root_dependencies=())[source]

Submit replicate, aggregate, and finalizer jobs with dependencies.

Parameters:
  • manifest (AnalysisJobManifest) – Submission manifest describing all condition and replicate tasks.

  • resources (AnalysisSlurmResources) – SLURM resource settings used for generated scripts.

  • hpc_dir (Path) – Root directory where scripts, logs, and submission metadata are stored.

Returns:

Graph of submitted replicate, aggregate, and finalizer job IDs.

Return type:

SubmittedJobGraph

Raises:

RuntimeError – Propagated if any sbatch submission fails.

polyzymd.workflow.analysis_slurm.submit_analysis_graph_with_arrays(manifest, resources, hpc_dir)[source]

Submit one array per condition plus aggregate and finalizer jobs.

Parameters:
  • manifest (AnalysisJobManifest) – Submission manifest describing condition and replicate tasks.

  • resources (AnalysisSlurmResources) – SLURM resource settings used for generated scripts.

  • hpc_dir (Path) – Root directory where scripts, logs, and submission metadata are stored.

Returns:

Graph of submitted array, aggregate, and finalizer job IDs.

Return type:

SubmittedJobGraph

Raises:

RuntimeError – Propagated if any sbatch submission fails.

polyzymd.workflow.analysis_slurm.read_analysis_status(hpc_dir)[source]

Read all status files for one analysis HPC run.