Workflow Module

Job Submitter

Job submission for HPC SLURM scheduler.

This module provides utilities for submitting daisy-chain MD simulation jobs to SLURM. In PolyzyMD, daisy-chain is the canonical term for serial MD segments on preempted hardware: each replicate gets a job script that calls polyzymd run-segment, checks progress, and resubmits itself until the simulation is complete.

Changed in version 1.1.0: Standardized daisy-chain execution on self-resubmitting jobs, where each submission advances one serial MD segment before scheduling the next segment as needed.

polyzymd.workflow.daisy_chain.check_existing_slurm_jobs(job_name)[source]

Query SLURM for RUNNING or PENDING jobs that match job_name.

This is a best-effort check: if squeue is unavailable (e.g. in a non-SLURM environment or CI), a warning is logged and an empty list is returned so that submission proceeds unimpeded.

Parameters:: job_name (str) – The SLURM --job-name to search for (exact match).
Returns:: SLURM job IDs that are RUNNING or PENDING with the given name. Empty if squeue is unavailable or returns no matches.
Return type:: list of str

polyzymd.workflow.daisy_chain.create_job_name(sim_config, replicate)[source]

Create a descriptive SLURM job name for a replicate.

Produces names like r1_310K_Fibronectin_SBMA-OEGMA_A75_B25 matching the directory naming convention.

Parameters:

sim_config (SimulationConfig) – Validated simulation configuration.
replicate (int) – Replicate number.

Returns:

Formatted job name.

Return type:

str

class polyzymd.workflow.daisy_chain.DaisyChainConfig(slurm_config, total_production_time_ns, total_samples=2500, equilibration_time_ns=0.5, replicates=<factory>, dry_run=False, generate_only=False, force=False, output_script_dir=PosixPath('daisy_chain_scripts'), config_path='config.yaml')[source]

Bases: object

Configuration for daisy-chain job submission.

Daisy-chain submission is PolyzyMD’s canonical SLURM workflow for serial MD segments on preempted hardware. Each replicate is managed by one self-resubmitting job script that advances the trajectory segment by segment until production is complete.

slurm_config

SLURM job configuration.

Type:: SlurmConfig

total_production_time_ns

Total production time in nanoseconds.

Type:: float

total_samples

Total trajectory frames across the entire production run.

Type:: int

equilibration_time_ns

Equilibration time (informational only).

Type:: float

replicates

Replicate numbers to run.

Type:: list of int

dry_run

If True, preview only. No scripts are written and no jobs are submitted.

Type:: bool

generate_only

If True, create scripts but don’t submit.

Type:: bool

force

If True, skip the squeue duplicate-job check and submit even if a RUNNING/PENDING job already exists for the same replicate.

Type:: bool

output_script_dir

Directory for generated job scripts.

Type:: Path

config_path

Path to the YAML configuration file.

Type:: str

slurm_config: SlurmConfig

total_production_time_ns: float

total_samples: int = 2500

equilibration_time_ns: float = 0.5

replicates: List[int]

dry_run: bool = False

generate_only: bool = False

force: bool = False

output_script_dir: Path = PosixPath('daisy_chain_scripts')

config_path: str = 'config.yaml'

classmethod from_simulation_config(sim_config, slurm_config, replicates='1', dry_run=False, generate_only=False, force=False, output_script_dir='daisy_chain_scripts', config_path='config.yaml')[source]

Create DaisyChainConfig from a SimulationConfig.

Parameters:

sim_config (SimulationConfig) – Simulation configuration.
slurm_config (SlurmConfig) – SLURM configuration.
replicates (str or list of int) – Replicate range string (e.g. "1-5") or list of ints.
dry_run (bool) – If True, preview only and write no files.
generate_only (bool) – If True, create scripts without submitting.
force (bool) – If True, skip duplicate-job check.
output_script_dir (str or Path) – Directory for job scripts.
config_path (str) – Path to the YAML configuration file.

Returns:

Configured instance.

Return type:

DaisyChainConfig

__init__(slurm_config, total_production_time_ns, total_samples=2500, equilibration_time_ns=0.5, replicates=<factory>, dry_run=False, generate_only=False, force=False, output_script_dir=PosixPath('daisy_chain_scripts'), config_path='config.yaml')

class polyzymd.workflow.daisy_chain.SubmissionResult(job_id, script_path, segment_index, replicate, is_dry_run=False, is_generated_only=False)[source]

Bases: object

Result of job submission.

job_id

SLURM job ID (or dummy ID for dry run).

Type:: str

script_path

Path to the generated script.

Type:: Path

segment_index

Initial segment index for the self-resubmitting daisy-chain job.

Type:: int

replicate

Replicate number.

Type:: int

is_dry_run

Whether this was a dry run.

Type:: bool

is_generated_only

Whether this was a generate-only script output.

Type:: bool

job_id: str

script_path: Path

segment_index: int

replicate: int

is_dry_run: bool = False

is_generated_only: bool = False

__init__(job_id, script_path, segment_index, replicate, is_dry_run=False, is_generated_only=False)

class polyzymd.workflow.daisy_chain.DaisyChainSubmitter(sim_config, dc_config, pixi_env='cuda-12-4', openff_logs=False, skip_build=False)[source]

Bases: object

Handle daisy-chain job submission for MD simulations.

In PolyzyMD’s daisy-chain model, each replicate gets a single self-resubmitting job script. The script calls polyzymd run-segment, checks progress, and resubmits itself to run serial MD segments until the simulation is complete.

Example

>>> sim_config = SimulationConfig.from_yaml("config.yaml")
>>> slurm_config = SlurmConfig.from_preset("aa100", email="user@example.com")
>>> dc_config = DaisyChainConfig.from_simulation_config(
...     sim_config, slurm_config, replicates="1-3"
... )
>>> submitter = DaisyChainSubmitter(sim_config, dc_config)
>>> results = submitter.submit_all()

__init__(sim_config, dc_config, pixi_env='cuda-12-4', openff_logs=False, skip_build=False)[source]

Initialize the submitter.

Parameters:

sim_config (SimulationConfig) – Simulation configuration.
dc_config (DaisyChainConfig) – Submission configuration.
pixi_env (str) – Pixi environment name (e.g. "cuda-12-4", "cuda-12-6").
openff_logs (bool) – Enable verbose OpenFF logs in generated scripts.
skip_build (bool) – Skip system building in generated scripts.

property sim_config: SimulationConfig: Get the simulation configuration.

property dc_config: DaisyChainConfig: Get the submission configuration.

property job_chains: Dict[int, List[SubmissionResult]]: Get the submission results for all replicates.

generate_job_script(replicate)[source]

Generate a self-resubmitting job script for a replicate.

Parameters:: replicate (int) – Replicate number.
Returns:: Complete SLURM batch script content.
Return type:: str

submit_replicate(replicate)[source]

Generate and submit the job for a single replicate.

Before submitting, checks squeue for existing RUNNING/PENDING jobs with the same job name. If duplicates are found and force is not set, raises RuntimeError.

Parameters:: replicate (int) – Replicate number.
Returns:: Submission result.
Return type:: SubmissionResult
Raises:: RuntimeError – If a SLURM job is already RUNNING or PENDING for this replicate and force is False.

submit_all()[source]

Submit jobs for all replicates.

Returns:: Mapping of replicate numbers to daisy-chain submission results.
Return type:: dict

polyzymd.workflow.daisy_chain.submit_daisy_chain(config_path, slurm_preset='aa100', replicates='1', email='', dry_run=False, generate_only=False, force=False, pixi_env='cuda-12-4', output_dir=None, scratch_dir=None, projects_dir=None, time_limit=None, memory=None, account=None, partition=None, qos=None, gpu_type=None, constraint=None, nodelist=None, openff_logs=False, skip_build=False)[source]

Submit daisy-chain simulation jobs from a YAML config.

This is the main entry point called by polyzymd submit. Daisy-chain is PolyzyMD’s canonical term for serial MD segments on preempted hardware; this function submits one self-resubmitting job per replicate to advance those segments until completion.

Parameters:

config_path (str or Path) – Path to simulation YAML config.
slurm_preset (str) – SLURM preset name (aa100, al40, blanca-shirts, bridges2, testing).
replicates (str) – Replicate range string (e.g. "1-5", "1,3,5").
email (str) – Email for job notifications.
dry_run (bool) – If True, preview only and write no files.
generate_only (bool) – If True, create scripts without submitting.
force (bool) – If True, skip the squeue duplicate-job check.
pixi_env (str) – Pixi environment name (e.g. "cuda-12-4", "cuda-12-6").
output_dir (str or Path or None) – Directory for job scripts.
scratch_dir (str or Path or None) – Override scratch directory for simulation output.
projects_dir (str or Path or None) – Override projects directory for scripts/logs.
time_limit (str or None) – Override SLURM time limit (format: HH:MM:SS).
memory (str or None) – Override SLURM memory allocation (e.g. "4G").
account (str or None) – Override SLURM account / allocation ID.
partition (str or None) – Override SLURM partition.
qos (str or None) – Override SLURM QoS value.
gpu_type (str or None) – Override GPU type for presets that use --gpus directive.
constraint (str or None) – SLURM --constraint expression (e.g. "A40|A100").
nodelist (str or None) – Optional SLURM --nodelist override.
openff_logs (bool) – Enable verbose OpenFF logs in generated scripts.
skip_build (bool) – Skip system building in generated scripts.

Returns:

Mapping of replicate numbers to submission results.

Return type:

dict

Raises:

ValueError – If the SLURM account is empty on a preset that requires one and neither dry_run nor generate_only is set.

SLURM Configuration

SLURM job script generation for HPC cluster submission.

This module provides templates and utilities for generating SLURM batch scripts for self-resubmitting MD simulation jobs.

Changed in version 1.1.0: Replaced conda/module-load environment activation with pixi. The module_load and conda_command fields on SlurmConfig have been removed. Environment activation is now handled by pixi shell-hook using the pixi_env parameter on SlurmScriptGenerator.

class polyzymd.workflow.slurm.SlurmConfig(partition='aa100', qos='normal', account='ucb625_asc1', time_limit='23:59:59', email='', nodes=1, ntasks=1, cpus_per_task=1, memory='3G', gpus=1, exclude=None, nodelist=None, gpu_type=None, gpu_directive_style='gres', constraint=None)[source]

Bases: object

Configuration for SLURM job submission.

partition

SLURM partition(s) to use.

Type:: str

qos

Quality of service. Set to "" to omit the --qos directive entirely (required for clusters such as Bridges2 that do not use QoS).

Type:: str

account

Account / allocation ID for resource allocation. Set to "" to omit the --account directive entirely (e.g. Bridges2, which infers the allocation from the submitting user’s login).

Type:: str

time_limit

Wall time limit (HH:MM:SS).

Type:: str

email

Email address for SLURM failure notifications. Set to "" to omit both --mail-type and --mail-user directives.

Type:: str

nodes

Number of nodes.

Type:: int

ntasks

Number of tasks. Ignored when gpu_directive_style == "gpus" (Bridges2-style); those scripts emit #SBATCH -N {nodes} only.

Type:: int

cpus_per_task

Number of CPUs allocated per task.

Type:: int

memory

Memory allocation (e.g. "3G"). Set to None to omit the --mem directive entirely (some clusters allocate memory per GPU and reject an explicit --mem request).

Type:: str | None

gpus

Number of GPUs.

Type:: int

exclude

Nodes to exclude (omitted when None).

Type:: str | None

nodelist

Optional SLURM --nodelist value.

Type:: str | None

gpu_type

Optional GPU type string used with the --gpus directive (e.g. "v100-32" for Bridges2). When None the classic --gres=gpu:<N> directive is emitted instead.

Type:: str | None

gpu_directive_style

"gres" (default, Alpine-style) or "gpus" (Bridges2-style). Controls which SBATCH GPU directive is written. Also governs which nodes/ntasks format is emitted.

Type:: str

constraint

Optional SLURM --constraint expression. Supports boolean expressions with | (OR) and & (AND), such as "A40|A100".

Type:: str | None

partition: str = 'aa100'

qos: str = 'normal'

account: str = 'ucb625_asc1'

time_limit: str = '23:59:59'

email: str = ''

nodes: int = 1

ntasks: int = 1

cpus_per_task: int = 1

memory: str | None = '3G'

gpus: int = 1

exclude: str | None = None

nodelist: str | None = None

gpu_type: str | None = None

gpu_directive_style: str = 'gres'

constraint: str | None = None

classmethod from_preset(preset, email='')[source]

Create a SlurmConfig from a named preset.

Parameters:

preset (Literal['aa100', 'al40', 'blanca-shirts', 'bridges2', 'testing']) – Preset name.
email (str) – Email for notifications.

Returns:

SlurmConfig with preset values.

Return type:

SlurmConfig

__init__(partition='aa100', qos='normal', account='ucb625_asc1', time_limit='23:59:59', email='', nodes=1, ntasks=1, cpus_per_task=1, memory='3G', gpus=1, exclude=None, nodelist=None, gpu_type=None, gpu_directive_style='gres', constraint=None)

class polyzymd.workflow.slurm.JobContext(job_name, output_file, scratch_dir, projects_dir='.', segment_index=0, replicate_num=1, extra_vars=<factory>)[source]

Bases: object

Context for job script template rendering.

job_name

SLURM job name.

Type:: str

output_file

Output file pattern (for SLURM logs).

Type:: str

scratch_dir

Directory for simulation output (trajectories, checkpoints).

Type:: str

projects_dir

Directory for scripts and logs.

Type:: str

segment_index

Current segment index.

Type:: int

replicate_num

Replicate number.

Type:: int

extra_vars

Additional template variables.

Type:: Dict

job_name: str

output_file: str

scratch_dir: str

projects_dir: str = '.'

segment_index: int = 0

replicate_num: int = 1

extra_vars: Dict

__init__(job_name, output_file, scratch_dir, projects_dir='.', segment_index=0, replicate_num=1, extra_vars=<factory>)

class polyzymd.workflow.slurm.SlurmScriptGenerator(config, pixi_env='cuda-12-4', openff_logs=False, skip_build=False)[source]

Bases: object

Generator for SLURM batch scripts.

Supports separate directories for: - projects_dir: Where scripts live and jobs are submitted from - scratch_dir: Where simulation output goes (trajectories, checkpoints)

Example

>>> config = SlurmConfig.from_preset("aa100", email="user@example.com")
>>> generator = SlurmScriptGenerator(config)
>>> script = generator.generate_job_script(
...     config_path="/projects/user/config.yaml",
...     replicate=1,
...     working_dir="/scratch/user/sim_output",
... )

JOB_TEMPLATE = '#!/bin/bash\n#SBATCH --partition={{ partition }}\n#SBATCH --job-name={{ job_name }}\n#SBATCH --output={{ output_file }}\n{{ qos_line }}\n{{ nodes_line }}\n{{ cpus_line }}\n{{ mem_line }}\n#SBATCH --time={{ time_limit }}\n{{ gpu_line }}\n{{ mail_line }}\n{{ account_line }}\n{{ exclude_line }}\n{{ nodelist_line }}\n{{ constraint_line }}\n#SBATCH --signal=B:USR1@300\n#SBATCH --no-requeue\n\n# =============================================================================\n# PolyzyMD Self-Resubmitting Simulation Job\n# {{ FULL_CREDIT_LINE }}\n# Generated by polyzymd — do not edit manually\n# =============================================================================\n\n# Activate pixi environment\n# The manifest path was resolved at submission time from `which polyzymd`.\neval "$(pixi shell-hook -e {{ pixi_env | shell_quote }} --manifest-path {{ manifest_path | shell_quote }})"\n\n# Enable strict error handling after environment setup\nset -e\n\n# Required for OpenFF Interchange.combine() functionality\nexport INTERCHANGE_EXPERIMENTAL=1\n\n# Resolve this script\'s path for self-resubmission.\n# $SLURM_JOB_SCRIPT is only available in SLURM >= 22.05; fall back to $0.\nTHIS_SCRIPT="${SLURM_JOB_SCRIPT:-$(realpath "$0")}"\n\n# Configuration\nCONFIG_PATH="{{ config_path }}"\nREPLICATE={{ replicate }}\nWORKING_DIR="{{ working_dir }}"\n\n# Ensure working directory exists\nmkdir -p "$WORKING_DIR"\n\necho "=================================================="\necho "PolyzyMD self-resubmitting job"\necho "{{ FULL_CREDIT_LINE }}"\necho "Config: $CONFIG_PATH"\necho "Replicate: $REPLICATE"\necho "Work dir: $WORKING_DIR"\necho "Pixi env: {{ pixi_env }}"\necho "Job ID: ${SLURM_JOB_ID:-local}"\necho "Timestamp: $(date)"\necho "=================================================="\n\n# =========================================================================\n# Signal forwarding: SLURM sends signals to the batch shell, not to child\n# processes. We trap SIGUSR1 (wall-time warning) and SIGTERM (preemption)\n# and forward them to the Python process running in the background.\n# =========================================================================\nCHILD_PID=""\nforward_signal() {\n if [ -n "$CHILD_PID" ] && kill -0 "$CHILD_PID" 2>/dev/null; then\n echo "Forwarding $1 to Python process (PID $CHILD_PID)"\n kill -"$1" "$CHILD_PID"\n fi\n}\ntrap \'forward_signal USR1\' USR1\ntrap \'forward_signal TERM\' TERM\n\n# Run the next segment (backgrounded for signal forwarding)\npolyzymd{{ openff_logs_flag }} run-segment \\\n -c "$CONFIG_PATH" \\\n -r "$REPLICATE" \\\n --scratch-dir "$WORKING_DIR"{{ skip_build_flag }} &\nCHILD_PID=$!\n\n# Wait for the child; \'wait\' is interrupted by trapped signals, so loop\n# until the child actually exits. Temporarily disable \'set -e\' so we can\n# capture non-zero exit codes (e.g. 99 for graceful shutdown) without the\n# shell exiting prematurely.\nset +e\nwait "$CHILD_PID" 2>/dev/null\nRC=$?\nwhile kill -0 "$CHILD_PID" 2>/dev/null; do\n wait "$CHILD_PID" 2>/dev/null\n RC=$?\ndone\nset -e\n\necho "run-segment exited with code $RC at $(date)"\n\n# =========================================================================\n# Resubmission logic\n# =========================================================================\nif [ $RC -eq 2 ]; then\n echo "CONCURRENT: Another job is already running this replicate — NOT resubmitting."\n echo "This duplicate job chain will now terminate cleanly."\n exit 0\nfi\n\nif [ $RC -ne 0 ] && [ $RC -ne 99 ]; then\n echo "FATAL: run-segment failed (exit code $RC) — NOT resubmitting"\n exit $RC\nfi\n\n# Check whether more work remains\nset +e\npolyzymd check-progress -c "$CONFIG_PATH" -r "$REPLICATE" --scratch-dir "$WORKING_DIR"\nPROGRESS_RC=$?\nset -e\n\nif [ $PROGRESS_RC -eq 0 ]; then\n echo "Simulation complete — no resubmission needed."\n exit 0\nfi\n\nif [ $PROGRESS_RC -ne 1 ]; then\n echo "FATAL: check-progress failed (exit code $PROGRESS_RC) — NOT resubmitting"\n exit $PROGRESS_RC\nfi\n\n# Work remains (exit code 1) — resubmit this same script\necho "Work remains — resubmitting job..."\nset +e\nsbatch "$THIS_SCRIPT"\nSUBMIT_RC=$?\nset -e\n\nif [ $SUBMIT_RC -eq 0 ]; then\n echo "Resubmitted successfully."\nelse\n echo "WARNING: sbatch resubmission failed (exit code $SUBMIT_RC)"\n echo "You can manually resume with:"\n echo " sbatch $THIS_SCRIPT"\n exit 1\nfi\n\nexit 0\n'

__init__(config, pixi_env='cuda-12-4', openff_logs=False, skip_build=False)[source]

Initialize the generator.

Parameters:

config (SlurmConfig) – SLURM configuration.
pixi_env (str) – Pixi environment name (e.g. "cuda-12-4", "cuda-12-6").
openff_logs (bool) – Enable verbose OpenFF logs in generated scripts.
skip_build (bool) – Skip system building in generated scripts (use pre-built system).

property config: SlurmConfig: Get the SLURM configuration.

generate_job_script(config_path, replicate, working_dir, job_name=None, output_file=None)[source]

Generate a self-resubmitting SLURM job script.

This produces a single script that handles the entire simulation lifecycle. Each invocation calls polyzymd run-segment which determines what work remains, runs the next segment, and exits. The bash wrapper then checks progress and resubmits itself if more work is needed.

Parameters:

config_path (str) – Absolute path to the YAML configuration file.
replicate (int) – Replicate number.
working_dir (str) – Directory for simulation output (trajectories, checkpoints).
job_name (str or None, optional) – SLURM job name. Callers should use create_job_name() to produce descriptive names (e.g. r1_310K_Fibronectin_...). Falls back to pzmd_r{replicate} if not provided.
output_file (str or None, optional) – SLURM log file pattern. Falls back to slurm_logs/{job_name}.%j.out relative to the directory where sbatch is invoked.

Returns:

Complete SLURM batch script content.

Return type:

str

save_script(script_content, output_path, make_executable=True)[source]

Save a script to a file.

Parameters:

script_content (str) – Script content.
output_path (str | Path) – Output file path.
make_executable (bool) – Whether to make the script executable.

Returns:

Path to the saved script.

Return type:

Path

Analysis SLURM Orchestration

Replicate-level SLURM orchestration for analysis comparisons.

This module provides a shared DAG submission layer for analysis plugins:

one replicate worker per (condition, replicate)
one aggregate worker per condition
one finalizer worker per analysis comparison

The DAG parallelizes at the per-replicate compute-stage boundary, with one SLURM job per (condition, replicate) pair. This per-replicate worker is the analysis lifecycle’s atomic unit. Sub-replicate parallelism (for example, per-run work inside SASA-style calculations) is intentionally handled inside each plugin’s compute path. Plugins can use internal threading/multiprocessing for that finer-grained work when needed.

class polyzymd.workflow.analysis_slurm.AnalysisSlurmResources(*, pixi_path='pixi', partition=None, qos=None, account=None, ntasks=1, cpus_per_task=1, mem='4G', time='01:00:00', max_retries=3, mail_user=None, mail_type='FAIL')[source]

Bases: BaseModel

SLURM resource settings for analysis workers.

pixi_path: str

partition: str | None

qos: str | None

account: str | None

ntasks: int

cpus_per_task: int

mem: str

time: str

max_retries: int

mail_user: str | None

mail_type: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.workflow.analysis_slurm.ReplicateTaskSpec(*, condition_index, replicate, condition_label, condition_slug)[source]

Bases: BaseModel

Task spec for one replicate job.

condition_index: int

replicate: int

condition_label: str

condition_slug: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.workflow.analysis_slurm.ConditionTaskSpec(*, condition_index, condition_label, condition_slug, replicate_specs)[source]

Bases: BaseModel

Task spec for one condition aggregate job.

condition_index: int

condition_label: str

condition_slug: str

replicate_specs: list[ReplicateTaskSpec]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.workflow.analysis_slurm.AnalysisJobManifest(*, analysis_name, comparison_yaml, condition_specs, settings_snapshot, snapshot_hash, pipeline_mode, partial_policy, equilibration, recompute, resources, created_at)[source]

Bases: BaseModel

Snapshot of inputs needed to run analysis workers.

analysis_name: str

comparison_yaml: str

condition_specs: list[ConditionTaskSpec]

settings_snapshot: dict[str, Any]

snapshot_hash: str

pipeline_mode: Literal['full', 'finalize_only']

partial_policy: Literal['strict', 'allow_partial']

equilibration: str

recompute: bool

resources: AnalysisSlurmResources

created_at: str

save(path)[source]

Save manifest as JSON.

classmethod load(path)[source]

Load manifest from JSON.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.workflow.analysis_slurm.SubmittedJobGraph(*, replicate_jobs, array_jobs=None, aggregator_jobs, finalizer_job_id)[source]

Bases: BaseModel

Submitted SLURM job IDs for analysis DAG nodes.

replicate_jobs: dict[tuple[int, int], str]

array_jobs: dict[str, str] | None

aggregator_jobs: dict[int, str]

finalizer_job_id: str

save(path)[source]

Save graph as JSON with portable keys.

classmethod load(path)[source]

Load graph from JSON with tuple/int key reconstruction.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.workflow.analysis_slurm.TaskStatus(*, state, attempt_count=0, error_message=None, last_updated, slurm_job_id=None)[source]

Bases: BaseModel

Task status persisted by worker wrappers.

state: Literal['pending', 'running', 'succeeded', 'failed', 'retrying']

attempt_count: int

error_message: str | None

last_updated: str

slurm_job_id: str | None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

polyzymd.workflow.analysis_slurm.compute_manifest_snapshot_hash(analysis_name, settings_snapshot, condition_specs, equilibration)[source]

Compute deterministic hash for manifest-sensitive comparison inputs.

polyzymd.workflow.analysis_slurm.validate_manifest_snapshot(manifest, analysis, config)[source]

Validate that live comparison inputs match the manifest snapshot.

Returns:: Prepared conditions, resolved equilibration, and analysis root.
Return type:: tuple[list[Any], str, Path]
Raises:: RuntimeError – If current config/plugin settings drift from the submitted manifest.

polyzymd.workflow.analysis_slurm.update_task_status(status_path, state, attempt_count, error_message=None)[source]

Atomically write a task status JSON file.

polyzymd.workflow.analysis_slurm.build_manifest(analysis, config, resources, recompute, equilibration, allow_partial=False)[source]

Build submission manifest from comparison config and plugin settings.

polyzymd.workflow.analysis_slurm.generate_replicate_script(manifest, task_spec, resources, hpc_dir)[source]

Generate a replicate worker script with automatic retries.

polyzymd.workflow.analysis_slurm.generate_aggregate_script(manifest, cond_spec, resources, hpc_dir)[source]

Generate an aggregate worker script with retries.

polyzymd.workflow.analysis_slurm.generate_array_script(cond_spec, manifest, resources, replicates, hpc_dir)[source]

Generate one array worker script for all replicates of a condition.

Parameters:

cond_spec (ConditionTaskSpec) – Condition task specification from the manifest.
manifest (AnalysisJobManifest) – Submission manifest used by worker commands.
resources (AnalysisSlurmResources) – SLURM resource settings used in script header.
replicates (list[int]) – Replicate IDs included in this array job.
hpc_dir (Path) – Root directory where scripts and logs are written.

Returns:

Generated executable script path.

Return type:

Path

polyzymd.workflow.analysis_slurm.generate_finalize_script(manifest, resources, hpc_dir)[source]

Generate the final compare+plot worker script with retries.

polyzymd.workflow.analysis_slurm.reconcile_status_with_slurm(hpc_dir)[source]

Reconcile local status files with live SLURM accounting state.

Parameters:: hpc_dir (Path) – Root HPC artifact directory for one analysis submission.
Returns:: Summary with checked file count, update count, and per-file changes.
Return type:: dict[str, Any]

polyzymd.workflow.analysis_slurm.submit_analysis_graph(manifest, resources, hpc_dir, root_dependencies=())[source]

Submit replicate, aggregate, and finalizer jobs with dependencies.

Parameters:

manifest (AnalysisJobManifest) – Submission manifest describing all condition and replicate tasks.
resources (AnalysisSlurmResources) – SLURM resource settings used for generated scripts.
hpc_dir (Path) – Root directory where scripts, logs, and submission metadata are stored.

Returns:

Graph of submitted replicate, aggregate, and finalizer job IDs.

Return type:

SubmittedJobGraph

Raises:

RuntimeError – Propagated if any sbatch submission fails.

polyzymd.workflow.analysis_slurm.submit_analysis_graph_with_arrays(manifest, resources, hpc_dir)[source]

Submit one array per condition plus aggregate and finalizer jobs.

Parameters:

manifest (AnalysisJobManifest) – Submission manifest describing condition and replicate tasks.
resources (AnalysisSlurmResources) – SLURM resource settings used for generated scripts.
hpc_dir (Path) – Root directory where scripts, logs, and submission metadata are stored.

Returns:

Graph of submitted array, aggregate, and finalizer job IDs.

Return type:

SubmittedJobGraph

Raises:

RuntimeError – Propagated if any sbatch submission fails.

polyzymd.workflow.analysis_slurm.read_analysis_status(hpc_dir)[source]

Read all status files for one analysis HPC run.