Run PolyzyMD on SLURM Clusters
Use this guide when you already have a working config.yaml and want the
shortest path to a reliable SLURM submission workflow.
PolyzyMD generates self-resubmitting job scripts. Each replicate runs one segment, checks whether more work remains, and resubmits itself when needed. That lets long simulations continue across wall-time limits without requiring manual dependency chains.
Before you start
validate your config locally first
know which
pixiCUDA environment matches your clusterknow which SLURM preset you want to use
If you are still setting up the project itself, start with Run Your First PolyzyMD Simulation.
Step 1: validate and dry-run locally
From the repository root or a subdirectory under it:
pixi run -e build polyzymd validate -c config.yaml
pixi run -e cuda-12-4 polyzymd submit -c config.yaml --preset aa100 --replicates 1 --dry-run
The dry run should create a script in job_scripts/ without submitting it.
Step 2: pick a preset
PolyzyMD includes presets for common clusters:
Preset |
Cluster style |
Typical use |
|---|---|---|
|
CU Boulder Alpine A100 |
main production runs |
|
CU Boulder Alpine L40 |
production runs on L40 nodes |
|
CU Boulder Blanca |
preemptable or lab-specific runs |
|
short queue |
smoke tests only |
|
PSC Bridges2 |
Bridges2 GPU jobs |
Use testing first when you are verifying a new system or workflow.
Step 3: submit one small test job
Run a short job before launching many replicates:
pixi run -e cuda-12-4 polyzymd submit \
-c config.yaml \
--preset testing \
--time-limit 0:05:00 \
--replicates 1
This is the fastest way to catch bad paths, scheduler issues, or environment problems.
Step 4: submit your real run
Once the short test succeeds, submit production jobs:
pixi run -e cuda-12-4 polyzymd submit \
-c config.yaml \
--preset aa100 \
--replicates 1-5 \
--email your.email@university.edu
Useful variants:
# Override storage locations
pixi run -e cuda-12-4 polyzymd submit \
-c config.yaml \
--preset aa100 \
--projects-dir /projects/$USER/polyzymd \
--scratch-dir /scratch/alpine/$USER/polyzymd_sims
# Give a larger system more RAM
pixi run -e cuda-12-4 polyzymd submit \
-c config.yaml \
--preset aa100 \
--memory 8G
Monitor jobs
Use normal SLURM tools for the scheduler view:
squeue -u $USER
scontrol show job <job_id>
tail -f slurm_logs/*.out
Use PolyzyMD for simulation progress:
pixi run -e cuda-12-4 polyzymd status -c config.yaml
pixi run -e cuda-12-4 polyzymd check-progress -c config.yaml -r 1
Recover a stalled replicate
If a replicate stops progressing, inspect it first:
pixi run -e cuda-12-4 polyzymd recover -c config.yaml -r 1
If the report shows unfinished work, resubmit a recovery job:
pixi run -e cuda-12-4 polyzymd recover -c config.yaml -r 1 --submit --preset aa100
Cluster-specific note for Bridges2
Use the bridges2 preset when running on PSC Bridges2:
pixi run -e cuda-12-6 polyzymd submit \
-c config.yaml \
--preset bridges2 \
--account abc123_gpu \
--replicates 1-3
Common Bridges2 differences:
it uses the
cuda-12-6environmentyou may need
--accountif you want to charge a specific allocationGPU selection can be adjusted with
--gpu-type
What the generated scripts do
Each generated script follows the same loop:
activate the selected
pixienvironmentrun
polyzymd run-segmentcall
polyzymd check-progressresubmit itself if work remains
That is why long runs can continue automatically after wall-time expiry or a graceful interruption.
Common fixes
pixi: command not found
Make sure pixi is available in non-interactive shells, not only your login
shell setup.
job dies with OOM
Increase --memory, reduce system size, or test with fewer polymers.
config path no longer exists
The generated script stores the config path it was given at submission time. If you move the config, regenerate the scripts and resubmit.
need to stop a job permanently
Because standard cancellation can trigger graceful restart behavior, use:
scancel --signal=KILL <job_id>