# Run PolyzyMD on SLURM Clusters Use this guide when you already have a working `config.yaml` and want the shortest path to a reliable SLURM submission workflow. PolyzyMD generates self-resubmitting job scripts. Each replicate runs one segment, checks whether more work remains, and resubmits itself when needed. That lets long simulations continue across wall-time limits without requiring manual dependency chains. ## Before you start - validate your config locally first - know which `pixi` CUDA environment matches your cluster - know which SLURM preset you want to use If you are still setting up the project itself, start with {doc}`quickstart`. ## Step 1: validate and dry-run locally From the repository root or a subdirectory under it: ```bash pixi run -e build polyzymd validate -c config.yaml pixi run -e cuda-12-4 polyzymd submit -c config.yaml --preset aa100 --replicates 1 --dry-run ``` The dry run should create a script in `job_scripts/` without submitting it. ## Step 2: pick a preset PolyzyMD includes presets for common clusters: | Preset | Cluster style | Typical use | |--------|---------------|-------------| | `aa100` | CU Boulder Alpine A100 | main production runs | | `al40` | CU Boulder Alpine L40 | production runs on L40 nodes | | `blanca-shirts` | CU Boulder Blanca | preemptable or lab-specific runs | | `testing` | short queue | smoke tests only | | `bridges2` | PSC Bridges2 | Bridges2 GPU jobs | Use `testing` first when you are verifying a new system or workflow. ## Step 3: submit one small test job Run a short job before launching many replicates: ```bash pixi run -e cuda-12-4 polyzymd submit \ -c config.yaml \ --preset testing \ --time-limit 0:05:00 \ --replicates 1 ``` This is the fastest way to catch bad paths, scheduler issues, or environment problems. ## Step 4: submit your real run Once the short test succeeds, submit production jobs: ```bash pixi run -e cuda-12-4 polyzymd submit \ -c config.yaml \ --preset aa100 \ --replicates 1-5 \ --email your.email@university.edu ``` Useful variants: ```bash # Override storage locations pixi run -e cuda-12-4 polyzymd submit \ -c config.yaml \ --preset aa100 \ --projects-dir /projects/$USER/polyzymd \ --scratch-dir /scratch/alpine/$USER/polyzymd_sims # Give a larger system more RAM pixi run -e cuda-12-4 polyzymd submit \ -c config.yaml \ --preset aa100 \ --memory 8G ``` ## Monitor jobs Use normal SLURM tools for the scheduler view: ```bash squeue -u $USER scontrol show job tail -f slurm_logs/*.out ``` Use PolyzyMD for simulation progress: ```bash pixi run -e cuda-12-4 polyzymd status -c config.yaml pixi run -e cuda-12-4 polyzymd check-progress -c config.yaml -r 1 ``` ## Recover a stalled replicate If a replicate stops progressing, inspect it first: ```bash pixi run -e cuda-12-4 polyzymd recover -c config.yaml -r 1 ``` If the report shows unfinished work, resubmit a recovery job: ```bash pixi run -e cuda-12-4 polyzymd recover -c config.yaml -r 1 --submit --preset aa100 ``` ## Cluster-specific note for Bridges2 Use the `bridges2` preset when running on PSC Bridges2: ```bash pixi run -e cuda-12-6 polyzymd submit \ -c config.yaml \ --preset bridges2 \ --account abc123_gpu \ --replicates 1-3 ``` Common Bridges2 differences: - it uses the `cuda-12-6` environment - you may need `--account` if you want to charge a specific allocation - GPU selection can be adjusted with `--gpu-type` ## What the generated scripts do Each generated script follows the same loop: 1. activate the selected `pixi` environment 2. run `polyzymd run-segment` 3. call `polyzymd check-progress` 4. resubmit itself if work remains That is why long runs can continue automatically after wall-time expiry or a graceful interruption. ## Common fixes ### `pixi: command not found` Make sure `pixi` is available in non-interactive shells, not only your login shell setup. ### job dies with OOM Increase `--memory`, reduce system size, or test with fewer polymers. ### config path no longer exists The generated script stores the config path it was given at submission time. If you move the config, regenerate the scripts and resubmit. ### need to stop a job permanently Because standard cancellation can trigger graceful restart behavior, use: ```bash scancel --signal=KILL ``` ## Related reference pages - command details: {doc}`cli_reference` - configuration fields: {doc}`configuration` - first-run setup: {doc}`quickstart`