Architecture

This page explains how PolyzyMD is organized, why the major subsystems are separated, and where contributors should look when they need to extend a workflow.

The high-level shape of the project

PolyzyMD is organized around a simulation lifecycle:

load and validate configuration
build a molecular system
run simulation workflows locally or through SLURM
analyze trajectories
compare conditions and plot results

That lifecycle is reflected in the package layout:

src/polyzymd/
|- cli/
|- config/
|- builders/
|- simulation/
|- workflow/
|- analyses/     # ★ plugin system — unified analysis lifecycle
|- config/comparison.py  # comparison config and plot settings
|- exporters/
|- core/
`- utils/

What each area is responsible for

`cli/`

Defines the command-line interface and maps user commands onto the lower-level workflow code.

`config/`

Holds the schema and loading logic for YAML configuration. If a user-facing setting needs validation, this is usually the first place to inspect.

`builders/`

Turns input structures into a simulation-ready system by assembling enzyme, substrate, polymer, and solvent components.

`simulation/`

Runs minimization, equilibration, continuation, checkpoints, and production segments.

`workflow/`

Handles orchestration around the simulation engine, especially SLURM job generation, resubmission, and recovery flows.

`analyses/`

The plugin system — the primary extension point for contributors. Each analysis plugin contains its own compute logic, aggregation, comparison, plotting, and formatting in a unified lifecycle: compute → aggregate → compare → plot → format.

To add a new analysis, create a package in analyses/<name>/ that subclasses Analysis, or use polyzymd new-analysis <name> to scaffold one automatically. See Extending the Analysis Framework for the full guide.

Comparison infrastructure (distributed)

Comparison functionality is split across focused modules:

config/comparison.py for comparison config and plotting settings
cli/compare.py for polyzymd compare subcommands
analyses/shared/inferential_statistics.py for t-tests, ANOVA, and effect sizes
analyses/shared/result_io.py for result discovery and loading
analyses/shared/paths.py for label/path helpers such as sanitize_label()

Established analysis plugins delegate plotting to _plotters.py modules within their package; the plot() method in __init__.py orchestrates what to plot.

`core/` and `utils/`

Provide shared infrastructure such as common types, experimental workflow labeling, and helper functionality that should not be duplicated across the package.

How data moves through the system

At a conceptual level, the flow looks like this:

config.yaml
  -> config schema
  -> system builders
  -> OpenMM-ready simulation objects
  -> local or SLURM execution
  -> analysis results on disk
  -> cross-condition comparisons
  -> plots and reports

This separation is intentional:

users can stop after building or running
analysis can be repeated without rebuilding simulations
comparison workflows can reuse cached analysis outputs
plotting can be rerun without recomputing the underlying statistics

Design patterns you will encounter

Lazy imports for heavy dependencies

Modules that depend on OpenMM or MDAnalysis often import those packages inside functions or methods instead of at module import time. This keeps lightweight CLI operations usable even when optional heavy dependencies are absent.

Plugin-based extension points

Analysis is the primary extensibility axis. New analysis types are added by creating a package in analyses/<name>/ that subclasses Analysis. The framework discovers plugins automatically via pkgutil — no registries, no decorators, no imports needed. Use polyzymd new-analysis <name> to scaffold the package structure automatically.

Separation between per-condition and cross-condition work

The unified analyses/ lifecycle handles both scopes in one plugin contract. Each plugin computes per-replicate results with compute_replicate(), aggregates per-condition outputs with aggregate(), and then compares across conditions with compare() before generating plots with plot(). This keeps the full scientific workflow explicit while preserving clear lifecycle stages.

Where contributors usually need to look

Goal	Start here
add or validate config fields	`src/polyzymd/config/`
change build behavior	`src/polyzymd/builders/`
change run or restart behavior	`src/polyzymd/simulation/` and `src/polyzymd/workflow/`
add an analysis type	`src/polyzymd/analyses/` (plugin package — subclass `Analysis` and implement `compute_replicate()` / `aggregate()`)
add comparison statistics	`src/polyzymd/analyses/shared/inferential_statistics.py`
add or change CLI commands	`src/polyzymd/cli/`

A practical mental model

If you are new to the codebase, it helps to think in layers:

config describes what should happen
builders and simulation make it happen for one system
workflow makes it practical on clusters
analyses plugins measure and compare what happened
comparison workflows interpret differences across studies

That mental model is usually enough to find the right subsystem before you dive into module-level details or API reference pages.

Architecture

The high-level shape of the project

What each area is responsible for

cli/

config/

builders/

simulation/

workflow/

analyses/