comparison.yaml Schema Reference
The comparison.yaml file defines a cross-condition analysis project. It
specifies which simulation conditions to compare, which analysis plugins to
run, and how to visualize results. Create one with polyzymd compare init -n <name> and place it at the root of your comparison project directory.
Source of truth: polyzymd.config.comparison.ComparisonConfig() in
src/polyzymd/config/comparison.py.
Important
Plugin settings path fields are resolved relative to the directory containing
comparison.yaml.
For example, in:
plugins.rmsf.reference_file, condition config paths, and other
plugin-declared path fields, a relative path like structures/enzyme.pdb is
interpreted as:
<comparison_yaml_parent>/structures/enzyme.pdb
For CLI commands that consume this file, see Comparison and Plotting Reference. For directory layout and data expectations, see Data Requirements & Directory Layout.
Typical local workflow:
pixi run -e build polyzymd compare validate -f comparison.yaml
pixi run -e build polyzymd compare run rmsf -f comparison.yaml
pixi run -e build polyzymd compare plot-all -f comparison.yaml
Typical SLURM workflow:
pixi run -e build polyzymd compare submit sasa -f comparison.yaml --dry-run
pixi run -e build polyzymd compare submit sasa -f comparison.yaml --partition <part>
pixi run -e build polyzymd compare status sasa -f comparison.yaml
pixi run -e build polyzymd compare finalize sasa -f comparison.yaml
pixi run -e build polyzymd compare plot-all -f comparison.yaml
Minimal Working Example
name: "polymer_stability_study"
conditions:
- label: "No Polymer"
config: "../no_polymer/config.yaml"
replicates: [1, 2, 3]
- label: "100% SBMA"
config: "../sbma_100/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
plugins:
rmsf:
selection: "protein and name CA"
Top-Level Fields
Field |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
yes |
— |
Human-readable project name |
|
string |
no |
|
Description of what is being compared |
|
string |
no |
|
Label of the control condition. Must match a |
|
list |
yes |
— |
List of condition entries (min 1 required) |
|
mapping |
no |
see below |
Default analysis parameters |
|
mapping |
no |
|
Analysis plugin settings — what to compute |
|
mapping |
no |
|
Optional MDAnalysis internal backend policy for job-backed analyses |
|
mapping |
no |
see below |
Plot customization — how to visualize |
Unknown top-level keys raise a ValueError listing the invalid keys and valid
alternatives. Use plugins: for analysis plugin settings; unsupported keys such
as analysis_settings: are rejected.
conditions[*]
Each entry describes one simulation condition to include in the comparison.
Field |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
yes |
— |
Display name (must be unique across all conditions) |
|
path |
yes |
— |
Path to the simulation’s |
|
list of int |
yes |
— |
Replicate numbers to include. A single |
defaults
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Time to discard as equilibration (e.g., |
|
float (0, 1] |
|
Significance threshold for pairwise comparisons and ANOVA. Used as the Benjamini-Hochberg FDR threshold when |
|
|
|
Post-hoc pairwise comparison method. See Post-Hoc Testing Reference for details. |
|
|
|
Two-sample t-test variance assumption. Only used when |
equilibration_time is interpreted as an absolute MDAnalysis trajectory
timestamp when the loaded trajectory exposes finite frame times. This handles
continuation runs where the first loaded segment may begin after 0 ps. If frame
timestamps are unavailable, PolyzyMD treats the first loaded frame as time zero.
mda_backend_policy
The default policy is empty and forwards no backend-related keyword arguments to MDAnalysis. This avoids nested oversubscription: PolyzyMD schedules work across conditions/replicates, while each replicate remains serial unless you explicitly opt into an MDAnalysis backend.
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Backend name forwarded to |
|
positive int |
|
Worker count forwarded only when |
|
positive int |
|
Optional partition count forwarded only when |
Example opt-in for local MDAnalysis internal parallelism:
mda_backend_policy:
backend: "multiprocessing"
n_workers: 2
n_parts: 2
Function-adapter jobs generated by the simple scaffold reject non-default
backend policies; use an AnalysisBase-compatible job for MDAnalysis internal
parallelism.
plugins
Presence of a key enables that analysis. The value is a mapping of that
plugin’s settings. An empty mapping (rmsf: {}) enables the plugin with all
defaults.
plugins.rmsf
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
MDAnalysis selection string for RMSF computation |
|
string |
|
Reference structure: |
|
int |
|
Required when |
|
path |
|
Path to external PDB reference structure. Required when |
|
string |
|
MDAnalysis selection used for trajectory alignment before RMSF calculation |
|
string |
|
MDAnalysis selection used to compute the centroid reference structure when |
plugins.secondary_structure
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Chain letter for the protein to analyze via DSSP |
plugins.sasa
Field |
Type |
Default |
Description |
|---|---|---|---|
|
list |
(required) |
List of SASA run definitions (see sub-fields) |
|
float |
|
MDTraj Shrake-Rupley probe radius in nanometers |
|
int |
|
Number of sphere points for MDTraj Shrake-Rupley SASA |
|
int |
|
Frames per chunk for memory management |
Each entry in runs:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required) |
Name for this SASA computation |
|
string |
(required) |
MDAnalysis selection for the target surface |
|
string |
same as |
Atoms to include in SASA context (affects shadowing) |
|
int |
|
Frame stride |
plugins.catalytic_triad
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Display name for the triad analysis |
|
string |
|
Optional description of the triad (e.g., |
|
float |
|
Distance threshold in Angstroms (H-bond cutoff) |
|
list |
(required) |
List of atom pair definitions |
Each entry in pairs:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required) |
Display label (e.g., |
|
string |
(required) |
MDAnalysis selection for atom/group A. Supports |
|
string |
(required) |
MDAnalysis selection for atom/group B |
plugins.distances
Field |
Type |
Default |
Description |
|---|---|---|---|
|
float |
|
Global default threshold in Angstroms |
|
list |
(required) |
List of distance pair definitions |
|
bool |
|
Apply periodic boundary conditions to distance calculations |
|
bool |
|
Align trajectory before computing distances |
|
string |
|
MDAnalysis selection used for trajectory alignment |
|
string |
|
Alignment reference mode: |
|
int |
|
Frame index to use as reference when |
Each entry in pairs:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required) |
Display label (e.g., |
|
string |
(required) |
MDAnalysis selection for group A. Supports |
|
string |
(required) |
MDAnalysis selection for group B |
|
float |
global |
Per-pair threshold override |
|
string |
|
Display text for d ≤ threshold |
|
string |
|
Display text for d > threshold |
plugins.contacts
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
MDAnalysis selection for polymer atoms |
|
string |
|
MDAnalysis selection for protein atoms |
|
float |
|
Contact distance cutoff in Angstroms |
|
string |
|
Residue grouping: |
|
bool |
|
Whether to compute aggregate residence-time summaries and plots. When |
|
mapping |
|
Custom residue groups: |
|
mapping |
|
Mutually exclusive partitions for contact-fraction and residence-time plots: |
|
list of string |
|
Explicit polymer type labels. If |
|
float |
|
Per-plugin FDR threshold |
|
float |
|
Minimum Cohen’s d for practical significance |
|
int |
|
Max residues shown per condition in formatted output |
plugins.rmsd
Field |
Type |
Default |
Description |
|---|---|---|---|
|
list |
(required) |
List of RMSD run definitions |
Each entry in runs:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required) |
Name for this RMSD computation (e.g., |
|
string |
(required) |
MDAnalysis selection for RMSD atoms |
|
string |
same as |
MDAnalysis selection for alignment |
|
string |
|
Reference structure mode: |
|
int |
|
Frame index to use as reference when |
|
path |
|
Path to external PDB reference structure |
|
string |
|
MDAnalysis selection for centroid computation. If |
|
float |
|
Rolling window size in nanoseconds for convergence detection |
|
float |
|
Step size in nanoseconds between convergence windows |
|
float |
|
Maximum slope (Å/ns) for a window to be considered converged |
|
float |
|
Duration in nanoseconds that convergence must be sustained |
plugins.rg
Field |
Type |
Default |
Description |
|---|---|---|---|
|
list |
(required) |
List of Rg run definitions |
Each entry in runs:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required) |
Name for this Rg computation |
|
string |
(required) |
MDAnalysis selection for Rg atoms |
|
string |
|
Computation mode: |
|
string |
|
How to weight fragments when |
|
bool |
|
Save per-frame fragment Rg distributions |
|
int |
|
Number of bins for Rg distribution histograms |
plugins.hydrogen_bonds
Field |
Type |
Default |
Description |
|---|---|---|---|
|
mapping |
|
Named atom groups: |
|
list or mapping |
one default summary ( |
Named H-bond summaries (see below) |
|
float |
|
H-bond distance cutoff in Angstroms |
|
float |
|
H-bond angle cutoff in degrees |
|
bool |
|
Update atom selections every frame |
|
int |
|
Number of top residue pairs to report |
|
bool |
|
Allow empty group selections: |
|
bool |
|
Whether overlapping composition partitions are allowed |
|
mapping |
|
Composition analysis settings |
|
float |
|
Override trajectory timestep in picoseconds for time-axis plots |
Time-axis plots assume uniformly saved frames. PolyzyMD converts frame index to
time as frame_index * timestep_ps; variable-timestep concatenated
trajectories are not supported.
Each summary entry in summaries has:
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
yes |
Unique summary name |
|
|
exactly one of |
Inter-group H-bonds |
|
|
exactly one of |
Intra-group H-bonds |
For mapping-form input, keys are treated as name values.
Hydrogen detection uses MDAnalysis HydrogenBondAnalysis with hydrogens
selected as (<group union>) and element H; topologies need explicit hydrogens
and usable element metadata.
composition sub-fields:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
mapping |
— |
Named partitions: |
plot_settings
Field |
Type |
Default |
Description |
|---|---|---|---|
|
path |
|
Directory for generated plots (relative to |
|
string |
|
Image format: |
|
int |
|
Resolution for raster formats. Range: 50–600. |
|
string |
|
PolyzyMD theme preset: |
|
string |
|
Seaborn/matplotlib color palette name |
|
mapping |
disabled |
Optional condition-label color and display-order rules for condition-series plots |
|
mapping |
from style preset |
Visual theme overrides (see below) |
style selects a PolyzyMD built-in theme preset for standard analysis plots. It
is not a matplotlib or seaborn stylesheet, and it does not control format,
dpi, per-analysis figure sizes, or color palettes.
theme values are merged on top of the selected preset, so you can choose a
base style and override only the fields that need project-specific changes.
plot_settings.semantic_colors
Semantic colors let a comparison project encode condition meaning directly in
figures. The settings are optional and disabled by default; when disabled,
plots keep using color_palette and each plotter’s existing category colors.
Semantic ordering is plot-only. It changes the display order of conditions in figures, but it does not mutate comparison statistics, rankings, cached artifacts, or JSON result files.
Top-level fields:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Opt in to semantic condition colors and plot ordering |
|
list of string |
|
Explicit plot display order by condition label. Labels not present keep their relative order after condition-level |
|
mapping |
|
Direct color overrides by exact condition label. Highest precedence color rule. |
|
mapping |
|
Per-condition semantic metadata keyed by exact condition label |
|
mapping |
|
Family-level colormap rules keyed by family name |
|
color |
|
Color used for the configured |
|
color |
|
Fallback color for conditions with incomplete semantic metadata |
|
color or |
|
Fallback for labels missing from |
conditions.<label> fields:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
color or |
|
Direct color for this condition, after |
|
string or |
|
Semantic family name used to look up |
|
scalar or |
|
Numeric or ordinal value mapped through the family color rule |
|
int or |
|
Plot-only display order used after explicit |
|
string or |
|
Optional semantic role. Use |
families.<family> fields:
Field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Matplotlib colormap name for values in this family |
|
|
|
Map numeric values continuously ( |
|
list |
|
Explicit value order for |
|
float or |
|
Lower bound for |
|
float or |
|
Upper bound for |
|
two floats |
|
Fractional colormap interval to sample, useful for avoiding colors that are too pale or too dark |
|
bool |
|
Reverse the value-to-colormap direction |
|
mapping |
|
Explicit color overrides by value. These override the family colormap for matching values. |
Color precedence for each condition label is:
semantic_colors.manual_colors.<label>semantic_colors.conditions.<label>.colorsemantic_colors.control_colorwhen the label is the top-levelcontrolor the condition hasrole: controlfamilies.<family>.value_colors.<value>families.<family>colormap mappingsemantic_colors.missing_colorfor incomplete condition metadatasemantic_colors.default_coloror the regularcolor_palettefor labels missing fromsemantic_colors.conditions
Semantic colors apply to plots where colors represent comparison conditions. Non-condition categories, such as secondary-structure states or residue classes, may still use categorical palettes or plot-specific colormaps.
plot_settings.theme
All fields are optional. Defaults are drawn from the selected style preset,
then any values under theme: override individual fields.
Theme presets
Preset |
Use when |
Notes |
|---|---|---|
|
You want the default compact print-style output. |
Uses moderate fonts, replicate dots, bar edges, and reference lines. |
|
You need slides, posters, or high-visibility figures. |
Increases font sizes, replicate dot size, bar line width, error-bar caps, reference-line width, and fill opacity. |
|
You want simpler, lower-ink plots. |
Hides replicate dots, removes bar edges, and reduces reference-line width and fill opacity. |
Tweakable PlotTheme fields
Field |
|
|
|
Description |
|---|---|---|---|---|
|
|
|
|
Axes title font size |
|
|
|
|
Figure suptitle font size |
|
|
|
|
Axis label font size |
|
|
|
|
Tick label font size |
|
|
|
|
Legend entry font size |
|
|
|
|
Heatmap annotation font size |
|
|
|
|
Secondary annotation font size |
|
|
|
|
Fine-grained annotation font size |
|
|
|
|
Bar fill opacity |
|
|
|
|
Bar edge color |
|
|
|
|
Bar edge line width |
|
|
|
|
Error bar cap size in points |
|
|
|
|
Scatter marker size for replicate dots |
|
|
|
|
Replicate dot opacity |
|
|
|
|
Replicate dot color |
|
|
|
|
Line plot opacity |
|
|
|
|
|
|
|
|
|
Reference line color |
|
|
|
|
Reference line style |
|
|
|
|
Reference line width |
|
|
|
|
Vertical highlight line opacity |
|
|
|
|
Hide top axis spine |
|
|
|
|
Hide right axis spine |
|
|
|
|
Title font weight |
|
|
|
|
Matplotlib legend location |
|
|
|
|
|
|
|
|
|
Render the “Made by PolyzyMD” watermark |
Per-Analysis Plot Settings
Per-analysis plot customization keys go under plot_settings: at the same
level as style, dpi, etc.
plot_settings.rmsf:
Field |
Default |
Description |
|---|---|---|
|
|
Show SEM fill_between bands |
|
|
Residue IDs for vertical reference lines |
|
|
Per-residue profile figure size |
|
|
Bar comparison figure size |
plot_settings.catalytic_triad:
Field |
Default |
Description |
|---|---|---|
|
|
Multi-row KDE panel |
|
|
Threshold bar chart |
|
|
2D joint KDE |
|
|
X-axis range for KDE (Angstroms) |
plot_settings.distances:
Field |
Default |
Description |
|---|---|---|
|
|
Threshold line on distributions |
|
|
KDE vs histogram |
|
|
Above/below threshold bars |
plot_settings.contacts:
Field |
Default |
Description |
|---|---|---|
|
|
Per-residue contact fraction profile |
|
|
Per-residue residence time profile |
|
|
Contact fraction by amino acid class bar chart |
|
|
Contact fraction by user partition bar charts |
|
|
Residence time by amino acid class bar chart |
|
|
Residence time by user partition bar charts |
plot_settings.secondary_structure:
Field |
Default |
Description |
|---|---|---|
|
|
Residue × time SS heatmap |
|
|
Helix/strand/coil fraction bars |
|
|
One bar chart per SS type |
|
|
Δ(helix persistence) vs control |
|
|
Diverging colormap for diff heatmap |
Tip
Common tips:
Run
polyzymd compare validateto check yourcomparison.yamlfor errors before launching a full analysis run.Relative paths in
config:are resolved from the directory containingcomparison.yaml, not from your working directory.An empty plugin mapping (e.g.,
rmsf: {}) enables the analysis with all default settings — you only need to specify fields you want to override.Set
control:to match one of your condition labels to get Δ-from-control columns in comparison tables and plots.