`comparison.yaml` Schema Reference

The comparison.yaml file defines a cross-condition analysis project. It specifies which simulation conditions to compare, which analysis plugins to run, and how to visualize results. Create one with polyzymd compare init -n <name> and place it at the root of your comparison project directory.

Source of truth: polyzymd.config.comparison.ComparisonConfig() in src/polyzymd/config/comparison.py.

Important

Plugin settings path fields are resolved relative to the directory containing comparison.yaml.

For example, in:

plugins.rmsf.reference_file, condition config paths, and other plugin-declared path fields, a relative path like structures/enzyme.pdb is interpreted as:

<comparison_yaml_parent>/structures/enzyme.pdb

For CLI commands that consume this file, see Comparison and Plotting Reference. For directory layout and data expectations, see Data Requirements & Directory Layout.

Typical local workflow:

pixi run -e build polyzymd compare validate -f comparison.yaml
pixi run -e build polyzymd compare run rmsf -f comparison.yaml
pixi run -e build polyzymd compare plot-all -f comparison.yaml

Typical SLURM workflow:

pixi run -e build polyzymd compare submit sasa -f comparison.yaml --dry-run
pixi run -e build polyzymd compare submit sasa -f comparison.yaml --partition <part>
pixi run -e build polyzymd compare status sasa -f comparison.yaml
pixi run -e build polyzymd compare finalize sasa -f comparison.yaml
pixi run -e build polyzymd compare plot-all -f comparison.yaml

Minimal Working Example

name: "polymer_stability_study"

conditions:
  - label: "No Polymer"
    config: "../no_polymer/config.yaml"
    replicates: [1, 2, 3]
  - label: "100% SBMA"
    config: "../sbma_100/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

plugins:
  rmsf:
    selection: "protein and name CA"

Top-Level Fields

Field	Type	Required	Default	Description
`name`	string	yes	—	Human-readable project name
`description`	string	no	`null`	Description of what is being compared
`control`	string	no	`null`	Label of the control condition. Must match a `label` in `conditions`. Used for relative comparisons (e.g., Δ from control).
`conditions`	list	yes	—	List of condition entries (min 1 required)
`defaults`	mapping	no	see below	Default analysis parameters
`plugins`	mapping	no	`{}`	Analysis plugin settings — what to compute
`mda_backend_policy`	mapping	no	`{}`	Optional MDAnalysis internal backend policy for job-backed analyses
`plot_settings`	mapping	no	see below	Plot customization — how to visualize

Unknown top-level keys raise a ValueError listing the invalid keys and valid alternatives. Use plugins: for analysis plugin settings; unsupported keys such as analysis_settings: are rejected.

`conditions[*]`

Each entry describes one simulation condition to include in the comparison.

Field	Type	Required	Default	Description
`label`	string	yes	—	Display name (must be unique across all conditions)
`config`	path	yes	—	Path to the simulation’s `config.yaml`. Relative paths resolved from `comparison.yaml` location.
`replicates`	list of int	yes	—	Replicate numbers to include. A single `int` is auto-wrapped to a list.

`defaults`

Field	Type	Default	Description
`equilibration_time`	string	`"10ns"`	Time to discard as equilibration (e.g., `"10ns"`, `"5000ps"`)
`fdr_alpha`	float (0, 1]	`0.05`	Significance threshold for pairwise comparisons and ANOVA. Used as the Benjamini-Hochberg FDR threshold when `posthoc_method` is `"ttest_bh"`, and as the family-wise alpha threshold when `posthoc_method` is `"tukey_hsd"`.
`posthoc_method`	`"ttest_bh"` or `"tukey_hsd"`	`"ttest_bh"`	Post-hoc pairwise comparison method. See Post-Hoc Testing Reference for details.
`ttest_method`	`"student"` or `"welch"`	`"student"`	Two-sample t-test variance assumption. Only used when `posthoc_method` is `"ttest_bh"`.

equilibration_time is interpreted as an absolute MDAnalysis trajectory timestamp when the loaded trajectory exposes finite frame times. This handles continuation runs where the first loaded segment may begin after 0 ps. If frame timestamps are unavailable, PolyzyMD treats the first loaded frame as time zero.

`mda_backend_policy`

The default policy is empty and forwards no backend-related keyword arguments to MDAnalysis. This avoids nested oversubscription: PolyzyMD schedules work across conditions/replicates, while each replicate remains serial unless you explicitly opt into an MDAnalysis backend.

Field	Type	Default	Description
`backend`	string	`null`	Backend name forwarded to `AnalysisBase.run()`, such as `"multiprocessing"` or `"dask"`
`n_workers`	positive int	`null`	Worker count forwarded only when `backend` is set
`n_parts`	positive int	`null`	Optional partition count forwarded only when `backend` is set

Example opt-in for local MDAnalysis internal parallelism:

mda_backend_policy:
  backend: "multiprocessing"
  n_workers: 2
  n_parts: 2

Function-adapter jobs generated by the simple scaffold reject non-default backend policies; use an AnalysisBase-compatible job for MDAnalysis internal parallelism.

`plugins`

Presence of a key enables that analysis. The value is a mapping of that plugin’s settings. An empty mapping (rmsf: {}) enables the plugin with all defaults.

`plugins.rmsf`

Field	Type	Default	Description
`selection`	string	`"protein and name CA"`	MDAnalysis selection string for RMSF computation
`reference_mode`	string	`"centroid"`	Reference structure: `"centroid"`, `"average"`, `"frame"`, or `"external"`
`reference_frame`	int	`null`	Required when `reference_mode` is `"frame"`
`reference_file`	path	`null`	Path to external PDB reference structure. Required when `reference_mode` is `"external"`. Also used for secondary structure annotation on profile plots.
`alignment_selection`	string	`"protein and name CA"`	MDAnalysis selection used for trajectory alignment before RMSF calculation
`centroid_selection`	string	`"protein"`	MDAnalysis selection used to compute the centroid reference structure when `reference_mode` is `"centroid"`

`plugins.secondary_structure`

Field	Type	Default	Description
`chain_id`	string	`"A"`	Chain letter for the protein to analyze via DSSP

`plugins.sasa`

Field	Type	Default	Description
`runs`	list	(required)	List of SASA run definitions (see sub-fields)
`probe_radius_nm`	float	`0.14`	MDTraj Shrake-Rupley probe radius in nanometers
`n_sphere_points`	int	`960`	Number of sphere points for MDTraj Shrake-Rupley SASA
`chunk_size`	int	`100`	Frames per chunk for memory management

Each entry in runs:

Field	Type	Default	Description
`label`	string	(required)	Name for this SASA computation
`target_selection`	string	(required)	MDAnalysis selection for the target surface
`context_selection`	string	same as `target_selection`	Atoms to include in SASA context (affects shadowing)
`stride`	int	`1`	Frame stride

`plugins.catalytic_triad`

Field	Type	Default	Description
`name`	string	`"catalytic_triad"`	Display name for the triad analysis
`description`	string	`null`	Optional description of the triad (e.g., `"Ser-His-Asp catalytic triad"`)
`threshold`	float	`3.5`	Distance threshold in Angstroms (H-bond cutoff)
`pairs`	list	(required)	List of atom pair definitions

Each entry in pairs:

Field	Type	Default	Description
`label`	string	(required)	Display label (e.g., `"Asp-His"`)
`selection_a`	string	(required)	MDAnalysis selection for atom/group A. Supports `midpoint(...)` syntax.
`selection_b`	string	(required)	MDAnalysis selection for atom/group B

`plugins.distances`

Field	Type	Default	Description
`threshold`	float	`3.5`	Global default threshold in Angstroms
`pairs`	list	(required)	List of distance pair definitions
`use_pbc`	bool	`true`	Apply periodic boundary conditions to distance calculations
`align_trajectory`	bool	`true`	Align trajectory before computing distances
`alignment_selection`	string	`"protein and name CA"`	MDAnalysis selection used for trajectory alignment
`alignment_mode`	string	`"centroid"`	Alignment reference mode: `"centroid"` or `"frame"`
`alignment_frame`	int	`null`	Frame index to use as reference when `alignment_mode` is `"frame"`

Each entry in pairs:

Field	Type	Default	Description
`label`	string	(required)	Display label (e.g., `"Ser77-Substrate"`)
`selection_a`	string	(required)	MDAnalysis selection for group A. Supports `com(...)` syntax.
`selection_b`	string	(required)	MDAnalysis selection for group B
`threshold`	float	global `threshold`	Per-pair threshold override
`below_label`	string	`"Below {threshold}Å"`	Display text for d ≤ threshold
`above_label`	string	`"Above {threshold}Å"`	Display text for d > threshold

`plugins.contacts`

Field	Type	Default	Description
`polymer_selection`	string	`"chainid C"`	MDAnalysis selection for polymer atoms
`protein_selection`	string	`"chainid A"`	MDAnalysis selection for protein atoms
`cutoff`	float	`4.5`	Contact distance cutoff in Angstroms
`grouping`	string	`"aa_class"`	Residue grouping: `"aa_class"`, `"secondary_structure"`, or `"none"`
`compute_residence_times`	bool	`true`	Whether to compute aggregate residence-time summaries and plots. When `false`, per-replicate contact events are still stored and the canonical artifact identity changes.
`protein_groups`	mapping	`null`	Custom residue groups: `{group_name: [resid, ...]}`
`protein_partitions`	mapping	`null`	Mutually exclusive partitions for contact-fraction and residence-time plots: `{partition_name: [group_name, ...]}`
`polymer_types`	list of string	`null`	Explicit polymer type labels. If `null`, types are auto-detected from topology.
`fdr_alpha`	float	`0.05`	Per-plugin FDR threshold
`min_effect_size`	float	`0.5`	Minimum Cohen’s d for practical significance
`top_residues`	int	`10`	Max residues shown per condition in formatted output

`plugins.rmsd`

Field	Type	Default	Description
`runs`	list	(required)	List of RMSD run definitions

Each entry in runs:

Field	Type	Default	Description
`label`	string	(required)	Name for this RMSD computation (e.g., `"backbone"`)
`selection`	string	(required)	MDAnalysis selection for RMSD atoms
`alignment_selection`	string	same as `selection`	MDAnalysis selection for alignment
`reference_mode`	string	`"centroid"`	Reference structure mode: `"centroid"` or `"frame"`
`reference_frame`	int	`0`	Frame index to use as reference when `reference_mode` is `"frame"`
`reference_file`	path	`null`	Path to external PDB reference structure
`centroid_selection`	string	`null`	MDAnalysis selection for centroid computation. If `null`, uses `alignment_selection`.
`convergence_window_size_ns`	float	`15.0`	Rolling window size in nanoseconds for convergence detection
`convergence_step_size_ns`	float	`5.0`	Step size in nanoseconds between convergence windows
`convergence_slope_threshold`	float	`0.0005`	Maximum slope (Å/ns) for a window to be considered converged
`convergence_sustained_for_ns`	float	`15.0`	Duration in nanoseconds that convergence must be sustained

`plugins.rg`

Field	Type	Default	Description
`runs`	list	(required)	List of Rg run definitions

Each entry in runs:

Field	Type	Default	Description
`label`	string	(required)	Name for this Rg computation
`selection`	string	(required)	MDAnalysis selection for Rg atoms
`calculation_mode`	string	`"selection"`	Computation mode: `"selection"` (single Rg for the whole selection) or `"fragments"` (per-fragment Rg)
`fragment_weighting`	string	`"equal"`	How to weight fragments when `calculation_mode` is `"fragments"`: `"equal"` or `"mass"`
`save_fragment_distribution`	bool	`true`	Save per-frame fragment Rg distributions
`histogram_bins`	int	`50`	Number of bins for Rg distribution histograms

`plugins.hydrogen_bonds`

Field	Type	Default	Description
`groups`	mapping	`{"protein": "chainid A", "polymer": "chainid C"}`	Named atom groups: `{name: "MDAnalysis selection"}`
`summaries`	list or mapping	one default summary (`protein_polymer` between `protein` and `polymer`)	Named H-bond summaries (see below)
`distance_cutoff`	float	`3.0`	H-bond distance cutoff in Angstroms
`angle_cutoff`	float	`150`	H-bond angle cutoff in degrees
`update_selections`	bool	`true`	Update atom selections every frame
`top_n_pairs`	int	`15`	Number of top residue pairs to report
`allow_empty_groups`	bool	`true`	Allow empty group selections: `true` = warn and skip summaries when a group matches no atoms; `false` = raise error
`allow_overlapping_composition`	bool	`false`	Whether overlapping composition partitions are allowed
`composition`	mapping	`null`	Composition analysis settings
`timestep_ps`	float	`null`	Override trajectory timestep in picoseconds for time-axis plots

Time-axis plots assume uniformly saved frames. PolyzyMD converts frame index to time as frame_index * timestep_ps; variable-timestep concatenated trajectories are not supported.

Each summary entry in summaries has:

Field	Type	Required	Description
`name`	string	yes	Unique summary name
`between`	`[group_a, group_b]`	exactly one of `between` / `within`	Inter-group H-bonds
`within`	`group_name`	exactly one of `between` / `within`	Intra-group H-bonds

For mapping-form input, keys are treated as name values.

Hydrogen detection uses MDAnalysis HydrogenBondAnalysis with hydrogens selected as (<group union>) and element H; topologies need explicit hydrogens and usable element metadata.

composition sub-fields:

Field	Type	Default	Description
`partitions`	mapping	—	Named partitions: `{name: "MDAnalysis selection"}`

`plot_settings`

Field	Type	Default	Description
`output_dir`	path	`"figures/"`	Directory for generated plots (relative to `comparison.yaml`)
`format`	string	`"png"`	Image format: `"png"`, `"pdf"`, or `"svg"`
`dpi`	int	`300`	Resolution for raster formats. Range: 50–600.
`style`	string	`"compact"`	PolyzyMD theme preset: `"compact"`, `"large_elements"`, or `"low_ink"`
`color_palette`	string	`"tab10"`	Seaborn/matplotlib color palette name
`semantic_colors`	mapping	disabled	Optional condition-label color and display-order rules for condition-series plots
`theme`	mapping	from style preset	Visual theme overrides (see below)

style selects a PolyzyMD built-in theme preset for standard analysis plots. It is not a matplotlib or seaborn stylesheet, and it does not control format, dpi, per-analysis figure sizes, or color palettes.

theme values are merged on top of the selected preset, so you can choose a base style and override only the fields that need project-specific changes.

`plot_settings.semantic_colors`

Semantic colors let a comparison project encode condition meaning directly in figures. The settings are optional and disabled by default; when disabled, plots keep using color_palette and each plotter’s existing category colors.

Semantic ordering is plot-only. It changes the display order of conditions in figures, but it does not mutate comparison statistics, rankings, cached artifacts, or JSON result files.

Top-level fields:

Field	Type	Default	Description
`enabled`	bool	`false`	Opt in to semantic condition colors and plot ordering
`order`	list of string	`[]`	Explicit plot display order by condition label. Labels not present keep their relative order after condition-level `order` sorting.
`manual_colors`	mapping	`{}`	Direct color overrides by exact condition label. Highest precedence color rule.
`conditions`	mapping	`{}`	Per-condition semantic metadata keyed by exact condition label
`families`	mapping	`{}`	Family-level colormap rules keyed by family name
`control_color`	color	`"black"`	Color used for the configured `control` condition or a condition with `role: control`
`missing_color`	color	`"lightgray"`	Fallback color for conditions with incomplete semantic metadata
`default_color`	color or `null`	`null`	Fallback for labels missing from `conditions`. If `null`, the regular palette color is used.

conditions.<label> fields:

Field	Type	Default	Description
`color`	color or `null`	`null`	Direct color for this condition, after `manual_colors` and before control/family rules
`family`	string or `null`	`null`	Semantic family name used to look up `families.<family>`
`value`	scalar or `null`	`null`	Numeric or ordinal value mapped through the family color rule
`order`	int or `null`	`null`	Plot-only display order used after explicit `semantic_colors.order`
`role`	string or `null`	`null`	Optional semantic role. Use `control` to apply `control_color`.

families.<family> fields:

Field	Type	Default	Description
`colormap`	string	`"viridis"`	Matplotlib colormap name for values in this family
`scale`	`"linear"` or `"ordinal"`	`"linear"`	Map numeric values continuously (`linear`) or ordered categories/steps discretely (`ordinal`)
`value_order`	list	`[]`	Explicit value order for `ordinal` mapping. If omitted, observed values are used in label order.
`vmin`	float or `null`	`null`	Lower bound for `linear` normalization. If omitted, observed values set the bound.
`vmax`	float or `null`	`null`	Upper bound for `linear` normalization. If omitted, observed values set the bound.
`colormap_range`	two floats	`[0.0, 1.0]`	Fractional colormap interval to sample, useful for avoiding colors that are too pale or too dark
`reverse`	bool	`false`	Reverse the value-to-colormap direction
`value_colors`	mapping	`{}`	Explicit color overrides by value. These override the family colormap for matching values.

Color precedence for each condition label is:

semantic_colors.manual_colors.<label>
semantic_colors.conditions.<label>.color
semantic_colors.control_color when the label is the top-level control or the condition has role: control
families.<family>.value_colors.<value>
families.<family> colormap mapping
semantic_colors.missing_color for incomplete condition metadata
semantic_colors.default_color or the regular color_palette for labels missing from semantic_colors.conditions

Semantic colors apply to plots where colors represent comparison conditions. Non-condition categories, such as secondary-structure states or residue classes, may still use categorical palettes or plot-specific colormaps.

`plot_settings.theme`

All fields are optional. Defaults are drawn from the selected style preset, then any values under theme: override individual fields.

Theme presets

Preset	Use when	Notes
`compact`	You want the default compact print-style output.	Uses moderate fonts, replicate dots, bar edges, and reference lines.
`large_elements`	You need slides, posters, or high-visibility figures.	Increases font sizes, replicate dot size, bar line width, error-bar caps, reference-line width, and fill opacity.
`low_ink`	You want simpler, lower-ink plots.	Hides replicate dots, removes bar edges, and reduces reference-line width and fill opacity.

Tweakable `PlotTheme` fields

Field	`compact`	`large_elements`	`low_ink`	Description
`title_fontsize`	`13`	`18`	`13`	Axes title font size
`suptitle_fontsize`	`14`	`20`	`14`	Figure suptitle font size
`label_fontsize`	`11`	`15`	`11`	Axis label font size
`tick_fontsize`	`9`	`12`	`9`	Tick label font size
`legend_fontsize`	`9`	`12`	`9`	Legend entry font size
`annotation_fontsize`	`9`	`12`	`9`	Heatmap annotation font size
`small_fontsize`	`8`	`10`	`8`	Secondary annotation font size
`tiny_fontsize`	`7`	`9`	`7`	Fine-grained annotation font size
`bar_alpha`	`0.85`	`0.85`	`0.85`	Bar fill opacity
`bar_edgecolor`	`"black"`	`"black"`	`"none"`	Bar edge color
`bar_linewidth`	`0.5`	`0.8`	`0.0`	Bar edge line width
`bar_capsize`	`4`	`5`	`3`	Error bar cap size in points
`dot_size`	`18`	`30`	`0`	Scatter marker size for replicate dots
`dot_alpha`	`0.7`	`0.7`	`0.0`	Replicate dot opacity
`dot_color`	`"black"`	`"black"`	`"black"`	Replicate dot color
`line_alpha`	`0.8`	`0.8`	`0.8`	Line plot opacity
`fill_alpha`	`0.25`	`0.3`	`0.15`	`fill_between` band opacity
`reference_line_color`	`"black"`	`"black"`	`"black"`	Reference line color
`reference_line_style`	`"--"`	`"--"`	`"--"`	Reference line style
`reference_line_width`	`1.5`	`2.0`	`1.0`	Reference line width
`highlight_line_alpha`	`0.5`	`0.5`	`0.5`	Vertical highlight line opacity
`hide_top_spine`	`true`	`true`	`true`	Hide top axis spine
`hide_right_spine`	`true`	`true`	`true`	Hide right axis spine
`title_fontweight`	`"bold"`	`"bold"`	`"bold"`	Title font weight
`legend_loc`	`"center left"`	`"center left"`	`"center left"`	Matplotlib legend location
`legend_bbox`	`[1.02, 0.5]`	`[1.02, 0.5]`	`[1.02, 0.5]`	`bbox_to_anchor` for legend placement
`show_watermark`	`true`	`true`	`true`	Render the “Made by PolyzyMD” watermark

Per-Analysis Plot Settings

Per-analysis plot customization keys go under plot_settings: at the same level as style, dpi, etc.

plot_settings.rmsf:

Field	Default	Description
`show_error`	`true`	Show SEM fill_between bands
`highlight_residues`	`[]`	Residue IDs for vertical reference lines
`figsize_profile`	`[14, 4]`	Per-residue profile figure size
`figsize_comparison`	`[8, 6]`	Bar comparison figure size

plot_settings.catalytic_triad:

Field	Default	Description
`generate_kde_panel`	`true`	Multi-row KDE panel
`generate_bars`	`true`	Threshold bar chart
`generate_2d_kde`	`false`	2D joint KDE
`kde_xlim`	`[0, 7]`	X-axis range for KDE (Angstroms)

plot_settings.distances:

Field	Default	Description
`show_threshold`	`true`	Threshold line on distributions
`use_kde`	`true`	KDE vs histogram
`generate_state_bars`	`true`	Above/below threshold bars

plot_settings.contacts:

Field	Default	Description
`generate_contact_fraction_profile`	`true`	Per-residue contact fraction profile
`generate_residence_time_profile`	`true`	Per-residue residence time profile
`generate_cf_by_aa_class_bars`	`true`	Contact fraction by amino acid class bar chart
`generate_cf_by_partition_bars`	`true`	Contact fraction by user partition bar charts
`generate_rt_by_aa_class_bars`	`true`	Residence time by amino acid class bar chart
`generate_rt_by_partition_bars`	`true`	Residence time by user partition bar charts

plot_settings.secondary_structure:

Field	Default	Description
`generate_timeline`	`true`	Residue × time SS heatmap
`generate_content_bars`	`true`	Helix/strand/coil fraction bars
`generate_individual_bars`	`true`	One bar chart per SS type
`generate_diff_heatmap`	`true`	Δ(helix persistence) vs control
`diff_colormap`	`"RdBu_r"`	Diverging colormap for diff heatmap

Tip

Common tips:

Run polyzymd compare validate to check your comparison.yaml for errors before launching a full analysis run.
Relative paths in config: are resolved from the directory containing comparison.yaml, not from your working directory.
An empty plugin mapping (e.g., rmsf: {}) enables the analysis with all default settings — you only need to specify fields you want to override.
Set control: to match one of your condition labels to get Δ-from-control columns in comparison tables and plots.

comparison.yaml Schema Reference

Minimal Working Example

Top-Level Fields

conditions[*]

defaults

mda_backend_policy

plugins

plugins.rmsf

plugins.secondary_structure

plugins.sasa

plugins.catalytic_triad

plugins.distances

plugins.contacts

plugins.rmsd

plugins.rg

plugins.hydrogen_bonds

plot_settings

plot_settings.semantic_colors

plot_settings.theme

Theme presets

Tweakable PlotTheme fields

Per-Analysis Plot Settings

`comparison.yaml` Schema Reference

`conditions[*]`

`defaults`

`mda_backend_policy`

`plugins`

`plugins.rmsf`

`plugins.secondary_structure`

`plugins.sasa`

`plugins.catalytic_triad`

`plugins.distances`

`plugins.contacts`

`plugins.rmsd`

`plugins.rg`

`plugins.hydrogen_bonds`

`plot_settings`

`plot_settings.semantic_colors`

`plot_settings.theme`

Tweakable `PlotTheme` fields