Comparison and Plotting Reference
Use this page when you need quick lookup information for polyzymd compare,
comparison.yaml, output paths, or plotting behavior.
Comparison Project Layout
polyzymd compare init -n my_study creates a workspace like this:
my_study/
├── comparison.yaml
├── comparison/
├── figures/
└── structures/
Core comparison.yaml Fields
name: "polymer_stability_study"
description: "Optional human-readable summary"
control: "No Polymer" # optional
conditions:
- label: "No Polymer"
config: "no_polymer/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
plugins:
rmsf:
selection: "protein and name CA"
Per-Plugin Statistical Settings
Some plugins support per-plugin statistical settings configured under the
plugins: block in comparison.yaml. These control false discovery rate
correction, effect-size filtering, and output truncation for cross-condition
comparisons.
Canonical YAML Example
plugins:
contacts:
cutoff: 4.5
fdr_alpha: 0.05
min_effect_size: 0.5
top_residues: 10
Settings Support Matrix
Setting |
contacts |
Default |
|---|---|---|
|
✓ |
0.05 |
|
✓ |
0.5 |
|
✓ |
10 |
Setting Descriptions
fdr_alpha— Significance threshold for pairwise comparisons. Whenposthoc_methodis"ttest_bh", this controls the Benjamini-Hochberg false discovery rate. Whenposthoc_methodis"tukey_hsd", this is the family-wise alpha threshold. Also used as the ANOVA significance threshold. Lower values are more conservative.min_effect_size— Minimum Cohen’s d required for practical significance. Pairs that meet or exceed this threshold are highlighted with “†” in formatted output; all pairs are shown regardless.top_residues— Maximum number of contacted residues shown per condition, ranked by aggregatedcontact_fraction_mean. Affects both saved JSON and CLI output.
Stable Plugin Keys
Stable analysis plugins:
rmsdrgrmsfcontactsdistancescatalytic_triadsecondary_structuresasahydrogen_bonds
Plugin Summary Table
Plugin |
Default compare? |
Primary metric |
Key feature |
Statistical method |
|---|---|---|---|---|
|
No (custom) |
|
Backbone stability over time |
Per-run pairwise t-tests + ANOVA |
|
No (custom) |
|
Protein compactness |
Per-run pairwise t-tests + ANOVA |
|
Yes |
|
Per-residue flexibility |
FDR-corrected pairwise t-tests + ANOVA |
|
No (custom) |
Coverage + contact fraction |
Per-residue contact mapping |
FDR-corrected pairwise t-tests per residue |
|
No (custom) |
Multiple distance metrics |
Named distance pairs |
Per-distance t-tests + ANOVA |
|
Yes |
|
Active-site geometry |
FDR-corrected pairwise t-tests + ANOVA |
|
Yes |
|
Secondary structure content |
FDR-corrected pairwise t-tests + ANOVA |
|
No (custom) |
Per-run mean SASA |
Multi-run target/context model |
Per-run pairwise t-tests + ANOVA |
|
Custom loader with default-style scalar statistics |
|
Flexible named groups + summaries + composition analysis |
FDR-corrected pairwise t-tests + ANOVA per configured summary |
Path Rules
relative paths in
config:are resolved relative tocomparison.yamlabsolute paths are used as-is
replicatesmust be an explicit list such as[1, 2, 3]
Replicate Counts
All stable shipped analyses support replicates: [1] for smoke tests and
protocol validation. One-replicate runs compute aggregate metrics and plots, but
inferential statistics, FDR correction, and uncertainty bands require at least
two independent replicates per condition. Singleton pairwise tests and ANOVA are
reported as not testable rather than significant.
Commands
Command |
Purpose |
|---|---|
|
Create a comparison workspace |
|
Check |
|
Run one analysis plugin |
|
List available comparison types |
|
Run every enabled plugin in one pass |
|
Generate configured figures |
|
List available plots and experimental labels |
|
Submit a SLURM DAG for one analysis plugin |
|
Show status of a submitted SLURM DAG |
|
Run comparison + plotting from on-disk aggregated results |
Common Stable Commands
All commands below assume you are inside the pixi environment
(pixi shell -e build) or are prefixed with pixi run -e build.
polyzymd compare run rmsd
polyzymd compare run rg
polyzymd compare run rmsf
polyzymd compare run contacts
polyzymd compare run distances
polyzymd compare run catalytic_triad
polyzymd compare run sasa
polyzymd compare run hydrogen_bonds
polyzymd compare run-all
polyzymd compare plot-all
Output Locations
per-replicate cache files are written under
analysis/<condition>/<analysis>/run_<replicate>/per-condition aggregate files are written under
analysis/<condition>/<analysis>/aggregated/cross-condition comparison JSON files are written to
comparison/<analysis>/result.jsonfigures are written under the configured
plot_settings.output_dir, usuallyfigures/<analysis>/polyzymd compare initscaffoldscomparison/,figures/, andstructures/next tocomparison.yaml;analysis/is created and populated during analysis runs
Typical comparison cache paths:
comparison/rmsd/result.json
comparison/rg/result.json
comparison/rmsf/result.json
comparison/contacts/result.json
comparison/distances/result.json
comparison/catalytic_triad/result.json
comparison/sasa/result.json
comparison/hydrogen_bonds/result.json
Plotting Smoke Test
For a final smoke test after comparisons finish:
polyzymd compare plot-all --list-available
polyzymd compare plot-all
Plugin-Specific Metadata Fields
Some plugins include additional metadata in their comparison output beyond the standard ranking and statistical fields. These fields are additive diagnostics — they do not affect rankings, p-values, or effect sizes.
RMSD Convergence Output
The RMSD plugin includes per-run convergence diagnostics generated by the
sliding-window convergence heuristic in analyses/shared/convergence.py.
These fields appear in the per-condition summaries within
comparison/rmsd/result.json:
Field |
Type |
Description |
|---|---|---|
|
|
Fraction of replicates that converged (0.0–1.0) |
|
|
Count of replicates where sustained convergence was detected |
|
|
Mean convergence time across converged replicates (ns) |
|
|
Median convergence time across converged replicates (ns) |
Note
Convergence metadata is purely informational. It does not influence the RMSD ranking, pairwise t-tests, ANOVA, or effect-size calculations. Use it to identify conditions where one or more replicates failed to reach a stable plateau, which may warrant longer production runs or additional replicates.
Statistical Terms
p-value: significance of the observed difference under the null hypothesisCohen's d: effect size magnitudeANOVA: omnibus test across multiple conditionsSEM: standard error of the mean across replicatesBenjamini-Hochberg (BH): step-up procedure for controlling the false discovery rate across multiple hypothesis testsAdjusted p-value (p_adj): p-value corrected for multiple comparisons via the BH procedure (forttest_bh) or family-wise Tukey adjustment (fortukey_hsd)False Discovery Rate (FDR): expected proportion of false positives among rejected hypothesesEffect size threshold: minimum Cohen’s d required for a pairwise difference to be considered practically significant
For interpretation guidance rather than lookup, see:
Post-Hoc Testing Reference — full post-hoc method details, output fields, and edge cases