RMSD Plugin Reference

For a step-by-step guide to running RMSD analysis, see RMSD Analysis: Quick Start.

Configuration Reference

All fields for RMSDRunSettings:

Field	Type	Default	Description
`label`	`str`	required	Human-readable run label (must be unique)
`selection`	`str`	`"protein and name CA"`	MDAnalysis selection for RMSD calculation
`alignment_selection`	`str`	`"protein and name CA"`	MDAnalysis selection for trajectory alignment
`reference_mode`	`str`	`"centroid"`	Reference mode: `centroid`, `average`, `frame`, or `external`
`reference_frame`	`int`	`0`	0-indexed frame for `reference_mode: frame`
`reference_file`	`str \| None`	`null`	Path to external PDB for `reference_mode: external`
`centroid_selection`	`str \| None`	`null`	Selection for centroid finding; defaults to `alignment_selection`
`convergence_window_size_ns`	`float`	`15.0`	Sliding window size for convergence detection (ns)
`convergence_step_size_ns`	`float`	`5.0`	Step between successive windows (ns)
`convergence_slope_threshold`	`float`	`0.0005`	Max absolute slope to qualify as “flat” (Å/ns)
`convergence_sustained_for_ns`	`float`	`15.0`	Required sustained duration below threshold (ns)

Top-level RMSDSettings contains a single field:

Field	Type	Default	Description
`runs`	`list[RMSDRunSettings]`	required	One or more named RMSD runs (at least one required)

Note

Run labels must be unique within a single comparison.yaml. Duplicate labels raise a validation error.

Output Files

Results are saved as canonical v1.3 artifacts. JSON files are stable artifact envelopes, while per-frame RMSD arrays are stored in NPZ sidecars.

<comparison_workspace>/
├── analysis/
│   └── <condition>/
│       └── rmsd/
│           ├── run_1/
│           │   ├── result.json
│           │   └── sidecars/
│           │       ├── rmsd_Protein_Backbone_timeseries.npz
│           │       └── rmsd_Active_Site_timeseries.npz
│           ├── run_2/
│           │   └── ...
│           ├── run_3/
│           │   └── ...
│           └── aggregated/
│               ├── result.json
│               └── sidecars/
│                   └── rmsd_Protein_Backbone_timeseries.npz
└── comparison/
    └── rmsd/
        └── result.json

The canonical paths are:

Level	Artifact	Path
Per replicate	`ReplicateArtifact`	`analysis/<condition>/rmsd/run_<replicate>/result.json`
Per condition	`ConditionArtifact`	`analysis/<condition>/rmsd/aggregated/result.json`
Cross condition	Comparison result	`comparison/rmsd/result.json`
Large arrays	NPZ sidecars	`analysis/<condition>/rmsd/*/sidecars/.npz`

Each replicate artifact contains JSON-compatible summaries for all configured runs. Raw per-frame RMSD timeseries are sidecars referenced from payload and listed in sidecars with recorded size and hash metadata.

Artifact envelope fields

Field	Description
`payload`	RMSD run summaries, scalar metrics, convergence diagnostics, and sidecar paths
`metadata`	Settings such as selections, reference modes, equilibration labels, and units
`provenance`	Input topology/trajectory identity and workflow details
`sidecars`	Validated references to `sidecars/*.npz` arrays for timeseries and aggregate profiles

Use ArtifactStore for programmatic access:

from pathlib import Path

from polyzymd.analyses.mda import ArtifactStore

replicate = ArtifactStore(Path("analysis/PEGylated/rmsd/run_1")).read_replicate_result()
condition = ArtifactStore(Path("analysis/PEGylated/rmsd/aggregated")).read_condition_result()
print(replicate.payload["runs"][0]["mean_rmsd"])
print(condition.payload["runs"][0]["metrics"]["mean_rmsd"])

JSON result structure

Per-replicate result (ReplicateArtifact), representative structure:

{
    "schema_version": "1",
    "artifact_type": "replicate",
    "analysis_name": "rmsd",
    "condition_label": "PEGylated",
    "replicate": 1,
    "payload": {
        "runs": [
            {
                "run_label": "Protein Backbone",
                "selection": "protein and name CA",
                "alignment_selection": "protein and name CA",
                "reference_mode": "centroid",
                "mean_rmsd": 1.823,
                "std_rmsd": 0.312,
                "median_rmsd": 1.791,
                "sem_rmsd": 0.078,
                "converged": true,
                "convergence_time_ns": 12.5,
                "timeseries_sidecar": "sidecars/rmsd_Protein_Backbone_timeseries.npz"
            }
        ]
    },
    "metadata": {"equilibration": "10ns", "time_unit": "ns"},
    "provenance": {"trajectory_files": ["..."], "n_frames_used": 9000},
    "sidecars": [
        {
            "path": "sidecars/rmsd_Protein_Backbone_timeseries.npz",
            "metadata": {"kind": "timeseries", "run_label": "Protein Backbone"}
        }
    ]
}

Aggregated result (ConditionArtifact), representative structure:

{
    "schema_version": "1",
    "artifact_type": "condition",
    "analysis_name": "rmsd",
    "condition_label": "PEGylated",
    "replicates": [1, 2, 3],
    "payload": {
        "runs": [
            {
                "run_label": "Protein Backbone",
                "selection": "protein and name CA",
                "metrics": {
                    "mean_rmsd": {"values": [1.823, 1.891, 1.854], "mean": 1.856, "sem": 0.034}
                },
                "convergence": {
                    "n_converged_replicates": 3,
                    "convergence_fraction": 1.0,
                    "mean_convergence_time_ns": 13.2
                }
            }
        ]
    },
    "metadata": {"equilibration": "10ns"},
    "provenance": {"source_replicates": [1, 2, 3]}
}

Plot Types

The RMSD plugin generates figures through polyzymd compare plot-all:

Plot output	Description
`rmsd_timeseries_<run>.png`	Mean RMSD vs time with SEM shading, one per run
`rmsd_comparison_<run>.png`	Grouped bar chart of mean RMSD across conditions, one per run
`rmsd_convergence_<condition>_<run>.png`	Dual-axis plot: RMSD timeseries with sliding-window slope and convergence marker (requires `show_convergence_plots: true`)

Timeseries plot features:

Mean RMSD curve per condition with SEM shading
Legend placed outside the plot area (bbox_to_anchor=(1.02, 0.5))
Optional per-replicate traces via show_per_replicate: true

RMSD plot behavior can be customized in comparison.yaml:

plot_settings:
  rmsd:
    show_per_replicate: false    # Overlay individual replicate traces
    figsize: [10, 6]             # Default figure size (bar charts)
    timeseries_figsize: [12, 5]  # Timeseries figure size (wider)
    show_convergence_plots: false  # Generate per-replicate convergence diagnostics
    convergence_figsize: [12, 5]   # Convergence panel figure size

Convergence Detection

Convergence detection is always on — every RMSD run automatically applies a sliding-window slope heuristic to determine whether the RMSD timeseries has plateaued. This is a purely additive diagnostic: it does not affect ranking, statistical tests, or any other comparison output. Convergence results appear as additional fields in per-replicate and aggregated JSON files, and optional convergence plots can be enabled via show_convergence_plots: true.

For a conceptual explanation of the algorithm, its parameters, and its limitations, see Establishing Convergence in MD Simulations.

Common CLI Options

Option	Default	Description
`-f, --file`	`comparison.yaml`	Path to comparison configuration
`--eq-time`	`0ns`	Equilibration time to skip
`--recompute`	off	Ignore cached results and recompute
`--format`	`table`	Output format (`table` or `json`)
`-o, --output`	(none)	Save formatted output to file
`-q, --quiet`	off	Suppress INFO messages
`--debug`	off	Enable DEBUG logging

Troubleshooting

“Selection matched no atoms”

Cause: MDAnalysis selection doesn’t match any atoms in your topology.

Fix:

Check residue numbering in your PDB vs. MDAnalysis (0-indexed vs 1-indexed)
Verify atom names match your topology
Use polyzymd --debug compare run rmsd -f comparison.yaml ... for detailed diagnostics

“At least one RMSD run must be defined”

Cause: The runs list in plugins.rmsd is empty or missing.

Fix: Add at least one run entry with a label field:

plugins:
  rmsd:
    runs:
      - label: "Protein Backbone"

“reference_file does not exist”

Cause: Using reference_mode: external but the PDB path is invalid.

Fix: Provide an absolute path or a path relative to the working directory:

reference_mode: "external"
reference_file: "/absolute/path/to/crystal.pdb"

“atom count mismatch between trajectory and external PDB”

Cause: The selection string matches different numbers of atoms in the trajectory vs. the external reference PDB.

Fix:

Ensure both systems use the same atom naming convention
Check that the external PDB contains the same residues as your simulation
Use a more specific selection if topologies differ

Very high RMSD values (> 10 Å)

Cause: Usually indicates alignment issues, wrong selection, or unfolding.

Fix:

Check that alignment_selection matches atoms in your system
Try reference_mode: "average" to compare
Verify trajectory files are complete
Check for protein unfolding or large conformational changes

“Low statistical reliability” warning

Cause: Long correlation time relative to trajectory length.

This is informational, not an error. Results are still valid but uncertainties may be underestimated.

Mitigation:

Use multiple replicates (aggregated SEM is more reliable)
Run longer simulations
Results are still useful for qualitative comparisons

Missing replicate data

Message: Skipping replicate N: trajectory data not found

Cause: The requested replicate hasn’t completed or path is incorrect.

Fix: This is informational — analysis continues with available replicates. Check simulation status if unexpected.

RMSD vs RMSF Comparison

Feature	RMSD	RMSF
Measures	Global deviation from reference	Per-residue fluctuation
Output	One value per frame (timeseries)	One value per residue (profile)
Reference	Fixed structure (centroid/average/external)	Time-averaged position
Detects	Conformational drift, unfolding	Flexible loops, rigid core
Multi-run	Yes (`runs` list with different selections)	Single selection
Best for	Equilibration assessment, stability comparison	Flexibility mapping

Tip

Use RMSD first to assess overall stability and choose equilibration time, then use RMSF to identify which regions drive flexibility differences.