# RMSD Interpretation: Use, Limits, and Cautions RMSD is a useful structural similarity diagnostic, but it is not proof of thermodynamic equilibration, statistical convergence, or biological stability by itself. This page explains what RMSD can and cannot support when interpreting PolyzyMD trajectories, with emphasis on reference choice, atom selection, autocorrelation, and cautious condition-level comparison. ```{versionadded} 1.3.0 The RMSD analysis plugin was added in PolyzyMD 1.3.0. ``` ```{note} **Just need quick results?** See the [Quick Start Guide](../how_to/analysis_rmsd_quickstart.md) for copy-paste commands and minimal setup. ``` ```{seealso} **For foundational statistical concepts** (autocorrelation, correlation time, the difference between means vs. variances), see the [Statistics Best Practices Guide](analysis_statistics_best_practices.md). This page focuses on **RMSD-specific** interpretation: what the values mean, which assumptions they depend on, and where the conclusions can be ambiguous. ``` ## What is RMSD? **Root Mean Square Deviation (RMSD)** measures the average distance between atoms in a structure and a reference structure after optimal superposition: $$ \text{RMSD}(t) = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \left\| \mathbf{r}_i(t) - \mathbf{r}_i^{\text{ref}} \right\|^2} $$ Where: - $\mathbf{r}_i(t)$ is the position of atom $i$ at time $t$ - $\mathbf{r}_i^{\text{ref}}$ is the position of atom $i$ in the reference structure - $N$ is the number of atoms in the selection Unlike RMSF, which averages over time to give one value per residue, RMSD gives one value per frame. The result is a timeseries describing how a selected set of atoms moves relative to a chosen reference. ### What RMSD can and cannot tell you RMSD measures distance from a chosen reference for a chosen atom selection. It does not directly measure free energy, functional activity, or thermodynamic stability. The same RMSD value can arise from different molecular motions, and a single RMSD timeseries can hide local rearrangements that matter biologically. | RMSD behavior | Cautious structural interpretation | |---------------|------------------------------------| | Low and stable | Selected atoms remain close to the chosen reference over the sampled interval. | | Gradually increasing | Possible drift away from the reference; the cause is not unique. | | Plateau after rise | Suggests structural stationarity for the selected atoms/reference, not full thermodynamic equilibration. | | Sudden jump | May reflect a transition, alignment artifact, domain motion, ligand event, or unfolding; inspect structures. | | Oscillating | May indicate repeated conformational motion or reference/alignment sensitivity; not necessarily two-state behavior. | ## Interpreting RMSD Values ### Rough Cα RMSD heuristics for folded globular proteins The following values are rough heuristics for **Cα RMSD of small-to-medium folded globular proteins**. They are not universal quality thresholds and should not be used to label a system as stable or unstable without additional context. | Cα RMSD (Å) | Possible interpretation | Common contributors | |-------------|------------------------|---------------------| | 0.5 – 1.5 | Close to reference for the selected atoms | Rigid core, short trajectory, restrained or crystal-like geometry | | 1.5 – 2.5 | Modest deviation from reference | Typical backbone fluctuations for many compact proteins | | 2.5 – 3.5 | Larger deviation from reference | Flexible loops, termini, lid opening, domain motion | | 3.5 – 5.0 | Large reference-relative change | Domain rearrangement, alignment sensitivity, partial unfolding | | > 5.0 | Very large reference-relative change | Major rearrangement, unfolding, or different conformational basin | ```{note} These heuristics assume comparable atom selections, alignment choices, reference modes, protein sizes, simulation lengths, and force-field contexts. Always compare like-with-like: same selection, same reference mode, same atoms. ``` ### Selection matters The choice of atoms for RMSD calculation strongly affects the result: | Selection | Typical use | Interpretation caution | |-----------|-------------|------------------------| | `protein and name CA` | Global backbone similarity | Flexible termini and domain motions can dominate. | | `protein and backbone` | Backbone conformation | Includes more atoms than Cα and can change the scale. | | `protein and name CA and resid 50:150` | Core-region similarity | Excludes regions that may be scientifically important. | | Active site residues | Local catalytic-geometry proxy | Low RMSD does not prove catalytic competence. | | `chainid C and not name H*` | Polymer conformation relative to reference | Polymer RMSD can be highly reference- and alignment-dependent. | ## RMSD vs Time: Interpreting Patterns Cautiously ### Plateau-like behavior ```text RMSD 3 | ___________ | / 2 | / | / 1 | / |_____/ 0 +----------------------→ Time 0 10 20 30 40 ns ``` An initial rise followed by an apparent plateau can suggest that the selected atoms have reached a reference-relative stationary regime. This is useful, but limited: it does not prove thermodynamic equilibration, convergence of other observables, or adequate sampling of all relevant conformations. ### Conformational drift ```text RMSD 5 | / | / 4 | / | / 3 | / |_______________/ 0 +----------------------→ Time ``` A continuously rising RMSD suggests ongoing movement away from the reference for the selected atoms. Possible explanations include slow relaxation, domain motion, unfolding, reference mismatch, alignment choices, or insufficient sampling. RMSD alone usually cannot identify which explanation is correct. ### Sudden jumps A sharp RMSD increase mid-trajectory indicates a rapid change in reference-relative geometry, but the molecular cause is non-unique. It may reflect loop flipping, lid opening, domain rearrangement, ligand motion, alignment sensitivity, imaging artifacts, or partial unfolding. ```{tip} When you observe a jump, load the trajectory in a molecular viewer and examine frames around the transition. Visual inspection can distinguish chemically meaningful events from alignment, imaging, or selection artifacts. ``` ### Oscillations Regular RMSD oscillations can be consistent with repeated motion between reference-relative geometries, but they do not by themselves establish discrete metastable states. Hinge bending, active-site lid dynamics, allosteric motion, alignment choices, and periodic boundary artifacts can all produce oscillatory patterns. For oscillating systems, the range and timescale of the oscillation often convey more than the mean RMSD alone. ## How PolyzyMD Handles Autocorrelation RMSD timeseries are autocorrelated because adjacent MD frames are not independent samples. A trajectory with many saved frames can still contain far fewer statistically independent observations. PolyzyMD reports uncertainty in terms of statistical inefficiency where possible. Conceptually, the effective sample size is `N_eff = N / g`, where `g` is the statistical inefficiency. For a simple integrated autocorrelation-time estimate, `g ≈ 1 + 2τ/dt`, with `τ` the integrated autocorrelation time and `dt` the frame spacing. Larger `g` means stronger correlation and fewer effective samples. This correction helps avoid treating adjacent frames as independent, but it does not replace independent replicate simulations or guarantee convergence of the underlying conformational ensemble. ```{seealso} For the mathematical details of autocorrelation functions and the LiveCoMS recommendations, see the [Statistics Best Practices Guide](analysis_statistics_best_practices.md). ``` ## Multi-Run Analysis: Why It Helps Interpretation Different RMSD selections answer different questions: | Run Label | Selection | Question | |-----------|-----------|----------| | "Protein Backbone" | `protein and name CA` | How close is the global backbone to this reference? | | "Active Site" | Catalytic residues CA | How close is the local active-site geometry to this reference? | | "Polymer Core" | `chainid C and not name H*` | How close is the polymer conformation to this reference? | | "Crystal Deviation" | `protein and name CA` (external ref) | How close is the protein to an external structural state? | Each run is ranked independently across conditions. This prevents averaging RMSD from structurally different selections, which would be difficult to interpret: ```text Rankings: Protein Backbone: With Polymer < No Polymer (closer to reference) Active Site: With Polymer < No Polymer (closer to reference) Polymer Core: No Polymer — (single condition only) ``` ## External Reference for Catalytic Competence When studying enzyme catalysis across multiple conditions, the standard reference modes (`centroid`, `average`) use a **condition-specific** reference: each condition's trajectory determines its own reference structure. The `external` reference mode uses a **condition-independent** reference, typically a crystal structure representing a specific geometry of interest. RMSD then measures deviation from that external structure: $$ \text{RMSD}^{\text{ext}}(t) = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \left\| \mathbf{r}_i(t) - \mathbf{r}_i^{\text{crystal}} \right\|^2} $$ **Interpretation changes with external reference:** | Metric | Standard RMSD (centroid/average) | External Reference RMSD | |--------|----------------------------------|------------------------| | Low value | Structure stays near its own trajectory-derived reference | Structure stays near the external geometry | | High value | Structure deviates from its trajectory-derived reference | Structure deviates from the external geometry | | Condition comparison | Which condition remains closer to its chosen internal reference? | Which condition remains closer to the external structure? | ```{tip} **Which reference mode for enzymes?** Use `centroid` or `average` for trajectory-internal reference-relative motion. Use `external` to ask whether a trajectory remains close to a specific known structure. External-reference RMSD does not make "closer" inherently better or more stable unless the external structure is justified as the relevant state for the scientific question. ``` ## Replicates vs Longer Simulations ### The LiveCoMS recommendation > "Multiple independent simulations are preferable to a single long simulation" > — Grossfield et al. (2018) ### Why replicates matter for RMSD | Multiple Replicates | Single Long Simulation | |--------------------|------------------------| | Independent starting points | Frames remain correlated | | Tests reproducibility of drift/plateau patterns | May remain trapped in one metastable state | | Supports uncertainty from replicate means | Requires autocorrelation correction within trajectory | | Parallelizable | Sequential | ```{note} With only 1 replicate, PolyzyMD still computes RMSD and includes the condition in descriptive summaries and rankings. Replicate SEM is unavailable because variability across independent simulations cannot be estimated from a singleton. Pairwise inferential tests require at least 2 replicates per condition. ``` ## Comparing Conditions ### What PolyzyMD computes For each RMSD run, the comparison produces: | Statistic | Description | |-----------|-------------| | **Ranking** | Conditions sorted by mean RMSD (lowest = closest to the chosen reference) | | **Percent change** | Relative to control condition | | **Direction** | Plugin labels such as `stabilizing`, `destabilizing`, or `unchanged`; interpret as reference-relative unless separately justified | | **t-statistic** | Two-sample t-test on replicate means | | **p-value** | Two-tailed significance | | **Cohen's d** | Effect size magnitude | | **ANOVA** | Omnibus F-test when 3+ conditions (per-run) | ### Direction labels PolyzyMD classifies the direction of change based on percent change in mean RMSD relative to control. For RMSD, these labels are shorthand and should be read as changes in closeness to the chosen reference, not proof of biological stability. | Percent Change | Direction | Meaning | |---------------|-----------|---------| | < −1% | `stabilizing` | Treatment reduces reference-relative deviation | | > +1% | `destabilizing` | Treatment increases reference-relative deviation | | −1% to +1% | `unchanged` | No meaningful difference by this threshold | ### Interpreting the comparison When one condition has lower mean RMSD than another, the most direct statement is that it stayed closer to the chosen reference for the selected atoms over the analyzed interval. Stronger claims, such as improved stability or functional preservation, require supporting evidence from the scientific context and other observables. PolyzyMD writes canonical RMSD artifacts through the analysis lifecycle. The stable locations are: - `analysis//rmsd/run_/result.json` for replicate-level artifacts - `analysis//rmsd/aggregated/result.json` for condition-level artifacts - `comparison/rmsd/result.json` for comparison artifacts Treat artifact contents as structured payloads and provenance that may refer to sidecars for larger data. Avoid depending on undocumented raw JSON field names unless they are described in reference documentation. ## Common Pitfalls ### 1. Treating a plateau as proof of equilibration **Symptom:** A plateau-like RMSD trace is described as complete equilibration. **Caution:** A plateau suggests stationarity of the selected atoms relative to the chosen reference. Other coordinates, slow modes, ligand states, solvent structure, or functional observables may still be unequilibrated. **Better interpretation:** "RMSD reached an apparent plateau for this selection and reference after the initial relaxation period." ### 2. Comparing different selections **Symptom:** RMSD values are not comparable across runs or publications. **Cause:** Different atom selections yield different RMSD magnitudes. **Better interpretation:** Always report the exact selection string. Compare only runs with identical selections, references, and alignment conventions. ### 3. Over-interpreting small differences **Symptom:** Claiming significance for 0.05 Å differences. **Cause:** Not accounting for uncertainty. **Better interpretation:** Report uncertainty and avoid implying meaningful structural differences when confidence intervals overlap substantially or replicate variation dominates: ```text # WRONG: "Condition A (1.856 Å) is less stable than B (1.861 Å)" # RIGHT: "Condition A (1.856 ± 0.034 Å) and B (1.861 ± 0.028 Å) # are not significantly different (p = 0.91, unchanged)" ``` ### 4. Ignoring timeseries shape **Symptom:** Reporting only mean RMSD without inspecting the timeseries. **Cause:** Two conditions can have the same mean RMSD but very different dynamics, such as one plateau-like trace and one drifting trace. **Better interpretation:** Inspect the timeseries shape before reducing the trajectory to a mean. Similar means can arise from stationary, drifting, or multi-regime trajectories. ### 5. Using all-atom RMSD without justification **Symptom:** Very high RMSD values even for compact proteins. **Cause:** Side-chain motions can dominate all-atom RMSD, obscuring backbone changes. **Better interpretation:** Use Cα, backbone, all-atom, or local selections according to the scientific question. Side-chain-rich selections are valid when side-chain rearrangements are the intended observable, but their RMSD scale is not interchangeable with Cα RMSD. ### 6. Ignoring replicate variation **Symptom:** Reporting within-trajectory SEM as the total uncertainty. **Cause:** Treating autocorrelation-corrected SEM as sufficient. **Better interpretation:** Use independent replicate statistics when available. Within-trajectory uncertainty can account for adjacent-frame correlation, but replicate-to-replicate variability better reflects sensitivity to initial conditions and sampling path. ### 7. Choosing the wrong reference mode **Symptom:** Unexpected or hard-to-interpret comparison results. **Cause:** Using `centroid` when `external` is more appropriate for the scientific question, or interpreting external-reference RMSD as inherently better when it is merely closer to the supplied structure. **Better interpretation:** Match reference mode to your scientific question: - Trajectory-internal reference-relative motion → `centroid` or `average` - Closeness to a specified structural state → `external` with a justified reference structure ### 8. Treating automated convergence as ground truth **Symptom:** Trusting an automated convergence diagnostic without further inspection. **Cause:** A sliding-window heuristic is parameter-dependent and can miss slow drift, metastable trapping, or convergence issues in observables other than RMSD. **Better interpretation:** Use convergence diagnostics as one input among several. Inspect the RMSD timeseries, run multiple independent replicates when possible, and check other relevant observables such as Rg, SASA, contacts, or active-site distances. See {doc}`/explanation/convergence_detection` for a full discussion of limitations. ## RMSD as one equilibration diagnostic RMSD is commonly used as an equilibration diagnostic because large structural relaxations often appear as changes in reference-relative distance. Its role is diagnostic, not definitive. A plateau can support the claim that the selected atoms are no longer drifting relative to the reference on the observed timescale, but it does not establish thermodynamic equilibration or convergence of all relevant observables. ```{tip} If RMSD never appears stationary within the simulation time, possible explanations include slow relaxation, reference mismatch, large-amplitude domain motion, unfolding, or simply insufficient sampling. Distinguish these by inspecting structures and complementary observables. ``` ### Automated convergence detection ```{versionadded} 1.3.0 ``` PolyzyMD can run a sliding-window convergence diagnostic on RMSD timeseries. The diagnostic evaluates whether reference-relative RMSD changes remain below a configured threshold over a sustained interval. The resulting information is stored as part of the canonical RMSD artifact payload and provenance, with condition-level summaries represented in aggregated artifacts. Larger timeseries or plot-ready data may be represented through sidecars referenced by the artifact. **This is a diagnostic tool, not a definitive convergence proof.** The heuristic can miss slow drift below the slope threshold, and convergence in RMSD does not guarantee convergence of other observables. Always use multiple replicates and visual inspection alongside automated diagnostics. For command-oriented usage, see the [RMSD Quick Start Guide](../how_to/analysis_rmsd_quickstart.md). For a full conceptual treatment of convergence diagnostics — including the algorithm, parameters, tuning guidance, and limitations — see {doc}`/explanation/convergence_detection`. ## References ### Primary Reference **Grossfield A, Patrone PN, Roe DR, Schultz AJ, Siderius DW, Zuckerman DM.** (2018) "Best Practices for Quantification of Uncertainty and Sampling Quality in Molecular Simulations." *Living Journal of Computational Molecular Science* 1(1):5067. https://doi.org/10.33011/livecoms.1.1.5067 ### Additional References **Knapp B, Frantal S, Greshake B, Schwarz R, et al.** (2018) "Is an Intuitive Convergence Definition of Molecular Dynamics Simulations Solely Based on the Root Mean Square Deviation Possible?" *Journal of Computational Biology* 25:1069-1077. Discussion of RMSD-based convergence assessment and its limitations. **Maiorov VN, Crippen GM.** (1994) "Significance of Root-Mean-Square Deviation in Comparing Three-dimensional Structures of Globular Proteins." *Journal of Molecular Biology* 235(2):625-634. https://doi.org/10.1006/jmbi.1994.1017 Foundational work on RMSD as a structural similarity measure. **Sargsyan K, Grauffel C, Bhagdev C.** (2017) "How Molecular Size Impacts RMSD Applications in Molecular Dynamics Simulations." *Journal of Chemical Theory and Computation* 13(4):1518-1524. https://doi.org/10.1021/acs.jctc.7b00028 Analysis of how protein size affects expected RMSD values. ## See Also - [Quick Start Guide](../how_to/analysis_rmsd_quickstart.md) — Get results fast - [Convergence Detection](convergence_detection.md) — Conceptual guide to convergence: algorithm, parameters, and limitations - [Statistics Best Practices](analysis_statistics_best_practices.md) — Foundational statistics for MD - [RMSF Best Practices](analysis_rmsf_best_practices.md) — Per-residue fluctuation analysis - [Reference Structure Selection](analysis_reference_selection.md) — Choose alignment reference - [Compare Simulation Conditions](../how_to/analysis_compare_conditions.md) — Full comparison workflow - [LiveCoMS Best Practices](https://livecomsjournal.org/index.php/livecoms/article/view/v1i1e5067) — Full methodology paper