# Establishing Convergence in MD Simulations

Understanding when a molecular dynamics simulation has converged — and what
convergence means in practice — is essential for drawing reliable conclusions.

```{versionadded} 1.3.0
Automated convergence detection was added alongside the RMSD analysis plugin.
```

## What Is Convergence in MD?

A simulation has *converged* when its observable of interest has stopped
drifting and is sampling from a stationary distribution. In the RMSD context,
this means the protein's deviation from a reference structure has settled into
a fluctuating plateau rather than continuing to increase or decrease.

Convergence is **not the same as equilibration**. Equilibration refers to the
initial transient period after simulation launch, during which the system
relaxes from its starting configuration. Convergence refers to the state of
the production region itself — whether the trajectory has sampled long enough
that running averages are stable and the statistical properties of the
observable are no longer evolving.

## Why It Matters

Conclusions drawn from non-converged simulations are unreliable. If the RMSD
is still drifting upward, the mean RMSD and its uncertainty will change
depending on how much data you include. Effect sizes between conditions may
appear significant or insignificant depending on where you truncate the
timeseries.

Grossfield et al. (2019) emphasize that quantifying uncertainty requires
sampling from a stationary distribution. If the distribution itself is still
evolving — as it is during a drift — standard error estimates understate the
true uncertainty.

## Visual Indicators of Convergence

Before any automated tool, researchers assess convergence by inspecting
timeseries plots. Signs that a trajectory has converged include:

- **Plateau in the timeseries.** The observable fluctuates around a stable
  mean rather than trending in one direction.
- **Stable running averages.** A running mean computed over successively
  longer windows stops changing appreciably.
- **Decorrelation time stabilization.** The estimated autocorrelation time
  of the observable converges to a consistent value rather than growing.

These visual checks remain valuable even when automated diagnostics are
available. Automated methods can miss patterns — such as oscillations between
two metastable states — that are obvious to a trained eye.

## PolyzyMD's Sliding-Window Approach

PolyzyMD implements a sliding-window slope heuristic for convergence detection.
The algorithm operates on any 1D timeseries (typically RMSD vs. time) and
proceeds as follows:

1. **Divide the timeseries into overlapping windows.** Each window spans a
   fixed duration (default: 15 ns) and successive windows are offset by a
   step size (default: 5 ns).

2. **Compute the mean observable in each window.** This smooths out
   frame-to-frame noise while preserving slow drift.

3. **Estimate the slope between successive window means.** The slope captures
   the rate of change in the smoothed signal.

4. **Check for sustained low slope.** If the absolute slope remains below a
   threshold for a sustained duration, the timeseries is declared converged.
   The convergence time is the start of the first sustained plateau.

This approach is robust to brief transient excursions — a single window with a
slightly elevated slope does not reset the clock unless it exceeds the
threshold. The requirement for *sustained* low slope avoids false positives
from momentary pauses in an otherwise drifting trajectory.

## Default Parameters and When to Tune Them

| Parameter | Default | Description |
|-----------|---------|-------------|
| `convergence_window_size_ns` | 15.0 | Width of each averaging window (ns) |
| `convergence_step_size_ns` | 5.0 | Step between successive window starts (ns) |
| `convergence_slope_threshold` | 0.0005 | Maximum absolute slope (Å/ns) to qualify as "flat" |
| `convergence_sustained_for_ns` | 15.0 | Required duration below threshold to declare convergence (ns) |

```{important}
The slope threshold is an **absolute** value in Å/ns.  It is calibrated for
protein backbone RMSD, where typical plateau values are 1–5 Å and a slope of
0.0005 Å/ns represents ~0.05 Å drift over 100 ns — well below the noise
floor for most systems.

For observables on a different scale (e.g. radius of gyration in nm, SASA in
Å², or unitless order parameters), you **must** rescale the threshold to match
the magnitude and natural variability of your signal. A threshold that is
appropriate for RMSD in Ångströms will be far too stringent for SASA (hundreds
of Å²) or far too permissive for a normalised order parameter (0–1).
```

**Guidance for tuning:**

- **Very long simulations (> 500 ns):** Increase `convergence_window_size_ns`
  and `convergence_sustained_for_ns` proportionally. A 15 ns window in a
  1 μs trajectory may be too sensitive to short-timescale fluctuations.

- **High-precision comparisons:** Decrease `convergence_slope_threshold` to
  require a flatter plateau before declaring convergence.

- **Noisy observables:** Increase `convergence_slope_threshold` to tolerate
  larger fluctuations. Polymer RMSD, for example, tends to be noisier than
  protein backbone RMSD.

- **Short simulations (< 50 ns):** Decrease `convergence_window_size_ns` and
  `convergence_sustained_for_ns` so the algorithm has enough data to assess.
  Be aware that shorter windows reduce the reliability of the assessment.

## Limitations

The sliding-window heuristic is a practical diagnostic, not a theoretical
proof of convergence. Important limitations include:

- **Not a proof of ergodic sampling.** A flat RMSD timeseries does not
  guarantee that the simulation has explored all relevant conformational
  states. The system could be trapped in a metastable basin.

- **Insensitive to slow conformational drift.** If the drift rate is below the
  slope threshold, the algorithm will declare convergence even though the
  observable is still changing — just slowly.

- **Parameter dependent.** The four tunable parameters (window size, step size,
  slope threshold, sustained duration) introduce subjective choices. Different
  parameter values can yield different convergence conclusions for the same
  trajectory.

- **Scale-dependent threshold.** The default slope threshold (0.0005 Å/ns) is
  calibrated for protein backbone RMSD in Ångströms. Applying it to observables
  with different units or magnitudes without adjustment will produce incorrect
  convergence calls.

- **Single-observable limitation.** Convergence in RMSD does not imply
  convergence in other observables (e.g., hydrogen bond occupancy, active site
  geometry). Different metrics may converge at different rates.

- **Multiple independent replicates remain essential.** Even with a converged
  RMSD timeseries in a single replicate, independent replicates are needed to
  quantify system-level variability and confirm reproducibility.

## Relationship to Equilibration Time

Convergence detection and equilibration time (`--eq-time`) address related but
distinct concerns:

- **Equilibration** removes transient artifacts from the start of the
  simulation — the period during which the system relaxes from its initial
  configuration. The equilibration time is a fixed cutoff applied *before*
  analysis begins.

- **Convergence** confirms that the production region (after equilibration) is
  stationary. It answers whether the remaining data is reliable for computing
  time-averaged properties.

Convergence detection can help *inform* the choice of `--eq-time`. If the
convergence diagnostic reports a convergence time of 12 ns but you set
`--eq-time 5ns`, the first 7 ns of your "production" data may still contain
drift. Conversely, if convergence is detected at 5 ns and you set
`--eq-time 20ns`, you may be discarding usable data.

In practice, the two are complementary: set `--eq-time` conservatively based
on visual inspection of the RMSD timeseries, then use convergence detection as
an independent consistency check.

## Beyond RMSD

The convergence utility (`analyses/shared/convergence.py`) accepts any 1D
timeseries — it is not specific to RMSD. The `find_convergence_time()`
function takes arrays of time values and signal values, making it applicable to
radius of gyration, solvent-accessible surface area, or any other scalar
observable that should plateau when the system reaches equilibrium.

In future PolyzyMD versions, convergence detection may be integrated into
additional analysis plugins. The algorithmic foundation is already
general-purpose.

## References

**Grossfield A, Patrone PN, Roe DR, Schultz AJ, Siderius DW, Zuckerman DM.**
(2019) "Best Practices for Quantification of Uncertainty and Sampling Quality
in Molecular Simulations." *Living Journal of Computational Molecular Science*
1(1):5067. [doi:10.33011/livecoms.1.1.5067](https://doi.org/10.33011/livecoms.1.1.5067)

The authoritative guide for uncertainty quantification in MD, including
discussion of convergence assessment, autocorrelation, and effective sample
sizes.

**Knapp B, Frantal S, Greshake B, Schwarz R, et al.** (2018) "Is an Intuitive
Convergence Definition of Molecular Dynamics Simulations Solely Based on the
Root Mean Square Deviation Possible?" *Journal of Computational Biology*
25:1069-1077.

Analysis of RMSD-based convergence criteria and their reliability, motivating
the use of sliding-window and sustained-plateau approaches over simple visual
inspection alone.

## See Also

- {doc}`/how_to/analysis_rmsd_quickstart` — RMSD quick start with convergence configuration
- {doc}`/explanation/analysis_rmsd_best_practices` — RMSD interpretation and best practices
- {doc}`/explanation/analysis_statistics_best_practices` — Statistical foundations for MD analysis
- [pymbar timeseries](https://pymbar.readthedocs.io/en/latest/timeseries.html)
  — `detectEquilibration()` and `statisticalInefficiency()` provide
  statistically rigorous alternatives to heuristic convergence detection.
  Integration with PolyzyMD is under consideration for a future release.