# RMSD Plugin Reference For a step-by-step guide to running RMSD analysis, see {doc}`../how_to/analysis_rmsd_quickstart`. ## Configuration Reference All fields for `RMSDRunSettings`: | Field | Type | Default | Description | |-------|------|---------|-------------| | `label` | `str` | *required* | Human-readable run label (must be unique) | | `selection` | `str` | `"protein and name CA"` | MDAnalysis selection for RMSD calculation | | `alignment_selection` | `str` | `"protein and name CA"` | MDAnalysis selection for trajectory alignment | | `reference_mode` | `str` | `"centroid"` | Reference mode: `centroid`, `average`, `frame`, or `external` | | `reference_frame` | `int` | `0` | 0-indexed frame for `reference_mode: frame` | | `reference_file` | `str \| None` | `null` | Path to external PDB for `reference_mode: external` | | `centroid_selection` | `str \| None` | `null` | Selection for centroid finding; defaults to `alignment_selection` | | `convergence_window_size_ns` | `float` | `15.0` | Sliding window size for convergence detection (ns) | | `convergence_step_size_ns` | `float` | `5.0` | Step between successive windows (ns) | | `convergence_slope_threshold` | `float` | `0.0005` | Max absolute slope to qualify as "flat" (Å/ns) | | `convergence_sustained_for_ns` | `float` | `15.0` | Required sustained duration below threshold (ns) | Top-level `RMSDSettings` contains a single field: | Field | Type | Default | Description | |-------|------|---------|-------------| | `runs` | `list[RMSDRunSettings]` | *required* | One or more named RMSD runs (at least one required) | ```{note} Run labels must be unique within a single `comparison.yaml`. Duplicate labels raise a validation error. ``` ## Output Files Results are saved in your project's analysis directory: ```text / └── analysis/ └── rmsd/ ├── run_1/ │ ├── rmsd_eq10ns.json │ ├── rmsd_Protein Backbone_timeseries.npz │ └── rmsd_Active Site_timeseries.npz ├── run_2/ │ ├── rmsd_eq10ns.json │ ├── rmsd_Protein Backbone_timeseries.npz │ └── rmsd_Active Site_timeseries.npz ├── run_3/ │ └── ... └── aggregated/ └── rmsd_reps1-3_eq10ns.json ``` Each replicate directory contains: - **JSON result** — summary statistics for all configured runs - **NPZ sidecar(s)** — raw per-frame RMSD timeseries (one per run) ### JSON Result Structure Per-replicate result (`RMSDResult`): ```python { "config_hash": "abc123...", "replicate": 1, "equilibration_time": 10.0, "equilibration_unit": "ns", "selection_string": "protein and name CA; ...", "n_frames_total": 10000, "n_frames_used": 9000, "trajectory_files": ["..."], "run_results": [ { "run_label": "Protein Backbone", "selection": "protein and name CA", "alignment_selection": "protein and name CA", "reference_mode": "centroid", "mean_rmsd": 1.823, "std_rmsd": 0.312, "median_rmsd": 1.791, "min_rmsd": 0.987, "max_rmsd": 3.104, "final_rmsd": 1.956, "sem_rmsd": 0.078, "correlation_time": 4521.3, "correlation_time_unit": "ps", "n_independent_frames": 16, "statistical_inefficiency": 562.7, "n_frames_total": 10000, "n_frames_used": 9000, "npz_path": ".../rmsd_Protein Backbone_timeseries.npz", "time_unit": "ns", "timestep_ps": 10.0, "converged": true, "convergence_assessable": true, "convergence_time_ns": 12.5, "convergence_message": "Converged at 12.500 ns" } ] } ``` Aggregated result (`RMSDAggregatedResult`): ```python { "replicates": [1, 2, 3], "n_replicates": 3, "run_results": [ { "run_label": "Protein Backbone", "selection": "protein and name CA", "overall_mean": 1.856, "overall_sem": 0.034, "overall_median": 1.823, "per_replicate_means": [1.823, 1.891, 1.854], "per_replicate_stds": [0.312, 0.298, 0.324], "per_replicate_medians": [1.791, 1.862, 1.816], "n_converged_replicates": 3, "convergence_fraction": 1.0, "mean_convergence_time_ns": 13.2, "median_convergence_time_ns": 12.5 } ] } ``` ## Plot Types The RMSD plugin generates figures through `polyzymd compare plot-all`: | Plot output | Description | |-------------|-------------| | `rmsd_timeseries_.png` | Mean RMSD vs time with SEM shading, one per run | | `rmsd_comparison_.png` | Grouped bar chart of mean RMSD across conditions, one per run | | `rmsd_convergence__.png` | Dual-axis plot: RMSD timeseries with sliding-window slope and convergence marker (requires `show_convergence_plots: true`) | **Timeseries plot features:** - Mean RMSD curve per condition with SEM shading - Legend placed outside the plot area (`bbox_to_anchor=(1.02, 0.5)`) - Optional per-replicate traces via `show_per_replicate: true` RMSD plot behavior can be customized in `comparison.yaml`: ```yaml plot_settings: rmsd: show_per_replicate: false # Overlay individual replicate traces figsize: [10, 6] # Default figure size (bar charts) timeseries_figsize: [12, 5] # Timeseries figure size (wider) show_convergence_plots: false # Generate per-replicate convergence diagnostics convergence_figsize: [12, 5] # Convergence panel figure size ``` ## Convergence Detection Convergence detection is always on — every RMSD run automatically applies a sliding-window slope heuristic to determine whether the RMSD timeseries has plateaued. This is a purely additive diagnostic: it does not affect ranking, statistical tests, or any other comparison output. Convergence results appear as additional fields in per-replicate and aggregated JSON files, and optional convergence plots can be enabled via `show_convergence_plots: true`. For a conceptual explanation of the algorithm, its parameters, and its limitations, see {doc}`../explanation/convergence_detection`. ## Common CLI Options | Option | Default | Description | |--------|---------|-------------| | `-f, --file` | `comparison.yaml` | Path to comparison configuration | | `--eq-time` | `0ns` | Equilibration time to skip | | `--recompute` | off | Ignore cached results and recompute | | `--format` | `table` | Output format (`table` or `json`) | | `-o, --output` | (none) | Save formatted output to file | | `-q, --quiet` | off | Suppress INFO messages | | `--debug` | off | Enable DEBUG logging | ## Troubleshooting ### "Selection matched no atoms" **Cause:** MDAnalysis selection doesn't match any atoms in your topology. **Fix:** - Check residue numbering in your PDB vs. MDAnalysis (0-indexed vs 1-indexed) - Verify atom names match your topology - Use `polyzymd --debug compare run rmsd -f comparison.yaml ...` for detailed diagnostics ### "At least one RMSD run must be defined" **Cause:** The `runs` list in `plugins.rmsd` is empty or missing. **Fix:** Add at least one run entry with a `label` field: ```yaml plugins: rmsd: runs: - label: "Protein Backbone" ``` ### "reference_file does not exist" **Cause:** Using `reference_mode: external` but the PDB path is invalid. **Fix:** Provide an absolute path or a path relative to the working directory: ```yaml reference_mode: "external" reference_file: "/absolute/path/to/crystal.pdb" ``` ### "atom count mismatch between trajectory and external PDB" **Cause:** The `selection` string matches different numbers of atoms in the trajectory vs. the external reference PDB. **Fix:** - Ensure both systems use the same atom naming convention - Check that the external PDB contains the same residues as your simulation - Use a more specific selection if topologies differ ### Very high RMSD values (> 10 Å) **Cause:** Usually indicates alignment issues, wrong selection, or unfolding. **Fix:** - Check that `alignment_selection` matches atoms in your system - Try `reference_mode: "average"` to compare - Verify trajectory files are complete - Check for protein unfolding or large conformational changes ### "Low statistical reliability" warning **Cause:** Long correlation time relative to trajectory length. **This is informational, not an error.** Results are still valid but uncertainties may be underestimated. **Mitigation:** - Use multiple replicates (aggregated SEM is more reliable) - Run longer simulations - Results are still useful for qualitative comparisons ### Missing replicate data **Message:** `Skipping replicate N: trajectory data not found` **Cause:** The requested replicate hasn't completed or path is incorrect. **Fix:** This is informational — analysis continues with available replicates. Check simulation status if unexpected. ## RMSD vs RMSF Comparison | Feature | RMSD | RMSF | |---------|------|------| | **Measures** | Global deviation from reference | Per-residue fluctuation | | **Output** | One value per frame (timeseries) | One value per residue (profile) | | **Reference** | Fixed structure (centroid/average/external) | Time-averaged position | | **Detects** | Conformational drift, unfolding | Flexible loops, rigid core | | **Multi-run** | Yes (`runs` list with different selections) | Single selection | | **Best for** | Equilibration assessment, stability comparison | Flexibility mapping | ```{tip} Use RMSD first to assess overall stability and choose equilibration time, then use RMSF to identify which regions drive flexibility differences. ```