RMSF Analysis: Statistical Best Practices
A comprehensive guide to statistically rigorous RMSF analysis, following LiveCoMS recommendations for uncertainty quantification in molecular dynamics simulations.
Note
Just need quick results? See the Quick Start Guide for copy-paste commands and minimal setup.
See also
For foundational statistical concepts (autocorrelation, correlation time, the difference between means vs. variances), see the Statistics Best Practices Guide.
This page focuses on RMSF-specific guidance: what the warning messages mean, how to interpret RMSF values, and how to compare conditions.
Introduction
RMSF (Root Mean Square Fluctuation) measures how much each residue fluctuates around its average position. Because RMSF is a variance-based quantity (a second moment), it requires special statistical treatment—correlated samples don’t just reduce precision, they introduce systematic bias.
This guide explains:
What RMSF measures and how to interpret values
What the warning messages mean and what to do about them
How to properly compare conditions with statistical rigor
Common pitfalls and how to avoid them
Tip
Why does RMSF need subsampling? Unlike mean distances (which can use all frames with corrected SEM), variance estimates from correlated samples are systematically biased—they underestimate true fluctuations. PolyzyMD automatically subsamples to independent frames when computing RMSF. See the Statistics Best Practices for the full explanation.
What is RMSF?
Root Mean Square Fluctuation (RMSF) measures how much each atom or residue fluctuates around its average position during a simulation:
Where:
\(\mathbf{r}_i(t)\) is the position of atom/residue \(i\) at time \(t\)
\(\langle \mathbf{r}_i \rangle\) is the time-averaged position
\(T\) is the number of frames
What RMSF Measures
High RMSF |
Low RMSF |
|---|---|
Flexible regions |
Rigid regions |
Loops, termini |
Core β-sheets |
Solvent-exposed |
Buried residues |
Functionally mobile |
Structurally constrained |
Relationship to B-factors
RMSF is related to crystallographic B-factors (temperature factors):
This allows comparison between simulation and experimental data, though crystal packing effects mean the correspondence is imperfect.
How PolyzyMD Handles Correlation for RMSF
PolyzyMD automatically performs autocorrelation analysis and subsamples to independent frames before computing RMSF. This is essential because RMSF measures variance, which is systematically biased by correlated samples.
See also
For the mathematical details of autocorrelation functions, correlation time estimation, and the LiveCoMS formula, see the Statistics Best Practices Guide.
What PolyzyMD Does
Computes RMSD timeseries after alignment to reference structure
Estimates correlation time (τ) via ACF integration
Selects independent frames spaced by ≥2τ
Computes RMSF using only these independent frames
Example Output
When you run RMSF analysis, PolyzyMD reports:
Correlation time: 15394 ps (15.4 ns)
Statistical inefficiency: 308.9
Independent samples: 6 (from 2000 frames)
This means:
The RMSD decorrelates over ~15 ns timescales
Each independent sample represents ~300 correlated frames
You effectively have only 6 independent measurements
Understanding the Warnings
The “Low Statistical Reliability” Warning
When N_independent < 10, PolyzyMD issues this warning:
WARNING: Low statistical reliability: only 6 independent samples
(recommended >= 10). Correlation time τ = 15394 ps is comparable to
or longer than the trajectory sampling window. Consider:
(1) extending simulation time,
(2) using multiple independent trajectories, or
(3) interpreting results with caution.
See Grossfield et al. (2018) LiveCoMS 1:5067.
What This Warning Means
The warning indicates that your trajectory may not have sampled enough independent conformations to precisely estimate equilibrium properties. It does not mean:
Your simulation is broken
Your data is useless
You must start over
Decision Tree: What To Do
Got "Low statistical reliability" warning?
│
├─ Are you using multiple independent replicates?
│ ├─ YES → You're probably fine. Replicate-based statistics are robust.
│ └─ NO → Consider running 3-5 replicates instead of extending.
│
├─ What kind of conclusion do you need?
│ ├─ Qualitative (stable vs unstable) → Warning is informational only.
│ └─ Quantitative (precise mean ± SEM) → Need more sampling.
│
└─ Are you comparing conditions?
├─ YES → Focus on replicate-to-replicate variation for significance.
└─ NO → Report uncertainty based on replicate spread if available.
When the Warning is Not a Problem
You have multiple replicates: Inter-replicate variation provides a robust uncertainty estimate independent of within-trajectory correlation.
You’re making qualitative conclusions: “The protein is stable” doesn’t require precise RMSF values.
The effect size is large: A 50% difference in RMSF is meaningful even with uncertain absolute values.
Replicates vs Longer Simulations
The LiveCoMS Recommendation
“Multiple independent simulations are preferable to a single long simulation” — Grossfield et al. (2018)
Why Replicates Are Better
Multiple Replicates |
Single Long Simulation |
|---|---|
Truly independent samples |
Samples remain correlated |
Tests reproducibility |
May be trapped in metastable state |
Detects rare events/outliers |
Rare events may dominate or be missed |
Parallelizable |
Sequential |
Robust to system issues |
Single point of failure |
The Independence Argument
Even with τ = 20 ns, a 200 ns simulation yields only ~10 independent samples. Running that same simulation 3 times gives you 3 truly independent measurements of the mean RMSF:
Replicate 1: mean RMSF = 0.755 Å
Replicate 2: mean RMSF = 0.693 Å
Replicate 3: mean RMSF = 0.696 Å
Replicate mean: 0.715 Å
Replicate std: 0.035 Å
Replicate SEM: 0.020 Å (= std/√3)
This replicate-based uncertainty is valid regardless of within-trajectory correlation because the replicates started from different initial velocities.
How Many Replicates?
Replicates |
Power to Detect |
Practical Guidance |
|---|---|---|
3 |
Large effects (d > 2) |
Minimum for publication |
5 |
Medium effects (d > 1.3) |
Recommended standard |
10 |
Small effects (d > 0.9) |
High-precision studies |
For most enzyme-polymer studies, 3-5 replicates per condition is a reasonable balance between statistical power and computational cost.
Handling Incomplete Data
During active research, simulations are often in progress. You may want to analyze available data without waiting for all replicates to complete. PolyzyMD handles this gracefully by skipping missing or failed replicates with informative warnings.
Missing Replicates
When a requested replicate’s data cannot be found, PolyzyMD logs a warning and continues with the remaining replicates:
WARNING: Skipping replicate 2: trajectory data not found.
Working directory not found: /scratch/user/project/run_2
Has replicate 2 been simulated?
Failed Replicates
If a replicate exists but analysis fails (e.g., corrupted trajectory, missing atoms), PolyzyMD logs the error and continues:
WARNING: Skipping replicate 3: analysis failed with error:
Could not read DCD file: unexpected end of file
Aggregation Summary
When aggregating with incomplete data, PolyzyMD reports which replicates were successfully analyzed:
WARNING: Aggregating 2 of 3 requested replicates. Skipped: [2]
Minimum Requirements
At least 2 successful replicates are required for aggregation. This is because SEM calculation requires n ≥ 2. If fewer than 2 replicates succeed, PolyzyMD raises an error:
ValueError: Aggregation requires at least 2 successful replicates, but only
1 succeeded. Failed replicates: [2, 3]
Output File Naming
Output filenames reflect the actual replicates used, not the requested range:
Requested |
Successful |
Filename |
|---|---|---|
1-3 |
1, 2, 3 |
|
1-3 |
1, 3 |
|
1-5 |
1, 2, 4 |
|
Note: Contiguous ranges use a hyphen (1-3), non-contiguous use underscores (1_3).
Best Practices for Incomplete Data
Investigate missing data: If replicates consistently fail, check:
Did the simulation complete? Check SLURM logs for timeouts or errors.
Are trajectory files in the expected location? Verify paths in config.yaml.
Is the trajectory corrupted? Try loading it manually with MDAnalysis.
Document which replicates were used: When publishing results from incomplete data, clearly state which replicates contributed to aggregated statistics. The JSON output includes a
replicatesfield for this purpose.Re-run with complete data: Once all simulations finish, re-run analysis with
--recomputeto include all replicates:polyzymd analyze rmsf -c config.yaml -r 1-5 --recompute
Consider statistical implications: Results from 2 replicates have larger uncertainty than 3+. Be appropriately cautious when interpreting results with fewer replicates than planned.
Example: Analyzing During Active Simulations
A common workflow when simulations are still running:
# Request all 5 planned replicates, but only 3 have completed
polyzymd analyze rmsf -c config.yaml -r 1-5 --eq-time 100ns
# Output shows:
# Skipping replicate 4: trajectory data not found...
# Skipping replicate 5: trajectory data not found...
# Aggregating 3 of 5 requested replicates. Skipped: [4, 5]
#
# Results saved to: rmsf_reps1-3_eq100ns.json
Later, when all simulations complete:
# Re-run to include all replicates
polyzymd analyze rmsf -c config.yaml -r 1-5 --eq-time 100ns --recompute
# Now all 5 are included:
# Results saved to: rmsf_reps1-5_eq100ns.json
Comparing Conditions
Experimental Design
To compare two conditions (e.g., polymer vs no-polymer):
Run N replicates of each condition (N ≥ 3)
Compute RMSF for each replicate
Use replicate means as your data points (not per-frame values)
Perform statistical test on replicate means
The Statistical Test
Use a two-sample t-test on the per-replicate mean RMSF values:
from scipy import stats
import numpy as np
# Per-replicate mean RMSF (not per-frame!)
condition_A = [0.755, 0.693, 0.696] # No polymer
condition_B = [0.558, 0.738, 0.496] # With polymer
# Two-sample t-test
t_stat, p_value = stats.ttest_ind(condition_A, condition_B)
print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.4f}")
Effect Size
The p-value tells you if the difference is statistically significant, but not if it’s meaningful. Effect size quantifies the magnitude:
# Cohen's d
mean_A = np.mean(condition_A)
mean_B = np.mean(condition_B)
pooled_std = np.sqrt((np.var(condition_A, ddof=1) + np.var(condition_B, ddof=1)) / 2)
cohens_d = (mean_A - mean_B) / pooled_std
print(f"Cohen's d: {cohens_d:.2f}")
Cohen’s d |
Interpretation |
|---|---|
< 0.2 |
Negligible effect |
0.2 - 0.5 |
Small effect |
0.5 - 0.8 |
Medium effect |
> 0.8 |
Large effect |
Power Analysis
With small sample sizes, you can only detect large effects. The minimum detectable difference depends on:
Sample size per group (N)
Within-group variability (σ)
Desired significance level (α, typically 0.05)
Desired power (typically 0.80)
# Minimum detectable difference (approximate)
# For t-test with α=0.05, power=0.80
critical_t = {3: 2.78, 5: 2.31, 10: 2.10} # df = 2*(N-1)
se_diff = pooled_std * np.sqrt(2/N)
min_detectable = critical_t[N] * se_diff
print(f"Minimum detectable difference: {min_detectable:.3f} Å")
Worked Example
From a real analysis comparing polymer vs no-polymer:
RESULTS SUMMARY
===============
No Polymer With Polymer
Replicate 1 0.755 Å 0.558 Å
Replicate 2 0.693 Å 0.738 Å
Replicate 3 0.696 Å 0.496 Å
-----------------------------------------------
Mean 0.715 Å 0.597 Å
Std 0.035 Å 0.126 Å
STATISTICAL ANALYSIS
====================
Difference: 0.117 Å (16.4% reduction with polymer)
t-statistic: 1.56
p-value: 0.194
Cohen's d: 1.27 (large effect)
INTERPRETATION
==============
- Large effect size (d = 1.27) suggests a real difference
- p > 0.05 means we cannot reject null hypothesis
- High variability in polymer condition (replicate 2 is outlier)
- CONCLUSION: Suggestive but not conclusive; need more replicates
Non-Significant p-value with Large Effect Size
This situation (p > 0.05 but large Cohen’s d) is common with small samples. It means:
The effect might be real but you lack power to detect it
More replicates would likely achieve significance
You should not conclude “no effect” — only “insufficient evidence”
The appropriate response is to run additional replicates, not to abandon the hypothesis.
Interpreting RMSF Values
Typical Ranges
RMSF (Å) |
Protein Region |
Interpretation |
|---|---|---|
0.3 - 0.5 |
Core β-sheets, buried helices |
Very rigid |
0.5 - 1.0 |
Surface helices, structured loops |
Normal flexibility |
1.0 - 2.0 |
Exposed loops, active site flaps |
Flexible |
2.0 - 5.0 |
Termini, disordered regions |
Highly flexible |
> 5.0 |
Disordered tails, unfolded regions |
May indicate instability |
Factors Affecting RMSF
Temperature: Higher T → higher RMSF (more kinetic energy)
Solvent:
Water → normal flexibility
Organic co-solvents → may increase or decrease depending on interactions
Crowding agents → typically reduce RMSF
Simulation parameters:
Force field → affects flexibility
Timestep → too large can cause instability
Thermostat → Langevin adds friction, affects dynamics
System composition:
Polymers can restrict or enhance motion
Ligand binding often rigidifies active site
Oligomerization affects interface flexibility
Active Site Flexibility
For enzyme studies, active site RMSF is particularly relevant:
Lower RMSF may indicate:
Better substrate positioning
More stable catalytic geometry
Potentially higher activity (if not too rigid)
Higher RMSF may indicate:
Conformational sampling for substrate binding
Induced fit dynamics
Potentially reduced activity if catalytic residues misalign
The “optimal” flexibility depends on the enzyme mechanism and should be interpreted in context of experimental activity data.
External Reference for Catalytic Competence
When studying enzyme catalysis across multiple conditions (polymer baths,
temperatures, mutants), the standard RMSF modes (centroid, average) use
a condition-specific reference — each condition’s trajectory determines
its own reference structure. This makes direct comparison difficult: a low
RMSF might mean the enzyme is rigid in a non-functional conformation.
The external reference mode solves this by using a condition-independent
reference, typically a crystal structure that represents the catalytically
competent geometry. RMSF then measures deviation from the functional state:
Interpretation changes with external reference:
Metric |
Standard RMSF (centroid/average) |
External Reference RMSF |
|---|---|---|
Low value |
Residue is rigid |
Residue stays near crystal geometry |
High value |
Residue is flexible |
Residue deviates from functional state |
Condition comparison |
Which condition is more flexible? |
Which condition best maintains catalytic geometry? |
Practical guidance:
Focus on catalytic residues (triad, oxyanion hole) rather than overall protein RMSF — these are the residues whose geometry directly affects activity
Use residue range truncation (
resid 5:175) to exclude flexible termini that dominate statistics and obscure active-site signalsReport both overall protein and active-site RMSF separately — polymer effects may stabilize the active site without affecting overall flexibility
Tip
Which reference mode for enzymes? Use centroid or average to answer
“how flexible is the protein?” Use external to answer “how well does the
protein maintain its catalytically competent geometry?” These are complementary
questions — consider running both analyses.
See also
For detailed setup instructions (Python, YAML, CLI), see the External PDB section of the Reference Selection Guide.
Common Pitfalls
Pitfall 1: Using Unaligned Trajectories
Symptom: RMSF values > 10 Å, sometimes 50+ Å
Cause: Without alignment, RMSF includes global translation and rotation of the entire protein through the simulation box.
Solution: PolyzyMD automatically aligns trajectories. If you see very high values, check that:
Your selection string matches atoms in the system
The reference structure is valid
Trajectory files are complete
Pitfall 2: Insufficient Equilibration
Symptom: RMSF values drift over time; different equilibration times give different results
Cause: Including non-equilibrated frames in the analysis
Solution: Use --eq-time to skip the equilibration period:
# Skip first 10% of a 200 ns simulation
polyzymd analyze rmsf -c config.yaml -r 1 --eq-time 20ns
Monitor RMSD vs time — equilibration is complete when RMSD plateaus.
Pitfall 3: Over-Interpreting Small Differences
Symptom: Claiming significance for 0.05 Å differences
Cause: Not accounting for uncertainty
Solution: Always report uncertainty and perform statistical tests:
# WRONG: "Condition A (0.715 Å) is more stable than B (0.720 Å)"
# RIGHT: "Condition A (0.715 ± 0.020 Å) and B (0.720 ± 0.025 Å)
# are not significantly different (p = 0.89)"
Pitfall 4: Ignoring Replicate Variation
Symptom: Reporting within-trajectory SEM as the uncertainty
Cause: Treating correlated frames as independent
Solution: Use replicate-based statistics:
# WRONG: SEM from 2000 correlated frames
sem_wrong = np.std(all_frames) / np.sqrt(2000) # Way too small
# RIGHT: SEM from 3 independent replicates
replicate_means = [0.755, 0.693, 0.696]
sem_right = np.std(replicate_means, ddof=1) / np.sqrt(3) # Correct
Pitfall 5: Cherry-Picking Replicates
Symptom: Excluding “outlier” replicates without justification
Cause: Desire for cleaner results
Solution: Include all replicates unless there’s a technical reason to exclude (e.g., simulation crashed, obvious system error). Biological variability is real and should be reported.
If one replicate differs substantially, investigate why:
Did a conformational change occur?
Did the ligand unbind?
Is there a slow process being sampled?
Advanced: Python API
Accessing Autocorrelation Results
from polyzymd.analysis import RMSFCalculator
import json
# Run analysis
calc = RMSFCalculator(config_path="config.yaml", equilibration="100ns")
result = calc.compute(replicate=1)
# Access autocorrelation data
print(f"Correlation time: {result.correlation_time:.1f} ps")
print(f"N independent: {result.n_independent_frames}")
# Load full result for detailed inspection
with open(result.source_file) as f:
data = json.load(f)
# Per-residue data
for resid, rmsf in zip(data['residue_ids'], data['rmsf_per_residue']):
print(f"Residue {resid}: {rmsf:.3f} Å")
Custom Statistical Analysis
import numpy as np
from scipy import stats
# Load multiple conditions
conditions = {
'no_polymer': [0.755, 0.693, 0.696],
'polymer_A': [0.558, 0.738, 0.496],
'polymer_B': [0.612, 0.598, 0.621],
}
# Pairwise comparisons with Bonferroni correction
n_comparisons = 3
alpha = 0.05 / n_comparisons
for name_a, values_a in conditions.items():
for name_b, values_b in conditions.items():
if name_a >= name_b:
continue
t, p = stats.ttest_ind(values_a, values_b)
sig = "**" if p < alpha else ""
print(f"{name_a} vs {name_b}: p = {p:.4f} {sig}")
Exporting for External Analysis
import pandas as pd
import json
# Load aggregated result
with open("analysis/rmsf/aggregated/rmsf_reps1-3_eq10ns.json") as f:
data = json.load(f)
# Create DataFrame for export
df = pd.DataFrame({
'residue_id': data['residue_ids'],
'residue_name': data['residue_names'],
'mean_rmsf': data['mean_rmsf_per_residue'],
'sem_rmsf': data['sem_rmsf_per_residue'],
})
# Export to CSV
df.to_csv("rmsf_results.csv", index=False)
# Export to Excel with multiple sheets
with pd.ExcelWriter("rmsf_analysis.xlsx") as writer:
df.to_excel(writer, sheet_name="Per-Residue", index=False)
summary = pd.DataFrame({
'Metric': ['Mean RMSF', 'SEM', 'Min', 'Max', 'N_replicates'],
'Value': [
data['overall_mean_rmsf'],
data['overall_sem_rmsf'],
data['overall_min_rmsf'],
data['overall_max_rmsf'],
data['n_replicates'],
]
})
summary.to_excel(writer, sheet_name="Summary", index=False)
References
Primary Reference
Grossfield A, Patrone PN, Roe DR, Schultz AJ, Siderius DW, Zuckerman DM. (2018) “Best Practices for Quantification of Uncertainty and Sampling Quality in Molecular Simulations.” Living Journal of Computational Molecular Science 1(1):5067. https://doi.org/10.33011/livecoms.1.1.5067
This is the definitive guide to uncertainty quantification in MD. The methodology in PolyzyMD follows their recommendations.
Additional References
Flyvbjerg H, Petersen HG. (1989) “Error estimates on averages of correlated data.” Journal of Chemical Physics 91:461-466. https://doi.org/10.1063/1.457480
Classic paper on block averaging for correlated data.
Chodera JD, Swope WC, Pitera JW, Seok C, Dill KA. (2007) “Use of the Weighted Histogram Analysis Method for the Analysis of Simulated and Parallel Tempering Simulations.” Journal of Chemical Theory and Computation 3:26-41. https://doi.org/10.1021/ct0502864
Introduces statistical inefficiency for MD analysis.
Knapp B, Frantal S, Greshake B, Schwarz R, et al. (2018) “Is an Intuitive Convergence Definition of Molecular Dynamics Simulations Solely Based on the Root Mean Square Deviation Possible?” Journal of Computational Biology 25:1069-1077.
Discussion of RMSD-based convergence assessment.
See Also
Quick Start Guide — Get results fast
Implementation Verification — 3-way benchmark against MDAnalysis and GROMACS
Statistics Best Practices — Foundational statistics for MD
Reference Structure Selection — Choose alignment reference
LiveCoMS Best Practices — Full methodology paper