RMSF Implementation Verification
PolyzyMD uses a custom NumPy-based RMSF implementation rather than
MDAnalysis’s built-in RMSF analysis class. This page documents the
3-way benchmark that validates the custom implementation against two
industry-standard tools.
Why a Custom Implementation?
MDAnalysis provides a built-in
RMSF
class, but it cannot support two features PolyzyMD requires:
Autocorrelation-based frame subsampling. PolyzyMD identifies statistically independent frames via autocorrelation analysis (following Grossfield et al., 2018), producing arbitrary non-uniform frame indices. MDAnalysis’s
RMSFclass only accepts uniformstart/stop/stepslicing.External reference positions. In
externalreference mode, RMSF is computed as deviations from a crystal structure’s atomic positions rather than the trajectory’s own average. MDAnalysis’sRMSFclass always uses the trajectory average internally.
The core computation is mathematically identical across all implementations:
Source Code
The RMSF and RMSD calculations now live in the RMSF plugin class:
RMSF + RMSD workflow:
RMSFAnalysis.compute_replicate()insrc/polyzymd/analyses/rmsf/__init__.py. The method performs the same two-pass RMSF logic and internal RMSD timeseries calculation for autocorrelation analysis.
The implementation is plain NumPy with no MDAnalysis analysis class dependencies.
Benchmark Methodology
A 3-way benchmark was performed comparing:
Method |
Implementation |
Notes |
|---|---|---|
PolyzyMD |
Custom NumPy (two-pass) |
The code under test |
MDAnalysis |
|
Industry-standard Python MD library |
GROMACS |
|
Industry-standard C simulation engine |
Test System
Protein: Lipase A (LipA) from Bacillus subtilis, 181 residues
Trajectory: 2,500 frames from a 1,000 ns NPT production run
Selection: 171 C-alpha atoms (residues 5–175, excluding flexible termini)
Topology:
solvated_system.pdb(11,636 atoms)
Modes Tested
Three RMSF reference modes were benchmarked:
Mode |
Alignment Reference |
RMSF Reference |
Methods |
|---|---|---|---|
Average |
Trajectory average structure |
Trajectory average positions |
3-way |
Centroid |
K-Means centroid frame (k=1) |
Trajectory average positions |
3-way |
External |
External crystal PDB |
External PDB positions |
2-way |
GROMACS is excluded from external mode because gmx rmsf always
computes RMSF relative to the trajectory average — the -s file is
used only for fitting, not as the RMSF reference. There is no option
to override this behavior.
Procedure
For each mode:
Load the trajectory and align all frames to the reference using MDAnalysis
AlignTraj(in-memory).Compute per-residue C-alpha RMSF using each method on the same aligned coordinates.
For GROMACS: export the aligned trajectory to GRO/XTC, run
gmx rmsf -nofit -res, convert output from nm to Angstroms.Compare per-residue values: Pearson correlation, mean absolute difference, maximum absolute difference.
Results
Average Mode (3-way)
Comparison |
Pearson r |
Mean |delta| (Angstrom) |
Max |delta| (Angstrom) |
|---|---|---|---|
PolyzyMD vs MDAnalysis |
1.00000000 |
0.000000 |
0.000000 |
PolyzyMD vs GROMACS |
0.99999998 |
0.000250 |
0.000579 |
MDAnalysis vs GROMACS |
0.99999998 |
0.000250 |
0.000579 |
Centroid Mode (3-way)
Comparison |
Pearson r |
Mean |delta| (Angstrom) |
Max |delta| (Angstrom) |
|---|---|---|---|
PolyzyMD vs MDAnalysis |
1.00000000 |
0.000000 |
0.000000 |
PolyzyMD vs GROMACS |
0.99999998 |
0.000249 |
0.000616 |
MDAnalysis vs GROMACS |
0.99999998 |
0.000249 |
0.000616 |
External Mode (2-way)
Comparison |
Pearson r |
Mean |delta| (Angstrom) |
Max |delta| (Angstrom) |
|---|---|---|---|
PolyzyMD vs MDAnalysis |
1.00000000 |
0.000000 |
0.000000 |
Interpretation
PolyzyMD and MDAnalysis produce bit-identical results in all three modes. This is expected — both run in the same Python process on the same NumPy arrays, and the RMSF formula is deterministic.
GROMACS differs by < 0.0007 Angstrom (max absolute deviation across all residues and modes). This small discrepancy is fully explained by the GRO coordinate format, which truncates positions to 3 decimal places in nanometers (0.001 nm = 0.01 Angstrom precision). The residual pattern is consistent with rounding noise, not a systematic bias.
Conclusion: PolyzyMD’s custom RMSF implementation is verified correct against both MDAnalysis (bit-identical) and GROMACS (limited only by coordinate format precision). The custom implementation is functionally equivalent to industry-standard tools while supporting the additional features (non-uniform frame subsampling, external reference positions) that PolyzyMD requires.
Reproducing the Benchmark
The benchmark script is located at scripts/benchmark_rmsf_3way.py in the
repository root. To reproduce:
pixi run -e build python scripts/benchmark_rmsf_3way.py
Requirements:
PolyzyMD
buildpixi environment (MDAnalysis, NumPy, scikit-learn, matplotlib)GROMACS >= 2020 (
gmx rmsf) — configure theGMXvariable at the top of the scriptA simulation directory with topology (PDB) and trajectory (DCD) files
The script outputs per-mode CSV data and a publication-quality benchmark
figure to scripts/benchmark_rmsf_output/.
See Also
RMSF Quick Start — Get RMSF results in 5 minutes
RMSF Best Practices — Statistical rigor, interpretation, pitfalls
Reference Structure Selection — Average, centroid, and external reference modes