# RMSF Plugin Reference

For a step-by-step guide to running RMSF analysis, see
{doc}`../how_to/analysis_rmsf_quickstart`.

## Configuration Reference

RMSF settings live under `plugins.rmsf` in `comparison.yaml`.

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled` | `bool` | `true` | Enable or disable RMSF analysis |
| `selection` | `str` | `"protein and name CA"` | MDAnalysis selection used for RMSF calculation |
| `reference_mode` | `str` | `"centroid"` | Alignment reference mode: `centroid`, `average`, `frame`, `external` |
| `reference_frame` | `int \| null` | `null` | 1-indexed frame when `reference_mode: frame` |
| `reference_file` | `str \| null` | `null` | Path to external PDB when `reference_mode: external` |
| `alignment_selection` | `str` | `"protein and name CA"` | Selection used for trajectory alignment |
| `centroid_selection` | `str` | `"protein"` | Selection used to find centroid reference frame |

```{note}
Validation rules:

- `reference_mode: frame` requires `reference_frame`
- `reference_mode: external` requires `reference_file`
- `reference_file` must point to an existing PDB file
```

### Minimal plugin block

```yaml
plugins:
  rmsf:
    enabled: true
    selection: "protein and name CA"
    reference_mode: "centroid"
```

### External reference example

```yaml
plugins:
  rmsf:
    selection: "protein and name CA"
    reference_mode: "external"
    reference_file: "/path/to/crystal_structure.pdb"
```

## Output Files

RMSF writes per-replicate and aggregated JSON files under each condition's
analysis directory:

```text
<projects_directory>/
└── analysis/
    └── rmsf/
        ├── run_1/
        │   └── rmsf_eq10ns.json
        ├── run_2/
        │   └── rmsf_eq10ns.json
        ├── run_3/
        │   └── rmsf_eq10ns.json
        └── aggregated/
            └── rmsf_reps1-3_eq10ns.json
```

Comparison-level output is written separately in the comparison workspace:

```text
<comparison_workspace>/
└── comparison/
    └── rmsf/
        └── result.json
```

### Per-replicate JSON (`RMSFResult`)

```python
{
    "config_hash": "abc123...",
    "replicate": 1,
    "equilibration_time": 10.0,
    "equilibration_unit": "ns",
    "selection_string": "protein and name CA",
    "correlation_time": 15394.5,
    "correlation_time_unit": "ps",
    "n_independent_frames": 6,
    "residue_ids": [1, 2, 3],
    "residue_names": ["MET", "ALA", "SER"],
    "rmsf_values": [0.45, 0.52, 0.49],
    "mean_rmsf": 0.621,
    "std_rmsf": 0.215,
    "min_rmsf": 0.248,
    "max_rmsf": 3.160,
    "reference_mode": "centroid",
    "reference_frame": 401,
    "alignment_selection": "protein and name CA",
    "reference_file": null,
    "n_frames_total": 10000,
    "n_frames_used": 9000,
    "trajectory_files": [".../prod_1.xtc"]
}
```

### Aggregated JSON (`RMSFAggregatedResult`)

```python
{
    "replicates": [1, 2, 3],
    "n_replicates": 3,
    "residue_ids": [1, 2, 3],
    "residue_names": ["MET", "ALA", "SER"],
    "mean_rmsf_per_residue": [0.46, 0.50, 0.47],
    "sem_rmsf_per_residue": [0.02, 0.03, 0.02],
    "per_replicate_mean_rmsf": [0.64, 0.59, 0.63],
    "overall_mean_rmsf": 0.62,
    "overall_sem_rmsf": 0.02,
    "overall_min_rmsf": 0.30,
    "overall_max_rmsf": 4.21
}
```

### Comparison JSON (`result.json`)

```python
{
    "metric": "rmsf",
    "conditions": [
        {
            "label": "No Polymer",
            "n_replicates": 3,
            "mean_rmsf": 0.715,
            "sem_rmsf": 0.020,
            "replicate_values": [0.755, 0.693, 0.696]
        },
        {
            "label": "With Polymer",
            "n_replicates": 3,
            "mean_rmsf": 0.551,
            "sem_rmsf": 0.034,
            "replicate_values": [0.590, 0.520, 0.542]
        }
    ],
    "pairwise_comparisons": [
        {
            "condition_a": "No Polymer",
            "condition_b": "With Polymer",
            "percent_change": -22.9,
            "p_value": 0.0211,
            "cohens_d": 4.06,
            "significant": true,
            "direction": "stabilizing"
        }
    ],
    "ranking": ["With Polymer", "No Polymer"]
}
```

## Plot Types

RMSF plots are generated by `polyzymd compare plot-all -f comparison.yaml`.

| Plot output | Description |
|-------------|-------------|
| `rmsf_profile.png` | Per-residue RMSF profile by condition; optional SEM shading |
| `rmsf_comparison.png` | Horizontal bar chart of condition-level mean RMSF with SEM |

The profile plot can include a reference secondary-structure annotation row
when `reference_file` is set and readable.

### RMSF plot settings

```yaml
plot_settings:
  rmsf:
    show_error: true                # Show SEM band/bars
    highlight_residues: [77, 133]   # Vertical guide lines in profile plot
    figsize_profile: [14, 4]        # Profile figure size
    figsize_comparison: [8, 6]      # Comparison figure size
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `show_error` | `bool` | `true` | Show SEM shading/bars |
| `highlight_residues` | `list[int]` | `[]` | Residue numbers to mark on profile plot |
| `figsize_profile` | `tuple[float, float]` | `[14, 4]` | Profile figure size |
| `figsize_comparison` | `tuple[float, float]` | `[8, 6]` | Comparison figure size |

## Common CLI Options

| Option | Default | Description |
|--------|---------|-------------|
| `-f, --file` | `comparison.yaml` | Path to comparison configuration |
| `--eq-time` | `0ns` | Equilibration time to skip |
| `--recompute` | off | Ignore cached results and recompute |
| `--format` | `table` | Output format (`table` or `json`) |
| `-o, --output` | (none) | Save formatted output to file |
| `-q, --quiet` | off | Suppress INFO messages |
| `--debug` | off | Enable DEBUG logging |

## Troubleshooting

### "Selection matched no atoms"

**Cause:** The MDAnalysis selection does not match atoms in the topology.

**Fix:**

- Check residue numbering and atom names in your input structure
- Start with `selection: "protein and name CA"`
- Re-run with `--debug` for detailed selection diagnostics

### "reference_file does not exist"

**Cause:** `reference_mode: external` is set, but the path is invalid.

**Fix:** Use an absolute path or a path relative to your working directory.

### "External PDB atom count does not match trajectory selection"

**Cause:** The `selection` string resolves to different atom counts in
trajectory and external reference.

**Fix:**

- Ensure both structures use compatible atom naming
- Use a stricter selection that matches in both systems
- Confirm the external PDB contains the same residue set

### Very high RMSF values (> 10 Å)

**Cause:** Usually alignment mismatch, overly broad selection, or genuine
structural instability.

**Fix:**

- Verify `alignment_selection` and `selection`
- Try `reference_mode: "average"` as a cross-check
- Confirm trajectory files are complete

### "Low statistical reliability" warning

**Cause:** Correlation time is large relative to available production data.

**Fix:**

- Use more replicates
- Extend simulation length
- Treat the result as qualitative if uncertainty is large

For interpretation guidance, see
{doc}`../explanation/analysis_rmsf_best_practices`.

### Missing replicate data

**Message:** `Skipping replicate N: trajectory data not found`

**Cause:** Replicate output is missing or path configuration is incorrect.

**Fix:** Analysis continues with available replicates. Verify simulation
completion and file paths if missing replicates are unexpected.