Rg Plugin Reference

For a step-by-step guide to running Rg analysis, see Rg Analysis: Quick Start.

Configuration Reference

All fields for RgRunSettings:

Field	Type	Default	Description
`label`	`str`	required	Human-readable run label (must be unique)
`selection`	`str`	required	MDAnalysis selection for Rg calculation
`calculation_mode`	`str`	`"selection"`	`"selection"` for whole-group Rg, `"fragments"` for per-fragment reduction
`fragment_weighting`	`str`	`"equal"`	`"equal"` (arithmetic mean) or `"mass"` (mass-weighted mean). Only valid when `calculation_mode="fragments"`
`save_fragment_distribution`	`bool`	`true`	Save per-fragment Rg values in NPZ sidecar for distribution analysis
`histogram_bins`	`int`	`50`	Number of bins for fragment/reduced distribution histograms (minimum 2)

Top-level RgSettings contains a single field:

Field	Type	Default	Description
`runs`	`list[RgRunSettings]`	required	One or more named Rg runs (at least one required)

Note

Run labels must be unique within a single comparison.yaml. Duplicate labels raise a validation error.

Warning

Unlike RMSD (which defaults selection to "protein and name CA"), Rg has no default selection. You must always specify selection explicitly for each run.

Fragment Mode Reference

Added in version 1.3.0: Fragment-aware Rg calculation was added in PolyzyMD 1.3.0.

Standard selection mode computes one Rg value per frame for the full atom group matched by selection. Fragment mode first computes Rg for each disconnected topological fragment, then reduces those per-fragment values to one per-frame value.

Use fragment mode when your selection includes many independent molecules and you care about the average fragment size rather than the size of the entire multi-molecule cloud.

Fragment mode configuration

plugins:
  rg:
    runs:
      - label: protein_rg
        selection: protein

      - label: polymer_blob_rg
        selection: "resname SBM or resname EGM or resname EGP"
        calculation_mode: fragments
        fragment_weighting: equal

Setting	Meaning
`calculation_mode: "selection"`	Whole-group Rg
`calculation_mode: "fragments"`	Per-fragment Rg with reduction
`fragment_weighting: "equal"`	Arithmetic mean over fragments
`fragment_weighting: "mass"`	Mass-weighted mean over fragments

How fragment mode reduction works

Identify disconnected fragments within the selected atom group
Compute fragment-level Rg for each frame
Reduce fragment values to one per-frame value using equal or mass weighting
Use the reduced timeseries for summary statistics and comparisons
Optionally save pooled fragment values in NPZ sidecars for distribution plots

Why Rg Has No Alignment or Reference Fields

Rg is based on mass-weighted distances from the center of mass, so it is translation and rotation invariant.

Because of that, Rg runs do not use RMSD-style fields such as alignment_selection, reference_mode, reference_file, or reference_frame.

Output Files

Results are saved as canonical v1.3 artifacts. JSON files use framework-owned artifact envelopes, and per-frame or distribution arrays are stored as NPZ sidecars.

<comparison_workspace>/
├── analysis/
│   └── <condition>/
│       └── rg/
│           ├── run_1/
│           │   ├── result.json
│           │   └── sidecars/
│           │       ├── rg_protein_rg_timeseries.npz
│           │       └── rg_polymer_blob_rg_timeseries.npz
│           ├── run_2/
│           │   └── ...
│           ├── run_3/
│           │   └── ...
│           └── aggregated/
│               ├── result.json
│               └── sidecars/
│                   └── rg_polymer_blob_rg_distribution.npz
└── comparison/
    └── rg/
        └── result.json

The canonical paths are:

Level	Artifact	Path
Per replicate	`ReplicateArtifact`	`analysis/<condition>/rg/run_<replicate>/result.json`
Per condition	`ConditionArtifact`	`analysis/<condition>/rg/aggregated/result.json`
Cross condition	Comparison result	`comparison/rg/result.json`
Large arrays	NPZ sidecars	`analysis/<condition>/rg/*/sidecars/.npz`

Each replicate artifact contains JSON summaries for all configured runs. NPZ sidecars store per-frame Rg timeseries and optional fragment distributions.

Artifact envelope fields

Field	Description
`payload`	Rg run summaries, scalar metrics, fragment statistics, and relative sidecar paths
`metadata`	Run settings, calculation modes, equilibration labels, and units
`provenance`	Input topology/trajectory identity and workflow details
`sidecars`	Validated references to `sidecars/*.npz` arrays with hashes and sizes

Use ArtifactStore for programmatic access:

from pathlib import Path

from polyzymd.analyses.mda import ArtifactStore

replicate = ArtifactStore(Path("analysis/PEGylated/rg/run_1")).read_replicate_result()
condition = ArtifactStore(Path("analysis/PEGylated/rg/aggregated")).read_condition_result()
print(replicate.payload["runs"][0]["mean_rg"])
print(condition.payload["runs"][0]["metrics"]["mean_rg"])

NPZ sidecar arrays

Each rg_<label>_timeseries.npz may include:

Array	Mode	Description
`rg_values`	always	Per-frame reduced Rg timeseries (Å)
`time_ns`	always	Time axis in ns
`frames`	always	0-indexed frame indices
`fragment_rg_values`	fragments only	Pooled fragment-level Rg values across all frames
`fragment_counts_per_frame`	fragments only	Number of fragments detected per frame
`fragment_masses`	fragments + mass weighting	Fragment masses used for weighted reduction

JSON result structures

Per-replicate result (ReplicateArtifact), representative structure:

{
    "schema_version": "1",
    "artifact_type": "replicate",
    "analysis_name": "rg",
    "condition_label": "PEGylated",
    "replicate": 1,
    "payload": {
        "runs": [
            {
                "run_label": "protein_rg",
                "selection": "protein",
                "calculation_mode": "selection",
                "mean_rg": 18.234,
                "sem_rg": 0.098,
                "timeseries_sidecar": "sidecars/rg_protein_rg_timeseries.npz"
            },
            {
                "run_label": "polymer_blob_rg",
                "selection": "resname SBM or resname EGM or resname EGP",
                "calculation_mode": "fragments",
                "fragment_weighting": "equal",
                "mean_rg": 8.412,
                "sem_rg": 0.054,
                "mean_fragments_per_frame": 50.0,
                "timeseries_sidecar": "sidecars/rg_polymer_blob_rg_timeseries.npz"
            }
        ]
    },
    "metadata": {"equilibration": "10ns", "time_unit": "ns"},
    "provenance": {"trajectory_files": ["..."], "n_frames_used": 9000},
    "sidecars": [
        {"path": "sidecars/rg_protein_rg_timeseries.npz", "metadata": {"kind": "timeseries"}},
        {"path": "sidecars/rg_polymer_blob_rg_timeseries.npz", "metadata": {"kind": "timeseries"}}
    ]
}

Aggregated result (ConditionArtifact), representative structure:

{
    "schema_version": "1",
    "artifact_type": "condition",
    "analysis_name": "rg",
    "condition_label": "PEGylated",
    "replicates": [1, 2, 3],
    "payload": {
        "runs": [
            {
                "run_label": "protein_rg",
                "calculation_mode": "selection",
                "metrics": {
                    "mean_rg": {"values": [18.234, 18.291, 18.244], "mean": 18.256, "sem": 0.044}
                },
                "distribution_sidecar": "sidecars/rg_protein_rg_distribution.npz"
            },
            {
                "run_label": "polymer_blob_rg",
                "calculation_mode": "fragments",
                "metrics": {
                    "mean_rg": {"values": [8.412, 8.445, 8.439], "mean": 8.432, "sem": 0.021}
                },
                "fragment_summary": {"overall_mean_fragments_per_frame": 50.0},
                "distribution_sidecar": "sidecars/rg_polymer_blob_rg_distribution.npz"
            }
        ]
    },
    "metadata": {"equilibration": "10ns"},
    "provenance": {"source_replicates": [1, 2, 3]},
    "sidecars": [
        {"path": "sidecars/rg_polymer_blob_rg_distribution.npz", "metadata": {"kind": "distribution"}}
    ]
}

Plot Types

The Rg plugin generates figures via polyzymd compare plot-all.

Plot output	Description
`rg_timeseries_<run>.png`	Mean Rg vs time with SEM shading, one figure per run
`rg_comparison_<run>.png`	Grouped bar chart of mean Rg across conditions, one figure per run
`rg_distribution_<run>.png`	Distribution view. Selection mode: reduced distribution panel only. Fragment mode: reduced + pooled fragment distributions

Distribution plots are generated for runs that include histogram data in aggregated results.

Plot settings in comparison.yaml:

plot_settings:
  rg:
    show_per_replicate: false    # Overlay individual replicate traces
    figsize: [10, 6]             # Default figure size (bar charts)
    timeseries_figsize: [12, 5]  # Timeseries figure size

Common CLI Options

Option	Default	Description
`-f, --file`	`comparison.yaml`	Path to comparison configuration
`--eq-time`	`0ns`	Equilibration time to skip
`--recompute`	off	Ignore cached results and recompute
`--format`	`table`	Output format (`table` or `json`)
`-o, --output`	(none)	Save formatted output to file
`-q, --quiet`	off	Suppress INFO messages
`--debug`	off	Enable DEBUG logging

Replicates are configured per condition in comparison.yaml:

conditions:
  - label: "no_polymer"
    config: "configs/no_polymer.yaml"
    replicates: [1, 3, 5]

Troubleshooting

“Selection matched no atoms”

When a run selection matches zero atoms, that run is skipped with a warning. Analysis continues for runs and conditions with valid data.

If unexpected:

Confirm residue numbering and atom naming in your topology
Check selection syntax directly against your system
Re-run with --debug for detailed diagnostics

“At least one Rg run must be defined”

Cause: plugins.rg.runs is missing or empty.

Fix: add at least one run with label and selection.

“Rg run labels must be unique”

Cause: duplicate run labels.

Fix: assign a unique label to each run.

“Equilibration removed all frames”

Cause: --eq-time exceeds trajectory duration.

Fix: lower equilibration time or verify simulation completion.

Very large Rg fluctuations (> 5 Å std)

Cause: often unfolding, large flexibility, or an overly broad selection.

Fix:

Validate the selection (whole protein vs backbone vs core)
Check trajectory integrity
Inspect timeseries for transitions or discontinuities

“Low statistical reliability” warning

Cause: correlation time is long relative to trajectory length.

This is informational. Consider longer simulations or more replicates.

Missing replicate data

Message: Skipping replicate N: trajectory data not found

Cause: trajectory files are missing or replicate is incomplete.

Fix: verify simulation output paths and replicate status.

Control condition missing for a run in fragment-mode workflows

Message: control has no data for a run and comparison falls back to all-vs-all pairwise testing for that run.

Cause: control selection matched no atoms for that run.

Behavior: expected in mixed run sets (for example, polymer-only runs with a no-polymer control).

Rg vs Other Metrics

Rg vs RMSD

Feature	Rg	RMSD
Measures	Structural compactness (mass-weighted size)	Deviation from a reference structure
Output	One value per frame (timeseries)	One value per frame (timeseries)
Reference	None required	Required (`centroid`, `average`, `frame`, or `external`)
Alignment	Not required	Required
Configuration	`label` + `selection`	`label` + `selection` + alignment + reference
Direction labels	`compaction` / `expansion` / `unchanged`	`stabilizing` / `destabilizing` / `unchanged`
Best for	Compaction, swelling, folding state shifts	Drift and reference-relative stability

Rg vs RMSF

Feature	Rg	RMSF
Measures	Global compactness over time	Per-residue positional fluctuation
Primary output	Timeseries	Residue profile
Best question	Is the structure compacting or expanding?	Which regions are flexible or rigid?