Extending the Analysis Framework

This guide shows you how to add a new analysis type to PolyzyMD. By the end, you will have a working analysis plugin that can compute metrics per replicate, aggregate across replicates, compare conditions, generate figures, and display results on the CLI.

Note

This guide covers the plugin system in analyses/. Each plugin is a self-contained package — you implement the science (compute, aggregate) and the framework handles discovery, iteration, result saving, comparison statistics, and CLI wiring.

Prerequisites

  • A working pixi environment: pixi install -e build

  • Basic familiarity with Pydantic v2 BaseModel

  • Understanding of the PolyzyMD chain convention (A=protein, B=substrate, C=polymer)

How the Plugin System Works

Every analysis in PolyzyMD is a single class that subclasses Analysis and lives in a package under src/polyzymd/analyses/<name>/. The framework discovers plugins automatically — no registry edits, no imports, no boilerplate. You drop a package, and the CLI can run it.

The lifecycle for each analysis is:

For each condition:
  For each replicate:
    result = compute_replicate(ctx, replicate)       # your code
  aggregated = aggregate(ctx, [replicate_results])   # your code

filter_conditions(conditions)  →  filtered list
compare(ctx)                   →  ComparisonResult
plot(ctx)                      →  [figure paths]
format(result, "text")         →  CLI output string

You implement the science (compute, aggregate). The framework provides discovery, replicate iteration, result saving, comparison statistics, and CLI wiring.

Anatomy of a Plugin

Before seeing the full code, here is what each piece does and why it exists.

name (required)

A unique, lowercase string identifier. Used in CLI commands (polyzymd compare run solvent_contacts), config files, and output directories.

Settings (required)

A Pydantic v2 BaseModel inner class. Users configure your analysis through the plugins: section of comparison.yaml, and the framework deserializes those settings into your Settings model. Provide sensible defaults for every field so the analysis works out of the box.

compute_replicate(ctx, replicate) (required)

Runs once per replicate per condition. The framework passes a ReplicateContext containing everything you need: the loaded SimulationConfig, your Settings, output paths, and equilibration time. Use TrajectoryLoader from analyses/shared/ to load trajectories — it handles topology discovery, trajectory segment daisy-chaining, timestep detection, and equilibration skipping.

Return a dict or Pydantic model. The framework collects these for aggregate().

aggregate(ctx, results) (required)

Runs once per condition after all replicates complete. Receives the list of objects your compute_replicate() returned. Compute summary statistics (mean, SEM) and return the aggregated result. The framework auto-saves this to ctx.result_path.

compare(ctx) (optional override)

Override only for complex multi-metric or entry-table comparisons. If you implement extract_metrics(), you typically do not need to override compare().

plot(ctx) (optional)

Generate matplotlib figures. Use self._build_plot_data(ctx) to collect per-condition paths, then self._load_aggregated_result(agg_dir) to load each condition’s data. Use shared plotting helpers (get_theme, apply_axis_style, save_figure, get_colors) for consistent theming. Return a list of figure paths. The scaffold generates a working plot() method automatically.

format(result, output_format) (optional)

Custom CLI display. Use format_scalar_comparison() from analyses.stats for formatted tables with rankings, effect sizes, and significance stars.

filter_conditions(conditions, settings) (optional)

Exclude conditions where the analysis cannot run at all. The framework calls this before dispatching any compute_replicate() or aggregate() jobs, so excluded conditions have zero work done.

When to use it: Your analysis requires a molecular component that may not be present in all conditions. The canonical case is polymer-specific analyses — if the comparison includes a “No Polymer (Control)” condition, analyses like contacts, exposure, polymer_affinity, and polymer_bridging must skip it because there are no polymer atoms to analyse.

Canonical pattern (check sim_config.polymers, fail-open on errors):

from polyzymd.analyses.base import Condition


def filter_conditions(
    self,
    conditions: list[Condition],
    settings: "BaseModel | None" = None,
) -> list[Condition]:
    """Exclude conditions without polymer."""
    valid: list[Condition] = []
    for cond in conditions:
        try:
            sim_config = cond.sim_config
            polymers = getattr(sim_config, "polymers", None)
            has_polymer = polymers is not None and getattr(polymers, "enabled", True)
            if not has_polymer:
                logger.info("Excluding '%s': no polymer configured", cond.label)
                continue
            valid.append(cond)
        except (AttributeError, ValueError, KeyError, OSError) as e:
            logger.warning("Error checking '%s': %s — including anyway", cond.label, e)
            valid.append(cond)  # fail-open
    return valid

Key rules:

  • Always fail-open on unexpected errors — include the condition rather than crashing the entire comparison.

  • Log excluded conditions at INFO level so users can see what was skipped.

  • The settings parameter carries the resolved plugin Settings instance. Use it if your exclusion logic depends on user-configurable selections (e.g., a custom polymer selection string).

Existing implementations: contacts, exposure, polymer_affinity, and polymer_bridging all implement filter_conditions(). binding_free_energy has a no-op override (intentionally keeps all conditions).

Tip

If only some sub-computations within your analysis are inapplicable — not the entire analysis — use graceful empty-selection handling inside compute_replicate() instead. See the next section.

Handling partially-applicable conditions in compute_replicate()

Sometimes an analysis is valid for a condition overall, but specific sub-computations within it are not. In that case, don’t exclude the entire condition — instead, handle empty AtomGroup selections gracefully inside compute_replicate().

Example: The hydrogen_bonds plugin computes H-bonds between multiple group pairs (protein–protein, protein–polymer, protein–substrate). For a “No Polymer” control, protein–protein H-bonds are still meaningful, but protein–polymer H-bonds are not because there are no polymer atoms. Rather than excluding the condition entirely (which would lose the protein–protein data), the plugin:

  1. Detects that the “polymer” group selection matched zero atoms.

  2. Logs a warning: "Group 'polymer' selection 'chainid C' matched no atoms will skip summaries using it".

  3. Produces zero-filled placeholder results for summaries involving that empty group (e.g., 0 H-bonds, 0.0 occupancy). This is scientifically correct — zero protein–polymer H-bonds is the ground truth when no polymer is present.

  4. Computes normally for summaries that don’t involve the empty group (protein–protein H-bonds still run).

# Pattern: graceful empty-selection handling (from hydrogen_bonds)
for group_name, selection_str in settings.groups.items():
    atom_group = u.select_atoms(selection_str)
    if len(atom_group) == 0:
        if settings.allow_empty_groups:
            logger.warning(
                "Group '%s' selection '%s' matched no atoms — will skip summaries using it",
                group_name,
                selection_str,
            )
        else:
            raise ValueError(
                f"Group '{group_name}' selection '{selection_str}' matched no atoms."
            )

When to use which pattern:

Situation

Pattern

Example

Entire analysis is meaningless for a condition

filter_conditions()

Polymer–protein contacts with no polymer

Some sub-computations are invalid, others are valid

Graceful empty-selection handling

H-bonds: protein–protein valid, protein–polymer not

Analysis works for all conditions

Neither (use defaults)

RMSF, RMSD, secondary structure

The Complete Example

Here is a single, complete plugin file that uses real framework utilities. This is the pattern you should follow for new analyses.

Important

The SolventContactsAnalysis implementation below is a walkthrough example for a hypothetical solvent_contacts plugin name that does not collide with built-in plugins.

Create src/polyzymd/analyses/solvent_contacts/ as a package in your own branch or downstream project, or use the scaffold command to generate the boilerplate automatically (the scaffold includes compute, aggregate, comparison, plotting, and tests):

# Default: dict-based results (simplest starting point)
polyzymd new-analysis solvent_contacts

# Typed Pydantic result models (better for complex analyses)
polyzymd new-analysis solvent_contacts --style pydantic

The --style flag controls how results are structured:

  • dict (default) — Uses plain dicts for replicate/aggregated results. No result model classes needed. Best for getting started quickly.

  • pydantic — Generates typed ReplicateResult and AggregatedResult Pydantic models with AggregatedResultClass. Better when you need validation, IDE autocomplete, or NPZ sidecar storage.

If creating manually:

"""Solvent contacts analysis plugin."""
from __future__ import annotations

import logging
from pathlib import Path
from typing import Any, ClassVar, Sequence

import numpy as np
from pydantic import BaseModel, Field

from polyzymd.analyses.base import (
    AggregateContext,
    Analysis,
    MetricValue,
    PlotContext,
    ReplicateContext,
)
from polyzymd.analyses.shared import TrajectoryLoader, convert_time, parse_time_string
from polyzymd.analyses.stats import format_scalar_comparison

logger = logging.getLogger(__name__)


class SolventContactsAnalysis(Analysis):
    """Solvent contacts around a selected protein region."""

    name: ClassVar[str] = "solvent_contacts"

    class Settings(BaseModel):
        """Settings for solvent contact-fraction analysis.

        Attributes
        ----------
        selection : str
            MDAnalysis selection string for target atoms
        solvent_selection : str
            MDAnalysis selection for solvent atoms used in contacts
        cutoff : float
            Contact cutoff distance in Angstrom
        """

        selection: str = Field(
            default="protein and name CA",
            description="MDAnalysis atom selection for target atoms",
        )
        solvent_selection: str = Field(
            default="resname HOH and name O",
            description="MDAnalysis atom selection for solvent contact atoms",
        )
        cutoff: float = Field(
            default=4.5,
            gt=0,
            description="Distance cutoff in Angstrom for a contact",
        )

    def compute_replicate(self, ctx: ReplicateContext, replicate: int) -> dict[str, Any]:
        """Compute solvent contact fraction for one replicate.

        Uses TrajectoryLoader for topology discovery, trajectory segment
        daisy-chaining, and equilibration frame skipping
        """
        # Lazy-import heavy third-party dependency
        import MDAnalysis as mda  # noqa: F401
        from MDAnalysis.analysis.distances import distance_array

        loader = TrajectoryLoader(ctx.sim_config)
        u = loader.load_universe(replicate)

        eq_value, eq_unit = parse_time_string(ctx.equilibration)
        timestep_ps = loader.get_timestep(replicate, unit="ps")
        eq_time_ps = convert_time(eq_value, eq_unit, "ps")
        start_frame = int(eq_time_ps / timestep_ps)

        target_atoms = u.select_atoms(ctx.settings.selection)
        solvent_atoms = u.select_atoms(ctx.settings.solvent_selection)
        contact_fractions = []
        for _ts in u.trajectory[start_frame:]:
            if len(target_atoms) == 0 or len(solvent_atoms) == 0:
                contact_fractions.append(0.0)
                continue
            dists = distance_array(target_atoms.positions, solvent_atoms.positions)
            contact_pairs = np.sum(dists <= ctx.settings.cutoff)
            normalizer = len(target_atoms) * len(solvent_atoms)
            contact_fractions.append(float(contact_pairs / normalizer))

        return {
            "mean_contact_fraction": float(np.mean(contact_fractions)),
            "std_contact_fraction": float(np.std(contact_fractions)),
            "n_frames": len(contact_fractions),
            "replicate": replicate,
        }

    def aggregate(self, ctx: AggregateContext, results: Sequence[Any]) -> dict[str, Any]:
        """Average contact fraction across replicates with SEM."""
        values = [r["mean_contact_fraction"] for r in results]
        return {
            "mean_contact_fraction": float(np.mean(values)),
            "sem_contact_fraction": float(np.std(values, ddof=1) / np.sqrt(len(values))),
            "replicate_values": values,
        }

    def extract_metrics(self, summary: Any) -> dict[str, MetricValue]:
        """Expose contact fraction as scalar metric for default comparison."""
        return {
            "mean_contact_fraction": MetricValue(
                name="mean_contact_fraction",
                mean=summary["mean_contact_fraction"],
                sem=summary["sem_contact_fraction"],
                replicate_values=summary["replicate_values"],
                higher_is_better=False,
                direction_labels=("more exposed", "unchanged", "more shielded"),
            )
        }

    def plot(self, ctx: PlotContext) -> list[Path]:
        """Generate a contact-fraction bar chart comparing conditions."""
        import matplotlib.pyplot as plt

        from polyzymd.analyses.shared import apply_axis_style, get_colors, get_theme, save_figure

        # Use the base class helper to gather per-condition paths
        data, labels = self._build_plot_data(ctx)
        if not labels:
            return []

        # Load aggregated results for each condition
        plot_labels = []
        means = []
        sems = []
        for label in labels:
            if label not in data or label == "__meta__":
                continue
            agg_dir = data[label]["aggregated_dir"]
            summary = self._load_aggregated_result(agg_dir)
            if summary is None:
                continue
            plot_labels.append(label)
            means.append(summary["mean_contact_fraction"])
            sems.append(summary["sem_contact_fraction"])

        if not plot_labels:
            return []

        # Use shared plotting utilities for consistent theming
        plot_settings = ctx.plot_settings
        theme = get_theme(plot_settings)
        colors = get_colors(len(plot_labels), plot_settings)

        fig, ax = plt.subplots(figsize=(8, 5))
        ax.bar(plot_labels, means, yerr=sems, capsize=5, color=colors[: len(plot_labels)])
        apply_axis_style(
            ax,
            plot_settings,
            title="Solvent Contact Fraction Across Conditions",
            ylabel="Contact fraction",
        )

        output_path = ctx.output_dir / "solvent_contacts_comparison.png"
        output_path.parent.mkdir(parents=True, exist_ok=True)
        save_figure(fig, output_path, plot_settings)
        plt.close(fig)
        return [output_path]

    def format(self, result: Any, output_format: str = "text") -> str:
        """Format comparison result for CLI display."""
        from polyzymd.analyses.base import ComparisonResult

        if isinstance(result, ComparisonResult):
            return format_scalar_comparison(
                result,
                title="Solvent Contacts Comparison",
                metric_label="Mean contact fraction",
                metric_unit="",
                metric_key="mean_contact_fraction",
                output_format=output_format,
                higher_is_better=False,
            )
        return super().format(result, output_format)

That file is now discoverable, runnable through compare workflows, and compatible with default framework formatting/statistics.

Base Class Helpers

The Analysis base class provides three utility methods that reduce boilerplate across the lifecycle. The SolventContactsAnalysis example above uses _build_plot_data() in its plot() method, but all three are available to every plugin.

_build_plot_data(ctx, *, include_replicates=False)

Collects per-condition paths from a PlotContext into a dict ready for plotter functions. Returns (data, labels) where data[label] contains "analysis_dir" and "aggregated_dir" keys:

def plot(self, ctx: PlotContext) -> list[Path]:
    # One line replaces the 8-12 line loop through ctx.conditions
    data, labels = self._build_plot_data(ctx, include_replicates=True)
    if not labels:
        return []

    for label in labels:
        agg_dir = data[label]["aggregated_dir"]
        summary = self._load_aggregated_result(agg_dir)
        # ... plot summary data ...

_check_cache(result_cls, cache_path, *, recompute, sim_config=None)

Consolidates the cache-checking pattern used in compute_replicate(). Loads a Pydantic result from disk if the cache file exists and recompute is False, optionally validating the config hash:

def compute_replicate(self, ctx: ReplicateContext, replicate: int) -> dict:
    result_file = ctx.output_dir / "solvent_contacts_result.json"

    # One line replaces the 5-line cache check pattern
    cached = self._check_cache(
        SolventContactsReplicateResult, result_file,
        recompute=ctx.recompute, sim_config=ctx.sim_config,
    )
    if cached is not None:
        return cached

    # ... expensive computation ...

Note

_check_cache() requires a Pydantic model class with a .load(path) method (i.e., a BaseAnalysisResult subclass). Dict-based plugins that want caching should use explicit json.loads() / Path.exists() checks instead.

_format_replicate_range(replicates)

Formats replicate tuples into compact strings for log messages and filenames. Contiguous ranges are collapsed:

cls._format_replicate_range((1, 2, 3))    # → "reps1-3"
cls._format_replicate_range((1, 3, 5))    # → "reps1_3_5"

Useful in aggregate() for logging or naming output files:

def aggregate(self, ctx: AggregateContext, results):
    rep_str = self._format_replicate_range(ctx.replicates)
    logger.info("Aggregating %s across %s", self.name, rep_str)
    # ...

Summary of helpers

Helper

Used in

Purpose

_build_plot_data()

plot()

Collect per-condition paths from PlotContext

_check_cache()

compute_replicate()

Load cached Pydantic result if valid

_format_replicate_range()

aggregate(), logging

Compact replicate string for messages/filenames

Import Rules

The import structure in the example above is intentional. Getting imports wrong causes subtle test failures, so follow these rules:

Module-level imports (top of file):

  • Standard library (logging, pathlib)

  • NumPy — imported at module level because it is available in project environments

  • Framework utilities from analyses/shared/ (TrajectoryLoader, parse_time_string, convert_time, AlignmentConfig, etc.)

  • Framework base classes from analyses/base and analyses/stats

Lazy imports inside methods (inside compute_replicate, plot, etc.):

  • MDAnalysis / mdtraj — heavy simulation libraries

  • matplotlib / seaborn — plotting libraries

  • Any package that may not be installed in all environments

# At module level — always available, needed for patch targets in tests
from polyzymd.analyses.shared import TrajectoryLoader, convert_time, parse_time_string


# Inside methods — heavy deps that may not be installed
def compute_replicate(self, ctx, replicate):
    import MDAnalysis as mda
    ...


def plot(self, ctx):
    import matplotlib.pyplot as plt
    ...

Why this matters for testing: when you mock TrajectoryLoader in tests, you write @patch("polyzymd.analyses.solvent_contacts.TrajectoryLoader"). This replaces the module-level name. If TrajectoryLoader were imported lazily inside the method, the patch target would be different and easier to get wrong.

Context Objects Reference

The framework provides context objects so plugins never need to load configs or discover paths themselves:

Context

Passed To

Key Attributes

ReplicateContext

compute_replicate()

.sim_config, .settings, .output_dir, .replicate, .recompute, .equilibration, .result_path

AggregateContext

aggregate()

.condition, .replicates, .output_dir, .settings, .equilibration, .result_path

ComparisonContext

compare()

.name, .conditions, .analysis_dirs, .results_dir, .effective_control, .control_label, .settings, .equilibration, .recompute, .fdr_alpha, .result_path, .aggregated_results

PlotContext

plot()

.conditions, .analysis_dirs, .output_dir, .results_dir, .settings, .plot_settings, .comparison_path, .control_label

PlotContext.plot_settings

PlotContext.plot_settings carries global plotting configuration from polyzymd.config.comparison.PlotSettings. Use this object instead of hard-coding figure style, DPI, or output behavior.

Type and global fields

ctx.plot_settings is always an instance of PlotSettings (the orchestrator guarantees this — it is never None).

Global fields available on PlotSettings include:

  • output_dir

  • format

  • dpi

  • style

  • color_palette

  • theme

Theme system

PlotSettings.theme uses a PlotTheme model that includes 28 visual style fields (for example: title_fontsize, label_fontsize, tick_fontsize, dot_size, line widths, transparency, and grid controls). The theme system has three presets:

  • PlotTheme.publication()

  • PlotTheme.presentation()

  • PlotTheme.minimal()

Per-analysis settings

Per-analysis settings are accessed as attributes on PlotSettings, for example:

  • plot_settings.contacts

  • plot_settings.rmsf

These fields use a __getattr__ fallback pattern, so access is stable even when a specific analysis block is not configured; defaults are returned.

Standard plugin pattern

The orchestrator guarantees that ctx.plot_settings is always a valid PlotSettings instance — never None. Use it directly:

plot_settings = ctx.plot_settings

Example using shared plotting helpers

from pathlib import Path

from polyzymd.analyses.base import PlotContext
from polyzymd.analyses.shared import (
    apply_axis_style,
    get_colors,
    get_theme,
    save_figure,
)


def plot(self, ctx: PlotContext) -> list[Path]:
    import matplotlib.pyplot as plt

    plot_settings = ctx.plot_settings
    theme = get_theme(plot_settings)
    colors = get_colors(len(ctx.conditions), plot_settings)

    fig, ax = plt.subplots(figsize=(8, 5))
    _ = theme
    _ = colors
    # ... plotting code ...
    apply_axis_style(ax, plot_settings, title="My Plot", ylabel="Value (Å)")
    out = ctx.output_dir / "my_plot.png"
    save_figure(fig, out, plot_settings)
    plt.close(fig)
    return [out]

Tip

Prefer shared plotting helpers over custom style code in each plugin. This keeps visual output consistent across analyses and makes global theming work.

Choosing Your Comparison Path

Path B: Custom Comparison

Override compare() for multi-metric or entry-table analyses:

from typing import Any

from polyzymd.analyses.base import ComparisonContext


def compare(self, ctx: ComparisonContext) -> Any:
    # Load results, compute custom statistics, return your result model
    # The returned object must support save(path)
    ...

See src/polyzymd/analyses/contacts/ or src/polyzymd/analyses/distances/ for full examples.

Loading Results in plot()

PlotContext provides paths and conditions but does not carry pre-loaded aggregated results. The recommended approach is to use _build_plot_data() to collect per-condition paths, then _load_aggregated_result() to load each condition’s data:

from pathlib import Path

from polyzymd.analyses.base import PlotContext


def plot(self, ctx: PlotContext) -> list[Path]:
    import matplotlib.pyplot as plt

    data, labels = self._build_plot_data(ctx)
    for label in labels:
        if label not in data or label == "__meta__":
            continue
        agg_dir = data[label]["aggregated_dir"]
        summary = self._load_aggregated_result(agg_dir)
        if summary is not None:
            # summary is a dict or AggregatedResultClass instance
            # ... plot data from summary ...
            pass

    return []

You can also loop through ctx.conditions directly for fine-grained control:

def plot(self, ctx: PlotContext) -> list[Path]:
    import matplotlib.pyplot as plt

    for cond in ctx.conditions:
        agg_dir = ctx.analysis_dirs[cond.label] / "aggregated"
        summary = self._load_aggregated_result(agg_dir)
        if summary is not None:
            # ... plot data from summary ...
            pass

    return []

_load_aggregated_result() looks for result.json in the aggregated directory, falling back to the most recent *.json file. It deserializes using AggregatedResultClass if set, otherwise json.loads(). Returns None if no result file is found.

Warning

Do not bypass _load_aggregated_result() with custom file parsing unless your plugin requires a non-standard storage format. The helper already handles both dict and model-based deserialization behavior.

Formatting with format_scalar_comparison()

For analyses using the default comparison path, format_scalar_comparison() from analyses.stats produces formatted tables with rankings, effect sizes, and significance stars:

from typing import Any

from polyzymd.analyses.base import ComparisonResult
from polyzymd.analyses.stats import format_scalar_comparison


def format(self, result: Any, output_format: str = "text") -> str:
    if isinstance(result, ComparisonResult):
        return format_scalar_comparison(
            result,
            title="My Analysis Comparison",
            metric_label="My Metric",
            metric_unit="Å",
            metric_key="my_metric",
            output_format=output_format,
            higher_is_better=True,
        )
    return super().format(result, output_format)

Parameters:

Parameter

Type

Description

result

ComparisonResult

The result to format

title

str

Display title (for example, "RMSF Comparison")

metric_label

str

Human-readable metric name for table headers

metric_unit

str

Unit string appended to values (for example, "Å")

metric_key

str | None

Key prefix in ConditionSummary extra fields; auto-detected if None

output_format

str

"text", "markdown", or "json"

higher_is_better

bool

Affects interpretation wording

Return Types: Dicts vs Pydantic Models

Plugins can return either plain dicts or typed Pydantic models from compute_replicate() and aggregate(). Both paths are fully supported.

For new plugins, start with dicts. They require no extra imports, no extra classes, and they work with the default comparison pipeline out of the box. You can always migrate to Pydantic models later if your plugin grows in complexity.

What is Pydantic, and when would you use it?

Pydantic is a Python data validation library. A Pydantic BaseModel subclass is like a dataclass with automatic type checking: if you declare a field as float, Pydantic raises a validation error if someone passes a string. PolyzyMD already uses Pydantic for its config system (Settings classes), so it is always available in the environment.

For result models, Pydantic adds:

  • Validation on construction — catches type errors early (e.g., passing a string where a float is expected)

  • Typed attribute accessresult.mean_rg instead of result["mean_rg"], with IDE autocomplete

  • Structured extension path — start with scaffold-generated BaseModel classes, then upgrade to BaseAnalysisResult if you need helper I/O methods or NPZ sidecar storage

  • Nested structure — complex result hierarchies with validated sub-models

Pydantic model path

Use models when you need validation, nested structure, strict typing, or more explicit result contracts.

Key pieces:

  1. The scaffold --style pydantic path generates result models inheriting from Pydantic BaseModel

  2. Plugin class sets AggregatedResultClass to the aggregated model class

  3. Framework deserializes aggregated files through AggregatedResultClass.model_validate_json(...) (or .load(path) when the class provides it)

  4. BaseModel classes are the simple default and do not include .save() / .load() helpers

  5. If you need NPZ sidecars or model-level I/O helpers, manually upgrade your result models to polyzymd.analyses._results_base.BaseAnalysisResult

from pathlib import Path
from typing import ClassVar

from pydantic import BaseModel

from polyzymd.analyses.base import Analysis


class ReplicateResult(BaseModel):
    mean_rg: float
    replicate: int


class AggregatedResult(BaseModel):
    mean_rg: float
    sem_rg: float
    replicate_values: list[float]


class SolventContactsAnalysis(Analysis):
    name: ClassVar[str] = "solvent_contacts"
    AggregatedResultClass = AggregatedResult

    class Settings(BaseModel):
        selection: str = "protein and name CA"

    def compute_replicate(self, ctx, replicate) -> ReplicateResult:
        return ReplicateResult(mean_rg=15.0, replicate=replicate)

    def aggregate(self, ctx, results) -> AggregatedResult:
        values = [r.mean_rg for r in results]
        return AggregatedResult(
            mean_rg=sum(values) / len(values),
            sem_rg=0.3,
            replicate_values=values,
        )

If you later need NPZ sidecar storage or .save() / .load() helper methods, replace BaseModel with BaseAnalysisResult in those result classes.

Important

Gotcha: AggregatedResultClass requires model returns

When you set AggregatedResultClass, your compute_replicate() and aggregate() should return instances of the corresponding Pydantic result models. Do not return plain dicts in this mode.

Why: _load_aggregated_result() / _deserialize_result() uses the declared class for deserialization. If on-disk JSON represents dict-shaped data that does not match the model contract expected by AggregatedResultClass.load(path), deserialization fails.

Comparison: dicts vs Pydantic models

Aspect

Dicts

Pydantic models

Setup effort

None — just return {}

Define result classes (scaffold uses BaseModel by default)

Validation

None — typos in keys are silent

Automatic type checking on construction

Attribute access

result["mean_rg"]

result.mean_rg with IDE autocomplete

Large array support

Manual — save NumPy arrays yourself

BaseModel: manual; BaseAnalysisResult: NPZ sidecar helpers

Nested structure

Manual — dicts of dicts

Validated sub-models with typed fields

Serialization

json.dumps() / json.loads()

BaseModel JSON by framework; optional BaseAnalysisResult.save() / .load()

Framework wiring

No AggregatedResultClass needed

Set AggregatedResultClass on plugin class

Best for

Scalar analyses, quick prototypes, new contributors

Complex results, per-residue arrays, production plugins

Rule of thumb: Start with dicts. If you find yourself writing defensive result.get("key", default) checks, duplicating key strings, or needing to store large NumPy arrays alongside JSON, migrate to Pydantic models.

A Note on Comparison Infrastructure

Shared comparison infrastructure now lives across the core packages:

  • polyzymd.config.comparison for comparison config and plot settings

  • polyzymd.cli.compare for polyzymd compare commands

  • polyzymd.analyses.shared for inferential statistics and result/path helpers

You do NOT need to create new top-level comparison modules for a plugin. Keep your plotting logic in your plugin’s plot() method and your formatting in format(). Start with all logic in __init__.py; as your plugin grows, extract plotting functions into a _plotters.py module within the package (see Extracting Plotting to _plotters.py below).

Canonical comparison base classes

If your plugin needs a custom comparison result (Path B), import base classes from analyses.base — this is the only canonical import path:

from polyzymd.analyses.base import (
    ANOVAResult,
    BaseComparisonResult,
    BaseConditionSummary,
    PairwiseResult,
)

Class

Purpose

BaseComparisonResult

Abstract base for custom plugin comparison results (generic over summary/pairwise types)

BaseConditionSummary

Abstract base for per-condition summaries (requires primary_metric_value / primary_metric_sem properties)

PairwiseResult

Standard pairwise t-test result with FDR fields

ANOVAResult

Standard ANOVA result

These classes provide .save() / .load() helpers, FDR-adjusted p-value fields, and serialization with NaN/Inf handling. The RMSD, Rg, and SASA plugins all subclass them for their multi-run comparison results.

Shared Utilities (analyses/shared/)

The analyses/shared/ package provides reusable infrastructure for plugin authors. The example above uses TrajectoryLoader, parse_time_string, and convert_time. The package now also re-exports plotting and config-hash helper symbols for direct plugin use.

TrajectoryLoader

The most important shared utility. Handles topology discovery, daisy-chain trajectory segments, timestep detection, and equilibration frame skipping:

from polyzymd.analyses.shared import TrajectoryLoader, convert_time, parse_time_string

loader = TrajectoryLoader(ctx.sim_config)
u = loader.load_universe(replicate)

# Convert equilibration time to frame offset
eq_value, eq_unit = parse_time_string(ctx.equilibration)
timestep_ps = loader.get_timestep(replicate, unit="ps")
eq_time_ps = convert_time(eq_value, eq_unit, "ps")
start_frame = int(eq_time_ps / timestep_ps)

Available re-exports

The following symbols are re-exported from polyzymd.analyses.shared for convenient one-line imports:

Source module

Re-exported symbols

shared.loader

TrajectoryLoader, TrajectoryInfo, parse_time_string, convert_time, time_to_frame

shared.alignment

AlignmentConfig, ReferenceMode, align_trajectory, get_alignment_description

shared.statistics

compute_sem, aggregate_per_residue_stats, aggregate_region_stats, weighted_mean_with_sem, StatResult, PerResidueStats

shared.autocorrelation

compute_acf, estimate_correlation_time, statistical_inefficiency, statistical_inefficiency_multiple, n_effective, get_independent_indices, check_statistical_reliability, ACFResult, CorrelationTimeResult

shared.pbc

minimum_image_distance, pairwise_distances_pbc

shared.constants

DEFAULT_CONTACT_CUTOFF, DEFAULT_DISTANCE_THRESHOLD, DEFAULT_SURFACE_EXPOSURE_THRESHOLD

shared.defaults

AnalysisDefaults

shared.config_hash

compute_config_hash, validate_config_hash

shared.plotting

get_theme, apply_axis_style, apply_legend, get_colors, get_output_path, save_figure, grouped_bars

Convergence detection

analyses/shared/convergence.py provides find_convergence_time(), a sliding-window slope heuristic for detecting sustained convergence in any timeseries. The RMSD plugin uses it to annotate per-run convergence diagnostics, but it is generic enough for any signal (e.g., Rg, energy).

from polyzymd.analyses.shared.convergence import find_convergence_time

result = find_convergence_time(time_ns, rmsd_values)
if result.converged:
    print(f"Converged at {result.convergence_time_ns:.1f} ns")

Multi-run comparison helpers

Some plugins track multiple named runs per condition — for example, per-chain RMSD, per-domain Rg, or per-target SASA. These plugins share a common compare loop: filter conditions → build pairs → run pairwise t-tests → run ANOVA → apply FDR correction. The multi_run_comparison module extracts that orchestration logic so plugins do not reimplement it.

from polyzymd.analyses.shared.multi_run_comparison import (
    apply_fdr_correction,
    build_condition_pairs,
    filter_summaries_with_run,
)

Function

Purpose

filter_summaries_with_run()

Keep only conditions that contain a specific named run

build_condition_pairs()

Build control-vs-all or all-vs-all condition pairs

apply_fdr_correction()

Benjamini-Hochberg FDR correction across pairwise and ANOVA results

Usage sketch inside a custom compare() override:

def compare(self, ctx):
    summaries = self._load_all_summaries(ctx)
    for run_label in run_labels:
        # Keep only conditions that have this run
        filtered = filter_summaries_with_run(summaries, run_label, get_run_fn)
        # Build pairwise pairs (control-vs-all or all-vs-all)
        pairs = build_condition_pairs(
            list(filtered.keys()), ctx.control_label
        )
        # ... run t-tests, ANOVA ...
    # Correct p-values across all runs
    apply_fdr_correction(all_pairwise, all_anova, fdr_alpha=ctx.fdr_alpha)

The RMSD, Rg, and SASA plugins all use this pattern. Study src/polyzymd/analyses/rmsd/__init__.py for a complete example.

Multi-run formatting helpers

After computing multi-run statistics, you need to format them for the CLI. The multi_run_formatting module provides reusable building blocks for ranked tables, pairwise comparison lines, and ANOVA summaries:

from polyzymd.analyses.shared.multi_run_formatting import (
    format_anova_line,
    format_pairwise_line,
    make_ranked_table_header,
    make_section_title,
)

Function

Purpose

make_ranked_table_header()

Build a ranked-table header with separator

format_pairwise_line()

Format one pairwise comparison as a text line

format_anova_line()

Format ANOVA results as a text line

make_section_title()

Build a section title with separator

make_ranked_rows()

Build (label, mean, sem, rank) tuples for table rows

make_ranked_markdown_header()

Build ranked-table headers for markdown output

These helpers keep CLI output visually consistent across multi-run plugins. See src/polyzymd/analyses/rmsd/_formatters.py for a complete formatting example.

Tip

If your plugin tracks multiple named runs (like per-chain or per-domain metrics), use the shared multi_run_comparison and multi_run_formatting helpers instead of reimplementing the compare → rank → pairwise → ANOVA → FDR loop. This keeps the statistical pipeline consistent and avoids subtle bugs in p-value correction.

Other submodules

Other submodules (selections, aggregation, diagnostics, aa_classification, logging_utils) are available via direct import but are not re-exported from the package root:

The metric_type classification is a planned future enhancement and is not yet implemented in the current codebase.

# Direct submodule import (not re-exported)
from polyzymd.analyses.shared.selections import get_positions
from polyzymd.analyses.shared.aggregation import collect_replicate_results

Plugins can now import plotting and config hash utilities directly from the package root:

from polyzymd.analyses.shared import (
    compute_config_hash,
    get_output_path,
    save_figure,
)

Note

Re-export imports are preferred for plugin code because they keep import paths stable even if internal module locations change.

Contributor Checklist

When creating a new analysis plugin:

  • File: src/polyzymd/analyses/<name>/ (use polyzymd new-analysis <name> to scaffold; add --style pydantic for typed result models)

  • name: ClassVar[str] — unique, lowercase, used in CLI

  • Settings class — Pydantic v2 BaseModel with defaults

  • compute_replicate() — uses TrajectoryLoader, returns dict or Pydantic model

  • aggregate() — returns dict or Pydantic model; framework auto-saves to ctx.result_path

  • Choose comparison path:

    • Default: implement extract_metrics()

    • Custom: override compare() returning a model with .save()

    • Multi-run custom: use multi_run_comparison + multi_run_formatting shared helpers

  • plot() — use _build_plot_data() + shared plotting helpers (get_theme, apply_axis_style, save_figure, get_colors)

  • (Optional) format() — CLI display via format_scalar_comparison()

  • (Optional) Use _check_cache() in compute_replicate() for result caching

  • (Optional) Use _format_replicate_range() for compact log messages

  • If your analysis requires polymer (or other optional molecular groups), implement filter_conditions() to exclude inapplicable conditions — or handle empty selections gracefully in compute_replicate()

  • If using AggregatedResultClass, return matching result model instances

  • Tests: tests/analyses/plugins/test_<name>.py (scaffold generates plot tests too)

  • Verify: pixi run -e build pytest tests/analyses/plugins/test_<name>.py -v

What a Good Community PR Looks Like

A strong contribution usually includes:

  • one new plugin package in src/polyzymd/analyses/<name>/

  • one focused test file

  • one short docs update showing the plugins: config block

  • figures only if they add scientific value

Aim for a plugin that another MDAnalysis user can understand in one read.

Extracting Plotting to _plotters.py

As a plugin grows, its plot() method may accumulate many helper functions. Established plugins extract these into a _plotters.py module within the plugin package:

analyses/rmsf/
├── __init__.py      # Analysis lifecycle (compute, aggregate, compare, plot)
├── _plotters.py     # Plotting functions called by plot()
├── _results.py      # Result models
└── ...

The plot() method in __init__.py delegates to functions in _plotters.py:

from ._plotters import plot_comparison_bars, plot_per_residue

def plot(self, ctx: PlotContext) -> list[Path]:
    data, labels = self._build_plot_data(ctx)
    paths = []
    paths.append(plot_comparison_bars(data, labels, ctx.output_dir, ctx.plot_settings))
    paths.append(plot_per_residue(data, labels, ctx.output_dir, ctx.plot_settings))
    return [p for p in paths if p is not None]

When to extract: Consider extracting when your plugin has 3+ plot functions or __init__.py exceeds ~500 lines. For simple plugins (like catalytic_triad/), keeping plotting inline is fine.

All 6 established plugins have _plotters.py: rmsf, secondary_structure, distances, contacts, catalytic_triad, hydrogen_bonds.

Anti-Patterns to Avoid

Do not bypass the context

# Wrong
def compute_replicate(self, ctx, replicate):
    config = SimulationConfig.from_yaml("/my/path/config.yaml")
# Right
def compute_replicate(self, ctx, replicate):
    sim_config = ctx.sim_config  # already loaded by framework

Understand the result-saving convention

The framework auto-saves return values from compute_replicate() and aggregate() to ctx.result_path. For dict plugins, returning your result is usually sufficient.

For caching with explicit target naming, save explicitly:

import json


def aggregate(self, ctx, results):
    agg = {"mean": 15.0}
    target_path = ctx.result_path
    if target_path is None:
        target_path = ctx.output_dir / "result.json"
    target_path.parent.mkdir(parents=True, exist_ok=True)
    target_path.write_text(json.dumps(agg, indent=2), encoding="utf-8")
    return agg

Do not import heavy deps at module level

# Wrong — breaks environments without MDAnalysis
import MDAnalysis as mda


# Right — lazy import inside method
def compute_replicate(self, ctx, replicate):
    import MDAnalysis as mda

Framework utilities from analyses/shared/ are imported at module level — they are always available and patchable in tests. Only heavy third-party dependencies (MDAnalysis, mdtraj, matplotlib) should be imported inside methods.

Do not create top-level comparison packages

New plugins should keep all logic in the plugin package. Use the canonical base classes from analyses.base (BaseComparisonResult, BaseConditionSummary, PairwiseResult, ANOVAResult) for custom comparison results, and the shared helpers from analyses/shared/ for multi-run orchestration and formatting.

Existing Plugins to Study

Plugin

Comparison path

Good example of

catalytic_triad/

Default (extract_metrics)

Simplest lifecycle — minimal compute logic with _plotters.py for KDE/bar plots

rmsf/

Default (extract_metrics)

Default scalar comparison with _plotters.py extraction

secondary_structure/

Default (extract_metrics)

Wrapping existing calculators with _plotters.py and typed result classes

rmsd/

Custom (multi-run)

Multi-run comparison with multi_run_comparison + multi_run_formatting shared helpers

rg/

Custom (multi-run)

Same multi-run pattern as RMSD, applied to radius of gyration

contacts/

Custom (compare override)

Multi-metric comparison, condition filtering, large _plotters.py

distances/

Custom (compare override)

Entry-table style comparison with _plotters.py extraction

Important

Do not interpret plugin differences as “simple” versus “complex” quality. Several existing plugins reflect earlier architectural phases. Use them as pattern references for specific tasks (result modeling, comparison style, plotting), not as maturity rankings.

Testing Your Plugin

Test file naming

Name your test file tests/analyses/plugins/test_<name>.py (for example, tests/analyses/plugins/test_solvent_contacts.py).

Testing strategy

A reliable test suite for plugins usually combines:

  • discovery and class-contract tests (name, Settings, discovery)

  • pure logic tests (extract_metrics, aggregate)

  • isolated compute tests (compute_replicate with mocked trajectory loader)

  • filesystem-backed integration tests for compare() and plot()

This layering keeps tests fast while still validating framework integration.

Basic tests (discovery, settings, metrics)

These tests use no trajectory data and verify the plugin integrates with the framework:

from polyzymd.analyses import get_analysis, list_analyses
from polyzymd.analyses.base import MetricValue


class TestSolventContactsDiscovery:
    """Test that the plugin is discoverable."""

    def test_discovered(self):
        assert "solvent_contacts" in list_analyses()

    def test_class_variables(self):
        cls = get_analysis("solvent_contacts")
        assert cls.name == "solvent_contacts"
        assert hasattr(cls, "Settings")

    def test_settings_defaults(self):
        cls = get_analysis("solvent_contacts")
        settings = cls.Settings()
        assert settings.selection == "protein and name CA"


class TestSolventContactsMetrics:
    """Test metric extraction for default comparison."""

    def test_extract_metrics(self):
        cls = get_analysis("solvent_contacts")
        analysis = cls()
        summary = {
            "mean_contact_fraction": 0.142,
            "sem_contact_fraction": 0.010,
            "replicate_values": [0.136, 0.148],
        }
        metrics = analysis.extract_metrics(summary)
        assert "mean_contact_fraction" in metrics
        assert isinstance(metrics["mean_contact_fraction"], MetricValue)
        assert metrics["mean_contact_fraction"].higher_is_better is False

Testing compute_replicate() with mocked trajectories

compute_replicate() requires MDAnalysis and trajectory files. Mock TrajectoryLoader at the module level to test compute logic without real trajectory data:

from pathlib import Path
from unittest.mock import MagicMock, patch

from polyzymd.analyses import get_analysis
from polyzymd.analyses.base import Condition, ReplicateContext


class TestSolventContactsCompute:
    """Test compute_replicate with mocked trajectories."""

    @patch("polyzymd.analyses.solvent_contacts.TrajectoryLoader")
    def test_computes_metric(self, mock_loader_cls, tmp_path):
        cls = get_analysis("solvent_contacts")
        analysis = cls()
        settings = cls.Settings()

        mock_loader = MagicMock()
        mock_loader_cls.return_value = mock_loader

        mock_universe = MagicMock()
        mock_target = MagicMock()
        mock_target.__len__.return_value = 10
        mock_target.positions = [[0.0, 0.0, 0.0]] * 10
        mock_solvent = MagicMock()
        mock_solvent.__len__.return_value = 20
        mock_solvent.positions = [[1.0, 1.0, 1.0]] * 20
        mock_universe.select_atoms.side_effect = [mock_target, mock_solvent]

        mock_trajectory = MagicMock()
        mock_trajectory.__getitem__.return_value = range(50)
        mock_universe.trajectory = mock_trajectory

        mock_loader.load_universe.return_value = mock_universe
        mock_loader.get_timestep.return_value = 10.0

        condition = Condition(
            label="Test",
            config_path=Path("/fake/config.yaml"),
            replicates=(1,),
            sim_config=MagicMock(),
        )
        ctx = ReplicateContext(
            condition=condition,
            replicate=1,
            sim_config=condition.sim_config,
            output_dir=tmp_path / "run_1",
            equilibration="0ns",
            recompute=True,
            settings=settings,
        )

        result = analysis.compute_replicate(ctx, replicate=1)
        assert "mean_contact_fraction" in result
        assert result["mean_contact_fraction"] >= 0
        assert result["replicate"] == 1

Note the patch target: @patch("polyzymd.analyses.solvent_contacts.TrajectoryLoader"). This works because TrajectoryLoader is imported at module level in solvent_contacts/__init__.py.

Testing aggregate()

Aggregation takes plain data and needs no trajectory mocks:

from pathlib import Path
from unittest.mock import MagicMock

import numpy as np

from polyzymd.analyses import get_analysis
from polyzymd.analyses.base import AggregateContext, Condition


class TestSolventContactsAggregate:
    """Test aggregation across replicates."""

    def test_aggregate(self, tmp_path):
        cls = get_analysis("solvent_contacts")
        analysis = cls()
        settings = cls.Settings()

        condition = Condition(
            label="Test",
            config_path=Path("/fake/config.yaml"),
            replicates=(1, 2, 3),
            sim_config=MagicMock(),
        )
        ctx = AggregateContext(
            condition=condition,
            replicates=(1, 2, 3),
            output_dir=tmp_path / "aggregated",
            equilibration="10ns",
            settings=settings,
        )

        results = [
            {"mean_contact_fraction": 0.12, "replicate": 1},
            {"mean_contact_fraction": 0.10, "replicate": 2},
            {"mean_contact_fraction": 0.11, "replicate": 3},
        ]

        agg = analysis.aggregate(ctx, results)
        expected_mean = np.mean([0.12, 0.10, 0.11])
        assert abs(agg["mean_contact_fraction"] - expected_mean) < 1e-10
        assert agg["replicate_values"] == [0.12, 0.10, 0.11]

Testing compare() with filesystem-backed aggregated results

compare() reads condition-specific result files from analysis directories, so tests should construct realistic filesystem layout with result.json files and a ComparisonContext:

import json
from pathlib import Path
from unittest.mock import MagicMock

from polyzymd.analyses import get_analysis
from polyzymd.analyses.base import ComparisonContext, Condition


class TestCompare:
    def test_compare_two_conditions(self, tmp_path):
        cls = get_analysis("solvent_contacts")
        analysis = cls()

        # Set up filesystem with aggregated results
        for label, value in [("WT", 0.12), ("Mutant", 0.18)]:
            agg_dir = tmp_path / label / "solvent_contacts" / "aggregated"
            agg_dir.mkdir(parents=True)
            (agg_dir / "result.json").write_text(
                json.dumps(
                    {
                        "mean_contact_fraction": value,
                        "sem_contact_fraction": 0.01,
                        "replicate_values": [value - 0.01, value + 0.01],
                    }
                ),
                encoding="utf-8",
            )

        conditions = [
            Condition(
                label="WT",
                config_path=Path("/fake"),
                replicates=(1, 2),
                sim_config=MagicMock(),
            ),
            Condition(
                label="Mutant",
                config_path=Path("/fake"),
                replicates=(1, 2),
                sim_config=MagicMock(),
            ),
        ]

        ctx = ComparisonContext(
            name="test",
            conditions=conditions,
            excluded_conditions=[],
            control_label="WT",
            analysis_dirs={
                "WT": tmp_path / "WT" / "solvent_contacts",
                "Mutant": tmp_path / "Mutant" / "solvent_contacts",
            },
            results_dir=tmp_path / "results",
            equilibration="10ns",
            settings=cls.Settings(),
            recompute=True,
        )

        result = analysis.compare(ctx)
        assert result is not None

Tip

This style of test is robust because it validates the same on-disk contract used by real compare runs (<condition>/<analysis>/aggregated/result.json).

Matplotlib backend in CI/headless tests

When testing plot code in CI or headless environments, force a non-interactive backend before importing matplotlib.pyplot:

import matplotlib

matplotlib.use("Agg")  # headless backend for CI / testing
import matplotlib.pyplot as plt

This avoids backend/display errors and keeps plotting tests deterministic.

Run tests with:

pixi run -e build pytest tests/analyses/plugins/test_solvent_contacts.py -v

See Also

Appendix: The comparison.yaml File

The comparison.yaml file is the single configuration file that drives the entire comparison workflow. It defines which simulation conditions to compare, which analyses to run, and how to generate plots. Understanding this file is important for plugin authors because your Settings class maps directly to the plugins: section.

Generating a template

The fastest way to create a comparison.yaml is with the scaffold command:

polyzymd compare init -n my_study

This creates a project directory with a fully commented template. You can also specify a custom equilibration time:

polyzymd compare init -n my_study --eq-time 20ns

File structure

A comparison.yaml has four main sections:

# Project metadata
name: "my_study"
description: "Comparison of polymer conditions"
control: "Condition A"  # or null if no control

# Conditions — each points to a simulation's config.yaml
conditions:
  - label: "Condition A"
    config: "../path/to/condition_a/config.yaml"
    replicates: [1, 2, 3]

  - label: "Condition B"
    config: "../path/to/condition_b/config.yaml"
    replicates: [1, 2, 3]

# Defaults
defaults:
  equilibration_time: "10ns"

# Plugins — which analyses to run and their settings
plugins:
  rmsf:
    selection: "protein and name CA"
    reference_mode: "centroid"

  # Your plugin's Settings fields go here:
  my_analysis:
    example_parameter: "value"

# Plot settings — visual customization
plot_settings:
  output_dir: "figures/"
  format: "png"
  dpi: 300
  style: "publication"

How plugins: connects to your Settings class

When the framework runs your plugin, it reads the plugins: section of comparison.yaml and deserializes the matching key into your Settings model. For example, if your plugin has:

class MySettings(BaseModel):
    cutoff: float = 4.5
    selection: str = "protein and name CA"

Then users configure it in comparison.yaml as:

plugins:
  my_analysis:
    cutoff: 5.0
    selection: "chainID A"

Any fields not specified in the YAML use the defaults from your Settings class. This is why providing sensible defaults for every field is important — it means the analysis works without any explicit configuration.

Running comparisons

After editing comparison.yaml:

# Validate the configuration
polyzymd compare validate

# Run a specific analysis
polyzymd compare run rmsf

# Run all enabled analyses
polyzymd compare run-all

# Generate plots
polyzymd compare plot-all

See How to Compare Simulation Conditions for a complete user guide on running comparisons.