Analysis Base Classes

API reference for polyzymd.analyses.base, including the plugin base class, context objects, and shared comparison/result models.

Base class and context objects for the PolyzyMD analysis plugin system.

Every analysis in PolyzyMD — RMSF, contacts, distances, etc. — is a single class that inherits from Analysis. The framework discovers these classes automatically (no registry edits) and owns replicate iteration, caching, dependency ordering, and CLI wiring.

How to Add a New Analysis

  1. Create src/polyzymd/analyses/<name>/ as a sub-package.

  2. Define a Settings model (Pydantic v2 BaseModel) as a class attribute.

  3. Subclass Analysis and implement the required methods.

  4. Done — the framework discovers it via pkgutil.

Required methods:

compute_replicate(ctx, replicate) -> dict | BaseModel
aggregate(ctx, results)           -> dict | BaseModel | None

Optional overrides (sensible defaults provided):

filter_conditions(conditions)     -> list[Condition]
compare(ctx)                      -> ComparisonResult | BaseModel | None
plot(ctx)                         -> list[Path]
format(result, output_format)     -> str
extract_metrics(summary)          -> dict[str, MetricValue]

Notes

The orchestrator auto-saves results returned by compute_replicate() and aggregate() to disk. Simple plugins can rely on this fallback. Plugins that need equilibration-aware caching or custom filenames should save explicitly (see rmsf/ for the pattern).

See also

analyses.stats

Shared statistical utility functions.

analyses.discovery

Automatic plugin discovery.

analyses.orchestrator

Framework engine for running analyses.

class polyzymd.analyses.base.BasePlotSettings[source]

Bases: BaseModel

Base class for per-analysis plot settings.

Each analysis plugin that supports plot customization should subclass this in its _plot_settings.py module and set PlotSettingsModel = MyPlotSettings on its Analysis subclass.

The class is intentionally minimal — it exists only so the framework can enforce a common type for all per-analysis plot settings.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.analyses.base.SlurmResourceHint(*, mem=None, time=None, cpus_per_task=None)[source]

Bases: BaseModel

Per-plugin SLURM resource hints for HPC submission.

These values are used as default SLURM resources when users do not pass explicit resource flags on the CLI. Explicit CLI flags always take precedence over plugin hints.

Parameters:
  • mem (str | None) – Memory request string, for example "16G".

  • time (str | None) – Walltime string, for example "04:00:00".

  • cpus_per_task (int | None) – Number of CPUs per task.

mem: str | None
time: str | None
cpus_per_task: int | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.analyses.base.Condition(label, config_path, replicates, sim_config)[source]

Bases: object

A single simulation condition within a comparison.

Mirrors the essential fields of ConditionConfig but decoupled from the comparison config module so analyses don’t import it.

label

Human-readable condition name (e.g. “100% SBMA”).

Type:

str

config_path

Path to this condition’s config.yaml.

Type:

Path

replicates

1-indexed replicate numbers to process.

Type:

tuple[int, …]

sim_config

Loaded simulation configuration.

Type:

SimulationConfig

label: str
config_path: Path
replicates: tuple[int, ...]
sim_config: SimulationConfig
classmethod from_condition_config(cond)[source]

Create from a ConditionConfig (lazy-loads SimulationConfig).

__init__(label, config_path, replicates, sim_config)
class polyzymd.analyses.base.ReplicateContext(condition, replicate, sim_config, output_dir, equilibration, recompute, settings, result_path=None)[source]

Bases: object

Context passed to Analysis.compute_replicate().

Provides everything needed to analyse a single replicate of a single condition.

condition

The condition being analysed.

Type:

Condition

replicate

1-indexed replicate number.

Type:

int

sim_config

Already-loaded simulation configuration.

Type:

SimulationConfig

output_dir

Where to write per-replicate results (<analysis_root>/<condition_label>/<analysis_name>/run_<rep>).

Type:

Path

equilibration

Equilibration time string (e.g. "10ns").

Type:

str

recompute

If True, ignore cached results and recompute.

Type:

bool

settings

Analysis-specific settings (the analysis’s Settings class).

Type:

BaseModel

result_path

Canonical cache path for the per-replicate result. May be None if the plugin is invoked outside the normal orchestrator pipeline.

Type:

Path | None

condition: Condition
replicate: int
sim_config: SimulationConfig
output_dir: Path
equilibration: str
recompute: bool
settings: BaseModel
result_path: Path | None = None
__init__(condition, replicate, sim_config, output_dir, equilibration, recompute, settings, result_path=None)
class polyzymd.analyses.base.AggregateContext(condition, replicates, output_dir, equilibration, settings, result_path=None)[source]

Bases: object

Context passed to Analysis.aggregate().

condition

The condition being aggregated.

Type:

Condition

replicates

Replicate numbers that were successfully computed.

Type:

tuple[int, …]

output_dir

Where to write the aggregated result (<analysis_root>/<condition_label>/<analysis_name>/aggregated).

Type:

Path

equilibration

Equilibration time string.

Type:

str

settings

Analysis-specific settings.

Type:

BaseModel

result_path

Canonical cache path for the aggregated result. May be None if the plugin is invoked outside the normal orchestrator pipeline.

Type:

Path | None

condition: Condition
replicates: tuple[int, ...]
output_dir: Path
equilibration: str
settings: BaseModel
result_path: Path | None = None
__init__(condition, replicates, output_dir, equilibration, settings, result_path=None)
class polyzymd.analyses.base.ComparisonContext(name, conditions, excluded_conditions, control_label, analysis_dirs, results_dir, equilibration, settings, recompute, fdr_alpha=0.05, ttest_method='student', posthoc_method='ttest_bh', result_path=None, failed_conditions=<factory>, aggregated_results=<factory>)[source]

Bases: object

Context passed to Analysis.compare().

Provides all conditions, their analysis directories, and the comparison-level configuration.

name

Comparison project name (from comparison.yaml).

Type:

str

conditions

Conditions that passed filter_conditions().

Type:

list[Condition]

excluded_conditions

Conditions removed by filter_conditions().

Type:

list[Condition]

failed_conditions

Conditions that were valid but failed during compute/aggregate (e.g., insufficient replicates). Empty by default.

Type:

list[Condition]

control_label

Label of the control condition (None if not specified or if the control was excluded).

Type:

str | None

analysis_dirs

Mapping condition_label -> analysis_dir (contains run_N/ and aggregated/).

Type:

dict[str, Path]

results_dir

Analysis-specific comparison directory.

Type:

Path

equilibration

Equilibration time string.

Type:

str

settings

Analysis-specific settings.

Type:

BaseModel

fdr_alpha

Significance threshold for pairwise tests and ANOVA. Used as the BH false-discovery-rate threshold when posthoc_method is "ttest_bh" and as the family-wise significance threshold when posthoc_method is "tukey_hsd".

Type:

float

ttest_method

Two-sample t-test method for default scalar pairwise tests. "student" uses equal variances and "welch" does not.

Type:

str

posthoc_method

Post-hoc testing method for default scalar pairwise tests. "ttest_bh" applies pairwise t-tests with BH correction and "tukey_hsd" applies Tukey HSD across all groups.

Type:

str

recompute

Whether to force recomputation.

Type:

bool

result_path

Canonical cache path for the comparison result.

Type:

Path | None

aggregated_results

Mapping condition_label -> aggregated result for conditions that succeeded. Plugins can use this instead of re-loading from disk.

Type:

dict[str, Any]

name: str
conditions: list[Condition]
excluded_conditions: list[Condition]
control_label: str | None
analysis_dirs: dict[str, Path]
results_dir: Path
equilibration: str
settings: BaseModel
recompute: bool
fdr_alpha: float = 0.05
ttest_method: str = 'student'
posthoc_method: str = 'ttest_bh'
result_path: Path | None = None
failed_conditions: list[Condition]
aggregated_results: dict[str, Any]
property effective_control: str | None

Return control label if the control was not excluded.

__init__(name, conditions, excluded_conditions, control_label, analysis_dirs, results_dir, equilibration, settings, recompute, fdr_alpha=0.05, ttest_method='student', posthoc_method='ttest_bh', result_path=None, failed_conditions=<factory>, aggregated_results=<factory>)
class polyzymd.analyses.base.PlotContext(conditions, analysis_dirs, results_dir, output_dir, settings, plot_settings=<factory>, comparison_path=None, control_label=None, equilibration='0ns')[source]

Bases: object

Context passed to Analysis.plot().

conditions

All conditions included in the comparison.

Type:

list[Condition]

analysis_dirs

Mapping condition_label -> analysis_dir.

Type:

dict[str, Path]

results_dir

Where comparison result JSONs live.

Type:

Path

output_dir

Where to write figures.

Type:

Path

settings

Analysis-specific settings.

Type:

BaseModel

plot_settings

Global plot settings (theme, DPI, format, etc.). The framework guarantees this is never None — a PlotSettings() default is provided when the comparison config has no plot_settings: section. Plugins can access this directly without None guards.

Type:

PlotSettings

comparison_path

Canonical comparison result path for this analysis.

Type:

Path | None

control_label

Label of the control condition, or None if not specified / excluded. Mirrors ComparisonContext.control_label.

Type:

str | None

equilibration

Equilibration time string used for equilibration-aware cache filenames in plot helpers.

Type:

str

Notes

PlotContext does not carry pre-loaded aggregated results. Use Analysis._build_plot_data() to collect per-condition paths, then Analysis._load_aggregated_result() to load each result:

def plot(self, ctx: PlotContext) -> list[Path]:
    data, labels = self._build_plot_data(ctx)
    for label in labels:
        agg_dir = data[label]["aggregated_dir"]
        summary = self._load_aggregated_result(agg_dir)
        # ... plot data from summary ...
conditions: list[Condition]
analysis_dirs: dict[str, Path]
results_dir: Path
output_dir: Path
settings: BaseModel
plot_settings: PlotSettings
comparison_path: Path | None = None
control_label: str | None = None
equilibration: str = '0ns'
__post_init__()[source]

Ensure plot settings is always materialized for plugins.

__init__(conditions, analysis_dirs, results_dir, output_dir, settings, plot_settings=<factory>, comparison_path=None, control_label=None, equilibration='0ns')
class polyzymd.analyses.base.MetricValue(name, mean, sem, replicate_values, higher_is_better=True, direction_labels=('decreased', 'unchanged', 'increased'))[source]

Bases: object

A single scalar metric extracted from a condition summary.

Used by the default Analysis.compare() implementation. If your analysis overrides compare() entirely, you don’t need this.

name

Metric identifier (e.g. "mean_rmsf", "coverage").

Type:

str

mean

Mean value across replicates.

Type:

float

sem

Standard error of the mean.

Type:

float

replicate_values

Per-replicate values (for t-tests / ANOVA).

Type:

list[float]

higher_is_better

If True, higher values rank first. If False, lower values rank first (e.g. RMSF). If None, no universal quality direction is assumed and conditions are ranked by descending mean value for neutral display.

Type:

bool | None

direction_labels

(negative_label, unchanged_label, positive_label) for interpreting percent-change direction. Defaults to ("decreased", "unchanged", "increased").

Type:

tuple[str, str, str]

name: str
mean: float
sem: float
replicate_values: list[float]
higher_is_better: bool | None = True
direction_labels: tuple[str, str, str] = ('decreased', 'unchanged', 'increased')
__init__(name, mean, sem, replicate_values, higher_is_better=True, direction_labels=('decreased', 'unchanged', 'increased'))
class polyzymd.analyses.base.ConditionSummary(*, label, n_replicates=0, **extra_data)[source]

Bases: BaseModel

Summary statistics for one condition in a scalar comparison.

For simple scalar analyses (RMSF, catalytic_triad, secondary_structure), dynamic <metric>_mean, <metric>_sem, and <metric>_replicate_values fields are added via model_extra.

label

Condition display name.

Type:

str

n_replicates

Number of replicates included.

Type:

int

model_config = {'extra': 'allow'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

label: str
n_replicates: int
class polyzymd.analyses.base.PairwiseResult(*, condition_a, condition_b, metric='default', t_statistic, p_value, p_value_adjusted=None, posthoc_method='ttest_bh', cohens_d, effect_size_interpretation, direction, significant, percent_change)[source]

Bases: BaseModel

Statistical comparison between two conditions for one metric.

condition_a

Label of first condition (typically control/reference).

Type:

str

condition_b

Label of second condition (typically treatment).

Type:

str

metric

Name of the metric being compared.

Type:

str

t_statistic

T-test statistic.

Type:

float

p_value

Two-tailed p-value.

Type:

float

p_value_adjusted

Multiplicity-corrected p-value. For "ttest_bh" this is the Benjamini-Hochberg adjusted value; for "tukey_hsd" this mirrors the Tukey family-wise p-value (already corrected). None only for legacy payloads missing this field.

Type:

float | None

posthoc_method

Post-hoc method used to generate this pairwise p-value.

Type:

str

cohens_d

Effect size (Cohen’s d).

Type:

float

effect_size_interpretation

"negligible", "small", "medium", or "large".

Type:

str

direction

Interpretation of change (e.g. "stabilizing").

Type:

str

significant

Whether the comparison is significant. Uses adjusted p-value when available, otherwise raw p-value.

Type:

bool

percent_change

Percent change from condition_a to condition_b.

Type:

float

model_config = {'ser_json_inf_nan': 'strings'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

condition_a: str
condition_b: str
metric: str
t_statistic: float
p_value: float
p_value_adjusted: float | None
posthoc_method: str
cohens_d: float
effect_size_interpretation: str
direction: str
significant: bool
percent_change: float
class polyzymd.analyses.base.ANOVAResult(*, metric='default', f_statistic, p_value, significant)[source]

Bases: BaseModel

One-way ANOVA result for one metric.

metric

Name of the metric tested.

Type:

str

f_statistic

F-statistic from ANOVA.

Type:

float

p_value

P-value for the test.

Type:

float

significant

Whether p_value is less than or equal to the configured significance threshold.

Type:

bool

metric: str
f_statistic: float
p_value: float
significant: bool
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.analyses.base.ComparisonResult(*, analysis_type, name, control_label=None, fdr_alpha=None, ttest_method='student', posthoc_method='ttest_bh', conditions=<factory>, pairwise_comparisons=<factory>, anova=None, ranking=<factory>, rankings_by_metric=None, equilibration_time='0ns', created_at='', polyzymd_version='')[source]

Bases: BaseModel

Serializable result of a cross-condition comparison.

This is the universal comparison output model. The default Analysis.compare() returns an instance of this class. Complex analyses (contacts, distances, exposure, BFE, polymer_affinity) may return their own typed Pydantic models — as long as those models have a .save() method, the framework handles them identically.

The CLI calls result.save(path) and analysis.format(result) for every comparison, so all result objects must support these two operations.

analysis_type

Analysis identifier (e.g. "rmsf").

Type:

str

name

Comparison project name.

Type:

str

control_label

Control condition label.

Type:

str | None

fdr_alpha

Significance threshold for pairwise tests and ANOVA. Used as the BH false-discovery-rate threshold ("ttest_bh") or the Tukey family-wise threshold ("tukey_hsd"). None when unknown (legacy payloads).

Type:

float | None

ttest_method

Two-sample t-test method used for pairwise tests.

Type:

str

posthoc_method

Post-hoc testing method used for pairwise tests.

Type:

str

conditions

Per-condition summary statistics.

Type:

list[ConditionSummary]

pairwise_comparisons

Pairwise statistical tests.

Type:

list[PairwiseResult]

anova

ANOVA results (None if < 3 conditions).

Type:

list[ANOVAResult] | None

ranking

Condition labels ranked by primary metric (best first).

Type:

list[str]

rankings_by_metric

Per-metric rankings for multi-metric analyses.

Type:

dict[str, list[str]] | None

equilibration_time

Equilibration time used.

Type:

str

created_at

ISO 8601 timestamp.

Type:

str

polyzymd_version

PolyzyMD version string.

Type:

str

model_config = {'ser_json_inf_nan': 'strings'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

analysis_type: str
name: str
control_label: str | None
fdr_alpha: float | None
ttest_method: str
posthoc_method: str
conditions: list[ConditionSummary]
pairwise_comparisons: list[PairwiseResult]
anova: list[ANOVAResult] | None
ranking: list[str]
rankings_by_metric: dict[str, list[str]] | None
equilibration_time: str
created_at: str
polyzymd_version: str
save(path)[source]

Save result to JSON file.

Parameters:

path (Path or str) – Output path.

Returns:

Path to saved file.

Return type:

Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:

path (Path or str) – Path to JSON file.

Returns:

Loaded result.

Return type:

Self

class polyzymd.analyses.base.BaseConditionSummary(*, label, config_path, n_replicates, replicate_values)[source]

Bases: BaseModel, ABC

Abstract base class for condition-level custom comparison summaries.

label

Display name for this condition

Type:

str

config_path

Path to the simulation config file

Type:

str

n_replicates

Number of replicates included

Type:

int

replicate_values

Per-replicate values of the primary metric

Type:

list[float]

label: str
config_path: str
n_replicates: int
replicate_values: list[float]
abstract property primary_metric_value: float

Return the primary metric value for ranking and comparison.

abstract property primary_metric_sem: float

Return the SEM of the primary metric.

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.analyses.base.BaseComparisonResult(*, metric, name, control_label=None, conditions, pairwise_comparisons, anova=None, ranking, equilibration_time, created_at, polyzymd_version)[source]

Bases: BaseModel, ABC, Generic[TConditionSummary, TPairwiseResult]

Abstract base class for custom plugin comparison results.

metric

The primary metric being compared

Type:

str

name

Name of the comparison project

Type:

str

control_label

Label of the control condition

Type:

str | None

conditions

Condition summaries

Type:

list[TConditionSummary]

pairwise_comparisons

Pairwise statistical comparisons

Type:

list[TPairwiseResult]

anova

ANOVA result(s)

Type:

ANOVAResult | list[ANOVAResult] | None

ranking

Condition labels ranked by primary metric

Type:

list[str]

equilibration_time

Equilibration time used

Type:

str

created_at

Timestamp for result generation

Type:

datetime

polyzymd_version

PolyzyMD version used

Type:

str

model_config = {'ser_json_inf_nan': 'strings'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

comparison_type: ClassVar[str] = 'base'
metric: str
name: str
control_label: str | None
conditions: list[TConditionSummary]
pairwise_comparisons: list[TPairwiseResult]
anova: ANOVAResult | list[ANOVAResult] | None
ranking: list[str]
equilibration_time: str
created_at: datetime
polyzymd_version: str
save(path)[source]

Save result to JSON file.

Parameters:

path (Path | str) – Output path

Returns:

Path to saved file

Return type:

Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:

path (Path | str) – Path to JSON file

Returns:

Loaded result

Return type:

Self

get_condition(label)[source]

Get a condition by label.

Parameters:

label (str) – Condition label

Returns:

The matching condition summary

Return type:

TConditionSummary

Raises:

KeyError – If condition not found

get_comparison(label)[source]

Get a pairwise comparison by condition pair.

Parameters:

label (str | tuple[str, str]) –

Comparison key.

  • (condition_a, condition_b) performs an exact pair lookup

  • condition_b performs legacy lookup by treatment label only

Returns:

The comparison, or None if not found

Return type:

TPairwiseResult | None

Notes

Legacy lookup by condition_b can be ambiguous for all-vs-all comparisons. Prefer tuple lookup for unambiguous retrieval.

class polyzymd.analyses.base.Analysis[source]

Bases: ABC

Base class for all PolyzyMD analyses.

Subclasses represent a complete analysis lifecycle: per-replicate computation, aggregation across replicates, cross-condition comparison, plotting, and CLI formatting.

Class Variables

namestr

Unique identifier used in config files and CLI (e.g. "rmsf").

Settingstype[BaseModel]

Pydantic model for this analysis’s settings.

PlotSettingsModeltype[BasePlotSettings] | None

Optional per-analysis plot settings model. When set, the comparison configuration loader parses plot_settings.<name> using this model and provides default-constructed values on attribute access when omitted in YAML. Defaults to None.

AggregatedResultClasstype[BaseModel] | None

Optional Pydantic model class for aggregated results. When set, the default _deserialize_result() uses this class’s .load(path) method (if available) or .model_validate_json() to load aggregated results from disk. When None (the default), aggregated results are loaded as plain dicts via json.loads().

Setting this class variable replaces the need to override _deserialize_result() in most cases.

Example:

from polyzymd.analyses.rmsf._results import RMSFAggregatedResult

class RMSFAnalysis(Analysis):
    name = "rmsf"
    AggregatedResultClass = RMSFAggregatedResult
    ...
aliasestuple[str, …]

Alternative CLI names (e.g. ("triad",) for catalytic_triad).

dependenciestuple[str, …]

Names of analyses that must run before this one (topological sort).

min_replicatesint

Minimum successful replicates required for aggregation.

has_compute_stagebool

Whether the framework should run compute_replicate().

has_aggregate_stagebool

Whether the framework should run aggregate().

slurm_resource_hintSlurmResourceHint | None

Optional per-plugin SLURM resource defaults for HPC submission.

settings_path_fieldstuple[str, …]

Settings field names that contain filesystem paths to resolve relative to comparison.yaml.

Examples

Simple plugin using the default comparison pipeline (t-tests, ANOVA, ranking). Implement extract_metrics() — the framework deserializes aggregated results automatically via json.loads():

from polyzymd.analyses.base import (
    AggregateContext, Analysis, MetricValue, ReplicateContext,
)
from pydantic import BaseModel

class RgAnalysis(Analysis):
    name = "rg"

    class Settings(BaseModel):
        selection: str = "protein and name CA"

    def compute_replicate(self, ctx, replicate):
        import MDAnalysis as mda
        import numpy as np
        # Use ctx.sim_config, ctx.settings — never load configs yourself
        ...
        return {"mean_rg": float(np.mean(rg_values)), "replicate": replicate}

    def aggregate(self, ctx, results):
        import numpy as np
        values = [r["mean_rg"] for r in results]
        return {"mean_rg": float(np.mean(values)),
                "sem_rg": float(np.std(values, ddof=1) / np.sqrt(len(values))),
                "replicate_values": values}

    def extract_metrics(self, summary):
        return {"mean_rg": MetricValue(
            name="mean_rg", mean=summary["mean_rg"],
            sem=summary["sem_rg"],
            replicate_values=summary["replicate_values"],
            higher_is_better=False,
            direction_labels=("compacting", "unchanged", "expanding"),
        )}

If your aggregated results use a typed Pydantic model, set AggregatedResultClass to have the framework deserialize into that model automatically instead of returning a plain dict:

class MyAnalysis(Analysis):
    name = "my_analysis"
    AggregatedResultClass = MyAggregatedResult  # your Pydantic model
    ...  # framework auto-deserializes via .load() or model_validate_json()

Custom compare plugin — override compare() entirely for multi-metric or entry-table analyses. See analyses/contacts/ or analyses/distances/ for full examples.

See also

analyses.stats

default_scalar_comparison(), format_scalar_comparison()

analyses.discovery

How the framework discovers plugins automatically.

analyses.orchestrator

How the framework runs the lifecycle.

contributor_guide

name: ClassVar[str]
Settings: ClassVar[type]
PlotSettingsModel: ClassVar[type[BasePlotSettings] | None] = None
AggregatedResultClass: ClassVar[type | None] = None
ReplicateResultClass: ClassVar[type | None] = None
execution_cost_hint: ClassVar[str] = 'medium'
aliases: ClassVar[tuple[str, ...]] = ()
dependencies: ClassVar[tuple[str, ...]] = ()
min_replicates: ClassVar[int] = 2
has_compute_stage: ClassVar[bool] = True
has_aggregate_stage: ClassVar[bool] = True
slurm_resource_hint: ClassVar[SlurmResourceHint | None] = None
settings_path_fields: ClassVar[tuple[str, ...]] = ()
compute_replicate(ctx, replicate)[source]

Compute analysis for a single replicate.

Parameters:
  • ctx (ReplicateContext) – Framework-provided context (paths, config, settings).

  • replicate (int) – 1-indexed replicate number.

Returns:

Per-replicate results. Can be a plain dict (simplest) or a Pydantic BaseModel. The framework serializes both via save_result() — dicts are written as JSON, models use model_dump_json().

Return type:

dict or BaseModel

Notes

The orchestrator has a fallback that saves the return value to ctx.result_path only if the file doesn’t already exist. Existing plugins save explicitly for custom per-replicate caching (e.g. rmsf_eq10ns.json). Simple plugins can skip manual saves and rely on the fallback.

aggregate(ctx, results)[source]

Aggregate results across replicates for one condition.

Parameters:
  • ctx (AggregateContext) – Framework-provided context (paths, replicates, settings).

  • results (Sequence[dict | BaseModel]) – Per-replicate results from compute_replicate(). Guaranteed to have at least min_replicates entries.

Returns:

Aggregated result, or None if aggregation is not meaningful for this analysis. Can be a plain dict or a Pydantic BaseModel.

Return type:

dict or BaseModel or None

Notes

The orchestrator has a fallback that saves the return value to ctx.result_path only if the file doesn’t already exist. Existing plugins save to ctx.result_path explicitly in aggregate() (see rmsf.py, contacts.py). Simple plugins can skip manual saves and rely on the fallback.

filter_conditions(conditions, settings=None)[source]

Filter conditions before comparison.

Override to exclude conditions where this analysis is not applicable (e.g. exclude no-polymer conditions for contacts).

The default implementation keeps all conditions.

Parameters:
  • conditions (list[Condition]) – All conditions from the comparison config.

  • settings (BaseModel or None) – Resolved plugin settings from the comparison config. The orchestrator passes the fully-resolved Settings instance so overrides can use user-customized values (e.g. polymer selection strings) instead of class-level defaults.

Returns:

Conditions to include in analysis.

Return type:

list[Condition]

compare(ctx)[source]

Compare results across conditions.

The default implementation uses extract_metrics() to build a scalar comparison with t-tests, ANOVA, and rankings, returning a ComparisonResult.

Override this entirely for multi-metric, per-pair, or entry-table comparisons that return a custom Pydantic model (e.g. ContactsComparisonResult). The only contract is that the returned object must have a .save(path) method.

Parameters:

ctx (ComparisonContext) – Framework-provided context (conditions, paths, settings).

Returns:

Comparison result, or None if comparison is not supported.

Return type:

ComparisonResult or BaseModel or None

extract_metrics(summary)[source]

Extract scalar metrics from an aggregated result for comparison.

Only called by the default compare() implementation. If you override compare() entirely, you do not need to implement this.

The default compare() loads aggregated results via _load_aggregated_result(), which uses AggregatedResultClass (if set) or falls back to json.loads(). You do not need to implement _deserialize_result() unless you need custom loading logic.

Parameters:

summary (dict or BaseModel) – Aggregated result (from aggregate()).

Returns:

Mapping metric_name -> MetricValue. For single-metric analyses, return one entry. For dual-metric (e.g. contacts), return two entries.

Return type:

dict[str, MetricValue]

plot(ctx)[source]

Generate comparison figures.

Override to produce matplotlib/seaborn figures. The default implementation produces no plots.

Parameters:

ctx (PlotContext) – Framework-provided context (conditions, paths, settings).

Returns:

Paths to generated figure files.

Return type:

list[Path]

format(result, output_format='text')[source]

Format a comparison result for CLI display.

Override to provide analysis-specific formatted output. The default implementation returns JSON when possible and otherwise falls back to str(result).

Parameters:
  • result (ComparisonResult or BaseModel) – The comparison result to format.

  • output_format (str) – Output format: "text", "json", or "markdown".

Returns:

Formatted string ready for CLI display.

Return type:

str

static replicate_result_path(output_dir)[source]

Return the canonical per-replicate cache path.

static aggregate_result_path(output_dir)[source]

Return the canonical aggregated cache path.

comparison_result_path(results_dir)[source]

Return the canonical comparison cache path.

figures_output_dir(figures_root)[source]

Return the analysis-specific figure directory.

save_result(result, path)[source]

Save a result object to disk using a common contract.

resolve_output_dir(analysis_root, condition_label)[source]

Build the analysis output directory for a condition.

Parameters:
  • analysis_root (Path) – Root analysis directory (e.g. comparison.yaml parent / analysis).

  • condition_label (str) – Condition label (will be sanitised for filesystem).

Returns:

<analysis_root>/<sanitized_label>/<analysis_name>

Return type:

Path

classmethod __init_subclass__(**kwargs)[source]

Validate that subclasses set required class variables.