Analyses Plugin System API

This page documents the polyzymd.analyses package — the plugin system for adding new analysis types to PolyzyMD.

Public API

Discovery

  • list_analyses() — return dict of {name: class} for all discovered plugins

  • list_all_names() — return list of all names including aliases

  • get_analysis(name) — get a plugin class by name or alias

  • clear_cache() — reset the discovery cache

Orchestration

  • run_analysis(analysis: Analysis, condition: Condition, settings: Any, equilibration: str = "0ns", output_dir: Path | None = None, recompute: bool = False) — run compute + aggregate for one condition

  • run_comparison(analysis: Analysis, config: ComparisonConfig, recompute: bool = False, equilibration: str | None = None) — run full lifecycle (compute + aggregate + compare + plot)

  • run_all_comparisons(config: ComparisonConfig, analysis_names: list[str] | None = None, recompute: bool = False, equilibration: str | None = None) — run multiple analyses from one comparison config

ComparisonConfig is defined in polyzymd.config.comparison.

Base Class

  • Analysis — abstract base class all plugins inherit from

Context Objects

  • ReplicateContext — passed to compute_replicate()

  • AggregateContext — passed to aggregate()

  • ComparisonContext — passed to compare()

  • PlotContext — passed to plot()

  • Condition — represents one simulation condition

Result Models

  • ComparisonResult — universal comparison result (Pydantic model with .save()/.load())

  • ConditionSummary — per-condition statistics

  • PairwiseResult — pairwise t-test result

  • ANOVAResult — ANOVA result

  • MetricValue — scalar metric descriptor for default comparison

  • BaseComparisonResult — abstract base for custom plugin comparison results

  • BaseConditionSummary — abstract base for per-condition summaries

Available Plugins

Plugin

Module

Comparison Style

rmsd

analyses.rmsd

Custom (per-run)

rg

analyses.rg

Custom (per-run)

rmsf

analyses.rmsf

Default (scalar)

catalytic_triad

analyses.catalytic_triad

Default (scalar)

secondary_structure

analyses.secondary_structure

Default (scalar)

sasa

analyses.sasa

Custom (per-run)

distances

analyses.distances

Custom

contacts

analyses.contacts

Custom

exposure

analyses.exposure

Custom (experimental)

hydrogen_bonds

analyses.hydrogen_bonds

Default (scalar)

binding_free_energy

analyses.binding_free_energy

Custom (experimental)

polymer_affinity

analyses.polymer_affinity

Custom (experimental)

polymer_bridging

analyses.polymer_bridging

Custom (experimental)

Shared Utilities

The analyses/shared/ package provides reusable infrastructure used across plugins.

Convergence Diagnostics

The convergence module provides a sliding-window slope heuristic for detecting sustained convergence in timeseries data (e.g., RMSD traces).

Convergence diagnostics for sliding-window timeseries analysis.

This module implements a sliding-window convergence heuristic adapted from a collaborator notebook used for RMSD equilibration checks.

class polyzymd.analyses.shared.convergence.ConvergenceResult(converged, assessable, convergence_time_ns, window_start_times_ns, window_mean_values, slope_times_ns, slopes, window_size_ns, step_size_ns, slope_threshold, sustained_for_ns, message)[source]

Bases: object

Container for convergence diagnostics.

converged

Whether sustained convergence was detected.

Type:

bool

assessable

Whether convergence could be assessed from available data.

Type:

bool

convergence_time_ns

Start time of the first sustained converged period.

Type:

float | None

window_start_times_ns

Start times for each sliding window.

Type:

list[float]

window_mean_values

Mean signal value in each sliding window.

Type:

list[float]

slope_times_ns

Time points associated with slope estimates.

Type:

list[float]

slopes

Slopes between successive window means.

Type:

list[float]

window_size_ns

Sliding window width in ns.

Type:

float

step_size_ns

Sliding window stride in ns.

Type:

float

slope_threshold

Absolute slope cutoff used for convergence.

Type:

float

sustained_for_ns

Required sustained duration below slope threshold.

Type:

float

message

Human-readable status message.

Type:

str

converged: bool
assessable: bool
convergence_time_ns: float | None
window_start_times_ns: list[float]
window_mean_values: list[float]
slope_times_ns: list[float]
slopes: list[float]
window_size_ns: float
step_size_ns: float
slope_threshold: float
sustained_for_ns: float
message: str
__init__(converged, assessable, convergence_time_ns, window_start_times_ns, window_mean_values, slope_times_ns, slopes, window_size_ns, step_size_ns, slope_threshold, sustained_for_ns, message)
polyzymd.analyses.shared.convergence.find_convergence_time(time_ns, values, window_size_ns=15.0, step_size_ns=5.0, slope_threshold=0.0005, sustained_for_ns=15.0)[source]

Find sustained convergence time using a sliding-window slope heuristic.

Parameters:
  • time_ns (array_like) – Monotonically increasing time values in ns.

  • values (array_like) – Signal values sampled at time_ns.

  • window_size_ns (float, optional) – Width of each averaging window in ns.

  • step_size_ns (float, optional) – Sliding step between successive windows in ns.

  • slope_threshold (float, optional) – Absolute slope threshold for classifying a window-to-window change as converged.

  • sustained_for_ns (float, optional) – Required cumulative duration below threshold before declaring convergence.

Returns:

Full convergence diagnostics, including intermediate window means and slope traces.

Return type:

ConvergenceResult

Raises:

ValueError – Raised when inputs are invalid.

Multi-Run Comparison Orchestration

The multi_run_comparison module provides helpers for plugins that compare multiple named runs (e.g., per-chain RMSD, per-domain Rg) across conditions. Used by the RMSD, Rg, and SASA plugins.

Shared helpers for multi-run comparison orchestration.

These helpers keep run-wise comparison logic concise across plugins that compare multiple named runs (RMSD, Rg, SASA).

polyzymd.analyses.shared.multi_run_comparison.filter_summaries_with_run(summaries, run_label, get_run_fn, logger=None)[source]

Filter condition summaries to those containing a specific run.

Parameters:
  • summaries (dict[str, Any]) – Mapping from condition label to condition summary.

  • run_label (str) – Run label to keep.

  • get_run_fn (Callable[[Any, str], Any]) – Callback that returns run summary for (summary, run_label) and raises KeyError when the run is missing.

  • logger (logging.Logger | None, optional) – Optional logger for missing-run warnings.

Returns:

Subset of summaries with run data available.

Return type:

dict[str, Any]

polyzymd.analyses.shared.multi_run_comparison.build_condition_pairs(condition_labels, control_label, on_control_missing='all_pairs', logger=None)[source]

Build pairwise condition pairs for comparison.

Parameters:
  • condition_labels (list[str]) – Ordered condition labels to compare.

  • control_label (str | None) – Preferred control label for control-vs-treatment comparisons.

  • on_control_missing (str, optional) –

    Behavior when control_label is requested but unavailable.

    Supported values:

    • "all_pairs": fall back to all-vs-all

    • "skip": return no pairs

  • logger (logging.Logger | None, optional) – Optional logger for fallback/skip messages.

Returns:

Pair list as (condition_a, condition_b) tuples.

Return type:

list[tuple[str, str]]

Raises:

ValueError – Raised when on_control_missing is not "all_pairs" or "skip".

polyzymd.analyses.shared.multi_run_comparison.apply_fdr_correction(pairwise_results, anova_by_run=None, fdr_alpha=0.05, get_p_value=None, set_corrected=None)[source]

Apply Benjamini-Hochberg FDR correction across statistical result families.

Parameters:
  • pairwise_results (list[Any]) – Pairwise comparison result objects.

  • anova_by_run (dict[Any, Any] | list[Any] | None, optional) – ANOVA result objects, as either list-like or dict-like container.

  • fdr_alpha (float, optional) – FDR threshold.

  • get_p_value (Callable[[Any], float | None] | None, optional) – Callback extracting raw p-value from a result object. Defaults to reading .p_value.

  • set_corrected (Callable[[Any, Any], None] | None, optional) – Callback applying BH output to each result object. Defaults to setting .p_value_adjusted (when available) and .significant.

Multi-Run Formatting

The multi_run_formatting module provides text and markdown formatting helpers for multi-run analysis CLI output — ranked tables, pairwise lines, and ANOVA summaries.

Shared formatting helpers for multi-run analysis outputs.

polyzymd.analyses.shared.multi_run_formatting.make_section_title(title, width)[source]

Build a section title and separator lines.

polyzymd.analyses.shared.multi_run_formatting.make_ranked_table_header(*, mean_label)[source]

Build standard ranked-table headers for text output.

polyzymd.analyses.shared.multi_run_formatting.make_ranked_markdown_header(*, mean_label)[source]

Build standard ranked-table headers for markdown output.

polyzymd.analyses.shared.multi_run_formatting.format_pairwise_line(*, condition_a, condition_b, direction, p_value, effect_size, effect_label, percent_change, significant, prefix='Pairwise')[source]

Format one standard pairwise comparison line.

polyzymd.analyses.shared.multi_run_formatting.format_anova_line(*, f_statistic, p_value, significant)[source]

Format one standard ANOVA line.

polyzymd.analyses.shared.multi_run_formatting.format_markdown_bullet(prefix, line)[source]

Format a markdown bullet line with consistent prefixing.

polyzymd.analyses.shared.multi_run_formatting.make_ranked_rows(ranking, get_values)[source]

Build ranked rows as (label, mean, sem, rank) tuples.