Analyses Plugin System API

This page documents the polyzymd.analyses package — the plugin system for adding new analysis types to PolyzyMD.

Public API

Discovery

list_analyses() — return dict of {name: class} for all discovered plugins
list_all_names() — return list of all names including aliases
get_analysis(name) — get a plugin class by name or alias
clear_cache() — reset the discovery cache

Orchestration

run_analysis(analysis: Analysis, condition: Condition, settings: Any, equilibration: str = "0ns", output_dir: Path | None = None, recompute: bool = False) — run compute + aggregate for one condition
run_comparison(analysis: Analysis, config: ComparisonConfig, recompute: bool = False, equilibration: str | None = None) — run full lifecycle (compute + aggregate + compare + plot)
run_all_comparisons(config: ComparisonConfig, analysis_names: list[str] | None = None, recompute: bool = False, equilibration: str | None = None) — run multiple analyses from one comparison config

ComparisonConfig is defined in polyzymd.config.comparison.

Base Class

Analysis — abstract base class all plugins inherit from

Context Objects

ReplicateContext — passed to compute_replicate()
AggregateContext — passed to aggregate()
ComparisonContext — passed to compare()
PlotContext — passed to plot()
Condition — represents one simulation condition

Result Models

ComparisonResult — universal comparison result (Pydantic model with .save()/.load())
ConditionSummary — per-condition statistics
PairwiseResult — pairwise t-test result
ANOVAResult — ANOVA result
MetricValue — scalar metric descriptor for default comparison
BaseComparisonResult — abstract base for custom plugin comparison results
BaseConditionSummary — abstract base for per-condition summaries

Available Plugins

Plugin	Module	Comparison Style
`rmsd`	`analyses.rmsd`	Custom (per-run)
`rg`	`analyses.rg`	Custom (per-run)
`rmsf`	`analyses.rmsf`	Default (scalar)
`catalytic_triad`	`analyses.catalytic_triad`	Default (scalar)
`secondary_structure`	`analyses.secondary_structure`	Default (scalar)
`sasa`	`analyses.sasa`	Custom (per-run)
`distances`	`analyses.distances`	Custom
`contacts`	`analyses.contacts`	Custom
`exposure`	`analyses.exposure`	Custom (experimental)
`hydrogen_bonds`	`analyses.hydrogen_bonds`	Default (scalar)
`binding_free_energy`	`analyses.binding_free_energy`	Custom (experimental)
`polymer_affinity`	`analyses.polymer_affinity`	Custom (experimental)
`polymer_bridging`	`analyses.polymer_bridging`	Custom (experimental)

Shared Utilities

The analyses/shared/ package provides reusable infrastructure used across plugins.

Convergence Diagnostics

The convergence module provides a sliding-window slope heuristic for detecting sustained convergence in timeseries data (e.g., RMSD traces).

Convergence diagnostics for sliding-window timeseries analysis.

This module implements a sliding-window convergence heuristic adapted from a collaborator notebook used for RMSD equilibration checks.

class polyzymd.analyses.shared.convergence.ConvergenceResult(converged, assessable, convergence_time_ns, window_start_times_ns, window_mean_values, slope_times_ns, slopes, window_size_ns, step_size_ns, slope_threshold, sustained_for_ns, message)[source]

Bases: object

Container for convergence diagnostics.

converged

Whether sustained convergence was detected.

Type:: bool

assessable

Whether convergence could be assessed from available data.

Type:: bool

convergence_time_ns

Start time of the first sustained converged period.

Type:: float | None

window_start_times_ns

Start times for each sliding window.

Type:: list[float]

window_mean_values

Mean signal value in each sliding window.

Type:: list[float]

slope_times_ns

Time points associated with slope estimates.

Type:: list[float]

slopes

Slopes between successive window means.

Type:: list[float]

window_size_ns

Sliding window width in ns.

Type:: float

step_size_ns

Sliding window stride in ns.

Type:: float

slope_threshold

Absolute slope cutoff used for convergence.

Type:: float

sustained_for_ns

Required sustained duration below slope threshold.

Type:: float

message

Human-readable status message.

Type:: str

converged: bool

assessable: bool

convergence_time_ns: float | None

window_start_times_ns: list[float]

window_mean_values: list[float]

slope_times_ns: list[float]

slopes: list[float]

window_size_ns: float

step_size_ns: float

slope_threshold: float

sustained_for_ns: float

message: str

__init__(converged, assessable, convergence_time_ns, window_start_times_ns, window_mean_values, slope_times_ns, slopes, window_size_ns, step_size_ns, slope_threshold, sustained_for_ns, message)

polyzymd.analyses.shared.convergence.find_convergence_time(time_ns, values, window_size_ns=15.0, step_size_ns=5.0, slope_threshold=0.0005, sustained_for_ns=15.0)[source]

Find sustained convergence time using a sliding-window slope heuristic.

Parameters:

time_ns (array_like) – Monotonically increasing time values in ns.
values (array_like) – Signal values sampled at time_ns.
window_size_ns (float, optional) – Width of each averaging window in ns.
step_size_ns (float, optional) – Sliding step between successive windows in ns.
slope_threshold (float, optional) – Absolute slope threshold for classifying a window-to-window change as converged.
sustained_for_ns (float, optional) – Required cumulative duration below threshold before declaring convergence.

Returns:

Full convergence diagnostics, including intermediate window means and slope traces.

Return type:

ConvergenceResult

Raises:

ValueError – Raised when inputs are invalid.

Multi-Run Comparison Orchestration

The multi_run_comparison module provides helpers for plugins that compare multiple named runs (e.g., per-chain RMSD, per-domain Rg) across conditions. Used by the RMSD, Rg, and SASA plugins.

Shared helpers for multi-run comparison orchestration.

These helpers keep run-wise comparison logic concise across plugins that compare multiple named runs (RMSD, Rg, SASA).

polyzymd.analyses.shared.multi_run_comparison.filter_summaries_with_run(summaries, run_label, get_run_fn, logger=None)[source]

Filter condition summaries to those containing a specific run.

Parameters:

summaries (dict[str, Any]) – Mapping from condition label to condition summary.
run_label (str) – Run label to keep.
get_run_fn (Callable[[Any, str], Any]) – Callback that returns run summary for (summary, run_label) and raises KeyError when the run is missing.
logger (logging.Logger | None, optional) – Optional logger for missing-run warnings.

Returns:

Subset of summaries with run data available.

Return type:

dict[str, Any]

polyzymd.analyses.shared.multi_run_comparison.build_condition_pairs(condition_labels, control_label, on_control_missing='all_pairs', logger=None)[source]

Build pairwise condition pairs for comparison.

Parameters:

condition_labels (list[str]) – Ordered condition labels to compare.
control_label (str | None) – Preferred control label for control-vs-treatment comparisons.
on_control_missing (str, optional) –
Behavior when control_label is requested but unavailable.

Supported values:
- "all_pairs": fall back to all-vs-all
- "skip": return no pairs
logger (logging.Logger | None, optional) – Optional logger for fallback/skip messages.

Returns:

Pair list as (condition_a, condition_b) tuples.

Return type:

list[tuple[str, str]]

Raises:

ValueError – Raised when on_control_missing is not "all_pairs" or "skip".

polyzymd.analyses.shared.multi_run_comparison.apply_fdr_correction(pairwise_results, anova_by_run=None, fdr_alpha=0.05, get_p_value=None, set_corrected=None)[source]

Apply Benjamini-Hochberg FDR correction across statistical result families.

Parameters:

pairwise_results (list[Any]) – Pairwise comparison result objects.
anova_by_run (dict[Any, Any] | list[Any] | None, optional) – ANOVA result objects, as either list-like or dict-like container.
fdr_alpha (float, optional) – FDR threshold.
get_p_value (Callable[[Any], float | None] | None, optional) – Callback extracting raw p-value from a result object. Defaults to reading .p_value.
set_corrected (Callable[[Any, Any], None] | None, optional) – Callback applying BH output to each result object. Defaults to setting .p_value_adjusted (when available) and .significant.

Multi-Run Formatting

The multi_run_formatting module provides text and markdown formatting helpers for multi-run analysis CLI output — ranked tables, pairwise lines, and ANOVA summaries.

Shared formatting helpers for multi-run analysis outputs.

polyzymd.analyses.shared.multi_run_formatting.make_section_title(title, width)[source]

Build a section title and separator lines.

polyzymd.analyses.shared.multi_run_formatting.make_ranked_table_header(*, mean_label)[source]

Build standard ranked-table headers for text output.

polyzymd.analyses.shared.multi_run_formatting.make_ranked_markdown_header(*, mean_label)[source]

Build standard ranked-table headers for markdown output.

polyzymd.analyses.shared.multi_run_formatting.format_pairwise_line(*, condition_a, condition_b, direction, p_value, effect_size, effect_label, percent_change, significant, prefix='Pairwise')[source]

Format one standard pairwise comparison line.

polyzymd.analyses.shared.multi_run_formatting.format_anova_line(*, f_statistic, p_value, significant)[source]

Format one standard ANOVA line.

polyzymd.analyses.shared.multi_run_formatting.format_markdown_bullet(prefix, line)[source]

Format a markdown bullet line with consistent prefixing.

polyzymd.analyses.shared.multi_run_formatting.make_ranked_rows(ranking, get_values)[source]

Build ranked rows as (label, mean, sem, rank) tuples.

Analyses Plugin System API

Public API

Discovery

Orchestration

Base Class

Context Objects

Result Models

Available Plugins

Shared Utilities

Convergence Diagnostics

Multi-Run Comparison Orchestration

Multi-Run Formatting

Related Documentation