Analyses Plugin System API

This reference page summarizes the public polyzymd.analyses API surface for plugin discovery, orchestration, statistics, the base facade, the public MDAnalysis layer, and built-in analysis plugin packages.

PolyzyMD analyses run trajectory-native work at the replicate level and lift the outputs into condition and comparison artifacts. The stable contributor import surfaces are polyzymd.analyses.base, polyzymd.analyses.mda, selected polyzymd.analyses.shared utilities, and the built-in plugin packages listed below.

Public package facade

The package root exposes discovery helpers and selected convenience imports. Use the narrower modules below for detailed autodoc reference; the package facade is summarized here to avoid duplicating class entries from the dedicated pages.

Discovery

Discovery is package-based and uses importable modules under polyzymd.analyses. A plugin can be a single module or a package with private helper modules for plotting, result models, or MDAnalysis job construction.

Function

Purpose

list_analyses()

Return discovered plugin classes keyed by canonical name.

list_all_names()

Return canonical plugin names.

get_analysis(name)

Resolve a plugin class by canonical name.

clear_cache()

Clear the discovery cache, primarily for tests and dynamic plugin development.

Automatic discovery of top-level Analysis plugin modules via pkgutil.

Scans src/polyzymd/analyses/ for plugin packages or modules, imports them, and collects all concrete Analysis subclasses. No bootstrap files, no package-level registry edits, no decorators needed.

How Discovery Works

  1. pkgutil.iter_modules() yields direct children of polyzymd.analyses.

  2. Each non-infrastructure top-level module or package is imported via importlib.import_module().

  3. All module-level names are inspected; concrete subclasses of Analysis are collected.

  4. Name collisions (two plugins with the same name) raise immediately.

Contributor Impact

To add a new analysis, create a package in src/polyzymd/analyses/<name>/ or a simple module at src/polyzymd/analyses/<name>.py, define a class inheriting from Analysis, and set name as a ClassVar[str].

polyzymd.analyses.discovery.clear_cache()[source]

Clear the discovery cache. Useful in tests.

polyzymd.analyses.discovery.get_analysis(name)[source]

Look up an Analysis class by canonical name.

Parameters:

name (str) – Canonical analysis name, for example "rmsf".

Returns:

The concrete Analysis subclass.

Return type:

type[Analysis]

Raises:

KeyError – If no analysis matches name.

polyzymd.analyses.discovery.list_analyses()[source]

Return all discovered analyses.

Returns:

Mapping canonical_name -> Analysis subclass, sorted by name.

Return type:

dict[str, type[Analysis]]

polyzymd.analyses.discovery.list_all_names()[source]

Return all canonical analysis names, sorted.

Returns:

All canonical names.

Return type:

list[str]

Orchestration

The orchestrator coordinates compute, aggregation, comparison, plotting, cache identity, and result persistence for one analysis or a set of analyses defined by ComparisonConfig.

Function

Scope

run_analysis()

Run compute and aggregation for one analysis on one condition.

run_comparison()

Run the full lifecycle for one analysis across all configured conditions.

run_all_comparisons()

Run selected or discovered analyses from one comparison configuration.

Orchestrator for running analyses through the plugin system.

The orchestrator owns the boring-but-critical plumbing:

  • Replicate iteration with error handling and minimum-replicate checks.

  • Dependency ordering via topological sort.

  • Condition filtering (delegates to each analysis’s filter_conditions).

  • Context construction — builds the right context objects and passes them to the analysis plugin.

The orchestrator does NOT own:

  • Science code (that lives in each Analysis subclass).

  • CLI wiring (that lives in cli/).

  • Configuration parsing (that lives in config/).

Usage

from polyzymd.analyses.orchestrator import run_analysis, run_comparison

# Run a single analysis for one condition
run_analysis("rmsf", condition, settings, equilibration="10ns")

# Run full comparison pipeline
run_comparison("rmsf", comparison_config)
polyzymd.analyses.orchestrator.run_replicate_once(analysis, condition, settings, equilibration, output_dir, replicate, recompute, backend_policy=None)[source]

Run a single replicate compute stage and save canonical output.

Parameters:
  • analysis (Analysis) – Analysis plugin instance.

  • condition (Condition) – Condition being analyzed.

  • settings (BaseModel) – Resolved analysis settings.

  • equilibration (str) – Equilibration time string.

  • output_dir (Path) – Replicate run directory (for example run_1).

  • replicate (int) – Replicate number.

  • recompute (bool) – Whether to force recomputation.

  • backend_policy (MDABackendPolicy or None, optional) – MDAnalysis internal backend policy for MDA job-backed analyses.

Returns:

Replicate result returned by the plugin.

Return type:

Any

polyzymd.analyses.orchestrator.aggregate_condition_from_disk(analysis, condition, settings, equilibration, output_dir, replicates, recompute=False)[source]

Aggregate one condition by loading replicate results from disk.

Parameters:
  • analysis (Analysis) – Analysis plugin instance.

  • condition (Condition) – Condition being aggregated.

  • settings (BaseModel) – Resolved analysis settings.

  • equilibration (str) – Equilibration time string.

  • output_dir (Path) – Analysis output directory for the condition.

  • replicates (Sequence[int]) – Replicate numbers to load.

  • recompute (bool, optional) – Force regeneration of aggregate outputs.

Returns:

Aggregated result returned by analysis.aggregate().

Return type:

Any

Raises:

ValueError – If fewer than analysis.min_replicates replicate results are available.

polyzymd.analyses.orchestrator.run_analysis(analysis, condition, settings, equilibration='0ns', output_dir=None, recompute=False, backend_policy=None)[source]

Run a single analysis for one condition (compute + aggregate).

Parameters:
  • analysis (Analysis) – The analysis plugin instance.

  • condition (Condition) – The condition to analyse.

  • settings (BaseModel) – Analysis-specific settings.

  • equilibration (str) – Equilibration time string (e.g. "10ns").

  • output_dir (Path | None) – Output directory. If None, auto-resolved from condition config.

  • recompute (bool) – Force recomputation of cached results.

  • backend_policy (MDABackendPolicy or None, optional) – MDAnalysis internal backend policy for MDA job-backed analyses.

Returns:

Aggregated result.

Return type:

BaseModel

polyzymd.analyses.orchestrator.prepare_comparison_run(analysis, config, equilibration)[source]

Resolve shared comparison state before compute/aggregate/compare.

Parameters:
  • analysis (Analysis) – Analysis plugin instance.

  • config (ComparisonConfig) – Comparison configuration.

  • equilibration (str | None) – Optional equilibration override.

Returns:

Prepared comparison state including filtered conditions.

Return type:

dict[str, Any]

polyzymd.analyses.orchestrator.finalize_comparison_from_disk(analysis, config, analysis_dirs, aggregated_results, results_dir, figures_dir, settings, effective_control, prepared_state=None, allow_partial=False, recompute=False)[source]

Run compare and plot using already-aggregated condition results.

Parameters:
  • analysis (Analysis) – Analysis plugin instance.

  • config (ComparisonConfig) – Comparison configuration.

  • analysis_dirs (dict[str, Path]) – Mapping of condition label to analysis directory.

  • aggregated_results (dict[str, Any]) – Mapping of condition label to aggregated result objects.

  • results_dir (Path) – Directory for comparison result JSON.

  • figures_dir (Path) – Output directory for generated figures.

  • settings (BaseModel) – Resolved analysis settings.

  • effective_control (str | None) – Control condition label if available in successful conditions.

  • allow_partial (bool, optional) – If True, proceed with dropped conditions. If False, fail when any configured condition lacks aggregated results.

  • recompute (bool, optional) – Force regeneration of comparison and plot outputs.

Returns:

Dictionary with comparison, comparison_path, and plots.

Return type:

dict[str, Any]

polyzymd.analyses.orchestrator.order_analyses_for_execution(analysis_names, satisfied=None)[source]

Return analysis names ordered by dependency constraints.

Parameters:
  • analysis_names (Sequence[str]) – Canonical analysis names to order.

  • satisfied (set[str] | None, optional) – Dependency names that are already satisfied outside this run list, such as excluded analyses with completed results on disk.

Returns:

Canonical analysis names in dependency-safe execution order.

Return type:

list[str]

Raises:
  • KeyError – If an analysis name cannot be resolved.

  • DependencyError – If declared dependencies are missing from the requested set.

  • ValueError – If a dependency cycle is detected.

polyzymd.analyses.orchestrator.run_comparison(analysis, config, recompute=False, equilibration=None)[source]

Run the full comparison pipeline for one analysis type.

Steps:

  1. Build Condition objects from ComparisonConfig.

  2. Filter conditions via analysis.filter_conditions().

  3. For each condition: compute replicates + aggregate.

  4. Run analysis.compare().

  5. Run analysis.plot().

Parameters:
  • analysis (Analysis) – The analysis plugin instance.

  • config (ComparisonConfig) – Comparison configuration.

  • recompute (bool) – Force recomputation.

  • equilibration (str or None) – Override equilibration time. If None, uses config.defaults.equilibration_time.

Returns:

Dictionary with "aggregated", "comparison", "plots" keys.

Return type:

dict[str, Any]

polyzymd.analyses.orchestrator.run_all_comparisons(config, analysis_names=None, recompute=False, equilibration=None)[source]

Run comparisons for multiple (or all enabled) analysis types.

Analyses are run in dependency order.

Parameters:
  • config (ComparisonConfig) – Comparison configuration.

  • analysis_names (list[str] | None) – Analysis names to run. None = run all enabled in config.

  • recompute (bool) – Force recomputation.

  • equilibration (str or None) – Override equilibration time. If None, uses config.defaults.equilibration_time.

Returns:

Mapping analysis_name -> run_comparison() result.

Return type:

dict[str, dict[str, Any]]

polyzymd.analyses.orchestrator.run_plot_only(analysis, config, equilibration=None)[source]

Run only the plot step for a single analysis type.

Uses the same path resolution and context construction as run_comparison() but skips compute, aggregate, and compare. Aggregated results and comparison results must already exist on disk.

Parameters:
  • analysis (Analysis) – The analysis plugin instance.

  • config (ComparisonConfig) – Comparison configuration.

  • equilibration (str | None) – Override equilibration time.

Returns:

A tuple of (generated_paths, failures) where failures is a list of (analysis_name, error_message) tuples.

Return type:

tuple[list[Path], list[tuple[str, str]]]

polyzymd.analyses.orchestrator.run_all_plots(config, analysis_names=None, equilibration=None)[source]

Run plot-only for all (or selected) enabled analyses.

Parameters:
  • config (ComparisonConfig) – Comparison configuration.

  • analysis_names (list[str] | None) – Analyses to plot. None means all enabled analyses.

  • equilibration (str | None) – Override equilibration time. If None, uses config.defaults.equilibration_time.

Returns:

A tuple of (generated_paths, failures) where failures is a list of (analysis_name, error_message) tuples.

Return type:

tuple[list[Path], list[tuple[str, str]]]

Public base facade

polyzymd.analyses.base is the stable public facade for contributor imports. It re-exports the Analysis base class, lifecycle context objects, scalar metric descriptors, and comparison result models from implementation modules.

For the complete class and context reference, see Analysis Base Classes.

At a high level, compute-stage plugins implement build_mda_jobs() and build_mda_collector(). Collectors produce ReplicateArtifact objects; aggregation combines those into ConditionArtifact objects; comparison produces ComparisonArtifact outputs or an active custom comparison contract for plugins that still need specialized comparison models.

The detailed autodoc for this facade lives on Analysis Base Classes.

Public MDAnalysis layer

polyzymd.analyses.mda is the public MDAnalysis extension layer for jobs, frame selection, collectors, artifact envelopes, artifact storage, default aggregation, and artifact-based comparison. The primary contributor surface is documented in MDAnalysis Extension-Layer API.

The detailed autodoc for this layer lives on MDAnalysis Extension-Layer API.

Statistics helpers

polyzymd.analyses.stats contains reusable scalar comparison helpers used by the default Analysis.compare() path and by plugin-specific comparison code.

Key public helpers include default_scalar_comparison() and format_scalar_comparison().

Shared utilities

Reusable plugin utilities live in polyzymd.analyses.shared. They are documented separately on Analysis Shared Utilities; this overview intentionally omits detailed shared-utility autodoc blocks.

Built-in plugin packages

Plugin name

Public package

Primary output contract

Comparison style

contacts

polyzymd.analyses.contacts

Contact-event artifacts and sidecars

Custom comparison

distances

polyzymd.analyses.distances

Pair-distance artifacts

Custom comparison

hydrogen_bonds

polyzymd.analyses.hydrogen_bonds

Hydrogen-bond event artifacts

Custom comparison

rmsd

polyzymd.analyses.rmsd

Per-run RMSD artifacts

Custom multi-run comparison

rg

polyzymd.analyses.rg

Per-run radius-of-gyration artifacts

Custom multi-run comparison

rmsf

polyzymd.analyses.rmsf

Per-residue profile artifacts

Default scalar comparison

sasa

polyzymd.analyses.sasa

SASA artifacts and sidecars

Custom multi-run comparison

catalytic_triad

polyzymd.analyses.catalytic_triad

Pair-distance-derived artifacts

Default scalar comparison

secondary_structure

polyzymd.analyses.secondary_structure

Secondary-structure matrix artifacts

Default scalar comparison

Built-in plugin packages expose their public Analysis subclass and supported settings/result contracts from the package root. Helper modules with leading underscores inside those packages are implementation details unless a page explicitly labels them as internal developer reference.

Private framework internals

polyzymd.analyses._framework is private/internal infrastructure for lifecycle contexts, artifact I/O, comparison models, and plugin contract enforcement. It is not a contributor import surface; contributor plugins should import public symbols from polyzymd.analyses.base and polyzymd.analyses.mda.