How PolyzyMD analysis plugins work

PolyzyMD analysis plugins turn trajectory calculations into reusable ensemble evidence. A plugin is not only a loop over frames; it is a small participant in a larger workflow that must compare conditions, preserve provenance, reuse cached work, render figures, and report results consistently across replicate simulations.

That is why the current analysis architecture separates MDAnalysis-native trajectory work from PolyzyMD ensemble workflow ownership. Contributors write an Analysis subclass and describe the per-replicate trajectory jobs it needs. PolyzyMD then handles how those per-replicate outputs become condition, comparison, plotting, and formatting artifacts.

The central idea

The conceptual lifecycle is:

Analysis subclass
  -> build_mda_jobs()
  -> MDAAnalysisJob
  -> collector
  -> ReplicateArtifact
  -> ConditionArtifact
  -> ComparisonArtifact or custom comparison output
  -> plot() / format()

Each stage has a narrow responsibility. The plugin defines scientific work and how to interpret its results. The framework supplies consistent orchestration, storage, and cross-condition behavior.

Why plugins build MDAAnalysisJob objects

Direct loops over frames inside a plugin-specific replicate hook are easy to imagine, but they make every plugin responsible for concerns that are not specific to its science: frame selection, job identity, cache behavior, artifact shape, provenance, aggregation inputs, parallel execution, and safe plotting inputs.

PolyzyMD instead asks compute-stage plugins to build MDAAnalysisJob objects and collectors. This keeps the frame-level calculation close to MDAnalysis while letting PolyzyMD keep ownership of the ensemble workflow.

Owner

Responsibility

MDAnalysis

Loading trajectories, selecting atoms, iterating selected frames, and running trajectory-native calculations.

Plugin code

Choosing selections, defining settings, building MDAAnalysisJob objects, and collecting completed job output into meaningful plugin artifacts.

PolyzyMD

Cache identity, artifact storage, replicate-to-condition aggregation, cross-condition comparison, provenance, plotting lifecycle, and CLI formatting lifecycle.

This split makes plugins easier to review. A maintainer can ask whether the trajectory job computes the right quantity, whether the collector returns the right artifact, and whether later stages read artifacts instead of rerunning trajectory work.

Public API boundaries

Contributor-facing plugins should use stable public facades:

  • polyzymd.analyses.base for Analysis, lifecycle contexts, metrics, and comparison result models.

  • polyzymd.analyses.mda for MDAAnalysisJob, frame selection, artifacts, artifact stores, and MDAnalysis integration concepts.

  • Documented utilities from polyzymd.analyses.shared when a shared helper already fits the problem.

polyzymd.analyses._framework is private/internal. It exists behind the public facades and should not appear in contributor imports or examples except when a document explicitly labels it as internal implementation detail.

What artifacts mean

Artifacts are the durable boundary between lifecycle stages. They let PolyzyMD rerun, aggregate, compare, and plot without depending on transient Python objects created during a trajectory pass.

Artifact concept

Meaning

ReplicateArtifact

The validated result of one plugin for one replicate. It is produced by a collector after the MDAAnalysisJob finishes.

ConditionArtifact

The condition-level summary assembled from replicate artifacts and any referenced sidecars.

ComparisonArtifact

The cross-condition comparison output used by default comparison workflows, or the canonical artifact form of a plugin comparison result.

Custom comparison output

Some established plugins intentionally return custom comparison models. They should still follow the active output contract and remain saveable/loadable.

Collectors should not serialize raw MDAnalysis Results objects. Those objects are useful during compute, but they are not the storage contract for PolyzyMD analysis results. The collector translates compute output into a ReplicateArtifact with a stable payload and provenance.

Sidecars keep artifacts practical

Artifacts should contain compact, validated metadata and summary values. Large arrays, per-frame tables, event streams, or dense per-residue outputs belong in sidecar files referenced by the artifact.

This design keeps artifact JSON readable and versionable while preserving access to richer data for aggregation, comparison, and plots. A sidecar is not an arbitrary cache file that a later stage discovers by filename. It is part of the artifact contract: the artifact records what the sidecar is, where it is, and why it belongs to that result.

Plotting is artifact-only

plot() is a rendering stage, not a hidden compute stage. It should load cached artifacts and referenced sidecars, then render figures from those durable outputs. This artifact-only plotting rule prevents plots from silently changing scientific results, rerunning expensive trajectory work, or depending on data that was not part of the saved analysis result.

If a plot needs data that is too large for the artifact payload, the compute or aggregation stage should write a validated sidecar and register it on the artifact. The plot can then load that sidecar through the artifact record.

Simple and advanced plugin shapes

PolyzyMD supports small plugins and larger package-style plugins. The choice is about maintainability, not status.

Choose this shape

When it fits

Typical structure

Simple plugin module

The analysis has one clear metric, limited plotting, and straightforward settings.

One module with the Analysis subclass, settings, job construction, collection, metrics, and perhaps a small plot method.

Advanced plugin package

The analysis has multiple jobs, richer artifact payloads, several plot types, substantial sidecar handling, or custom comparison output.

A package with a public __init__.py for the Analysis subclass and private helper modules such as _mda.py, _plotters.py, _models.py, or _formatters.py.

The same lifecycle applies in both cases. A package layout only moves complexity into smaller files; it does not change the public import boundary for contributors or the artifact contract with PolyzyMD.

How to think about plugin responsibility

A healthy analysis plugin answers three questions clearly:

  1. What trajectory-native work is needed? That belongs in MDAnalysis jobs built by build_mda_jobs().

  2. What durable result should represent one replicate? That belongs in the collector that returns a ReplicateArtifact and registers sidecars when needed.

  3. What comparison or presentation should users see? That belongs in aggregation, comparison, plot(), and format() stages that consume saved artifacts rather than raw trajectory state.

Keeping these questions separate is what lets PolyzyMD add new analyses without changing core orchestration code.