How PolyzyMD analysis plugins work
PolyzyMD analysis plugins turn trajectory calculations into reusable ensemble evidence. A plugin is not only a loop over frames; it is a small participant in a larger workflow that must compare conditions, preserve provenance, reuse cached work, render figures, and report results consistently across replicate simulations.
That is why the current analysis architecture separates MDAnalysis-native
trajectory work from PolyzyMD ensemble workflow ownership. Contributors
write an Analysis subclass and describe the per-replicate trajectory jobs it
needs. PolyzyMD then handles how those per-replicate outputs become condition,
comparison, plotting, and formatting artifacts.
The central idea
The conceptual lifecycle is:
Analysis subclass
-> build_mda_jobs()
-> MDAAnalysisJob
-> collector
-> ReplicateArtifact
-> ConditionArtifact
-> ComparisonArtifact or custom comparison output
-> plot() / format()
Each stage has a narrow responsibility. The plugin defines scientific work and how to interpret its results. The framework supplies consistent orchestration, storage, and cross-condition behavior.
Why plugins build MDAAnalysisJob objects
Direct loops over frames inside a plugin-specific replicate hook are easy to imagine, but they make every plugin responsible for concerns that are not specific to its science: frame selection, job identity, cache behavior, artifact shape, provenance, aggregation inputs, parallel execution, and safe plotting inputs.
PolyzyMD instead asks compute-stage plugins to build MDAAnalysisJob objects
and collectors. This keeps the frame-level calculation close to MDAnalysis while
letting PolyzyMD keep ownership of the ensemble workflow.
Owner |
Responsibility |
|---|---|
MDAnalysis |
Loading trajectories, selecting atoms, iterating selected frames, and running trajectory-native calculations. |
Plugin code |
Choosing selections, defining settings, building |
PolyzyMD |
Cache identity, artifact storage, replicate-to-condition aggregation, cross-condition comparison, provenance, plotting lifecycle, and CLI formatting lifecycle. |
This split makes plugins easier to review. A maintainer can ask whether the trajectory job computes the right quantity, whether the collector returns the right artifact, and whether later stages read artifacts instead of rerunning trajectory work.
Public API boundaries
Contributor-facing plugins should use stable public facades:
polyzymd.analyses.baseforAnalysis, lifecycle contexts, metrics, and comparison result models.polyzymd.analyses.mdaforMDAAnalysisJob, frame selection, artifacts, artifact stores, and MDAnalysis integration concepts.Documented utilities from
polyzymd.analyses.sharedwhen a shared helper already fits the problem.
polyzymd.analyses._framework is private/internal. It exists behind the public
facades and should not appear in contributor imports or examples except when a
document explicitly labels it as internal implementation detail.
What artifacts mean
Artifacts are the durable boundary between lifecycle stages. They let PolyzyMD rerun, aggregate, compare, and plot without depending on transient Python objects created during a trajectory pass.
Artifact concept |
Meaning |
|---|---|
|
The validated result of one plugin for one replicate. It is produced by a collector after the |
|
The condition-level summary assembled from replicate artifacts and any referenced sidecars. |
|
The cross-condition comparison output used by default comparison workflows, or the canonical artifact form of a plugin comparison result. |
Custom comparison output |
Some established plugins intentionally return custom comparison models. They should still follow the active output contract and remain saveable/loadable. |
Collectors should not serialize raw MDAnalysis Results objects. Those objects are
useful during compute, but they are not the storage contract for PolyzyMD
analysis results. The collector translates compute output into a
ReplicateArtifact with a stable payload and provenance.
Sidecars keep artifacts practical
Artifacts should contain compact, validated metadata and summary values. Large arrays, per-frame tables, event streams, or dense per-residue outputs belong in sidecar files referenced by the artifact.
This design keeps artifact JSON readable and versionable while preserving access to richer data for aggregation, comparison, and plots. A sidecar is not an arbitrary cache file that a later stage discovers by filename. It is part of the artifact contract: the artifact records what the sidecar is, where it is, and why it belongs to that result.
Plotting is artifact-only
plot() is a rendering stage, not a hidden compute stage. It should load cached
artifacts and referenced sidecars, then render figures from those durable
outputs. This artifact-only plotting rule prevents plots from silently changing
scientific results, rerunning expensive trajectory work, or depending on data
that was not part of the saved analysis result.
If a plot needs data that is too large for the artifact payload, the compute or aggregation stage should write a validated sidecar and register it on the artifact. The plot can then load that sidecar through the artifact record.
Simple and advanced plugin shapes
PolyzyMD supports small plugins and larger package-style plugins. The choice is about maintainability, not status.
Choose this shape |
When it fits |
Typical structure |
|---|---|---|
Simple plugin module |
The analysis has one clear metric, limited plotting, and straightforward settings. |
One module with the |
Advanced plugin package |
The analysis has multiple jobs, richer artifact payloads, several plot types, substantial sidecar handling, or custom comparison output. |
A package with a public |
The same lifecycle applies in both cases. A package layout only moves complexity into smaller files; it does not change the public import boundary for contributors or the artifact contract with PolyzyMD.
How to think about plugin responsibility
A healthy analysis plugin answers three questions clearly:
What trajectory-native work is needed? That belongs in MDAnalysis jobs built by
build_mda_jobs().What durable result should represent one replicate? That belongs in the collector that returns a
ReplicateArtifactand registers sidecars when needed.What comparison or presentation should users see? That belongs in aggregation, comparison,
plot(), andformat()stages that consume saved artifacts rather than raw trajectory state.
Keeping these questions separate is what lets PolyzyMD add new analyses without changing core orchestration code.
What to read next
Extend PolyzyMD with MDAnalysis-native analyses for the current full implementation guide.
Analyses Plugin System API for the analysis plugin API overview.
Analysis Base Classes for the public
polyzymd.analyses.basefacade.MDAnalysis Extension-Layer API for the public MDAnalysis integration API.
Analysis Shared Utilities for documented shared analysis utilities.