# How PolyzyMD analysis plugins work PolyzyMD analysis plugins turn trajectory calculations into reusable ensemble evidence. A plugin is not only a loop over frames; it is a small participant in a larger workflow that must compare conditions, preserve provenance, reuse cached work, render figures, and report results consistently across replicate simulations. That is why the current analysis architecture separates **MDAnalysis-native trajectory work** from **PolyzyMD ensemble workflow ownership**. Contributors write an `Analysis` subclass and describe the per-replicate trajectory jobs it needs. PolyzyMD then handles how those per-replicate outputs become condition, comparison, plotting, and formatting artifacts. ## The central idea The conceptual lifecycle is: ```text Analysis subclass -> build_mda_jobs() -> MDAAnalysisJob -> collector -> ReplicateArtifact -> ConditionArtifact -> ComparisonArtifact or custom comparison output -> plot() / format() ``` Each stage has a narrow responsibility. The plugin defines scientific work and how to interpret its results. The framework supplies consistent orchestration, storage, and cross-condition behavior. ## Why plugins build MDAAnalysisJob objects Direct loops over frames inside a plugin-specific replicate hook are easy to imagine, but they make every plugin responsible for concerns that are not specific to its science: frame selection, job identity, cache behavior, artifact shape, provenance, aggregation inputs, parallel execution, and safe plotting inputs. PolyzyMD instead asks compute-stage plugins to build `MDAAnalysisJob` objects and collectors. This keeps the frame-level calculation close to MDAnalysis while letting PolyzyMD keep ownership of the ensemble workflow. | Owner | Responsibility | | --- | --- | | **MDAnalysis** | Loading trajectories, selecting atoms, iterating selected frames, and running trajectory-native calculations. | | **Plugin code** | Choosing selections, defining settings, building `MDAAnalysisJob` objects, and collecting completed job output into meaningful plugin artifacts. | | **PolyzyMD** | Cache identity, artifact storage, replicate-to-condition aggregation, cross-condition comparison, provenance, plotting lifecycle, and CLI formatting lifecycle. | This split makes plugins easier to review. A maintainer can ask whether the trajectory job computes the right quantity, whether the collector returns the right artifact, and whether later stages read artifacts instead of rerunning trajectory work. ## Public API boundaries Contributor-facing plugins should use stable public facades: - `polyzymd.analyses.base` for `Analysis`, lifecycle contexts, metrics, and comparison result models. - `polyzymd.analyses.mda` for `MDAAnalysisJob`, frame selection, artifacts, artifact stores, and MDAnalysis integration concepts. - Documented utilities from `polyzymd.analyses.shared` when a shared helper already fits the problem. `polyzymd.analyses._framework` is private/internal. It exists behind the public facades and should not appear in contributor imports or examples except when a document explicitly labels it as internal implementation detail. ## What artifacts mean Artifacts are the durable boundary between lifecycle stages. They let PolyzyMD rerun, aggregate, compare, and plot without depending on transient Python objects created during a trajectory pass. | Artifact concept | Meaning | | --- | --- | | `ReplicateArtifact` | The validated result of one plugin for one replicate. It is produced by a collector after the `MDAAnalysisJob` finishes. | | `ConditionArtifact` | The condition-level summary assembled from replicate artifacts and any referenced sidecars. | | `ComparisonArtifact` | The cross-condition comparison output used by default comparison workflows, or the canonical artifact form of a plugin comparison result. | | Custom comparison output | Some established plugins intentionally return custom comparison models. They should still follow the active output contract and remain saveable/loadable. | Collectors should not serialize raw MDAnalysis Results objects. Those objects are useful during compute, but they are not the storage contract for PolyzyMD analysis results. The collector translates compute output into a `ReplicateArtifact` with a stable payload and provenance. ## Sidecars keep artifacts practical Artifacts should contain compact, validated metadata and summary values. Large arrays, per-frame tables, event streams, or dense per-residue outputs belong in sidecar files referenced by the artifact. This design keeps artifact JSON readable and versionable while preserving access to richer data for aggregation, comparison, and plots. A sidecar is not an arbitrary cache file that a later stage discovers by filename. It is part of the artifact contract: the artifact records what the sidecar is, where it is, and why it belongs to that result. ## Plotting is artifact-only `plot()` is a rendering stage, not a hidden compute stage. It should load cached artifacts and referenced sidecars, then render figures from those durable outputs. This artifact-only plotting rule prevents plots from silently changing scientific results, rerunning expensive trajectory work, or depending on data that was not part of the saved analysis result. If a plot needs data that is too large for the artifact payload, the compute or aggregation stage should write a validated sidecar and register it on the artifact. The plot can then load that sidecar through the artifact record. ## Simple and advanced plugin shapes PolyzyMD supports small plugins and larger package-style plugins. The choice is about maintainability, not status. | Choose this shape | When it fits | Typical structure | | --- | --- | --- | | **Simple plugin module** | The analysis has one clear metric, limited plotting, and straightforward settings. | One module with the `Analysis` subclass, settings, job construction, collection, metrics, and perhaps a small plot method. | | **Advanced plugin package** | The analysis has multiple jobs, richer artifact payloads, several plot types, substantial sidecar handling, or custom comparison output. | A package with a public `__init__.py` for the `Analysis` subclass and private helper modules such as `_mda.py`, `_plotters.py`, `_models.py`, or `_formatters.py`. | The same lifecycle applies in both cases. A package layout only moves complexity into smaller files; it does not change the public import boundary for contributors or the artifact contract with PolyzyMD. ## How to think about plugin responsibility A healthy analysis plugin answers three questions clearly: 1. **What trajectory-native work is needed?** That belongs in MDAnalysis jobs built by `build_mda_jobs()`. 2. **What durable result should represent one replicate?** That belongs in the collector that returns a `ReplicateArtifact` and registers sidecars when needed. 3. **What comparison or presentation should users see?** That belongs in aggregation, comparison, `plot()`, and `format()` stages that consume saved artifacts rather than raw trajectory state. Keeping these questions separate is what lets PolyzyMD add new analyses without changing core orchestration code. ## What to read next - {doc}`../extending_analyses` for the current full implementation guide. - {doc}`../../api/analyses` for the analysis plugin API overview. - {doc}`../../api/analyses_base` for the public `polyzymd.analyses.base` facade. - {doc}`../../api/analyses_mda` for the public MDAnalysis integration API. - {doc}`../../api/analyses_shared` for documented shared analysis utilities.