Convert a plugin into an advanced package
This how-to is for contributors whose analysis plugin has outgrown a single module. Use it when the plugin already works and the next task is to make its structure easier to maintain, test, and review.
An advanced package is still one plugin. The public entry point remains one
Analysis subclass discovered by PolyzyMD. The package split only moves helper
code into focused private modules inside that plugin package.
Decide whether to split
Do not split a plugin just because it might grow later. Keep a plugin as a single module when:
it has one small
Settingsmodel and one readableAnalysissubclass;build_mda_jobs()and the collector fit on the same screen;plotting is absent or limited to one short helper;
results are plain artifacts with compact payloads and no custom output model;
formatting uses the default formatter or a short
format()method; andtests can find the behavior without navigating several files.
Split into a package when the current module is becoming hard to review. Good signals include:
MDAnalysis job setup or collectors need several helper classes or functions;
the plugin uses array or table sidecars and needs loading helpers;
plotting has several artifact-only figures or repeated data preparation;
aggregation, comparison, or CLI output uses structured Pydantic models;
formatting has enough logic to distract from the lifecycle methods; or
maintainers are repeatedly asking for clearer separation of responsibilities.
The goal is not abstraction for its own sake. The goal is a smaller public
facade in __init__.py plus private modules that each have one reason to
change.
Use built-in packages as examples, not import targets
Built-in plugins such as rmsf, distances, contacts, sasa, rg, rmsd,
hydrogen_bonds, secondary_structure, and catalytic_triad show real package
shapes. Their private helper modules are useful examples of organization:
_mda.pyfor trajectory-native job and collector helpers;_plotters.pyfor artifact-only plotting helpers;_models.pyfor domain schemas that validate artifact payload entries;_formatters.pyfor substantial CLI formatting; andplugin-specific modules such as
_comparison.pyor_filters.pyonly when a plugin has a genuine need.
Do not import private modules from another plugin. For contributor plugins, use the public facades only:
from polyzymd.analyses.base import Analysis, PlotContext
from polyzymd.analyses.mda import MDAAnalysisJob, MDAReplicateJobContext
Documented utilities from polyzymd.analyses.shared are also acceptable when an
existing shared helper fits the task. Do not import from
polyzymd.analyses._framework; it is a private implementation detail behind the
public facades.
Target package layout
For a plugin named solvent_shell, convert the single module into a directory
like this:
src/polyzymd/analyses/solvent_shell/
├── __init__.py
├── _mda.py
├── _plotters.py
├── _models.py
└── _formatters.py
Create only the files that have a current job. If your plugin has no custom
formatting yet, skip _formatters.py. If plain artifact payload dictionaries
are enough, skip _models.py.
File |
Responsibility |
Keep out |
|---|---|---|
|
Public plugin class, settings model, class variables, lifecycle wiring, and small orchestration methods. |
Long MDAnalysis loops, large plot functions, bulky formatters. |
|
MDAnalysis job functions, |
Public plugin registration, CLI formatting, plot rendering. |
|
Artifact-only plot helpers that load |
Trajectory loading, MDAnalysis job execution, recomputation. |
|
Pydantic domain schemas for validating nested artifact payload entries or custom comparison outputs. |
Alternate aggregate cache loaders, raw MDAnalysis |
|
Substantial text/table formatting used by |
Scientific computation, artifact writes, plotting side effects. |
Keep __init__.py as the public facade
After the split, __init__.py should still be the place where discovery finds
the plugin class. Keep the public class readable and import private helpers from
the same package with relative imports.
from typing import ClassVar
from pydantic import BaseModel, Field
from polyzymd.analyses.base import Analysis, PlotContext
from polyzymd.analyses.mda import MDACollectorContext, MDAReplicateJobContext
from ._formatters import format_solvent_shell
from ._mda import SolventShellCollector, build_solvent_shell_jobs
from ._plotters import plot_solvent_shell_summary
class SolventShellSettings(BaseModel):
"""Settings for solvent-shell analysis."""
selection: str = Field(default="protein and chainid A")
class SolventShellAnalysis(Analysis):
"""Analyze solvent-shell behavior around the protein."""
name: ClassVar[str] = "solvent_shell"
Settings: ClassVar[type[BaseModel]] = SolventShellSettings
def build_mda_jobs(self, ctx: MDAReplicateJobContext):
return build_solvent_shell_jobs(ctx)
def build_mda_collector(self, ctx: MDACollectorContext):
del ctx
return SolventShellCollector()
def plot(self, ctx: PlotContext):
return plot_solvent_shell_summary(ctx)
def format(self, result, output_format: str = "text") -> str:
return format_solvent_shell(result, output_format=output_format)
This file wires together the lifecycle but does not hide heavy computation or plotting details inside the class body.
Move trajectory work to _mda.py
Put trajectory-native details in _mda.py: job construction, function-adapter
jobs, AnalysisBase-compatible workers, collectors, and sidecar writes. Import
heavy dependencies lazily inside functions or methods that need them.
from polyzymd.analyses.mda import (
MDAAnalysisJob,
MDACollectorContext,
MDAJobResult,
MDAReplicateJobContext,
ReplicateArtifact,
frame_selection_payload,
)
def calculate_shell_counts(
universe,
*,
selection: str,
start=None,
stop=None,
step=None,
frames=None,
**_frame_kwargs,
):
"""Calculate compact per-replicate values for one trajectory."""
import numpy as np
atoms = universe.select_atoms(selection)
values: list[float] = []
iterator = (
universe.trajectory[frames]
if frames is not None
else universe.trajectory[start:stop:step]
)
for _ts in iterator:
values.append(float(np.asarray(atoms.positions).shape[0]))
return {"mean_count": float(np.mean(values)), "n_frames": len(values)}
def build_solvent_shell_jobs(ctx: MDAReplicateJobContext) -> list[MDAAnalysisJob]:
settings = ctx.settings
return [
MDAAnalysisJob.from_function(
name="solvent_shell_counts",
function=calculate_shell_counts,
universe=ctx.universe,
frame_selection=ctx.frame_selection,
universe_policy=ctx.universe_policy,
function_kwargs={"selection": settings.selection},
)
]
class SolventShellCollector:
"""Collect one completed job into a replicate artifact."""
def __call__(
self,
ctx: MDACollectorContext,
completed_jobs: list[MDAJobResult],
) -> ReplicateArtifact:
job = completed_jobs[0]
mean_count = float(job.results["mean_count"])
metadata = {"result_kind": "solvent_shell_replicate"}
if ctx.settings_fingerprint is not None:
metadata["settings_fingerprint"] = ctx.settings_fingerprint
return ReplicateArtifact(
analysis_name=ctx.analysis_name,
condition_label=ctx.condition_label,
replicate=ctx.replicate,
payload={"metrics": {"mean_shell_count": mean_count}},
provenance={"frame_selection": frame_selection_payload(ctx.frame_selection)},
metadata=metadata,
warnings=list(ctx.warnings),
)
The collector returns a durable artifact. It should not serialize raw MDAnalysis
Results objects. If the job produces arrays or event tables, write registered
sidecars as shown in Store large analysis outputs with artifact sidecars.
Function-adapter workers receive frame-selection keyword arguments from
PolyzyMD’s FrameSelection. They must respect both explicit frames selectors
and start/stop/step selectors so equilibration cuts, analysis windows, and
strides are honored consistently.
Move artifact-only plotting to _plotters.py
Plot helpers should read cached artifacts and registered sidecars only. They
should not load trajectories, create MDAAnalysisJob objects, or rerun compute
work.
from pathlib import Path
from polyzymd.analyses.base import PlotContext
from polyzymd.analyses.mda import ArtifactStore
def plot_solvent_shell_summary(ctx: PlotContext) -> list[Path]:
"""Plot from cached condition artifacts and sidecars."""
import matplotlib.pyplot as plt
output_path = ctx.output_dir / "solvent_shell_summary.png"
labels: list[str] = []
values: list[float] = []
for condition in ctx.conditions:
analysis_dir = ctx.analysis_dirs.get(condition.label)
if analysis_dir is None:
continue
artifact = ArtifactStore(analysis_dir).read_condition_result()
metric = artifact.payload.get("metrics", {}).get("mean_shell_count")
if metric is None:
continue
labels.append(condition.label)
values.append(float(metric["mean"]))
if not values:
return []
fig, ax = plt.subplots()
ax.bar(labels, values)
ax.set_ylabel("Mean shell count")
fig.tight_layout()
fig.savefig(output_path)
plt.close(fig)
return [output_path]
Keep all heavy plotting dependencies inside the plotting function. This preserves fast imports for users who only need configuration, discovery, or API docs.
Add _models.py only for domain schemas
Use _models.py when the plugin has a custom Pydantic output model or typed
helpers that make aggregation and comparison clearer. Do not add a result model
only to wrap one scalar already stored under payload["metrics"].
Good uses for _models.py include:
custom comparison outputs with several tables or ranked entries;
aggregate summaries that need schema validation beyond artifact payloads;
small typed models that point to sidecars by reference; and
typed custom comparison models when the default comparison artifact is not enough.
Large arrays, per-frame matrices, and event streams should remain in registered
sidecars, with _models.py storing only validated summaries or sidecar
references.
Add _formatters.py only for substantial formatting
Use _formatters.py when format() has enough branches or table-building logic
to obscure the Analysis subclass. Keep format() in __init__.py as a small
delegation method and put the formatting implementation in _formatters.py.
def format_solvent_shell(result, *, output_format: str = "text") -> str:
"""Format solvent-shell comparison output for the CLI."""
if output_format == "json":
return result.model_dump_json(indent=2)
return "Solvent-shell comparison complete."
Do not compute new scientific results in a formatter. It should only render data that aggregation or comparison already produced.
Migrate in small steps
Start from a passing single-file plugin. Run the focused plugin tests before the split so you know whether a later failure came from the refactor.
Create the package directory. Replace
solvent_shell.pywithsolvent_shell/__init__.pyand keep the publicAnalysissubclass name,name, andSettingsunchanged.Move trajectory helpers first. Put job functions, workers, collectors, and sidecar writes in
_mda.py. Keep public imports frompolyzymd.analyses.base,polyzymd.analyses.mda, and documentedpolyzymd.analyses.sharedutilities.Move plot helpers next. Put plot data loading and rendering in
_plotters.py. Verify plots read artifacts and sidecars only.Move structured models only if needed. Add
_models.pywhen the plugin has a custom result contract. Otherwise, keep relying on canonical artifacts.Move formatting only if needed. Add
_formatters.pywhen CLI rendering is substantial enough to deserve its own file.Run focused tests and docs checks. Use the plugin test file and the docs build before opening a pull request.
Example validation commands:
pixi run -e build pytest tests/analyses/plugins/test_solvent_shell.py -v
pixi run -e build make -C docs clean html
Migration checklist
Before you consider the package split complete, check that:
__init__.pystill exposes exactly one public plugin class for discovery.The plugin’s
nameandSettingsare unchanged unless you intentionally made a user-facing configuration change.All imports from PolyzyMD use public facades or documented shared utilities.
No contributor code imports from another plugin’s private helper modules.
Heavy dependencies such as MDAnalysis, NumPy-heavy routines, or matplotlib are imported lazily inside functions that need them when practical.
Collectors return
ReplicateArtifactobjects and write large arrays or tables as registered sidecars.Aggregation and comparison consume artifacts or validated structured outputs, not raw runtime containers.
plot()delegates to helpers that load cached artifacts and sidecars only._models.pyand_formatters.pyexist only if they have clear current responsibilities.Focused plugin tests pass after the file move.
Success state
You have converted a plugin into an advanced package when the public plugin class is easier to read, private helpers have focused responsibilities, imports stay on public PolyzyMD facades, heavy dependencies remain lazy, and plotting is still artifact-only.