# Build a simple scalar analysis plugin This tutorial turns the scaffold pattern into one small trajectory-native analysis. You will sketch a plugin that computes one scalar value per replicate, stores it in a `ReplicateArtifact`, lets PolyzyMD aggregate replicate metrics into a `ConditionArtifact`, and lets the MDAnalysis artifact comparison path build the default scalar comparison from that artifact. The metric in this tutorial is **teaching-only**: the mean x-coordinate of the selected protein atoms. It is not validated science and should not be used to interpret enzyme-polymer simulations. It exists only because it is small enough to show the plugin lifecycle in one sitting. ## Before you start You should already have completed {doc}`first_scaffold`. Keep using a scratch plugin while you learn. The snippets below are meant to replace the scaffolded placeholder pieces in a tutorial plugin such as `solvent_shell`; they are not a request to add this metric to PolyzyMD. You also need the architecture vocabulary from {doc}`architecture`: - `build_mda_jobs()` creates `MDAAnalysisJob` objects. - A collector maps completed jobs to a `ReplicateArtifact`. - Default aggregation reads scalar replicate metrics from `payload["metrics"]` and writes a `ConditionArtifact`. - Default comparison reads the canonical `ConditionArtifact` payload; use `extract_metrics()` only to customize metric descriptors for that payload. ## 1. Define a tiny settings model Start with one atom selection and one constant metric key. Use lowercase `chainid` in MDAnalysis selections. In the PolyzyMD chain convention, `chainid A` is the protein. ```python from typing import ClassVar from pydantic import BaseModel, Field from polyzymd.analyses.base import Analysis from polyzymd.analyses.mda import ( MDAAnalysisJob, MDACollectorContext, MDAJobResult, MDAReplicateJobContext, ReplicateArtifact, frame_selection_payload, strict_json_payload, ) METRIC_NAME: str = "mean_protein_x_nm" class MeanProteinXSettings(BaseModel): """Settings for the teaching-only scalar metric.""" selection: str = Field( default="protein and chainid A", description="MDAnalysis selection for atoms to summarize.", ) ``` This imports PolyzyMD symbols only from the public contributor facades: `polyzymd.analyses.base` and `polyzymd.analyses.mda`. The metric key is a module constant instead of a setting so every replicate and condition writes the same scalar name. ## 2. Write a function job that returns JSON-compatible data `MDAAnalysisJob.from_function()` adapts a plain function into the MDAnalysis job shape. The function receives the loaded universe plus frame-selection keyword arguments from PolyzyMD. Return a JSON-compatible dictionary. Do not return or store raw MDAnalysis `Results` objects in artifacts. ```python def calculate_mean_x( universe, *, selection: str, start=None, stop=None, step=None, frames=None, **_frame_kwargs, ) -> dict[str, float | int | str]: """Calculate a teaching-only scalar for one replicate.""" atoms = universe.select_atoms(selection) values: list[float] = [] iterator = ( universe.trajectory[frames] if frames is not None else universe.trajectory[start:stop:step] ) for _ts in iterator: if len(atoms) == 0: raise ValueError(f"Selection matched no atoms: {selection!r}") # MDAnalysis positions are commonly in Angstrom; divide by 10 for nm. values.append(float(atoms.positions[:, 0].mean() / 10.0)) mean_x = sum(values) / len(values) if values else 0.0 return { "mean_x_nm": mean_x, "n_frames": len(values), "selection": selection, } ``` The important boundary is the return type. The function can use MDAnalysis while it is running, but its output is already reduced to primitive values that can be validated and serialized. ## 3. Build the function job In your `Analysis` subclass, use the framework-provided context. Do not load the configuration again, and do not pass a backend policy to `from_function()` for a function-adapter job. ```python class MeanProteinXAnalysis(Analysis): """Teaching-only scalar analysis plugin.""" name: ClassVar[str] = "mean_protein_x" Settings: ClassVar[type[BaseModel]] = MeanProteinXSettings def build_mda_jobs(self, ctx: MDAReplicateJobContext) -> list[MDAAnalysisJob]: settings = ctx.settings return [ MDAAnalysisJob.from_function( name="mean_protein_x", function=calculate_mean_x, universe=ctx.universe, frame_selection=ctx.frame_selection, universe_policy=ctx.universe_policy, function_kwargs={"selection": settings.selection}, ) ] ``` This job runs once for a replicate. The function handles the selected frame iteration internally and returns a compact result dictionary. ## 4. Collect the job result into a ReplicateArtifact The collector is where you translate completed job output into the durable replicate contract. Put the scalar that should be aggregated under `payload["metrics"]`. ```python class MeanProteinXCollector: """Collect one completed function job into a replicate artifact.""" def __call__( self, ctx: MDACollectorContext, completed_jobs: list[MDAJobResult], ) -> ReplicateArtifact: if len(completed_jobs) != 1: raise ValueError(f"Expected one job, got {len(completed_jobs)}") job = completed_jobs[0] result = dict(job.results) mean_x = float(result["mean_x_nm"]) metadata = {"result_kind": "teaching_scalar"} if ctx.settings_fingerprint is not None: metadata["settings_fingerprint"] = ctx.settings_fingerprint return ReplicateArtifact( analysis_name=ctx.analysis_name, condition_label=ctx.condition_label, replicate=ctx.replicate, payload={ "selection": result["selection"], "n_frames": int(result["n_frames"]), "metrics": {METRIC_NAME: mean_x}, }, provenance={ "source": "mean_protein_x_teaching_function", "frame_selection": frame_selection_payload(ctx.frame_selection), "universe_policy": strict_json_payload( ctx.universe_policy.as_dict(), analysis_name=ctx.analysis_name, ), }, metadata=metadata, warnings=list(ctx.warnings), ) ``` Then add the collector hook to the `MeanProteinXAnalysis` class above: ```python def build_mda_collector(self, ctx: MDACollectorContext): del ctx return MeanProteinXCollector() ``` Because `payload["metrics"]` contains one finite scalar, the default aggregation path can combine replicate artifacts without a custom `aggregate()` method. ## 5. Let default aggregation build the ConditionArtifact For this simple scalar plugin, do not override `aggregate()`. PolyzyMD's default MDAnalysis aggregation reads each replicate artifact's `payload["metrics"]` and creates a `ConditionArtifact` with a payload shaped like this: ```python { "metrics": { "mean_protein_x_nm": { "name": "mean_protein_x_nm", "values": [1.2, 1.3, 1.1], "mean": 1.2, "sem": 0.0577, "std": 0.1, "n": 3, } }, "replicate_metrics": { "1": {"mean_protein_x_nm": 1.2}, "2": {"mean_protein_x_nm": 1.3}, "3": {"mean_protein_x_nm": 1.1}, }, "n_replicates": 3, } ``` The exact numbers will differ. The stable idea is that aggregation computes replicate values, mean, standard deviation, and SEM for each named scalar metric. ## 6. Let artifact comparison use the condition metrics For this simple MDAnalysis artifact tutorial, the default comparison reads the canonical `ConditionArtifact.payload["metrics"]`, builds the internal `MetricValue` inputs from the stored means, SEMs, and replicate values, and returns a comparison artifact. Implement `extract_metrics()` only when you need to customize metric direction, labels, or units from the same canonical payload. Because the teaching metric has no scientific direction, treat any comparison output as a lifecycle check only. For a real plugin, define metric direction and labels through the supported comparison contract only after the metric has a validated interpretation. ## Success state You have the pieces of a simple scalar plugin when: - `build_mda_jobs()` returns one `MDAAnalysisJob.from_function()` job. - The function returns a JSON-compatible dictionary, not raw MDAnalysis results. - The collector returns a `ReplicateArtifact` with a finite scalar under `payload["metrics"]`. - You do not override `aggregate()` because default aggregation can create the `ConditionArtifact` from replicate metrics. - Default comparison reads the `ConditionArtifact` metrics directly, or your `extract_metrics()` reads that canonical payload to add metric metadata. ## Common mistakes - **Treating the teaching metric as science.** The x-coordinate example is only a lifecycle exercise. Replace it with a validated quantity for real work. - **Using the wrong chain-selection spelling.** Use lowercase `chainid`, for example `protein and chainid A`. - **Passing backend settings to `MDAAnalysisJob.from_function()`.** Function jobs use the default adapter path; do not pass `backend_policy` in this tutorial. - **Serializing raw MDAnalysis results.** Store primitive values in the artifact payload. Use {doc}`sidecars` later for arrays or tables. - **Hiding aggregation inside the collector.** The collector should describe one replicate. Let the default aggregation stage combine replicates. - **Importing private framework modules.** Contributor examples should use `polyzymd.analyses.base` and `polyzymd.analyses.mda`. ## What to read next - {doc}`../extending_analyses` for the full implementation guide. - {doc}`sidecars` for the next practical guide when your plugin needs array or table outputs. - {doc}`../../api/analyses_base` for `Analysis` and `MetricValue` API details. - {doc}`../../api/analyses_mda` for `MDAAnalysisJob`, collectors, and artifact models.