# Create Custom Plots from Analysis Artifacts Want to use the PolyzyMD artifacts to make your own plots? Use this guide when you already have cached analysis artifacts and sidecars and want to make your own matplotlib plots from those existing results. The example loads cached hydrogen-bond aggregate artifacts and combines the `ser_his` and `asp_his` summaries on one graph without rerunning the analysis. This workflow is intended for JupyterLab, Jupyter Notebook, VS Code notebooks, or an IPython session. ::::{tip} Launch your notebook server from the PolyzyMD pixi environment, or select a kernel created from that environment: ```bash pixi run -e build jupyter lab ``` :::: ```{important} This is a post-processing workflow for existing artifacts and sidecars. It does not customize a plugin's `plot()` method, load trajectories, or rerun MDAnalysis. To adjust PolyzyMD's standard analysis plots, start with {doc}`publication_plots` instead. ``` The code below reads small JSON artifacts from `analysis//hydrogen_bonds/aggregated/result.json`. ## Prepare a notebook context cell Add a short Markdown cell at the top of the notebook so exported notebooks keep the intent of the figure clear. ````markdown ## Custom hydrogen-bond occupancy plot This notebook loads existing PolyzyMD hydrogen-bond aggregate artifacts and plots the `ser_his` and `asp_his` summaries together. It does not rerun the hydrogen-bond analysis. ```` ## Import notebook dependencies ```python from pathlib import Path import matplotlib.pyplot as plt import numpy as np import pandas as pd from polyzymd.analyses.mda import ArtifactStore ``` ## Set the project paths and summaries Edit `project_dir`, `conditions`, and `condition_dirs` to match the directory that contains your `comparison.yaml` and `analysis/` tree. The condition labels should match the labels in `comparison.yaml`. The directory values should match the corresponding directories under `analysis/`. ```python project_dir = Path("/path/to/polyzymd/project").expanduser().resolve() conditions = [ "No Polymer", "100% SBMA", "100% EGMA", "1% EGPMA", "2% EGPMA", "5% EGPMA", "10% EGPMA", ] condition_dirs = { "No Polymer": "no_polymer", "100% SBMA": "100_sbma", "100% EGMA": "100_egma", "1% EGPMA": "1_egpma", "2% EGPMA": "2_egpma", "5% EGPMA": "5_egpma", "10% EGPMA": "10_egpma", # Edit these to match directories under analysis/ } summary_names = ["ser_his", "asp_his"] ``` If you are unsure how labels map to directories, list the available analysis directories and update `condition_dirs` to match them. ```python for path in sorted((project_dir / "analysis").iterdir()): if path.is_dir(): print(path.name) ``` This example is useful when the built-in hydrogen-bond plot for `within: catalytic_triad` includes an unwanted Ser-Asp component. Loading the named summaries directly lets you plot only Ser-His and Asp-His occupancy. ## Validate that expected artifacts exist ```python artifact_paths = { condition: project_dir / "analysis" / condition_dirs[condition] / "hydrogen_bonds" / "aggregated" / "result.json" for condition in conditions } missing = [path for path in artifact_paths.values() if not path.exists()] if missing: raise FileNotFoundError( "Missing hydrogen-bond aggregate artifacts:\n" + "\n".join(str(path) for path in missing) ) ``` ## Load and validate hydrogen-bond artifacts ```python condition_artifacts = {} for condition, artifact_path in artifact_paths.items(): artifact_dir = artifact_path.parent artifact = ArtifactStore(artifact_dir).read_condition_result("result.json") if artifact.analysis_name != "hydrogen_bonds": raise ValueError( f"Expected a hydrogen_bonds artifact for {condition!r}, " f"got {artifact.analysis_name!r}" ) summaries = artifact.payload.get("summaries") if not isinstance(summaries, list): raise TypeError( f"Expected artifact.payload['summaries'] to be a list for {condition!r}" ) condition_artifacts[condition] = artifact ``` ## Extract a tidy DataFrame This cell extracts one row per condition and summary. The occupancy fields mean: - `mean_fraction_with_any`: mean across replicates of the fraction of analyzed frames with at least one H-bond matching that summary. - `sem_fraction_with_any`: SEM across replicates. - `per_replicate_fraction_with_any`: one occupancy value per replicate. For this example, `ser_his` is the fraction of frames with at least one Ser-His H-bond, and `asp_his` is the fraction of frames with at least one Asp-His H-bond. ```{note} Artifact payload shapes are plugin- and version-specific. This example uses the current hydrogen-bond artifact summary fields. Inspect `artifact.payload` and the plugin documentation for your PolyzyMD version before adapting the field names. ``` ```python rows = [] for condition, artifact in condition_artifacts.items(): summaries_by_name = { summary["name"]: summary for summary in artifact.payload["summaries"] if isinstance(summary, dict) and "name" in summary } missing_summaries = [ summary_name for summary_name in summary_names if summary_name not in summaries_by_name ] if missing_summaries: available = sorted(summaries_by_name) raise KeyError( f"Missing summaries for {condition!r}: {missing_summaries}. " f"Available summaries: {available}" ) for summary_name in summary_names: summary = summaries_by_name[summary_name] replicate_values = summary["per_replicate_fraction_with_any"] rows.append( { "condition": condition, "summary": summary_name, "mean_fraction_with_any": summary["mean_fraction_with_any"], "sem_fraction_with_any": summary["sem_fraction_with_any"], "per_replicate_fraction_with_any": replicate_values, } ) df = pd.DataFrame(rows) df ``` ## Plot grouped bars with SEM and replicate dots The bars show mean occupancy, error bars show SEM across replicates, and black dots show per-replicate occupancy values. ```python fig, ax = plt.subplots(figsize=(10, 5)) x = np.arange(len(conditions)) width = 0.8 / len(summary_names) colors = dict(zip(summary_names, plt.get_cmap("tab10").colors)) for idx, summary_name in enumerate(summary_names): offset = (idx - (len(summary_names) - 1) / 2) * width bar_x = x + offset means = [] sems = [] replicate_series = [] for condition in conditions: row = df[(df["condition"] == condition) & (df["summary"] == summary_name)].iloc[0] means.append(row["mean_fraction_with_any"]) sems.append(row["sem_fraction_with_any"]) replicate_series.append(row["per_replicate_fraction_with_any"]) ax.bar( bar_x, means, width=width, yerr=sems, capsize=4, label=summary_name, color=colors[summary_name], edgecolor="black", linewidth=0.6, alpha=0.85, ) for xpos, values in zip(bar_x, replicate_series): values = np.asarray(values, dtype=float) jitter = np.linspace(-width * 0.25, width * 0.25, num=len(values)) ax.scatter( np.full_like(values, xpos) + jitter, values, color="black", s=24, zorder=3, alpha=0.8, ) ax.set_xticks(x) ax.set_xticklabels(conditions, rotation=35, ha="right") ax.set_ylabel("Fraction of analyzed frames with ≥1 H-bond") ax.set_title("Catalytic triad hydrogen-bond occupancy") ax.set_ylim(bottom=0) ax.legend(title="Summary") ax.grid(axis="y", alpha=0.25) fig.tight_layout() ``` ## Save outside the canonical analysis tree Save notebook-generated figures under a separate directory such as `figures/custom/`. Avoid writing custom outputs inside the canonical `analysis/` tree, which PolyzyMD owns. ```python output_dir = project_dir / "figures" / "custom" output_dir.mkdir(parents=True, exist_ok=True) figure_path = output_dir / "hbond_ser_his_asp_his_occupancy.png" fig.savefig(figure_path, dpi=300, bbox_inches="tight") figure_path ``` ## Adapt this pattern - Change condition labels and order by editing the `conditions` list. Use the display labels from `comparison.yaml`. - Change artifact directory names by editing `condition_dirs` to match the directories under `analysis/`. - Change which hydrogen-bond summaries appear by editing `summary_names`. - Summary names must match the names configured in `comparison.yaml`. - If artifacts live on an HPC filesystem, copy the small JSON artifact files to local storage before opening the notebook to improve responsiveness. - Use the same `ArtifactStore(...).read_condition_result("result.json")` pattern for other condition-level artifacts, then inspect `artifact.payload` for the fields you want to plot. ## Troubleshoot common notebook issues ### `ImportError: No module named polyzymd` Start the notebook server from the pixi environment: ```bash pixi run -e build jupyter lab ``` If you use VS Code or an existing Jupyter server, select the kernel associated with the PolyzyMD `build` environment. ### `FileNotFoundError` for `result.json` Check that the hydrogen-bond analysis has been run and that `project_dir` points to the directory containing `analysis/`: ```bash pixi run -e build polyzymd compare run hydrogen_bonds -f comparison.yaml ``` Also list the directories under `analysis/` and compare them with `condition_dirs`: ```python for path in sorted((project_dir / "analysis").iterdir()): if path.is_dir(): print(path.name) ``` ### `KeyError` for a summary name Print the available summary names from each artifact and compare them with the `summaries` section of `comparison.yaml`. ```python for condition, artifact in condition_artifacts.items(): available = [summary.get("name") for summary in artifact.payload["summaries"]] print(condition, available) ``` ## See also - {doc}`publication_plots` for settings that control PolyzyMD's standard analysis plots. - {doc}`hydrogen_bonds` for configuring hydrogen-bond summaries. - {doc}`../reference/analysis_hydrogen_bonds_reference` for hydrogen-bond settings and the generated output files and plots. - {doc}`../reference/comparison_yaml` for `comparison.yaml` schema details. - {doc}`../reference/analysis_comparison_reference` for comparison output paths and plotting behavior.