# Extending the Plotter Framework This guide shows developers how to create custom plotters using PolyzyMD's registry-based framework. The framework follows the **Open-Closed Principle** (open for extension, closed for modification) and provides automatic discovery of plotter implementations. ```{note} This guide covers the **plotter** subsystem for generating comparison visualizations. For statistical comparisons, see {doc}`extending_comparators`. ``` ## Overview The plotter framework provides: - **BasePlotter**: Abstract base class with shared utilities (`_save_figure`, `_get_output_path`) - **PlotterRegistry**: Registry for auto-discovery of plotter types via `@register` decorator - **PlotSettings**: Configuration model for plot appearance (DPI, colors, formats) - **Automatic discovery**: `plot_all()` finds all registered plotters for an analysis type ### Architecture ``` compare/ ├── plotter.py # BasePlotter, PlotterRegistry, ComparisonPlotter ├── plotters/ │ ├── triad.py # TriadKDEPanelPlotter, TriadThresholdBarsPlotter │ ├── contacts.py # BindingPreferenceHeatmapPlotter, BindingPreferenceBarPlotter │ ├── rmsf.py # RMSFBarPlotter, RMSFLinePlotter │ └── __init__.py └── config.py # PlotSettings, PlotSettingsTriad, PlotSettingsContacts ``` ## Quick Start: Minimal Plotter Here's the minimal code to create a new plotter: ```python from pathlib import Path from typing import Any, Sequence from polyzymd.compare.plotter import BasePlotter, PlotterRegistry @PlotterRegistry.register("my_custom_plot") class MyCustomPlotter(BasePlotter): """Generate custom visualization for my analysis type.""" @classmethod def plot_type(cls) -> str: return "my_custom_plot" def can_plot(self, comparison_config, analysis_type: str) -> bool: """Return True if this plotter handles the analysis type.""" return analysis_type == "my_analysis" def plot( self, data: dict[str, Any], labels: Sequence[str], output_dir: Path, **kwargs, ) -> list[Path]: """Generate and save plot(s).""" import matplotlib.pyplot as plt # Load your data from each condition's analysis_dir results = self._load_results(data, labels) if not results: return [] # Create your visualization fig, ax = plt.subplots(figsize=(10, 6), dpi=self.settings.dpi) # ... plotting logic ... # Save using helper method output_path = self._get_output_path(output_dir, "my_custom_plot") return [self._save_figure(fig, output_path)] def _load_results(self, data, labels): """Load analysis results from filesystem.""" from my_module.results import MyResult results = {} for label in labels: cond_data = data.get(label) if cond_data is None: continue analysis_dir = Path(cond_data["analysis_dir"]) result_file = analysis_dir / "my_result.json" if result_file.exists(): results[label] = MyResult.load(result_file) return results ``` ## Critical: The Data Contract ```{warning} The most common mistake when implementing plotters is expecting data to be passed via `kwargs`. **This does not work!** The orchestrator only provides filesystem paths—plotters must load their own data. ``` ### What `plot()` Receives The `ComparisonPlotter.plot_analysis()` method calls your plotter with: ```python plotter.plot( data=data, # Dict of condition metadata (see structure below) labels=labels, # List of condition labels in display order output_dir=output_dir, # Where to save plots # **kwargs is reserved for future use—DO NOT rely on it ) ``` ### The `data` Dictionary Structure The `data` dict contains metadata for each condition, **not analysis results**: ```python data = { "SBMA_75_25": { "condition": ConditionConfig(...), # Condition metadata "sim_config": SimulationConfig(...), # Full simulation config "analysis_dir": Path("path/to/analysis/contacts/"), # CRITICAL! "aggregated_dir": Path("path/to/analysis/contacts/aggregated/"), "replicates": [1, 2, 3], # Replicate numbers }, "EGMA_75_25": { # Same structure... }, } ``` ### Correct Pattern: Load From Filesystem `````{tab-set} ````{tab-item} Correct ✓ :sync: correct ```python def plot(self, data, labels, output_dir, **kwargs): # Load data from filesystem paths for label in labels: analysis_dir = Path(data[label]["analysis_dir"]) result_file = analysis_dir / "my_result_aggregated.json" result = MyAggregatedResult.load(result_file) # ... use result for plotting ... ``` ```` ````{tab-item} Incorrect ✗ :sync: incorrect ```python def plot(self, data, labels, output_dir, **kwargs): # WRONG: Expecting pre-loaded result in kwargs comparison_result = kwargs.get("comparison_result") if comparison_result is None: return [] # Always returns empty! ``` ```` ````` ## Step-by-Step Guide ### Step 1: Choose Your Analysis Type Each plotter handles one analysis type (e.g., `"contacts"`, `"catalytic_triad"`, `"rmsf"`). The `can_plot()` method determines which analysis types your plotter supports: ```python def can_plot(self, comparison_config: "ComparisonConfig", analysis_type: str) -> bool: """Check if this plotter can handle the analysis type.""" if analysis_type != "contacts": return False # Optionally check settings to enable/disable return self.settings.contacts.generate_my_plot ``` ### Step 2: Register Your Plotter Use the `@PlotterRegistry.register()` decorator with a unique key: ```python @PlotterRegistry.register("binding_preference_heatmap") class BindingPreferenceHeatmapPlotter(BasePlotter): ... ``` The registry key should be: - Lowercase with underscores - Descriptive of the plot type - Unique across all plotters ### Step 3: Implement Required Methods #### `plot_type()` (classmethod) Return the registry key: ```python @classmethod def plot_type(cls) -> str: return "binding_preference_heatmap" ``` #### `can_plot()` Check if this plotter should be called: ```python def can_plot(self, comparison_config: "ComparisonConfig", analysis_type: str) -> bool: # Only handle contacts analysis if analysis_type != "contacts": return False # Check if this specific plot type is enabled in settings return self.settings.contacts.generate_enrichment_heatmap ``` #### `plot()` Generate and save the visualization. Key responsibilities: 1. **Load data from filesystem** (not kwargs!) 2. **Create matplotlib figure(s)** 3. **Save using `_save_figure()`** 4. **Return list of output paths** ```python def plot( self, data: dict[str, Any], labels: Sequence[str], output_dir: Path, **kwargs, ) -> list[Path]: import matplotlib.pyplot as plt # 1. Load data from each condition results = {} for label in labels: analysis_dir = Path(data[label]["analysis_dir"]) result_file = analysis_dir / "binding_preference_aggregated_reps1-3.json" if result_file.exists(): results[label] = AggregatedBindingPreferenceResult.load(result_file) if not results: return [] # 2. Create figure fig, ax = plt.subplots(figsize=self.settings.contacts.figsize_enrichment_heatmap) # ... plotting logic ... # 3. Save and return output_path = self._get_output_path(output_dir, "my_plot_name") return [self._save_figure(fig, output_path)] ``` ### Step 4: Access Plot Settings The `self.settings` attribute provides access to `PlotSettings`: ```python # Global settings self.settings.dpi # int, default 150 self.settings.format # str, "png" or "svg" self.settings.color_palette # str, default "Set2" # Analysis-specific settings self.settings.contacts.figsize_enrichment_heatmap # tuple self.settings.contacts.enrichment_colormap # str self.settings.contacts.show_enrichment_error # bool self.settings.triad.figsize_kde_panel # tuple self.settings.triad.kde_fill_alpha # float ``` To add new settings, extend the settings models in `compare/config.py`. ### Step 5: Handle Multiple Plot Files If your plotter generates multiple files (e.g., one per polymer type), return all paths: ```python def plot(self, data, labels, output_dir, **kwargs) -> list[Path]: output_paths = [] for polymer_type in polymer_types: fig, ax = plt.subplots(...) # ... plot for this polymer type ... output_path = self._get_output_path( output_dir, f"binding_bars_{polymer_type.lower()}" ) output_paths.append(self._save_figure(fig, output_path)) return output_paths ``` ## Complete Example: Bar Chart Plotter Here's a complete example of a plotter that generates grouped bar charts: ```python """Example plotter for binding preference bar charts.""" from __future__ import annotations import logging from pathlib import Path from typing import TYPE_CHECKING, Any, Sequence import numpy as np from polyzymd.compare.plotter import BasePlotter, PlotterRegistry if TYPE_CHECKING: from polyzymd.analysis.contacts.binding_preference import ( AggregatedBindingPreferenceResult, ) from polyzymd.compare.config import ComparisonConfig logger = logging.getLogger(__name__) @PlotterRegistry.register("binding_preference_bars") class BindingPreferenceBarPlotter(BasePlotter): """Generate grouped bar chart of binding preference enrichment. Creates a figure showing enrichment ratios as grouped bars with: - Groups: Protein groups (e.g., aromatic, polar, charged) - Bars within group: One per condition - Error bars: SEM across replicates - Reference line at 1.0 (neutral enrichment) One plot is generated per polymer type. """ @classmethod def plot_type(cls) -> str: return "binding_preference_bars" def can_plot(self, comparison_config: "ComparisonConfig", analysis_type: str) -> bool: """Check if this plotter can handle the analysis type.""" if analysis_type != "contacts": return False return self.settings.contacts.generate_enrichment_bars def plot( self, data: dict[str, Any], labels: Sequence[str], output_dir: Path, **kwargs, ) -> list[Path]: """Generate enrichment bar chart from filesystem data.""" import matplotlib.pyplot as plt # Load binding preference results from each condition binding_results = self._load_binding_preference_results(data, labels) if not binding_results: logger.info("No binding preference data found - skipping bar plots") return [] # Collect all polymer types and protein groups all_polymer_types: set[str] = set() all_protein_groups: set[str] = set() for result in binding_results.values(): all_polymer_types.update(result.polymer_types()) all_protein_groups.update(result.protein_groups()) polymer_types = sorted(all_polymer_types) protein_groups = sorted(all_protein_groups) if not polymer_types or not protein_groups: return [] # Generate one plot per polymer type output_paths: list[Path] = [] valid_labels = [label for label in labels if label in binding_results] for poly_type in polymer_types: fig, ax = plt.subplots( figsize=self.settings.contacts.figsize_enrichment_bars, dpi=self.settings.dpi, ) n_groups = len(protein_groups) n_conditions = len(valid_labels) bar_width = 0.8 / n_conditions x = np.arange(n_groups) for i, cond_label in enumerate(valid_labels): result = binding_results[cond_label] means = [] sems = [] for prot_group in protein_groups: entry = result.get_entry(poly_type, prot_group) if entry and entry.mean_enrichment is not None: means.append(entry.mean_enrichment) sems.append(entry.sem_enrichment or 0.0) else: means.append(0.0) sems.append(0.0) offset = (i - n_conditions / 2 + 0.5) * bar_width ax.bar(x + offset, means, bar_width, yerr=sems, label=cond_label) ax.axhline(y=1.0, color="black", linestyle="--", label="Neutral") ax.set_xlabel("Protein Group") ax.set_ylabel("Enrichment Ratio") ax.set_title(f"Binding Preference: {poly_type}") ax.set_xticks(x) ax.set_xticklabels(protein_groups, rotation=45, ha="right") ax.legend() plt.tight_layout() output_path = self._get_output_path( output_dir, f"binding_preference_bars_{poly_type.lower()}" ) output_paths.append(self._save_figure(fig, output_path)) return output_paths def _load_binding_preference_results( self, data: dict[str, Any], labels: Sequence[str], ) -> dict[str, "AggregatedBindingPreferenceResult"]: """Load aggregated binding preference results for each condition.""" from polyzymd.analysis.contacts.binding_preference import ( AggregatedBindingPreferenceResult, ) results: dict[str, AggregatedBindingPreferenceResult] = {} for label in labels: cond_data = data.get(label) if cond_data is None: continue analysis_dir = Path(cond_data.get("analysis_dir", "")) if not analysis_dir: continue # Find aggregated file agg_files = list(analysis_dir.glob("binding_preference_aggregated*.json")) if not agg_files: continue result_file = sorted(agg_files)[-1] # Most recent try: results[label] = AggregatedBindingPreferenceResult.load(result_file) except Exception as e: logger.warning(f"Failed to load {result_file}: {e}") return results ``` ## Testing Your Plotter ### Unit Test ```python import pytest from pathlib import Path from unittest.mock import MagicMock, patch class TestMyCustomPlotter: """Tests for MyCustomPlotter.""" def test_plot_type_returns_registry_key(self): plotter = MyCustomPlotter(settings=MagicMock()) assert plotter.plot_type() == "my_custom_plot" def test_can_plot_returns_true_for_correct_type(self): settings = MagicMock() settings.my_analysis.generate_custom_plot = True plotter = MyCustomPlotter(settings=settings) assert plotter.can_plot(MagicMock(), "my_analysis") is True assert plotter.can_plot(MagicMock(), "other_type") is False def test_plot_returns_empty_when_no_data(self): plotter = MyCustomPlotter(settings=MagicMock()) result = plotter.plot({}, [], Path("/tmp")) assert result == [] ``` ### Integration Test ```bash # Run plot-all to test discovery and execution pixi run -e build polyzymd compare plot-all -f comparison.yaml ``` ## Common Patterns ### Pattern: Load Aggregated Results Most plotters load aggregated (replicate-averaged) results: ```python def _load_aggregated(self, data, labels): from my_module import MyAggregatedResult results = {} for label in labels: agg_dir = Path(data[label].get("aggregated_dir", "")) result_file = agg_dir / "my_aggregated.json" if result_file.exists(): results[label] = MyAggregatedResult.load(result_file) return results ``` ### Pattern: Pool Per-Replicate Data For distribution plots (KDEs, histograms), pool raw data across replicates: ```python def _pool_replicate_data(self, data, labels): pooled = {} for label in labels: analysis_dir = Path(data[label]["analysis_dir"]) replicates = data[label]["replicates"] all_values = [] for rep in replicates: rep_file = analysis_dir / f"run_{rep}" / "result.json" if rep_file.exists(): result = MyResult.load(rep_file) all_values.extend(result.values) if all_values: pooled[label] = np.array(all_values) return pooled ``` ### Pattern: Conditional Plot Generation Only generate plots when data is available: ```python def plot(self, data, labels, output_dir, **kwargs): results = self._load_results(data, labels) if not results: logger.info("No data found - skipping plot") return [] # Return empty list, not raise exception # Continue with plotting... ``` ## Adding Plot Settings To add configurable settings for your plotter: ### 1. Extend Settings Model In `compare/config.py`: ```python class PlotSettingsMyAnalysis(BaseModel): """Plot settings for my_analysis type.""" generate_custom_plot: bool = True figsize_custom: tuple[int, int] = (10, 6) custom_colormap: str = "viridis" class PlotSettings(BaseModel): """Root plot settings.""" # Existing fields... my_analysis: PlotSettingsMyAnalysis = Field(default_factory=PlotSettingsMyAnalysis) ``` ### 2. Access in Plotter ```python def plot(self, data, labels, output_dir, **kwargs): fig, ax = plt.subplots(figsize=self.settings.my_analysis.figsize_custom) ax.imshow(matrix, cmap=self.settings.my_analysis.custom_colormap) ``` ## Troubleshooting ### "No data found" but data exists Check that your file pattern matches: ```python # Verify file exists agg_files = list(analysis_dir.glob("binding_preference_aggregated*.json")) print(f"Found files: {agg_files}") # Debug ``` ### Plot not appearing in `plot-all` 1. Verify registration: ```python from polyzymd.compare.plotter import PlotterRegistry print(PlotterRegistry.list()) # Should include your key ``` 2. Check `can_plot()` returns `True`: ```python assert plotter.can_plot(config, "my_analysis") is True ``` 3. Verify settings enable the plot: ```yaml plot_settings: my_analysis: generate_custom_plot: true ``` ### Type errors in matplotlib Use tuples for `add_axes` and `tight_layout`: ```python # Correct cbar_ax = fig.add_axes((0.92, 0.15, 0.02, 0.7)) # tuple plt.tight_layout(rect=(0, 0, 0.9, 0.95)) # tuple # Incorrect (causes type errors) cbar_ax = fig.add_axes([0.92, 0.15, 0.02, 0.7]) # list ``` ## See Also - {doc}`extending_comparators` — Creating custom statistical comparators - {doc}`analysis_binding_preference` — Binding preference analysis details - {doc}`architecture` — Overall system architecture