Compare Module

Core

Base classes for comparison analysis.

This module provides abstract base classes that consolidate common patterns across all comparator types, following the Template Method design pattern.

Classes

BaseConditionSummary: Abstract base for condition-level summary statistics.
BaseComparisonResult: Abstract base for complete comparison results with save/load.
PairwiseComparison: Shared model for statistical comparison between two conditions.
ANOVASummary: Shared model for ANOVA results.
BaseComparator: Abstract base implementing the Template Method pattern for comparisons.

Design Principles

Open-Closed Principle: New comparators extend base classes without modifying them.
Template Method: compare() defines the algorithm skeleton; subclasses fill in specifics.
DRY: Statistical tests, pairwise logic, and serialization are implemented once.

class polyzymd.compare.core.base.PairwiseComparison(*, condition_a, condition_b, metric='default', t_statistic, p_value, cohens_d, effect_size_interpretation, direction, significant, percent_change)[source]

Bases: BaseModel

Statistical comparison between two conditions.

This is the standard pairwise comparison result used across all comparator types. For comparators that need additional fields (e.g., multiple metrics), subclass this model.

Variables:

condition_a (str) – Label of first condition (typically control/reference).
condition_b (str) – Label of second condition (typically treatment).
metric (str) – Name of the metric being compared.
t_statistic (float) – T-test statistic.
p_value (float) – Two-tailed p-value.
cohens_d (float) – Effect size (Cohen’s d).
effect_size_interpretation (str) – “negligible”, “small”, “medium”, or “large”.
direction (str) – Interpretation of change (e.g., “stabilizing”, “improving”).
significant (bool) – Whether p < 0.05.
percent_change (float) – Percent change from condition_a to condition_b.

condition_a: str

condition_b: str

metric: str

t_statistic: float

p_value: float

cohens_d: float

effect_size_interpretation: str

direction: str

significant: bool

percent_change: float

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.ANOVASummary(*, metric='default', f_statistic, p_value, significant)[source]

Bases: BaseModel

One-way ANOVA result summary.

Variables:

metric (str) – Name of the metric tested (e.g., “rmsf”, “coverage”).
f_statistic (float) – F-statistic from ANOVA.
p_value (float) – P-value for the test.
significant (bool) – Whether p < 0.05.

metric: str

f_statistic: float

p_value: float

significant: bool

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.BaseConditionSummary(*, label, config_path, n_replicates, replicate_values)[source]

Bases: BaseModel, ABC

Abstract base class for condition-level summary statistics.

All condition summaries share these common fields. Subclasses add analysis-specific fields (e.g., mean_rmsf, coverage_mean).

Variables:

label (str) – Display name for this condition.
config_path (str) – Path to the simulation config file.
n_replicates (int) – Number of replicates included.
replicate_values (list[float]) – Per-replicate values of the primary metric (for statistical tests).

label: str

config_path: str

n_replicates: int

replicate_values: list[float]

abstract property primary_metric_value: float

Return the primary metric value for ranking/comparison.

This is used by BaseComparator for sorting and statistical tests.

abstract property primary_metric_sem: float: Return the SEM of the primary metric.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.BaseComparisonResult(*, metric, name, control_label=None, conditions, pairwise_comparisons, anova=None, ranking, equilibration_time, created_at, polyzymd_version='1.2.1')[source]

Bases: BaseModel, ABC, Generic[TConditionSummary, TPairwiseComparison]

Abstract base class for comparison results.

Provides common serialization (save/load) and accessor methods. Subclasses define analysis-specific fields.

Variables:

metric (str) – The primary metric being compared (e.g., “rmsf”, “simultaneous_contact_fraction”).
name (str) – Name of the comparison project.
control_label (str, optional) – Label of the control condition.
conditions (list[TConditionSummary]) – Summary for each condition.
pairwise_comparisons (list[TPairwiseComparison]) – Statistical comparisons (all vs control, or all pairs).
anova (ANOVASummary, optional) – ANOVA result if 3+ conditions.
ranking (list[str]) – Labels sorted by primary metric.
equilibration_time (str) – Equilibration time used.
created_at (datetime) – When the analysis was run.
polyzymd_version (str) – Version of polyzymd used.

comparison_type: ClassVar[str] = 'base'

metric: str

name: str

control_label: str | None

conditions: list[Any]

pairwise_comparisons: list[Any]

anova: ANOVASummary | list[ANOVASummary] | None

ranking: list[str]

equilibration_time: str

created_at: datetime

polyzymd_version: str

save(path)[source]

Save result to JSON file.

Parameters:: path (Path or str) – Output path.
Returns:: Path to saved file.
Return type:: Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:: path (Path or str) – Path to JSON file.
Returns:: Loaded result.
Return type:: Self

get_condition(label)[source]

Get a condition by label.

Parameters:: label (str) – Condition label.
Returns:: The matching condition.
Return type:: BaseConditionSummary
Raises:: KeyError – If condition not found.

get_comparison(label)[source]

Get pairwise comparison for a condition vs control.

Parameters:: label (str) – Treatment condition label.
Returns:: The comparison, or None if not found.
Return type:: PairwiseComparison or None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.BaseComparator(config, analysis_settings, equilibration=None)[source]

Bases: ABC, Generic[TAnalysisSettings, TConditionData, TConditionSummary, TResult]

Abstract base class for all comparators using Template Method pattern.

The compare() method defines the comparison algorithm skeleton: 1. Load/compute analysis for each condition 2. Build condition summaries 3. Compute pairwise statistical comparisons 4. Compute ANOVA (if 3+ conditions) 5. Rank conditions 6. Build and return result

Subclasses implement the abstract methods to customize each step.

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (TAnalysisSettings) – Analysis-specific settings.
equilibration (str, optional) – Equilibration time override.
Parameters (Type)
---------------
TAnalysisSettings – Type of analysis settings (e.g., RMSFAnalysisSettings).
TConditionData – Type of raw data loaded for each condition.
TConditionSummary – Type of condition summary (e.g., RMSFConditionSummary).
TResult – Type of comparison result (e.g., RMSFComparisonResult).

comparison_type: ClassVar[str] = 'base'

__init__(config, analysis_settings, equilibration=None)[source]

abstractmethod classmethod comparison_type_name()[source]

Return the comparison type identifier (e.g., “rmsf”, “contacts”).

Returns:: Type identifier used in registry and CLI.
Return type:: str

abstract property metric_type: MetricType

Declare whether this comparator’s metric is mean or variance-based.

This determines how autocorrelation is handled in the underlying analysis:

MEAN_BASED: Use all frames for computation, correct uncertainty using N_eff (effective sample size). Examples: average distance, contact fraction, catalytic triad proximity.
VARIANCE_BASED: Subsample to independent frames separated by 2τ (correlation time) to avoid bias in variance estimates. Examples: RMSF, fluctuation metrics.

Contributors implementing new comparators MUST declare the appropriate metric type to ensure correct statistical treatment per LiveCoMS best practices (Grossfield et al., 2018).

Returns:: The metric type for this comparator.
Return type:: MetricType

References

Grossfield et al. (2018) LiveCoMS 1:5067 (Best Practices for Uncertainty)
GitHub: dmzuckerman/Sampling-Uncertainty

compare(recompute=False)[source]

Run comparison across all conditions (Template Method).

This method defines the algorithm skeleton. Subclasses customize behavior by implementing the abstract hook methods.

Parameters:: recompute (bool, optional) – If True, force recompute even if cached results exist.
Returns:: Complete comparison results with statistics and rankings.
Return type:: TResult

Registry for comparator types.

This module provides extensible infrastructure for registering comparator types following the Open-Closed Principle (OCP). New comparators can be added by registering with the ComparatorRegistry without modifying core code.

Example

Registering a new comparator:

>>> from polyzymd.compare.core.registry import ComparatorRegistry
>>> from polyzymd.compare.core.base import BaseComparator
>>>
>>> @ComparatorRegistry.register("my_metric")
... class MyComparator(BaseComparator):
...     @classmethod
...     def comparison_type_name(cls) -> str:
...         return "my_metric"
...     ...
>>>
>>> # Create comparator instance via registry
>>> comparator = ComparatorRegistry.create("my_metric", config, settings)

class polyzymd.compare.core.registry.ComparatorRegistry[source]

Bases: object

Registry for comparator implementations.

Allows new comparators to be registered without modifying core code. Use the register decorator to add new comparator classes.

Examples

>>> @ComparatorRegistry.register("rmsf")
... class RMSFComparator(BaseComparator):
...     ...
>>>
>>> # List available comparators
>>> ComparatorRegistry.list_available()
['contacts', 'rmsf', 'triad']
>>>
>>> # Create comparator instance
>>> comparator = ComparatorRegistry.create("rmsf", config, settings)

classmethod register(name=None)[source]

Decorator to register a comparator class.

Parameters:: name (str, optional) – Registry key. If None, uses the class’s comparison_type_name().
Returns:: Decorator function.
Return type:: Callable

Examples

>>> @ComparatorRegistry.register("rmsf")
... class RMSFComparator(BaseComparator):
...     @classmethod
...     def comparison_type_name(cls) -> str:
...         return "rmsf"

classmethod get(name)[source]

Get comparator class by name.

Parameters:: name (str) – Comparator type identifier.
Returns:: The registered comparator class.
Return type:: Type[BaseComparator]
Raises:: ValueError – If the comparator type is not registered.

classmethod list_available()[source]

List all registered comparator types.

Returns:: Sorted list of registered type names.
Return type:: list[str]

classmethod is_registered(name)[source]

Check if a comparator type is registered.

Parameters:: name (str) – Comparator type identifier.
Returns:: True if registered, False otherwise.
Return type:: bool

classmethod create(name, config, analysis_settings, equilibration=None, **kwargs)[source]

Factory to create a comparator instance.

Parameters:

name (str) – Comparator type identifier.
config (ComparisonConfig) – Comparison configuration.
analysis_settings – Analysis-specific settings.
equilibration (str, optional) – Equilibration time override.
**kwargs – Additional comparator-specific arguments.

Returns:

Configured comparator instance.

Return type:

BaseComparator

classmethod clear()[source]

Clear the registry (for testing purposes).

Configuration

Configuration schema for comparison projects.

This module defines the YAML schema for comparison.yaml files that specify which simulation conditions to compare.

The schema has two main sections: - analysis_settings: Defines WHAT analyses to run (shared across conditions) - comparison_settings: Defines HOW to compare (statistical parameters)

Both sections use a registry-based approach for extensibility. New analysis types can be added by registering with AnalysisSettingsRegistry and ComparisonSettingsRegistry (see polyzymd.compare.settings).

class polyzymd.compare.config.ConditionConfig(*, label, config, replicates)[source]

Bases: BaseModel

Configuration for one condition in a comparison.

Variables:

label (str) – Display name for this condition (e.g., “No Polymer”, “100% SBMA”)
config (Path) – Path to the simulation’s config.yaml file
replicates (list[int]) – List of replicate numbers to include in the analysis

label: str

config: Path

replicates: list[int]

classmethod resolve_path(v)[source]

Convert string paths to Path objects.

classmethod ensure_list(v)[source]

Ensure replicates is a list.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.AnalysisSettingsContainer(**data)[source]

Bases: BaseModel

Container for analysis settings (WHAT to analyze).

Uses dynamic attribute access to support any registered analysis type without hardcoding field names.

model_config = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

__init__(**data)[source]

Initialize with dynamic analysis settings.

Parameters:: **data (Any) – Analysis settings keyed by analysis type name.

get(analysis_type)[source]

Get settings for a specific analysis type.

Parameters:: analysis_type (str) – Analysis type identifier (e.g., “rmsf”, “contacts”).
Returns:: Settings for the analysis type, or None if not configured.
Return type:: BaseAnalysisSettings or None

get_enabled_analyses()[source]

Get list of enabled analysis types.

Returns:: Names of configured analyses (presence implies enabled).
Return type:: list[str]

Notes

Uses actual model data from comparison.yaml rather than relying on a registry. This makes comparison.yaml the source of truth for which analyses are enabled.

to_analysis_yaml_dict(replicates, eq_time)[source]

Convert to analysis.yaml-compatible dictionary.

Parameters:

replicates (list[int]) – Replicate numbers for the analysis.yaml.
eq_time (str) – Equilibration time for the analysis.yaml.

Returns:

Dictionary suitable for writing to analysis.yaml.

Return type:

dict[str, Any]

class polyzymd.compare.config.ComparisonSettingsContainer(**data)[source]

Bases: BaseModel

Container for comparison settings (HOW to compare).

Uses dynamic attribute access to support any registered comparison type. Each analysis type in analysis_settings must have a corresponding entry here (can be empty dict) to enable comparison.

model_config = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

__init__(**data)[source]

Initialize with dynamic comparison settings.

Parameters:: **data (Any) – Comparison settings keyed by analysis type name.

get(analysis_type)[source]

Get settings for a specific comparison type.

Parameters:: analysis_type (str) – Analysis type identifier (e.g., “rmsf”, “contacts”).
Returns:: Comparison settings, or None if not configured.
Return type:: BaseComparisonSettings or None

get_enabled_comparisons()[source]

Get list of enabled comparison types.

Returns:: Names of configured comparisons.
Return type:: list[str]

class polyzymd.compare.config.RMSFPlotSettings(*, show_error=True, highlight_residues=<factory>, figsize_profile=(14, 4), figsize_comparison=(8, 6))[source]

Bases: BasePlotSettings

RMSF-specific plot customization.

Variables:

show_error (bool) – Show error bands/bars on plots (default True)
highlight_residues (list[int]) – Residue numbers to highlight with vertical lines (e.g., active site)
figsize_profile (tuple[float, float]) – Figure size for per-residue profile plots
figsize_comparison (tuple[float, float]) – Figure size for bar comparison plots

show_error: bool

highlight_residues: list[int]

figsize_profile: tuple[float, float]

figsize_comparison: tuple[float, float]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.TriadPlotSettings(*, generate_kde_panel=True, generate_bars=True, generate_2d_kde=False, threshold_line_color='red', kde_fill_alpha=0.7, kde_xlim=(0.0, 7.0), figsize_kde_panel=None, figsize_bars=(10, 6))[source]

Bases: BasePlotSettings

Triad-specific plot customization.

Variables:

generate_kde_panel (bool) – Generate multi-row KDE panel plot (default True)
generate_bars (bool) – Generate grouped threshold bar chart (default True)
generate_2d_kde (bool) – Generate 2D joint KDE plot (default False, more specialized)
threshold_line_color (str) – Color for threshold vertical line
kde_fill_alpha (float) – Transparency for KDE fill (0-1)
kde_xlim (tuple[float, float]) – X-axis limits for KDE panel in Angstroms (default (0, 7)).
figsize_kde_panel (tuple[float, float] | None) – Figure size for KDE panel (auto-calculated if None)
figsize_bars (tuple[float, float]) – Figure size for bar chart

generate_kde_panel: bool

generate_bars: bool

generate_2d_kde: bool

threshold_line_color: str

kde_fill_alpha: float

kde_xlim: tuple[float, float]

figsize_kde_panel: tuple[float, float] | None

figsize_bars: tuple[float, float]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.DistancesPlotSettings(*, show_threshold=True, use_kde=True, generate_state_bars=True, figsize=(10, 6))[source]

Bases: BasePlotSettings

Distance analysis plot customization.

Variables:

show_threshold (bool) – Show threshold line on distribution plots
use_kde (bool) – Use KDE instead of histogram for distributions
generate_state_bars (bool) – Generate per-pair state bar charts (above/below threshold). Each pair gets its own figure showing the fraction of frames in each state per condition. Default True.
figsize (tuple[float, float]) – Default figure size for distance plots

show_threshold: bool

use_kde: bool

generate_state_bars: bool

figsize: tuple[float, float]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.ContactsPlotSettings(*, figsize=(10, 8), generate_enrichment_heatmap=True, generate_enrichment_bars=True, figsize_enrichment_heatmap=None, figsize_enrichment_bars=(10, 6), enrichment_colormap='RdBu_r', show_enrichment_error=True, generate_system_coverage_heatmap=True, generate_system_coverage_bars=True, figsize_system_coverage_heatmap=None, figsize_system_coverage_bars=(10, 6), show_system_coverage_error=True, generate_user_partition_bars=True, figsize_user_partition_bars=(10, 6), show_user_partition_error=True, generate_contact_fraction_profile=True, figsize_contact_fraction_profile=(14, 5), show_contact_fraction_profile_error=True, contact_fraction_profile_threshold=None, generate_residence_time_profile=True, figsize_residence_time_profile=(14, 5), show_residence_time_profile_error=True, generate_cf_by_aa_class_bars=True, figsize_cf_by_aa_class_bars=(10, 6), show_cf_by_aa_class_error=True, generate_cf_by_partition_bars=True, figsize_cf_by_partition_bars=(10, 6), show_cf_by_partition_error=True, generate_rt_by_aa_class_bars=True, figsize_rt_by_aa_class_bars=(10, 6), show_rt_by_aa_class_error=True, generate_rt_by_partition_bars=True, figsize_rt_by_partition_bars=(10, 6), show_rt_by_partition_error=True, highlight_residues=<factory>)[source]

Bases: BasePlotSettings

Contacts analysis plot customization.

Variables:

figsize (tuple[float, float]) – Default figure size for contact plots
generate_enrichment_heatmap (bool) – Generate binding preference enrichment heatmap (default True)
generate_enrichment_bars (bool) – Generate binding preference bar charts (default True)
figsize_enrichment_heatmap (tuple[float, float] | None) – Figure size for enrichment heatmap (auto-calculated if None)
figsize_enrichment_bars (tuple[float, float]) – Figure size for enrichment bar charts
enrichment_colormap (str) – Colormap for enrichment heatmap (diverging recommended)
show_enrichment_error (bool) – Show error bars on enrichment bar charts (default True)
generate_system_coverage_heatmap (bool) – Generate system coverage enrichment heatmap (default True)
generate_system_coverage_bars (bool) – Generate system coverage bar charts (default True)
figsize_system_coverage_heatmap (tuple[float, float] | None) – Figure size for system coverage heatmap (auto-calculated if None)
figsize_system_coverage_bars (tuple[float, float]) – Figure size for system coverage bar charts
show_system_coverage_error (bool) – Show error bars on system coverage bar charts (default True)
generate_user_partition_bars (bool) – Generate user-defined partition bar charts (default True)
figsize_user_partition_bars (tuple[float, float]) – Figure size for user-defined partition bar charts
show_user_partition_error (bool) – Show error bars on user-defined partition bar charts (default True)
generate_contact_fraction_profile (bool) – Generate per-residue contact fraction line plot (default True)
figsize_contact_fraction_profile (tuple[float, float]) – Figure size for contact fraction profile plot
show_contact_fraction_profile_error (bool) – Show SEM fill_between bands on contact fraction profile (default True)
contact_fraction_profile_threshold (float or None) – If set, draw a horizontal threshold line on the contact fraction profile. Residues above this value are considered “high contact”.
generate_residence_time_profile (bool) – Generate per-residue mean residence time line plot (default True)
figsize_residence_time_profile (tuple[float, float]) – Figure size for residence time profile plot
show_residence_time_profile_error (bool) – Show SEM fill_between bands on residence time profile (default True)
generate_cf_by_aa_class_bars (bool) – Generate contact fraction by AA class grouped bar chart (default True)
figsize_cf_by_aa_class_bars (tuple[float, float]) – Figure size for contact fraction by AA class bar chart
show_cf_by_aa_class_error (bool) – Show error bars on contact fraction by AA class bar chart (default True)
generate_cf_by_partition_bars (bool) – Generate contact fraction by user-defined partition bar charts (default True)
figsize_cf_by_partition_bars (tuple[float, float]) – Figure size for contact fraction by partition bar charts
show_cf_by_partition_error (bool) – Show error bars on contact fraction by partition bar charts (default True)
generate_rt_by_aa_class_bars (bool) – Generate residence time by AA class grouped bar chart (default True)
figsize_rt_by_aa_class_bars (tuple[float, float]) – Figure size for residence time by AA class bar chart
show_rt_by_aa_class_error (bool) – Show error bars on residence time by AA class bar chart (default True)
generate_rt_by_partition_bars (bool) – Generate residence time by user-defined partition bar charts (default True)
figsize_rt_by_partition_bars (tuple[float, float]) – Figure size for residence time by partition bar charts
show_rt_by_partition_error (bool) – Show error bars on residence time by partition bar charts (default True)
highlight_residues (list[int]) – Residue IDs to mark with vertical dashed lines on profile plots. Useful for highlighting active-site residues or known anchor points.

figsize: tuple[float, float]

generate_enrichment_heatmap: bool

generate_enrichment_bars: bool

figsize_enrichment_heatmap: tuple[float, float] | None

figsize_enrichment_bars: tuple[float, float]

enrichment_colormap: str

show_enrichment_error: bool

generate_system_coverage_heatmap: bool

generate_system_coverage_bars: bool

figsize_system_coverage_heatmap: tuple[float, float] | None

figsize_system_coverage_bars: tuple[float, float]

show_system_coverage_error: bool

generate_user_partition_bars: bool

figsize_user_partition_bars: tuple[float, float]

show_user_partition_error: bool

generate_contact_fraction_profile: bool

figsize_contact_fraction_profile: tuple[float, float]

show_contact_fraction_profile_error: bool

contact_fraction_profile_threshold: float | None

generate_residence_time_profile: bool

figsize_residence_time_profile: tuple[float, float]

show_residence_time_profile_error: bool

generate_cf_by_aa_class_bars: bool

figsize_cf_by_aa_class_bars: tuple[float, float]

show_cf_by_aa_class_error: bool

generate_cf_by_partition_bars: bool

figsize_cf_by_partition_bars: tuple[float, float]

show_cf_by_partition_error: bool

generate_rt_by_aa_class_bars: bool

figsize_rt_by_aa_class_bars: tuple[float, float]

show_rt_by_aa_class_error: bool

generate_rt_by_partition_bars: bool

figsize_rt_by_partition_bars: tuple[float, float]

show_rt_by_partition_error: bool

highlight_residues: list[int]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.BFEPlotSettings(*, generate_heatmap=True, generate_bars=True, figsize_heatmap=None, figsize_bars=(10, 6), colormap='RdBu_r', show_error_bars=True, annotate_heatmap=True)[source]

Bases: BasePlotSettings

Binding free energy plot customization.

Variables:

generate_heatmap (bool) – Generate ΔG_sel heatmap (rows = AA groups, columns = conditions). Default True.
generate_bars (bool) – Generate ΔG_sel grouped bar chart (one bar per condition per AA group). Default True.
figsize_heatmap (tuple[float, float] | None) – Figure size for ΔG_sel heatmap (auto-calculated if None).
figsize_bars (tuple[float, float]) – Figure size for ΔG_sel bar charts.
colormap (str) – Diverging colormap for heatmap (default “RdBu_r”: red = avoidance, blue = preference).
show_error_bars (bool) – Show SEM error bars on bar charts. Default True.
annotate_heatmap (bool) – Annotate each heatmap cell with its ΔG_sel value. Default True.

generate_heatmap: bool

generate_bars: bool

figsize_heatmap: tuple[float, float] | None

figsize_bars: tuple[float, float]

colormap: str

show_error_bars: bool

annotate_heatmap: bool

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.AffinityPlotSettings(*, generate_stacked_bars=True, generate_group_bars=True, figsize_stacked=(10, 6), figsize_group_bars=(10, 6), show_error_bars=True)[source]

Bases: BasePlotSettings

Polymer affinity score plot customization.

Variables:

generate_stacked_bars (bool) – Generate stacked bar chart of total score by condition, broken down by polymer type. Default True.
generate_group_bars (bool) – Generate grouped bar chart showing per-group contributions across conditions. Default True.
figsize_stacked (tuple[float, float]) – Figure size for stacked bar chart.
figsize_group_bars (tuple[float, float]) – Figure size for grouped bar charts.
show_error_bars (bool) – Show SEM error bars on plots. Default True.

generate_stacked_bars: bool

generate_group_bars: bool

figsize_stacked: tuple[float, float]

figsize_group_bars: tuple[float, float]

show_error_bars: bool

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.SSPlotSettings(*, generate_timeline=True, generate_content_bars=True, generate_individual_bars=True, generate_diff_heatmap=True, figsize_timeline=(14, 6), figsize_content_bars=(10, 6), figsize_diff_heatmap=None, diff_colormap='RdBu_r')[source]

Bases: BasePlotSettings

Secondary structure plot customization.

Variables:

generate_timeline (bool) – Generate per-condition residue x time SS heatmap. Default True.
generate_content_bars (bool) – Generate grouped bar chart of helix/strand/coil fractions. Default True.
generate_individual_bars (bool) – Generate one bar chart per SS type (helix, beta-sheet, no-SS). Default True.
generate_diff_heatmap (bool) – Generate condition x residue persistence difference heatmap. Default True.
figsize_timeline (tuple[float, float]) – Figure size for timeline heatmap.
figsize_content_bars (tuple[float, float]) – Figure size for content bar chart.
figsize_diff_heatmap (tuple[float, float] | None) – Figure size for difference heatmap (auto-calculated if None).
diff_colormap (str) – Diverging colormap for difference heatmap.

generate_timeline: bool

generate_content_bars: bool

generate_individual_bars: bool

generate_diff_heatmap: bool

figsize_timeline: tuple[float, float]

figsize_content_bars: tuple[float, float]

figsize_diff_heatmap: tuple[float, float] | None

diff_colormap: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.PlotTheme(*, title_fontsize=13, suptitle_fontsize=14, label_fontsize=11, tick_fontsize=9, legend_fontsize=9, annotation_fontsize=9, small_fontsize=8, tiny_fontsize=7, bar_alpha=0.85, bar_edgecolor='black', bar_linewidth=0.5, bar_capsize=4, dot_size=18, dot_alpha=0.7, dot_color='black', line_alpha=0.8, fill_alpha=0.25, reference_line_color='black', reference_line_style='--', reference_line_width=1.5, highlight_line_alpha=0.5, hide_top_spine=True, hide_right_spine=True, title_fontweight='bold', legend_loc='center left', legend_bbox=(1.02, 0.5), show_watermark=True)[source]

Bases: BaseModel

Centralized visual defaults for all comparison plots.

Replaces ~219 hardcoded style values (font sizes, alphas, line widths, marker sizes, spine visibility, etc.) across all plotter files with a single configurable Pydantic model.

Three presets are available via class methods:

PlotTheme.publication() — default; print-ready sizes and weights.
PlotTheme.presentation() — ~1.3x larger fonts/dots/lines for slides.
PlotTheme.minimal() — no dots, no bar edges, thinner lines.

Users can override individual values in comparison.yaml:

plot_settings:
  style: "publication"
  theme:
    title_fontsize: 16
    dot_size: 24

Parameters:

title_fontsize (int) – Font size for axes titles.
suptitle_fontsize (int) – Font size for figure suptitles.
label_fontsize (int) – Font size for axis labels (xlabel/ylabel).
tick_fontsize (int) – Font size for tick labels.
legend_fontsize (int) – Font size for legend entries.
annotation_fontsize (int) – Font size for heatmap cell annotations and inline text.
small_fontsize (int) – Font size for secondary annotations (e.g. SEM ± labels).
tiny_fontsize (int) – Font size for fine-grained annotations (e.g. residue IDs).
bar_alpha (float) – Opacity for bar chart fill.
bar_edgecolor (str) – Edge colour for bar outlines.
bar_linewidth (float) – Edge line width for bars.
bar_capsize (int) – Error bar cap size in points.
dot_size (int) – Marker size for replicate dot overlays (s= in scatter).
dot_alpha (float) – Opacity for replicate dots.
dot_color (str) – Colour for replicate dots.
line_alpha (float) – Opacity for line plots (e.g. RMSF profiles).
fill_alpha (float) – Opacity for fill_between bands (e.g. SEM regions).
reference_line_color (str) – Colour for horizontal/vertical reference lines.
reference_line_style (str) – Linestyle for reference lines (e.g. "--").
reference_line_width (float) – Line width for reference lines.
highlight_line_alpha (float) – Opacity for highlight / vertical reference lines.
hide_top_spine (bool) – Whether to hide the top axis spine.
hide_right_spine (bool) – Whether to hide the right axis spine.
title_fontweight (str) – Font weight for titles (e.g. "bold", "normal").
legend_loc (str) – Matplotlib legend location string (e.g. "center left"). Used with legend_bbox to place the legend outside the axes.
legend_bbox (tuple of float) – bbox_to_anchor for legend placement, relative to axes. Default (1.02, 0.5) places it just outside the right edge, vertically centred.
show_watermark (bool) – Whether to render a subtle “Made by PolyzyMD” watermark in the bottom-right corner of every saved figure. Default True.

title_fontsize: int

suptitle_fontsize: int

label_fontsize: int

tick_fontsize: int

legend_fontsize: int

annotation_fontsize: int

small_fontsize: int

tiny_fontsize: int

bar_alpha: float

bar_edgecolor: str

bar_linewidth: float

bar_capsize: int

dot_size: int

dot_alpha: float

dot_color: str

line_alpha: float

fill_alpha: float

reference_line_color: str

reference_line_style: str

reference_line_width: float

highlight_line_alpha: float

hide_top_spine: bool

hide_right_spine: bool

title_fontweight: str

legend_loc: str

legend_bbox: tuple[float, float]

show_watermark: bool

classmethod publication()[source]

Publication preset — default values, print-ready.

classmethod presentation()[source]

Presentation preset — ~1.3x larger fonts/dots/lines for slides.

classmethod minimal()[source]

Minimal preset — no dots, no bar edges, thinner lines.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.PlotSettings(*, output_dir=PosixPath('figures'), format='png', dpi=300, style='publication', color_palette='tab10', theme=<factory>, **data)[source]

Bases: BaseModel

Global plot settings for comparison.yaml.

Controls plot generation for all analyses. Per-analysis plot settings are discovered via PlotSettingsRegistry — any key in the YAML that matches a registered analysis type is parsed into the corresponding settings class. Unrecognised keys that are not global fields are logged and skipped.

Variables:

output_dir (Path) – Directory for generated plots (relative to comparison.yaml)
format (str) – Image format: “png”, “pdf”, or “svg”
dpi (int) – Resolution for raster formats (PNG)
style (str) – Plot style preset: “publication”, “presentation”, or “minimal”
color_palette (str) – Seaborn/matplotlib color palette name
theme (PlotTheme) – Resolved visual theme. Built from the style preset and any user overrides in the theme: YAML block.

Notes

Attribute access for any registered analysis type always succeeds: if the user did not provide that section in YAML, a default-constructed settings instance is returned. This means self.settings.rmsf.show_error is always safe, even when the YAML has no rmsf: block.

Examples

In comparison.yaml:

plot_settings:
  output_dir: "figures/"
  format: "png"
  dpi: 300
  style: "publication"

  rmsf:
    highlight_residues: [77, 133, 156]

  triad:
    generate_2d_kde: true

model_config = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

output_dir: Path

format: str

dpi: int

style: str

color_palette: str

theme: PlotTheme

__init__(**data)[source]

Initialize with global fields and registry-discovered per-analysis settings.

Theme resolution: the style field selects a preset (publication, presentation, or minimal) and then any user-supplied theme: overrides are merged on top. This allows style: presentation with theme: {dot_size: 40} to use the presentation preset but override just the dot size.

Parameters:: **data (Any) – Plot settings from YAML. Keys matching registered analysis types are parsed into their settings classes; global keys are handled by Pydantic; unknown keys are logged and skipped.

__getattr__(name)[source]

Fall back to default-constructed settings for registered types.

This ensures self.settings.rmsf.show_error works even when the user omitted the rmsf: block from their YAML.

Parameters:: name (str) – Attribute name.
Returns:: Default-constructed settings if name is a registered type.
Return type:: BasePlotSettings
Raises:: AttributeError – If name is not a registered plot settings type.

classmethod resolve_output_dir(v)[source]

Convert string paths to Path objects.

class polyzymd.compare.config.ComparisonConfig(*, name, description=None, control=None, conditions, defaults=<factory>, analysis_settings=<factory>, comparison_settings=<factory>, plot_settings=<factory>, source_path=None)[source]

Bases: BaseModel

Schema for comparison.yaml configuration files.

A comparison config defines multiple simulation conditions to compare, along with analysis settings and comparison-specific parameters.

The schema follows a three-section pattern: - analysis_settings: WHAT to analyze (shared across conditions) - comparison_settings: HOW to compare (statistical parameters) - plot_settings: HOW to visualize (plot customization)

Variables:

name (str) – Name of the comparison project
description (str, optional) – Description of what is being compared
control (str, optional) – Label of the control condition for relative comparisons
conditions (list[ConditionConfig]) – List of conditions to compare
defaults (AnalysisDefaults) – Default analysis parameters (equilibration_time)
analysis_settings (AnalysisSettingsContainer) – Analysis parameters (WHAT to analyze)
comparison_settings (ComparisonSettingsContainer) – Comparison parameters (HOW to compare)
plot_settings (PlotSettings) – Plot customization (HOW to visualize)

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> print(config.name)
"Polymer Stabilization Study"
>>> for cond in config.conditions:
...     print(f"{cond.label}: {cond.config}")
>>> print("Enabled analyses:", config.analysis_settings.get_enabled_analyses())
>>> rmsf_settings = config.analysis_settings.get("rmsf")
>>> if rmsf_settings:
...     print(f"RMSF selection: {rmsf_settings.selection}")

name: str

description: str | None

control: str | None

conditions: list[ConditionConfig]

defaults: AnalysisDefaults

analysis_settings: AnalysisSettingsContainer

comparison_settings: ComparisonSettingsContainer

plot_settings: PlotSettings

source_path: Path | None

classmethod parse_analysis_settings(v)[source]

Parse analysis_settings from dict or container.

classmethod parse_comparison_settings(v)[source]

Parse comparison_settings from dict or container.

validate_comparison_coverage()[source]

Validate that comparison_settings covers all analysis_settings.

Each analysis type in analysis_settings must have a corresponding entry in comparison_settings (can be empty {}).

classmethod from_yaml(path)[source]

Load comparison config from YAML file.

Parameters:

path (Path or str) – Path to comparison.yaml file

Returns:

Loaded and validated configuration

Return type:

ComparisonConfig

Raises:

FileNotFoundError – If the config file doesn’t exist
ValidationError – If the config is invalid

to_yaml(path)[source]

Save comparison config to YAML file.

Parameters:: path (Path or str) – Output path for comparison.yaml

get_condition(label)[source]

Get a condition by its label.

Parameters:: label (str) – The condition label to find
Returns:: The matching condition
Return type:: ConditionConfig
Raises:: KeyError – If no condition with that label exists

validate_config()[source]

Validate the comparison configuration.

Returns:: List of error messages (empty if valid)
Return type:: list[str]

generate_analysis_yaml(condition)[source]

Generate analysis.yaml content for a specific condition.

Parameters:: condition (ConditionConfig) – The condition to generate analysis.yaml for.
Returns:: YAML content for the analysis.yaml file.
Return type:: str

generate_analysis_yaml_for_all()[source]

Generate analysis.yaml content for all conditions.

Returns:: Dictionary mapping condition labels to analysis.yaml content.
Return type:: dict[str, str]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

polyzymd.compare.config.generate_comparison_template(name, eq_time='10ns')[source]

Generate a template comparison.yaml file.

Parameters:

name (str) – Project name
eq_time (str) – Default equilibration time

Returns:

YAML template content

Return type:

str

Settings

Analysis and comparison settings for the comparison workflow.

This module defines the concrete settings classes for each analysis type, registered via the AnalysisSettingsRegistry and ComparisonSettingsRegistry.

Analysis Settings (WHAT to analyze): - RMSFAnalysisSettings: RMSF calculation parameters - DistancesAnalysisSettings: Distance pair monitoring parameters - CatalyticTriadAnalysisSettings: Active site distance analysis - ContactsAnalysisSettings: Polymer-protein contact parameters

Comparison Settings (HOW to compare): - RMSFComparisonSettings: (no comparison-specific params) - DistancesComparisonSettings: (no comparison-specific params) - CatalyticTriadComparisonSettings: (no comparison-specific params) - ContactsComparisonSettings: FDR, effect size, top residues

All settings classes are auto-registered on module import.

class polyzymd.compare.settings.RMSFAnalysisSettings(*, selection='protein and name CA', reference_mode='centroid', reference_frame=None, reference_file=None)[source]

Bases: BaseAnalysisSettings

RMSF analysis settings.

Variables:

selection (str) – MDAnalysis selection string for RMSF calculation.
reference_mode (str) – Reference structure mode: centroid, average, frame, or external.
reference_frame (int, optional) – Frame number if reference_mode is ‘frame’ (1-indexed).
reference_file (str, optional) – Path to external PDB file if reference_mode is ‘external’.

selection: str

reference_mode: str

reference_frame: int | None

reference_file: str | None

classmethod analysis_type()[source]

Return the analysis type identifier.

classmethod validate_reference_mode(v)[source]

Validate reference mode is one of the allowed values.

validate_reference_params()[source]

Validate reference_frame and reference_file for their modes.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.RMSFComparisonSettings[source]

Bases: BaseComparisonSettings

Comparison settings for RMSF analysis.

Currently empty — all RMSF comparison behavior uses defaults from BaseComparisonSettings. This class exists as an extension point: add fields here when RMSF-specific comparison parameters are needed (e.g., a per-residue significance threshold) without modifying the orchestrator or other comparison types.

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.DistancePairSettings(*, label, selection_a, selection_b, threshold=None, below_label=None, above_label=None)[source]

Bases: BaseAnalysisSettings

Configuration for a single distance pair.

Variables:

label (str) – Human-readable label for this pair.
selection_a (str) – First atom/point selection.
selection_b (str) – Second atom/point selection.
threshold (float, optional) – Per-pair distance threshold (Angstroms). If None, uses the global threshold from DistancesAnalysisSettings.
below_label (str, optional) – Display label for the “below threshold” state (e.g. "Bound", "Closed"). When None, defaults to "Below {threshold}Å".
above_label (str, optional) – Display label for the “above threshold” state (e.g. "Unbound", "Open"). When None, defaults to "Above {threshold}Å".

label: str

selection_a: str

selection_b: str

threshold: float | None

below_label: str | None

above_label: str | None

classmethod analysis_type()[source]

Return the analysis type identifier.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.DistancesAnalysisSettings(*, threshold=3.5, pairs=<factory>, use_pbc=True, align_trajectory=True, alignment_selection='protein and name CA', alignment_mode='centroid', alignment_frame=None)[source]

Bases: BaseAnalysisSettings

Distance analysis settings.

Variables:

threshold (float, optional) – Distance threshold for contact analysis (Angstroms).
pairs (list[DistancePairSettings]) – List of atom pairs to measure distances between.
use_pbc (bool) – Use PBC-aware minimum image distances. Default True.
align_trajectory (bool) – Align trajectory before distance calculation. Default True. When enabled, removes rotational drift and COM motion that can add noise to inter-domain distance measurements.
alignment_selection (str) – MDAnalysis selection for trajectory alignment. Default: “protein and name CA”.
alignment_mode (str) – Reference mode for alignment: “centroid”, “average”, or “frame”. Default: “centroid”.
alignment_frame (int, optional) – Reference frame (1-indexed) when alignment_mode=”frame”.

threshold: float | None

pairs: list[DistancePairSettings]

use_pbc: bool

align_trajectory: bool

alignment_selection: str

alignment_mode: str

alignment_frame: int | None

classmethod analysis_type()[source]

Return the analysis type identifier.

classmethod validate_pairs(v)[source]

Ensure at least one pair is defined.

classmethod validate_alignment_mode(v)[source]

Validate alignment mode is one of the allowed values.

validate_alignment_frame_required()[source]

Ensure alignment_frame is provided when alignment_mode is ‘frame’.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

get_pair_selections()[source]

Get list of (selection_a, selection_b) tuples.

get_pair_labels()[source]

Get list of pair labels.

get_pair_thresholds()[source]

Get list of thresholds per pair, using global threshold as fallback.

Returns:: List of thresholds, one per pair. If a pair has no explicit threshold, the global threshold is used. If neither is set, None is returned.
Return type:: list[float | None]

get_alignment_config()[source]

Build an AlignmentConfig from these settings.

Returns:: Configuration for trajectory alignment, ready to pass to align_trajectory() or DistanceCalculator.
Return type:: AlignmentConfig

Notes

Import is done inside the method to avoid circular imports.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.DistancesComparisonSettings[source]

Bases: BaseComparisonSettings

Comparison settings for distance analysis.

Currently empty — all distance comparison behavior uses defaults from BaseComparisonSettings. This class exists as an extension point: add fields here when distance-specific comparison parameters are needed (e.g., per-pair significance thresholds) without modifying the orchestrator or other comparison types.

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.TriadPairSettings(*, label, selection_a, selection_b)[source]

Bases: BaseAnalysisSettings

Configuration for one distance pair in a catalytic triad/active site.

Variables:

label (str) – Human-readable label for this pair (e.g., “Asp133-His156”).
selection_a (str) – First atom/point selection.
selection_b (str) – Second atom/point selection.

label: str

selection_a: str

selection_b: str

classmethod analysis_type()[source]

Return the analysis type identifier.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.CatalyticTriadAnalysisSettings(*, name, pairs, threshold=3.5, description=None)[source]

Bases: BaseAnalysisSettings

Catalytic triad/active site analysis settings.

Variables:

name (str) – Name of the triad/active site (e.g., “LipA_catalytic_triad”).
pairs (list[TriadPairSettings]) – Distance pairs to monitor.
threshold (float) – Distance threshold for contact/H-bond analysis (Angstroms).
description (str, optional) – Description of the active site.

name: str

pairs: list[TriadPairSettings]

threshold: float

description: str | None

classmethod analysis_type()[source]

Return the analysis type identifier.

classmethod validate_pairs(v)[source]

Ensure at least one pair is defined.

property n_pairs: int: Number of distance pairs.

get_pair_selections()[source]

Get list of (selection_a, selection_b) tuples.

get_pair_labels()[source]

Get list of pair labels.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.CatalyticTriadComparisonSettings[source]

Bases: BaseComparisonSettings

Comparison settings for catalytic triad analysis.

Currently empty — all triad comparison behavior uses defaults from BaseComparisonSettings. This class exists as an extension point: add fields here when triad-specific comparison parameters are needed (e.g., functional distance thresholds) without modifying the orchestrator or other comparison types.

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.BindingPreferenceFieldsMixin(*, surface_exposure_threshold=0.2, enzyme_pdb_for_sasa=None, include_default_aa_groups=True, protein_groups=None, protein_partitions=None, polymer_type_selections=None)[source]

Bases: BaseAnalysisSettings

Shared fields for experimental binding-preference-derived analyses.

Both ContactsAnalysisSettings and BindingFreeEnergyAnalysisSettings need identical fields for surface exposure, protein grouping, and polymer type selection. This mixin provides them once, keeping defaults in sync.

Variables:

surface_exposure_threshold (float) – Relative SASA threshold for surface exposure (0.0-1.0).
enzyme_pdb_for_sasa (str, optional) – Path to enzyme PDB for SASA calculation.
include_default_aa_groups (bool) – Include default AA class groupings (aromatic, polar, etc.).
protein_groups (dict[str, list[int]], optional) – Custom protein groups as {name: [resid1, resid2, …]}.
protein_partitions (dict[str, list[str]], optional) – Custom partitions for system coverage comparison.
polymer_type_selections (dict[str, str], optional) – Custom polymer type selections as {name: “MDAnalysis selection”}.

surface_exposure_threshold: float

enzyme_pdb_for_sasa: str | None

include_default_aa_groups: bool

protein_groups: dict[str, list[int]] | None

protein_partitions: dict[str, list[str]] | None

polymer_type_selections: dict[str, str] | None

classmethod analysis_type()[source]

Return the analysis type identifier (override in subclass).

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.ContactsAnalysisSettings(*, surface_exposure_threshold=0.2, enzyme_pdb_for_sasa=None, include_default_aa_groups=True, protein_groups=None, protein_partitions=None, polymer_type_selections=None, polymer_selection='chainID C', protein_selection='protein', cutoff=4.5, polymer_types=None, grouping='aa_class', compute_residence_times=True, compute_binding_preference=False, enrichment_normalization='residue')[source]

Bases: BindingPreferenceFieldsMixin

Polymer-protein contact analysis settings.

Inherits binding preference fields (surface_exposure_threshold, enzyme_pdb_for_sasa, include_default_aa_groups, protein_groups, protein_partitions, polymer_type_selections) from BindingPreferenceFieldsMixin.

Variables:

polymer_selection (str) – MDAnalysis selection for polymer atoms.
protein_selection (str) – MDAnalysis selection for protein atoms.
cutoff (float) – Distance cutoff for contacts in Angstroms.
polymer_types (list[str], optional) – Filter contacts by polymer residue names.
grouping (str) – How to group protein residues: aa_class, secondary_structure, or none.
compute_residence_times (bool) – If True, compute residence time statistics.
compute_binding_preference (bool) – If True, compute binding preference enrichment analysis.
enrichment_normalization (str) – DEPRECATED (kept for backward compatibility). Enrichment is now always normalized by protein surface availability. This field is ignored.

polymer_selection: str

protein_selection: str

cutoff: float

polymer_types: list[str] | None

grouping: str

compute_residence_times: bool

compute_binding_preference: bool

enrichment_normalization: str

classmethod analysis_type()[source]

Return the analysis type identifier.

classmethod validate_grouping(v)[source]

Validate grouping mode.

classmethod validate_enrichment_normalization(v)[source]

Validate enrichment normalization method.

validate_protein_partitions()[source]

Validate protein_partitions references and mutual exclusivity.

Validates: 1. All groups referenced in partitions exist in protein_groups 2. Groups within each partition don’t overlap (mutually exclusive)

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.ContactsComparisonSettings(*, fdr_alpha=0.05, min_effect_size=0.5, top_residues=10)[source]

Bases: BaseComparisonSettings

Comparison settings for polymer-protein contacts analysis.

Variables:

fdr_alpha (float) – False discovery rate alpha for Benjamini-Hochberg correction.
min_effect_size (float) – Minimum Cohen’s d effect size to highlight in reports.
top_residues (int) – Number of top residues (by effect size) to display in console.

fdr_alpha: float

min_effect_size: float

top_residues: int

classmethod analysis_type()[source]

Return the analysis type identifier.

classmethod validate_fdr_alpha(v)[source]

Validate FDR alpha is in valid range.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.ExposureAnalysisSettings(*, protein_selection='protein', polymer_selection='chainID C', exposure_threshold=0.2, transient_lower=0.2, transient_upper=0.8, min_event_length=1, probe_radius_nm=0.14, n_sphere_points=960, protein_chain='A', polymer_resnames=None)[source]

Bases: BaseAnalysisSettings

Experimental exposure dynamics settings (dynamic SASA-based chaperone analysis).

Variables:

protein_selection (str) – MDAnalysis selection for protein atoms (chain A by default).
polymer_selection (str) – MDAnalysis selection for polymer atoms (chain C by default).
exposure_threshold (float) – Relative SASA threshold for classifying a residue as exposed.
transient_lower (float) – Lower bound of exposure fraction for “transient” classification.
transient_upper (float) – Upper bound of exposure fraction for “transient” classification.
min_event_length (int) – Minimum exposed-window length (frames) to count as an event.
probe_radius_nm (float) – Probe radius for MDTraj shrake_rupley, in nm.
n_sphere_points (int) – Number of sphere points for shrake_rupley.
protein_chain (str) – Chain letter for protein (default “A”).
polymer_resnames (list[str], optional) – Subset of polymer monomer resnames to include. If None, all detected.

protein_selection: str

polymer_selection: str

exposure_threshold: float

transient_lower: float

transient_upper: float

min_event_length: int

probe_radius_nm: float

n_sphere_points: int

protein_chain: str

polymer_resnames: list[str] | None

classmethod analysis_type()[source]

Return the analysis type identifier.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.ExposureComparisonSettings[source]

Bases: BaseComparisonSettings

Comparison settings for exposure dynamics analysis.

Currently empty — all exposure comparison behavior uses defaults from BaseComparisonSettings. This class exists as an extension point: add fields here when exposure-specific comparison parameters are needed (e.g., transient classification thresholds) without modifying the orchestrator or other comparison types.

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.BindingFreeEnergyAnalysisSettings(*, surface_exposure_threshold=0.2, enzyme_pdb_for_sasa=None, include_default_aa_groups=True, protein_groups=None, protein_partitions=None, polymer_type_selections=None, units='kT', compute_binding_preference=True)[source]

Bases: BindingPreferenceFieldsMixin

Experimental settings for binding free energy analysis via Boltzmann inversion.

Computes the selectivity free energy:

ΔG_sel = -k_B·T · ln(contact_share / expected_share)

where: - contact_share = fraction of polymer contacts directed at an AA group - expected_share = fraction of exposed surface belonging to that AA group - T = simulation temperature (from SimulationConfig)

This is a post-processing analysis that consumes binding preference results from the contacts analysis layer (no new per-frame computation is needed).

Inherits binding preference fields (surface_exposure_threshold, enzyme_pdb_for_sasa, include_default_aa_groups, protein_groups, protein_partitions, polymer_type_selections) from BindingPreferenceFieldsMixin.

Variables:

units (str) – Energy units for output. One of “kT” (dimensionless, in units of k_bT — the thermal energy), “kcal/mol”, or “kJ/mol”.
compute_binding_preference (bool) – Compute binding preference from contacts data when cached results are not found.

units: str

compute_binding_preference: bool

classmethod validate_units(v)[source]

Validate energy units.

classmethod analysis_type()[source]

Return the analysis type identifier.

k_b()[source]

Return k_B in the selected energy units.

Returns:: Boltzmann constant in kcal/(mol·K) or kJ/(mol·K). When units=’kT’, returns 0.0 — callers should use kT=1.0 directly instead of k_b() * T.
Return type:: float

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

Returns:: Dictionary suitable for writing to analysis.yaml.
Return type:: dict

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.BindingFreeEnergyComparisonSettings(*, fdr_alpha=0.05)[source]

Bases: BaseComparisonSettings

Comparison settings for binding free energy analysis.

Variables:: fdr_alpha (float) – False discovery rate alpha for Benjamini-Hochberg correction of p-values across (polymer_type, AA_group) pairs.

fdr_alpha: float

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.PolymerAffinityScoreSettings(*, surface_exposure_threshold=0.2, enzyme_pdb_for_sasa=None, include_default_aa_groups=True, protein_groups=None, protein_partitions=None, polymer_type_selections=None, compute_binding_preference=True)[source]

Bases: BindingPreferenceFieldsMixin

Experimental settings for polymer affinity score analysis.

The polymer affinity score is a comparative metric that quantifies total polymer-protein interaction strength:

S = Σ_{p,g} N_{p,g} × ΔG_sel_{p,g} [kT]

where:: N = mean_contact_fraction × n_exposed_in_group ΔG_sel = -ln(contact_share / expected_share)

This is a post-processing analysis that consumes binding preference results from the contacts analysis layer — no new per-frame computation is needed. All scores are in kT (dimensionless); the temperature factor cancels in the Boltzmann inversion ratio.

Important

This metric assumes thermodynamic independence of contacts. The absolute values are NOT rigorous binding free energies. Only relative differences between polymer compositions are meaningful (comparative ranking).

Inherits binding preference fields (surface_exposure_threshold, enzyme_pdb_for_sasa, include_default_aa_groups, protein_groups, protein_partitions, polymer_type_selections) from BindingPreferenceFieldsMixin.

Variables:: compute_binding_preference (bool) – Compute binding preference from contacts data when cached results are not found.

compute_binding_preference: bool

classmethod analysis_type()[source]

Return the analysis type identifier.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

Returns:: Dictionary suitable for writing to analysis.yaml.
Return type:: dict

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.PolymerAffinityScoreComparisonSettings(*, fdr_alpha=0.05)[source]

Bases: BaseComparisonSettings

Comparison settings for polymer affinity score analysis.

Variables:: fdr_alpha (float) – False discovery rate alpha for Benjamini-Hochberg correction of pairwise p-values across conditions.

fdr_alpha: float

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.SecondaryStructureAnalysisSettings(*, chain_id='A')[source]

Bases: BaseAnalysisSettings

Secondary structure (DSSP) analysis settings.

Variables:: chain_id (str) – Chain letter for the protein to analyze (default “A”).

chain_id: str

classmethod analysis_type()[source]

Return the analysis type identifier.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.SecondaryStructureComparisonSettings[source]

Bases: BaseComparisonSettings

Comparison settings for secondary structure analysis.

Currently empty — all secondary structure comparison behavior uses defaults from BaseComparisonSettings. This class exists as an extension point: add fields here when SS-specific comparison parameters are needed without modifying the orchestrator.

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

polyzymd.compare.settings.get_all_analysis_types()[source]

Get all registered analysis types.

Returns:: Sorted list of registered analysis type names.
Return type:: list[str]

polyzymd.compare.settings.get_all_comparison_types()[source]

Get all registered comparison settings types.

Returns:: Sorted list of registered comparison type names.
Return type:: list[str]

Statistics

Statistical tests for comparing simulation conditions.

This module provides statistical functions for comparing analysis results across multiple conditions, including t-tests, ANOVA, and effect sizes.

All functions use SciPy for statistical calculations.

class polyzymd.compare.statistics.TTestResult(t_statistic, p_value)[source]

Bases: object

Result of a two-sample t-test.

Variables:

t_statistic (float) – The t-statistic
p_value (float) – Two-tailed p-value

t_statistic: float

p_value: float

property significant: bool: Whether the result is significant at p < 0.05.

to_dict()[source]

Convert to dictionary.

__init__(t_statistic, p_value)

class polyzymd.compare.statistics.EffectSize(cohens_d, interpretation, direction)[source]

Bases: object

Cohen’s d effect size with interpretation.

Variables:

cohens_d (float) – The effect size (positive = group1 > group2)
interpretation (str) – Categorical interpretation: “negligible”, “small”, “medium”, “large”
direction (str) – For RMSF: “stabilizing” (d > 0, lower RMSF) or “destabilizing” (d < 0)

cohens_d: float

interpretation: str

direction: str

to_dict()[source]

Convert to dictionary.

__init__(cohens_d, interpretation, direction)

class polyzymd.compare.statistics.ANOVAResult(f_statistic, p_value)[source]

Bases: object

Result of one-way ANOVA.

Variables:

f_statistic (float) – The F-statistic
p_value (float) – P-value for the test

f_statistic: float

p_value: float

property significant: bool: Whether the result is significant at p < 0.05.

to_dict()[source]

Convert to dictionary.

__init__(f_statistic, p_value)

polyzymd.compare.statistics.independent_ttest(group1, group2)[source]

Perform two-sample independent t-test.

Tests the null hypothesis that two independent samples have identical expected values.

Parameters:

group1 (array_like) – First group of values (e.g., control replicate means)
group2 (array_like) – Second group of values (e.g., treatment replicate means)

Returns:

Result containing t-statistic and p-value

Return type:

TTestResult

Examples

>>> control = [0.715, 0.693, 0.696]  # No polymer RMSF
>>> treatment = [0.517, 0.586]        # 100% SBMA RMSF
>>> result = independent_ttest(control, treatment)
>>> print(f"t = {result.t_statistic:.3f}, p = {result.p_value:.4f}")

polyzymd.compare.statistics.cohens_d(group1, group2, rmsf_mode=True)[source]

Compute Cohen’s d effect size.

Cohen’s d is the difference between means divided by the pooled standard deviation. A positive d means group1 has higher values.

For RMSF comparisons (rmsf_mode=True), direction is interpreted as: - d > 0 (control > treatment) = “stabilizing” (treatment reduces RMSF) - d < 0 (control < treatment) = “destabilizing” (treatment increases RMSF)

Parameters:

group1 (array_like) – First group (typically control)
group2 (array_like) – Second group (typically treatment)
rmsf_mode (bool, optional) – If True, interpret direction for RMSF (lower = better). Default is True.

Returns:

Effect size with interpretation

Return type:

EffectSize

Notes

Effect size interpretation (Cohen, 1988):

|d| < 0.2: negligible
0.2 <= |d| < 0.5: small
0.5 <= |d| < 0.8: medium
|d| >= 0.8: large

polyzymd.compare.statistics.one_way_anova(*groups)[source]

Perform one-way ANOVA across multiple groups.

Tests the null hypothesis that all groups have the same mean.

Parameters:: *groups (array_like) – Variable number of groups to compare
Returns:: Result containing F-statistic and p-value
Return type:: ANOVAResult

Examples

>>> no_poly = [0.715, 0.693, 0.696]
>>> sbma = [0.517, 0.586]
>>> egma = [0.558, 0.738, 0.496]
>>> result = one_way_anova(no_poly, sbma, egma)
>>> print(f"F = {result.f_statistic:.3f}, p = {result.p_value:.4f}")

polyzymd.compare.statistics.percent_change(control_mean, treatment_mean)[source]

Calculate percent change from control.

Parameters:

control_mean (float) – Mean value of control condition
treatment_mean (float) – Mean value of treatment condition

Returns:

Percent change: (treatment - control) / control * 100 Negative = reduction, Positive = increase

Return type:

float

Comparators

Contacts

Contacts comparator for comparing polymer-protein contacts across conditions.

This module provides the ContactsComparator class that orchestrates contacts analysis and statistical comparison across multiple conditions.

Key features: - Aggregate-level comparisons (coverage, mean contact fraction) - Effect size (Cohen’s d) for practical significance - ANOVA for 3+ conditions - Auto-exclusion of conditions without polymer (e.g., “No Polymer” controls)

The comparator inherits from BaseComparator and implements the Template Method pattern for DRY comparison logic. Since contacts has TWO primary metrics (coverage and mean_contact_fraction), some methods are customized.

Note

Per-residue pairwise comparisons have been removed. Contact data is mechanistic (explains WHY stability changes), not an observable. Per-residue contact-RMSF correlations are computed in polyzymd compare report.

class polyzymd.compare.comparators.contacts.ContactsComparator(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

Bases: BaseComparator[ContactsAnalysisSettings, dict[str, Any], ContactsConditionSummary, ContactsComparisonResult]

Compare polymer-protein contacts across multiple simulation conditions.

This class loads contacts analysis results for each condition (computing them if necessary), then performs statistical comparisons including: - Aggregate-level comparisons (coverage, mean contact fraction) - ANOVA for 3+ conditions - Effect sizes (Cohen’s d) for practical significance

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (ContactsAnalysisSettings) – Settings defining what contacts to analyze (selections, cutoff).
comparison_settings (ContactsComparisonSettings, optional) – Settings for how to compare (FDR alpha, effect sizes). Defaults to ContactsComparisonSettings() if not provided.
equilibration (str, optional) – Equilibration time override (e.g., “10ns”). If None, uses config.defaults.equilibration_time.

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> analysis_settings = config.analysis_settings.get("contacts")
>>> comparison_settings = config.comparison_settings.get("contacts")
>>> comparator = ContactsComparator(config, analysis_settings, comparison_settings)
>>> result = comparator.compare()
>>> print(result.ranking_by_coverage)
["100% SBMA", "50/50 Mix", "100% EGMA"]

Notes

Higher contact fraction is considered “better” (more polymer-protein interaction)
Conditions without polymer atoms are automatically excluded
This is a MEAN_BASED metric (contact fractions are averages)

comparison_type: ClassVar[str] = 'contacts'

__init__(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Contact fraction is a mean-based metric.

Contact fraction is the average fraction of frames where a residue is in contact with the polymer. This is an average over frames, so the mean converges regardless of autocorrelation. However, we need to correct uncertainty using N_eff (effective sample size).

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

compare(recompute=False)[source]

Run comparison across all conditions.

Overrides base to handle contacts-specific logic: - Dual metrics (coverage and mean_contact_fraction) - Auto-exclusion of no-polymer conditions - Custom result building

Parameters:: recompute (bool, optional) – If True, force recompute even if cached results exist.
Returns:: Complete comparison results with statistics and rankings.
Return type:: ContactsComparisonResult

RMSF

RMSF comparator for comparing flexibility across conditions.

This module provides the RMSFComparator class that orchestrates RMSF analysis and statistical comparison across multiple conditions.

The comparator inherits from BaseComparator and implements the Template Method pattern for DRY comparison logic.

class polyzymd.compare.comparators.rmsf.RMSFComparator(config, analysis_settings, equilibration=None, selection_override=None, reference_mode_override=None, reference_frame_override=None, reference_file_override=None)[source]

Bases: BaseComparator[RMSFAnalysisSettings, dict[str, Any], RMSFConditionSummary, RMSFComparisonResult]

Compare RMSF across multiple simulation conditions.

This class loads RMSF results for each condition (computing them if necessary), then performs statistical comparisons including t-tests, ANOVA, and effect size calculations.

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (RMSFAnalysisSettings) – RMSF analysis settings (from config.analysis_settings.get(“rmsf”)).
equilibration (str, optional) – Equilibration time override (e.g., “10ns”). If None, uses config.defaults.equilibration_time.
selection_override (str, optional) – Override for atom selection (requires –override flag on CLI).
reference_mode_override (str, optional) – Override for reference mode (requires –override flag on CLI).
reference_frame_override (int, optional) – Override for reference frame (requires –override flag on CLI).
reference_file_override (str, optional) – Override for external reference PDB file path (requires –override flag on CLI). Used when reference_mode is “external”.

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> rmsf_settings = config.analysis_settings.get("rmsf")
>>> comparator = RMSFComparator(config, rmsf_settings, equilibration="10ns")
>>> result = comparator.compare()
>>> print(result.ranking)
["100% SBMA", "100% EGMA", "No Polymer", "50/50 Mix"]

comparison_type: ClassVar[str] = 'rmsf'

__init__(config, analysis_settings, equilibration=None, selection_override=None, reference_mode_override=None, reference_frame_override=None, reference_file_override=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

RMSF is a variance-based metric.

RMSF measures root-mean-square fluctuations, which are inherently variance-based. Correlated frames lead to biased variance estimates, so independent subsampling (2τ separation) is required for accurate uncertainty quantification.

Returns:: MetricType.VARIANCE_BASED
Return type:: MetricType

Triad

Catalytic triad comparator for comparing active site geometry across conditions.

This module provides the TriadComparator class that orchestrates catalytic triad analysis and statistical comparison across multiple conditions.

The key metric is “simultaneous contact fraction” - the percentage of frames where ALL pairs in the triad are below the contact threshold simultaneously. Higher values indicate better triad integrity and potentially better catalytic competence.

The comparator inherits from BaseComparator and implements the Template Method pattern for DRY comparison logic.

class polyzymd.compare.comparators.triad.TriadComparator(config, analysis_settings, equilibration=None)[source]

Bases: BaseComparator[CatalyticTriadAnalysisSettings, dict[str, Any], TriadConditionSummary, TriadComparisonResult]

Compare catalytic triad geometry across multiple simulation conditions.

This class loads triad analysis results for each condition (computing them if necessary), then performs statistical comparisons including t-tests, ANOVA, and effect size calculations on the simultaneous contact fraction.

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (CatalyticTriadAnalysisSettings) – Catalytic triad analysis settings (from config.analysis_settings.get(“catalytic_triad”)).
equilibration (str, optional) – Equilibration time override (e.g., “10ns”). If None, uses config.defaults.equilibration_time.

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> triad_settings = config.analysis_settings.get("catalytic_triad")
>>> comparator = TriadComparator(config, triad_settings, equilibration="10ns")
>>> result = comparator.compare()
>>> print(result.ranking)
["100% SBMA", "100% EGMA", "No Polymer", "50/50 Mix"]

Notes

Higher simultaneous contact fraction is better (triad is more intact).

comparison_type: ClassVar[str] = 'triad'

__init__(config, analysis_settings, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Catalytic triad contact fraction is a mean-based metric.

The simultaneous contact fraction is an average over frames (fraction of frames where all pairs are in contact). The mean converges regardless of autocorrelation, but we need to correct the uncertainty using N_eff (effective sample size = N/g where g is the statistical inefficiency).

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

Distances

Distances comparator for comparing distance metrics across conditions.

This module provides the DistancesComparator class that orchestrates distance analysis and statistical comparison across multiple conditions.

The primary ranking metric is mean distance (lower = closer interactions). Secondary metric is fraction below threshold (if threshold specified).

The comparator inherits from BaseComparator and implements the Template Method pattern for DRY comparison logic.

class polyzymd.compare.comparators.distances.DistancesComparator(config, analysis_settings, equilibration=None)[source]

Bases: BaseComparator[DistancesAnalysisSettings, dict[str, Any], DistanceConditionSummary, DistanceComparisonResult]

Compare distance metrics across multiple simulation conditions.

This class loads distance analysis results for each condition (computing them if necessary), then performs statistical comparisons including t-tests, ANOVA, and effect size calculations on both mean distance and fraction below threshold.

Each distance pair is compared independently - there is no cross-pair averaging since different pairs measure fundamentally different physical quantities (e.g., H-bond distances vs lid-opening distances).

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (DistancesAnalysisSettings) – Distance analysis settings (from config.analysis_settings.get(“distances”)).
equilibration (str, optional) – Equilibration time override (e.g., “10ns”). If None, uses config.defaults.equilibration_time.

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> dist_settings = config.analysis_settings.get("distances")
>>> comparator = DistancesComparator(config, dist_settings, equilibration="10ns")
>>> result = comparator.compare()
>>> print(result.ranking_by_pair["Catalytic H-bond"])  # Per-pair ranking
["100% SBMA", "No Polymer", "50/50 Mix", "100% EGMA"]

Notes

Lower mean distance is better (closer interactions). Higher fraction below threshold is better (more time in contact).

comparison_type: ClassVar[str] = 'distances'

__init__(config, analysis_settings, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Distance analysis is a mean-based metric.

The mean distance is an average over frames. The mean converges regardless of autocorrelation, but we need to correct the uncertainty using N_eff (effective sample size = N/g where g is the statistical inefficiency).

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

compare(recompute=False)[source]

Run the comparison across all conditions.

Each distance pair is compared independently - rankings and statistics are computed per-pair since averaging unrelated distances (e.g., H-bond + lid-opening) is not semantically meaningful.

Parameters:: recompute (bool) – Force recompute even if cached results exist.
Returns:: Complete comparison result with per-pair rankings.
Return type:: DistanceComparisonResult

Exposure Dynamics

Exposure dynamics comparator for chaperone-like polymer-protein interaction analysis.

This module provides ExposureDynamicsComparator, which orchestrates: 1. SASA computation (MDTraj shrake_rupley, protein-only) 2. Exposure dynamics analysis (classify residues, detect chaperone events) 3. Chaperone enrichment (dual residue/atom normalization) 4. Statistical comparison of chaperone fraction across conditions

Design follows the ContactsComparator pattern: - compare() is fully overridden (custom multi-metric flow) - _load_or_compute() handles caching at replicate level - Condition summaries aggregate per-replicate ExposureDynamicsResults

Registration: @ComparatorRegistry.register("exposure")

class polyzymd.compare.comparators.exposure.ExposureDynamicsComparator(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

Bases: BaseComparator[ExposureAnalysisSettings, dict[str, Any], ExposureConditionSummary, ExposureComparisonResult]

Compare chaperone-like polymer activity across simulation conditions.

Combines per-frame SASA data with polymer-protein contact data to:

Classify each protein residue as stably exposed, stably buried, or transiently exposed.
Detect “chaperone events” (buried → exposed → polymer contact → re-buried) and unassisted refolding events.
Compute dynamic chaperone enrichment per (polymer_type, aa_group) pair with dual residue/atom normalization.
Statistically compare chaperone_fraction across conditions.

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (ExposureAnalysisSettings) – Settings defining SASA and exposure parameters.
comparison_settings (ExposureComparisonSettings, optional) – Settings for statistical comparison. Defaults to ExposureComparisonSettings() if not provided.
equilibration (str, optional) – Equilibration time override. If None, uses config.defaults.equilibration_time.

Notes

This is a MEAN_BASED metric (chaperone fraction is an average over frames).
Conditions without polymer (no chaperone events possible) are excluded.
Contacts must be pre-computed (contacts_rep{n}.json must exist).
SASA is computed on demand and cached under analysis_dir/sasa/.

comparison_type: ClassVar[str] = 'exposure'

__init__(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Chaperone fraction is a mean-based metric.

Chaperone fraction is the fraction of exposed windows that coincide with polymer contact — an average over discrete events. The mean converges regardless of autocorrelation; uncertainty is corrected using N_eff.

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

compare(recompute=False)[source]

Run exposure dynamics comparison across all conditions.

Parameters:: recompute (bool, optional) – If True, force recompute even if cached results exist.
Returns:: Complete comparison with statistics and rankings.
Return type:: ExposureComparisonResult

Binding Free Energy

Binding free energy comparator via Boltzmann inversion of binding preference.

This module implements BindingFreeEnergyComparator, which converts the existing binding preference (enrichment) data into a selectivity free energy ΔG_sel in real units (kT, kcal/mol, or kJ/mol).

Physics

In the NPT ensemble the correct thermodynamic potential is the Gibbs free energy G. The polymer distributes its contacts across protein surface groups. Both the observed contact distribution (contact_share) and the null reference distribution (expected_share, proportional to each group’s solvent-exposed surface area) are proper probability distributions that sum to 1 over the partition. Boltzmann inversion of their ratio gives the selectivity free energy:

ΔG_sel(j) = -k_B·T · ln(contact_share_j / expected_share_j)

Because both distributions are normalized over the same partition, there is no arbitrary constant — ΔG_sel(j) is fully determined by the data.

Because contact_share / expected_share = enrichment + 1, per replicate:

ΔG_sel,rep = -k_B·T · ln(enrichment_rep + 1)

This is the exact Boltzmann-inverted version of the dimensionless enrichment score.

Sign convention:: ΔG_sel < 0 → preferential contact (observed > surface-availability reference) ΔG_sel > 0 → contact avoidance (observed < surface-availability reference) ΔG_sel = 0 → contacts match the surface-availability reference exactly

Differences between groups (ΔG_sel(i) - ΔG_sel(j)) give the relative selectivity. Differences between conditions (ΔG_sel,B(j) - ΔG_sel,A(j)) give a true ΔΔG.

Temperature handling

ΔG_sel computed at temperature T is not comparable to ΔG_sel at T’ (in physical units). Pairwise statistics are suppressed between conditions at different simulation temperatures.

Design

Consumes cached binding preference files produced by ContactsComparator / binding_preference.py. When cached data is missing, computes binding preference on-demand from per-replicate contacts_rep{N}.json files (following the same load-or-compute contract as every other comparator).
Inherits BaseComparator but overrides compare() (like ContactsComparator) because the result type (BindingFreeEnergyResult) does not conform to BaseComparisonResult.

class polyzymd.compare.comparators.binding_free_energy.BindingFreeEnergyComparator(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

Bases: BaseComparator[BindingFreeEnergyAnalysisSettings, dict[str, Any], FreeEnergyConditionSummary, BindingFreeEnergyResult]

Compare selectivity free energy (ΔG_sel) across simulation conditions.

Consumes cached binding preference results (produced by the contacts analysis layer) and converts them to selectivity free energies via Boltzmann inversion:

ΔG_sel = -k_B·T · ln(contact_share / expected_share)

Statistical comparisons are only computed between conditions that share the same simulation temperature. Cross-temperature pairs are flagged and their statistics suppressed.

Parameters:

config (ComparisonConfig) – Comparison configuration.
analysis_settings (BindingFreeEnergyAnalysisSettings) – Units, surface-exposure threshold, custom partitions.
comparison_settings (BindingFreeEnergyComparisonSettings, optional) – FDR alpha. Defaults to BindingFreeEnergyComparisonSettings().
equilibration (str, optional) – Equilibration time override.

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> settings = BindingFreeEnergyAnalysisSettings(units="kcal/mol")
>>> comparator = BindingFreeEnergyComparator(config, settings)
>>> result = comparator.compare()
>>> print(result.units)
kcal/mol

Notes

This is a MEAN_BASED metric (contact fractions are averages over frames, not fluctuation-based quantities).

comparison_type: ClassVar[str] = 'binding_free_energy'

__init__(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Contact share is a mean-based metric.

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

compare(recompute=False)[source]

Run binding free energy comparison across all conditions.

Parameters:: recompute (bool, optional) – Ignored (binding free energy is always recomputed from cached binding preference data; it is fast and stateless).
Returns:: Complete ΔG_sel comparison result.
Return type:: BindingFreeEnergyResult

Polymer Affinity Score

Polymer affinity score comparator.

This module implements PolymerAffinityScoreComparator, which quantifies the total strength of polymer-protein interactions by summing per-contact free energy contributions weighted by the number of simultaneous contacts.

Physics

For each (polymer_type, protein_group) pair:

S_{p,g} = N_{p,g} × ΔG_sel(p,g)

where N_{p,g} = mean_contact_fraction × n_exposed_in_group and ΔG_sel(p,g) = -ln(contact_share / expected_share) (in kT).

Because contact_share / expected_share = enrichment + 1:

ΔG_sel,rep = -ln(enrichment_rep + 1)

The total affinity score for a polymer type is:

S_p = Σ_g S_{p,g}

The total affinity score for a condition is:

S = Σ_p S_p

Independence assumption

This formulation assumes contacts are thermodynamically independent — each contact contributes the same free energy regardless of what other contacts exist simultaneously. The absolute values are NOT rigorous binding free energies. Only the relative differences between polymer compositions are meaningful as a comparative scoring metric.

Sign convention

S < 0 →  net favorable polymer-protein interaction
S > 0 →  net unfavorable (avoidance dominates)
S = 0 →  contacts match the surface-availability reference

Temperature handling

All scores are in kT (dimensionless); the temperature factor cancels in the Boltzmann inversion ratio. Pairwise statistics are suppressed between conditions at different simulation temperatures because N changes.

Design

Consumes cached binding preference files produced by the contacts analysis layer. When cached data is missing, computes binding preference on-demand from per-replicate contacts_rep{N}.json files.
Inherits BaseComparator but overrides compare() (like the BFE comparator) because the result type does not conform to BaseComparisonResult.
Uses AggregatedBindingPreferenceEntry objects (from bp_result.entries) for the group-level data that includes mean_contact_fraction.

class polyzymd.compare.comparators.polymer_affinity.PolymerAffinityScoreComparator(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

Bases: BaseComparator[PolymerAffinityScoreSettings, dict[str, Any], AffinityScoreConditionSummary, PolymerAffinityScoreResult]

Compare polymer affinity scores across simulation conditions.

Computes a composite interaction score for each (polymer_type, protein_group) pair by multiplying the mean number of simultaneous contacts by the per-contact selectivity free energy:

S = N × ΔG_sel [kT]

The total score is summed across all polymer types and protein groups. More negative = stronger net polymer-protein affinity.

Statistical comparisons use per-replicate total scores and are only computed between conditions at the same simulation temperature.

Parameters:

config (ComparisonConfig) – Comparison configuration.
analysis_settings (PolymerAffinityScoreSettings) – Surface-exposure threshold, protein groups, etc.
comparison_settings (PolymerAffinityScoreComparisonSettings, optional) – FDR alpha. Defaults to PolymerAffinityScoreComparisonSettings().
equilibration (str, optional) – Equilibration time override.

Notes

This is a MEAN_BASED metric (contact fractions are averages over frames, not fluctuation-based quantities).

comparison_type: ClassVar[str] = 'polymer_affinity'

__init__(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Contact fractions and shares are mean-based metrics.

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

compare(recompute=False)[source]

Run polymer affinity score comparison across all conditions.

Parameters:: recompute (bool, optional) – Ignored (affinity scores are always recomputed from cached binding preference data; the computation is fast and stateless).
Returns:: Complete polymer affinity score comparison result.
Return type:: PolymerAffinityScoreResult

Results

Common result modules live under polyzymd.compare.results.

Stable result families include:

polyzymd.compare.results.rmsf
polyzymd.compare.results.triad
polyzymd.compare.results.contacts
polyzymd.compare.results.distances
polyzymd.compare.results.secondary_structure

Result models for binding free energy comparison analysis.

Physics background

In the NPT ensemble (constant pressure, as used in all polyzymd simulations) the correct thermodynamic potential is the Gibbs free energy G.

The quantity computed here is a selectivity free energy (ΔG_sel) that measures how much more (or less) favorable it is for a polymer to contact a given group of protein residues compared to what would be expected if the polymer contacted each exposed surface residue in proportion to that residue group’s share of the total solvent-exposed protein surface.

Concretely: if aromatic residues make up 10% of the solvent-exposed surface but receive 20% of the polymer’s contacts, the polymer preferentially contacts aromatic residues. The reference (expected) distribution is simply proportional to surface availability — not any property of the polymer itself.

ΔG_sel(j) = -k_B·T · ln(contact_share_j / expected_share_j)

where:

contact_share_j = (contact frames involving residues in group j) / (total contact frames across all protein residues) — the observed fraction of polymer contacts directed at group j
expected_share_j = (number of solvent-exposed residues in group j) / (total number of solvent-exposed protein residues) — the fraction of the protein surface belonging to group j; this is the reference assuming contacts are distributed purely by surface area
k_B = Boltzmann constant (0.0019872041 kcal mol⁻¹ K⁻¹)
T = simulation temperature in Kelvin

Because both distributions are normalized over the same partition (they sum to 1 over all groups), there is no arbitrary additive constant — ΔG_sel is fully determined by the data.

When units=’kT’ (default), the formula simplifies to:

ΔG_sel(j) / k_BT = -ln(contact_share_j / expected_share_j)

yielding a dimensionless value directly comparable to the thermal energy scale. A value of -1.0 means the binding preference is exactly 1 k_bT favorable relative to the surface-availability reference.

Note: contact_share / expected_share = enrichment_ratio = enrichment + 1 (where enrichment is the existing dimensionless enrichment score from binding preference analysis). So ΔG_sel = -kT·ln(enrichment + 1), and the two representations are mathematically equivalent; ΔG_sel simply puts the enrichment score on a physically meaningful energy scale.

Sign convention

ΔG_sel < 0 →  preferential contact (observed > surface-availability reference)
ΔG_sel > 0 →  contact avoidance (observed < surface-availability reference)
ΔG_sel = 0 →  contacts match the surface-availability reference exactly

Differences between conditions (ΔG_sel,B(j) − ΔG_sel,A(j)) give a true ΔΔG, stored in FreeEnergyPairwiseEntry.delta_delta_G.

Uncertainty propagation

When multiple independent replicates are available, two uncertainty estimates are reported:

Between-replicate SEM on ΔG_sel (primary, used for pairwise statistics): ΔG_sel is computed independently for each replicate, and the SEM is taken directly across those values. This is the most statistically sound approach for independent replicates and is the quantity used in t-tests.
Delta-method propagation (analytical approximation, stored for reference): For the mean contact_share and its SEM, uncertainty is propagated through the logarithm using first-order error propagation (Taylor 1997, ch. 3; Bevington & Robinson 2003, ch. 3):
```
σ(ΔG_sel) ≈ k_B·T · √[(σ_cs / cs)² + (σ_es / es)²]
(or simply √[...] when units='kT')
```
where σ_cs = SEM of contact_share across replicates, and σ_es ≈ 0 because expected_share is computed from a single static PDB structure (no replicate variance). This simplifies to σ(ΔG_sel) ≈ k_B·T · (σ_cs / cs) (or σ_cs / cs when units=’kT’).

References:
- Taylor, J. R. (1997). An Introduction to Error Analysis, 2nd ed. University Science Books. (Ch. 3: Error propagation for functions of one or more variables)
- Bevington, P. R. & Robinson, D. K. (2003). Data Reduction and Error Analysis for the Physical Sciences, 3rd ed. McGraw-Hill. (Ch. 3)
- Wikipedia: Delta method, https://en.wikipedia.org/wiki/Delta_method

Temperature handling

When units=’kT’, ΔG_sel = -ln(ratio) is temperature-independent (the same ratio at any temperature gives the same dimensionless value). However, the underlying contact probabilities ARE temperature-dependent, so cross- temperature comparisons still require caution.

When units=’kcal/mol’ or ‘kJ/mol’, ΔG_sel computed at temperature T is NOT directly comparable to ΔG_sel at temperature T’. Pairwise statistical comparisons are only computed between conditions sharing the same simulation temperature.

class polyzymd.compare.results.binding_free_energy.FreeEnergyEntry(*, polymer_type, protein_group, partition_name='aa_class', contact_share, expected_share, enrichment_ratio, delta_G=None, delta_G_uncertainty=None, delta_G_per_replicate=<factory>, units='kT', temperature_K, n_replicates=0, n_exposed_in_group=0)[source]

Bases: BaseModel

Free energy analysis for one (polymer_type, protein_group) pair in one condition.

Stores both the ΔG_sel value and the raw probability quantities used to compute it, enabling reproducibility and downstream verification.

Variables:

polymer_type (str) – Polymer residue type (e.g., “SBM”, “EGM”).
protein_group (str) – Protein amino acid group label (e.g., “aromatic”, “charged_positive”).
partition_name (str) – Name of the partition this group belongs to (e.g., “aa_class”).
contact_share (float) – Observed fraction of polymer contacts directed at this group. This is P_obs in ΔG_sel = -kT·ln(P_obs / P_ref).
expected_share (float) – Surface-availability-weighted reference fraction. This is P_ref in ΔG_sel = -kT·ln(P_obs / P_ref).
enrichment_ratio (float) – contact_share / expected_share (= enrichment + 1). Stored for traceability; ΔG_sel = -kT·ln(enrichment_ratio).
delta_G (float | None) – ΔG_sel in the configured units. None when contact_share = 0 or expected_share = 0 (log undefined or reference missing).
delta_G_uncertainty (float | None) – σ(ΔG_sel) from delta-method error propagation. None if delta_G is None or if SEM data is unavailable (single replicate).
delta_G_per_replicate (list[float]) – Per-replicate ΔG_sel values used for cross-condition statistics.
units (str) – Energy units (“kT”, “kcal/mol”, or “kJ/mol”).
temperature_K (float) – Simulation temperature in Kelvin (used as kT denominator).
n_replicates (int) – Number of replicates with valid data for this entry.
n_exposed_in_group (int) – Number of surface-exposed residues in this group (used for expected_share).

polymer_type: str

protein_group: str

partition_name: str

contact_share: float

expected_share: float

enrichment_ratio: float

delta_G: float | None

delta_G_uncertainty: float | None

delta_G_per_replicate: list[float]

units: str

temperature_K: float

n_replicates: int

n_exposed_in_group: int

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.binding_free_energy.FreeEnergyConditionSummary(*, label, config_path, temperature_K, n_replicates, units='kT', entries=<factory>, polymer_types=<factory>, protein_groups=<factory>)[source]

Bases: BaseModel

Free energy summary for one simulation condition.

Aggregates FreeEnergyEntry objects across all (polymer_type, protein_group) pairs for a single condition, together with condition metadata.

Variables:

label (str) – Display name for this condition.
config_path (str) – Path to the SimulationConfig YAML used.
temperature_K (float) – Simulation temperature in Kelvin.
n_replicates (int) – Number of replicates in this condition.
units (str) – Energy units (“kT”, “kcal/mol”, or “kJ/mol”).
entries (list[FreeEnergyEntry]) – All (polymer_type, protein_group) ΔG_sel entries.
polymer_types (list[str]) – Polymer residue types present.
protein_groups (list[str]) – Protein group labels analyzed.

label: str

config_path: str

temperature_K: float

n_replicates: int

units: str

entries: list[FreeEnergyEntry]

polymer_types: list[str]

protein_groups: list[str]

property primary_metric_value: float: Mean ΔG_sel across all valid entries (for BaseConditionSummary compatibility).

property primary_metric_sem: float: Mean σ(ΔG_sel) across all valid entries.

get_entry(polymer_type, protein_group, partition_name=None)[source]

Get the FreeEnergyEntry for a (polymer_type, protein_group) pair.

Parameters:

polymer_type (str) – Polymer type.
protein_group (str) – AA group label.
partition_name (str or None, optional) – If given, further restrict to entries belonging to this partition. Necessary when the same protein_group label appears in multiple partitions (e.g., “rest_of_protein” in several user-defined partitions).

Return type:

FreeEnergyEntry or None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.binding_free_energy.FreeEnergyPairwiseEntry(*, polymer_type, protein_group, condition_a, condition_b, temperature_a_K, temperature_b_K, cross_temperature=False, delta_G_a=None, delta_G_b=None, delta_delta_G=None, t_statistic=None, p_value=None)[source]

Bases: BaseModel

Pairwise comparison between two conditions for one (polymer, group) pair.

Each condition has a per-group selectivity free energy ΔG_sel. The difference ΔΔG = ΔG_sel,B − ΔG_sel,A is a true double-delta quantity.

Statistics are only computed when both conditions share the same simulation temperature. If temperatures differ, all stat fields are None and the cross_temperature flag is set to True.

Variables:

polymer_type (str) – Polymer residue type.
protein_group (str) – Protein group label.
condition_a (str) – Label of the first condition.
condition_b (str) – Label of the second condition.
temperature_a_K (float) – Temperature of condition A in Kelvin.
temperature_b_K (float) – Temperature of condition B in Kelvin.
cross_temperature (bool) – True when temperatures differ — statistics are suppressed.
delta_G_a (float | None) – ΔG_sel for condition A.
delta_G_b (float | None) – ΔG_sel for condition B.
delta_delta_G (float | None) – ΔΔG = ΔG_sel,B − ΔG_sel,A. Positive → B has less favorable selectivity.
t_statistic (float | None) – T-test statistic (None for cross-temperature pairs).
p_value (float | None) – Two-tailed p-value (None for cross-temperature pairs).

polymer_type: str

protein_group: str

condition_a: str

condition_b: str

temperature_a_K: float

temperature_b_K: float

cross_temperature: bool

delta_G_a: float | None

delta_G_b: float | None

delta_delta_G: float | None

t_statistic: float | None

p_value: float | None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.binding_free_energy.BindingFreeEnergyResult(*, name, units='kT', formula='ΔG_sel = -ln(contact_share / expected_share) [units: k_bT]', mixed_temperatures=False, temperature_groups=<factory>, conditions=<factory>, pairwise_comparisons=<factory>, polymer_types=<factory>, protein_groups=<factory>, surface_exposure_threshold=None, equilibration_time='', created_at=<factory>, polyzymd_version=<factory>)[source]

Bases: BaseModel

Complete binding free energy comparison result.

This is the main output from BindingFreeEnergyComparator.compare().

Physics summary

Formula: ΔG_sel = -k_B·T · ln(contact_share / expected_share)

Uncertainty: σ(ΔG_sel) = k_B·T · √[(σ_cs/cs)² + (σ_es/es)²]

Temperature note: pairwise statistics are suppressed between conditions at different temperatures. The mixed_temperatures flag indicates this occurred. Each condition’s temperature is stored in its summary.

ivar name:: Name of the comparison project.
vartype name:: str
ivar units:: Energy units (“kT”, “kcal/mol”, or “kJ/mol”).
vartype units:: str
ivar formula:: Human-readable formula string (for documentation/output).
vartype formula:: str
ivar mixed_temperatures:: True if conditions span more than one simulation temperature.
vartype mixed_temperatures:: bool
ivar temperature_groups:: Mapping of temperature (K) to condition labels at that temperature.
vartype temperature_groups:: dict[float, list[str]]
ivar conditions:: Summary for each condition.
vartype conditions:: list[FreeEnergyConditionSummary]
ivar pairwise_comparisons:: All pairwise comparisons (cross-T pairs have stats suppressed).
vartype pairwise_comparisons:: list[FreeEnergyPairwiseEntry]
ivar polymer_types:: All polymer types found.
vartype polymer_types:: list[str]
ivar protein_groups:: All protein groups analyzed.
vartype protein_groups:: list[str]
ivar surface_exposure_threshold:: SASA threshold used (from binding preference settings).
vartype surface_exposure_threshold:: float | None
ivar equilibration_time:: Equilibration time used.
vartype equilibration_time:: str
ivar created_at:: When the analysis was run.
vartype created_at:: datetime
ivar polyzymd_version:: Version of polyzymd used.
vartype polyzymd_version:: str

name: str

units: str

formula: str

mixed_temperatures: bool

temperature_groups: dict[str, list[str]]

conditions: list[FreeEnergyConditionSummary]

pairwise_comparisons: list[FreeEnergyPairwiseEntry]

polymer_types: list[str]

protein_groups: list[str]

surface_exposure_threshold: float | None

equilibration_time: str

created_at: datetime

polyzymd_version: str

save(path)[source]

Save result to JSON file.

Parameters:: path (Path or str) – Output path.
Returns:: Path to the saved file.
Return type:: Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:: path (Path or str) – Path to JSON file.
Returns:: Loaded result.
Return type:: BindingFreeEnergyResult

get_condition(label)[source]

Get a condition summary by label.

Parameters:: label (str) – Condition label.
Return type:: FreeEnergyConditionSummary
Raises:: KeyError – If not found.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Exposure dynamics condition summary and comparison result models.

These classes inherit from the base classes in compare/core/ and add exposure-dynamics-specific fields for chaperone event analysis.

class polyzymd.compare.results.exposure.ExposureConditionSummary(*, label, config_path, n_replicates, replicate_values, mean_transient_fraction, sem_transient_fraction, mean_chaperone_fraction, sem_chaperone_fraction, mean_n_transient, mean_total_chaperone_events=0.0, mean_total_unassisted_events=0.0, enrichment_by_polymer_type=<factory>, polymer_types=<factory>, aa_groups=<factory>)[source]

Bases: BaseConditionSummary

Summary statistics for one condition in an exposure dynamics comparison.

Variables:

label (str) – Display name for this condition.
config_path (str) – Path to the simulation config file.
n_replicates (int) – Number of replicates included.
replicate_values (list[float]) – Per-replicate mean chaperone fraction across transient residues.
mean_transient_fraction (float) – Mean fraction of protein residues that are transiently exposed, averaged across replicates.
sem_transient_fraction (float) – Standard error of mean_transient_fraction.
mean_chaperone_fraction (float) – Mean chaperone fraction (chaperone events / total exposed windows) across transient residues and replicates.
sem_chaperone_fraction (float) – Standard error of mean_chaperone_fraction.
mean_n_transient (float) – Mean number of transient residues across replicates.
mean_total_chaperone_events (float) – Mean total chaperone event count across replicates.
mean_total_unassisted_events (float) – Mean total unassisted event count across replicates.
enrichment_by_polymer_type (dict[str, dict[str, float]]) – Nested dict: polymer_type → aa_group → mean enrichment_residue.
polymer_types (list[str]) – Polymer types present in this condition.
aa_groups (list[str]) – Amino-acid groups present in this condition.

mean_transient_fraction: float

sem_transient_fraction: float

mean_chaperone_fraction: float

sem_chaperone_fraction: float

mean_n_transient: float

mean_total_chaperone_events: float

mean_total_unassisted_events: float

enrichment_by_polymer_type: dict[str, dict[str, float]]

polymer_types: list[str]

aa_groups: list[str]

property primary_metric_value: float: Return mean chaperone fraction as the primary metric.

property primary_metric_sem: float: Return SEM of chaperone fraction.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.exposure.ExposureComparisonResult(*, metric='chaperone_fraction', name, control_label=None, conditions=<factory>, pairwise_comparisons=<factory>, anova=None, ranking, equilibration_time='0ns', created_at=<factory>, polyzymd_version='1.2.1', ranking_by_transient_fraction=<factory>, excluded_conditions=<factory>)[source]

Bases: BaseComparisonResult[ExposureConditionSummary, PairwiseComparison]

Complete exposure dynamics comparison result.

This is the main output from ExposureDynamicsComparator.compare(). Contains per-condition summaries of transient exposure and chaperone event statistics, plus pairwise statistical comparisons.

Variables:

metric (str) – Always “chaperone_fraction”.
name (str) – Comparison project name.
control_label (str, optional) – Label of the control condition.
conditions (list[ExposureConditionSummary]) – Summary for each condition.
pairwise_comparisons (list[PairwiseComparison]) – Pairwise t-tests on chaperone_fraction.
anova (ANOVASummary, optional) – One-way ANOVA across all conditions.
ranking (list[str]) – Condition labels sorted by chaperone_fraction (highest first).
ranking_by_transient_fraction (list[str]) – Condition labels sorted by transient_fraction (highest first).
excluded_conditions (list[str]) – Conditions excluded (e.g., no-polymer controls).
equilibration_time (str) – Equilibration time used.
created_at (datetime) – When the analysis was run.
polyzymd_version (str) – Version of polyzymd used.

comparison_type: ClassVar[str] = 'exposure'

metric: str

conditions: list[ExposureConditionSummary]

pairwise_comparisons: list[PairwiseComparison]

ranking_by_transient_fraction: list[str]

excluded_conditions: list[str]

equilibration_time: str

created_at: datetime

polyzymd_version: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Result models for polymer affinity score comparison analysis.

The polymer affinity score is a comparative metric that quantifies the total strength of polymer-protein interactions by summing per-contact free energy contributions weighted by the number of simultaneous contacts.

Physics

For each (polymer_type, protein_group) pair, the affinity score is:

S_{p,g} = N_{p,g} × ΔG_sel(p,g)

where N_{p,g} = mean_contact_fraction × n_exposed_in_group and ΔG_sel(p,g) = -ln(contact_share / expected_share) (in units of k_bT).

The total affinity score for a polymer type is:

S_p = Σ_g S_{p,g}

The total affinity score for a condition is:

S = Σ_p S_p

Independence assumption

This formulation assumes contacts are thermodynamically independent — each contact contributes the same free energy regardless of what other contacts exist simultaneously. This is the standard polyvalent binding approximation (Mammen et al., Angew. Chem. Int. Ed. 1998, 37, 2754).

The absolute values are NOT rigorous thermodynamic binding free energies. However, the relative differences between polymer compositions are meaningful as a comparative scoring function, analogous to scoring functions in molecular docking or MM/PBSA decomposition.

Sign convention

S < 0 →  net favorable polymer-protein interaction
S > 0 →  net unfavorable (avoidance dominates)
S = 0 →  contacts match the surface-availability reference

Interpretation

More negative total score → stronger net polymer-protein affinity. When combined with structural stability metrics (RMSF, triad contacts), the affinity score helps rank polymer compositions by total interaction strength.

Uncertainty propagation

Per-replicate scores are computed independently:

S_rep = N_rep × ΔG_sel,rep

where N_rep = contact_fraction_rep × n_exposed_in_group, and ΔG_sel,rep = -ln(enrichment_rep + 1). The mean and SEM are taken across replicates. This approach naturally captures the covariance between N and ΔG_sel.

When per-replicate data is unavailable, analytical error propagation is used:

σ(S) = √[(N·σ_ΔG_sel)² + (ΔG_sel·σ_N)²]

class polyzymd.compare.results.polymer_affinity.AffinityScoreEntry(*, polymer_type, protein_group, partition_name='aa_class', n_contacts, delta_G_per_contact=None, affinity_score=None, affinity_score_uncertainty=None, affinity_score_per_replicate=<factory>, mean_contact_fraction=0.0, n_exposed_in_group=0, contact_share=0.0, expected_share=0.0, temperature_K=0.0, n_replicates=0)[source]

Bases: BaseModel

Affinity score for one (polymer_type, protein_group) pair in one condition.

Stores both the composite score and its constituent quantities for reproducibility and downstream verification.

Variables:

polymer_type (str) – Polymer residue type (e.g., “SBM”, “EGM”).
protein_group (str) – Protein amino acid group label (e.g., “aromatic”, “charged_positive”).
partition_name (str) – Name of the partition this group belongs to (e.g., “aa_class”).
n_contacts (float) – Mean number of simultaneous contacts per frame. Computed as mean_contact_fraction * n_exposed_in_group.
delta_G_per_contact (float | None) – Per-contact selectivity free energy in kT. Computed as -ln(contact_share / expected_share).
affinity_score (float | None) – Composite score: n_contacts * delta_G_per_contact (kT). More negative = stronger favorable interaction.
affinity_score_uncertainty (float | None) – Uncertainty on affinity_score. From replicate SEM when available, otherwise from analytical error propagation.
affinity_score_per_replicate (list[float]) – Per-replicate affinity scores for statistical testing.
mean_contact_fraction (float) – Mean per-residue contact fraction in this group (from binding preference).
n_exposed_in_group (int) – Number of surface-exposed residues in this group.
contact_share (float) – Observed fraction of polymer contacts directed at this group.
expected_share (float) – Surface-availability reference fraction.
temperature_K (float) – Simulation temperature in Kelvin.
n_replicates (int) – Number of replicates with valid data.

polymer_type: str

protein_group: str

partition_name: str

n_contacts: float

delta_G_per_contact: float | None

affinity_score: float | None

affinity_score_uncertainty: float | None

affinity_score_per_replicate: list[float]

mean_contact_fraction: float

n_exposed_in_group: int

contact_share: float

expected_share: float

temperature_K: float

n_replicates: int

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.polymer_affinity.PolymerTypeScore(*, polymer_type, total_score, total_score_uncertainty=None, total_score_per_replicate=<factory>, total_n_contacts=0.0, group_contributions=<factory>)[source]

Bases: BaseModel

Aggregated affinity score for one polymer type across all protein groups.

The score is the sum of per-group affinity scores:: S_p = Σ_g (N_g × ΔG_sel(g))

Variables:

polymer_type (str) – Polymer residue type (e.g., “SBM”, “EGM”).
total_score (float) – Sum of affinity scores across all protein groups (kT).
total_score_uncertainty (float | None) – Uncertainty on total_score.
total_score_per_replicate (list[float]) – Per-replicate total scores for statistical testing.
total_n_contacts (float) – Total mean simultaneous contacts per frame across all groups.
group_contributions (list[AffinityScoreEntry]) – Breakdown by protein group (for detail reporting).

polymer_type: str

total_score: float

total_score_uncertainty: float | None

total_score_per_replicate: list[float]

total_n_contacts: float

group_contributions: list[AffinityScoreEntry]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.polymer_affinity.AffinityScoreConditionSummary(*, label, config_path, temperature_K, n_replicates=0, total_score=0.0, total_score_uncertainty=None, total_score_per_replicate=<factory>, total_n_contacts=0.0, polymer_type_scores=<factory>, entries=<factory>, polymer_types=<factory>, protein_groups=<factory>)[source]

Bases: BaseModel

Affinity score summary for one simulation condition.

Aggregates scores at three levels: per (polymer_type, protein_group), per polymer_type, and total condition score.

Variables:

label (str) – Display name for this condition.
config_path (str) – Path to the SimulationConfig YAML used.
temperature_K (float) – Simulation temperature in Kelvin.
n_replicates (int) – Number of replicates in this condition.
total_score (float) – Grand total affinity score across all polymer types and groups (kT).
total_score_uncertainty (float | None) – Uncertainty on total_score.
total_score_per_replicate (list[float]) – Per-replicate grand total scores for pairwise statistics.
total_n_contacts (float) – Total mean simultaneous contacts per frame (all types, all groups).
polymer_type_scores (list[PolymerTypeScore]) – Per-polymer-type score breakdown.
entries (list[AffinityScoreEntry]) – All (polymer_type, protein_group) entries.
polymer_types (list[str]) – Polymer types present.
protein_groups (list[str]) – Protein groups analyzed.

label: str

config_path: str

temperature_K: float

n_replicates: int

total_score: float

total_score_uncertainty: float | None

total_score_per_replicate: list[float]

total_n_contacts: float

polymer_type_scores: list[PolymerTypeScore]

entries: list[AffinityScoreEntry]

polymer_types: list[str]

protein_groups: list[str]

property primary_metric_value: float: Total affinity score (for ranking compatibility).

property primary_metric_sem: float: Uncertainty on total affinity score.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.polymer_affinity.AffinityScorePairwiseEntry(*, condition_a, condition_b, temperature_a_K, temperature_b_K, cross_temperature=False, score_a=0.0, score_b=0.0, delta_score=None, t_statistic=None, p_value=None)[source]

Bases: BaseModel

Pairwise affinity score comparison between two conditions.

Compares total affinity scores. Statistics are suppressed for cross-temperature pairs.

Variables:

condition_a (str) – Label of the first condition (typically control or reference).
condition_b (str) – Label of the second condition.
temperature_a_K (float) – Temperature of condition A in Kelvin.
temperature_b_K (float) – Temperature of condition B in Kelvin.
cross_temperature (bool) – True when temperatures differ (statistics suppressed).
score_a (float) – Total affinity score for condition A (kT).
score_b (float) – Total affinity score for condition B (kT).
delta_score (float | None) – Difference: score_B - score_A (kT). Negative = B has stronger affinity than A.
t_statistic (float | None) – T-test statistic (None for cross-temperature pairs).
p_value (float | None) – Two-tailed p-value (None for cross-temperature pairs).

condition_a: str

condition_b: str

temperature_a_K: float

temperature_b_K: float

cross_temperature: bool

score_a: float

score_b: float

delta_score: float | None

t_statistic: float | None

p_value: float | None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.polymer_affinity.PolymerAffinityScoreResult(*, name, methodology='Polymer Affinity Score: S = Σ (N_contacts × ΔG_sel_per_contact) [kT]. N_contacts = mean_contact_fraction × n_exposed_in_group. ΔG_sel_per_contact = -ln(contact_share / expected_share). More negative = stronger net polymer-protein affinity. Assumes contact independence; interpret as comparative scoring metric.', mixed_temperatures=False, temperature_groups=<factory>, conditions=<factory>, pairwise_comparisons=<factory>, polymer_types=<factory>, protein_groups=<factory>, surface_exposure_threshold=None, equilibration_time='', created_at=<factory>, polyzymd_version=<factory>)[source]

Bases: BaseModel

Complete polymer affinity score comparison result.

This is the main output from PolymerAffinityScoreComparator.compare().

The polymer affinity score quantifies total polymer-protein interaction strength as a comparative metric. It is computed by summing per-contact selectivity free energies weighted by the number of simultaneous contacts:

S = Σ_{p,g} N_{p,g} × ΔG_sel(p,g)

where the sum runs over all (polymer_type, protein_group) pairs.

Important

This quantity assumes contact independence and should be interpreted as a relative affinity score, not a rigorous thermodynamic binding free energy. See the module docstring for details.

Variables:

name (str) – Name of the comparison project.
methodology (str) – Human-readable description of the scoring methodology.
mixed_temperatures (bool) – True if conditions span more than one simulation temperature.
temperature_groups (dict[str, list[str]]) – Mapping of temperature (K, as str) to condition labels.
conditions (list[AffinityScoreConditionSummary]) – Summary for each condition.
pairwise_comparisons (list[AffinityScorePairwiseEntry]) – All pairwise comparisons.
polymer_types (list[str]) – All polymer types found.
protein_groups (list[str]) – All protein groups analyzed.
surface_exposure_threshold (float | None) – SASA threshold used (from settings).
equilibration_time (str) – Equilibration time used.
created_at (datetime) – When the analysis was run.
polyzymd_version (str) – Version of polyzymd used.

name: str

methodology: str

mixed_temperatures: bool

temperature_groups: dict[str, list[str]]

conditions: list[AffinityScoreConditionSummary]

pairwise_comparisons: list[AffinityScorePairwiseEntry]

polymer_types: list[str]

protein_groups: list[str]

surface_exposure_threshold: float | None

equilibration_time: str

created_at: datetime

polyzymd_version: str

save(path)[source]

Save result to JSON file.

Parameters:: path (Path or str) – Output path.
Returns:: Path to the saved file.
Return type:: Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:: path (Path or str) – Path to JSON file.
Returns:: Loaded result.
Return type:: PolymerAffinityScoreResult

get_condition(label)[source]

Look up a condition summary by label.

Parameters:: label (str) – Condition display name.
Return type:: AffinityScoreConditionSummary or None

get_ranking()[source]

Return conditions ranked by total affinity score (most negative first).

Returns:: Conditions sorted by total_score ascending.
Return type:: list[AffinityScoreConditionSummary]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Formatters

Output formatters for binding free energy comparison results.

Provides console table, Markdown, and JSON output for BindingFreeEnergyResult.

polyzymd.compare.binding_free_energy_formatters.format_bfe_console_table(result)[source]

Format a BindingFreeEnergyResult as a console-friendly ASCII table.

Parameters:: result (BindingFreeEnergyResult) – Comparison result to format.
Returns:: ASCII table string.
Return type:: str

polyzymd.compare.binding_free_energy_formatters.format_bfe_markdown(result)[source]

Format a BindingFreeEnergyResult as Markdown.

Parameters:: result (BindingFreeEnergyResult) – Comparison result to format.
Returns:: Markdown-formatted string.
Return type:: str

polyzymd.compare.binding_free_energy_formatters.format_bfe_json(result)[source]

Format a BindingFreeEnergyResult as JSON.

Parameters:: result (BindingFreeEnergyResult) – Comparison result to format.
Returns:: JSON string.
Return type:: str

polyzymd.compare.binding_free_energy_formatters.format_bfe_result(result, format='table')[source]

Format a BindingFreeEnergyResult in the requested format.

Parameters:

result (BindingFreeEnergyResult) – Comparison result to format.
format (str) – Output format: “table” (default), “markdown”, or “json”.

Returns:

Formatted string.

Return type:

str

Raises:

ValueError – If format is not recognized.

Output formatters for polymer affinity score comparison results.

Provides console table, Markdown, and JSON output for PolymerAffinityScoreResult.

polyzymd.compare.polymer_affinity_formatters.format_affinity_console_table(result)[source]

Format a PolymerAffinityScoreResult as a console-friendly ASCII table.

Parameters:: result (PolymerAffinityScoreResult) – Comparison result to format.
Returns:: ASCII table string.
Return type:: str

polyzymd.compare.polymer_affinity_formatters.format_affinity_markdown(result)[source]

Format a PolymerAffinityScoreResult as Markdown.

Parameters:: result (PolymerAffinityScoreResult) – Comparison result to format.
Returns:: Markdown-formatted string.
Return type:: str

polyzymd.compare.polymer_affinity_formatters.format_affinity_json(result)[source]

Format a PolymerAffinityScoreResult as JSON.

Parameters:: result (PolymerAffinityScoreResult) – Comparison result to format.
Returns:: JSON string.
Return type:: str

polyzymd.compare.polymer_affinity_formatters.format_affinity_result(result, format='table')[source]

Format a PolymerAffinityScoreResult in the requested format.

Parameters:

result (PolymerAffinityScoreResult) – Comparison result to format.
format (str) – Output format: “table” (default), “markdown”, or “json”.

Returns:

Formatted string.

Return type:

str

Raises:

ValueError – If format is not recognized.

Plotters

Binding free energy plotters for comparison workflow.

This module provides registered plotters for ΔG_sel (selectivity free energy) analysis:

BFEHeatmapPlotter: ΔG_sel heatmap with rows = AA groups, columns = conditions
BFEBarPlotter: Grouped bar chart of ΔG_sel by AA residue class

Both plotters load a BindingFreeEnergyResult JSON saved by the polyzymd compare binding-free-energy command (in results/ adjacent to comparison.yaml) rather than per-condition analysis directories.

Partition-aware plotting

Each FreeEnergyEntry carries a partition_name field (e.g., “aa_class”, “lid_helices”, “whole_lid_domain”) that identifies which residue grouping scheme produced that entry. Different partitions use different denominators (each partition’s total exposed surface area), so mixing groups from different partitions on the same figure is scientifically misleading.

Both plotters therefore produce one figure per (partition, polymer_type) combination. When only a single partition is present (the common case for datasets that only use default AA-class grouping), filenames and titles omit the partition name to preserve backward compatibility.

Physics interpretation

ΔG_sel < 0 →  preferential contact (polymer contacts this group more than
expected from surface availability alone)
ΔG_sel > 0 →  contact avoidance (polymer contacts this group less than expected)
ΔG_sel = 0 →  contacts match surface-availability reference exactly

Diverging colormap (RdBu_r by default) is centered at 0.0:

Blue (negative) → preference
White (zero) → neutral
Red (positive) → avoidance

Units are whatever was specified in analysis_settings.binding_free_energy.units (kT by default — dimensionless, in units of k_bT).

class polyzymd.compare.plotters.binding_free_energy.BFEHeatmapPlotter(settings)[source]

Bases: BasePlotter

Generate ΔG_sel heatmap comparing binding free energy across conditions.

Creates one figure per (partition, polymer_type) combination: - Rows: protein groups belonging to that partition - Columns: Conditions (e.g., 0% SBMA, 25% SBMA, …) - Color: ΔG_sel value with diverging colormap centered at 0

When only a single partition exists (e.g., just “aa_class”), filenames and titles match the previous single-partition behavior for backward compatibility.

Loads BindingFreeEnergyResult from results/ adjacent to comparison.yaml (accepts both binding_free_energy_comparison_*.json and bfe_comparison_*.json naming conventions).

Sign convention

Blue (negative ΔG_sel) = preferential contact Red (positive ΔG_sel) = contact avoidance

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Return True for ‘binding_free_energy’ when heatmap is enabled.

plot(data, labels, output_dir, **kwargs)[source]

Generate ΔG_sel heatmaps, one per (partition, polymer_type).

Parameters:

data (dict) – Mapping of condition_label -> condition data dict from ComparisonPlotter._load_analysis_data().
labels (sequence of str) – Condition labels in desired display order.
output_dir (Path) – Directory to save plot files.
**kwargs (Any) – Unused; for interface compatibility.

Returns:

Paths to generated plot files, or empty list.

Return type:

list[Path]

class polyzymd.compare.plotters.binding_free_energy.BFEBarPlotter(settings)[source]

Bases: BasePlotter

Generate ΔG_sel grouped bar charts comparing binding free energy across conditions.

Creates one figure per (partition, polymer_type) combination with: - Groups on x-axis: protein groups from that partition - Bars within each group: one per condition - Error bars: between-replicate SEM on ΔG_sel (delta-method fallback) - Reference line at ΔG_sel = 0

When only a single partition exists, filenames and titles match the previous single-partition behavior for backward compatibility.

Loads BindingFreeEnergyResult from results/ adjacent to comparison.yaml (accepts both binding_free_energy_comparison_*.json and bfe_comparison_*.json naming conventions).

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Return True for ‘binding_free_energy’ when bar charts are enabled.

plot(data, labels, output_dir, **kwargs)[source]

Generate ΔG_sel grouped bar charts, one per (partition, polymer_type).

Parameters:

data (dict) – Mapping of condition_label -> condition data dict from ComparisonPlotter._load_analysis_data().
labels (sequence of str) – Condition labels in desired display order.
output_dir (Path) – Directory to save plot files.
**kwargs (Any) – Unused; for interface compatibility.

Returns:

Paths to generated plot files, or empty list.

Return type:

list[Path]

Exposure dynamics plotters for comparison workflow.

Provides two registered plotters:

ExposureChaperoneFractionPlotter ("exposure_chaperone_fraction") Bar chart comparing mean chaperone fraction across conditions.
ExposureEnrichmentHeatmapPlotter ("exposure_enrichment_heatmap") Heatmap of residue-based chaperone enrichment per (polymer_type, aa_group).

Both plotters follow the established BasePlotter pattern: load data from data[label]["analysis_dir"] paths rather than expecting data to be passed via kwargs.

class polyzymd.compare.plotters.exposure.ExposureChaperoneFractionPlotter(settings)[source]

Bases: BasePlotter

Bar chart comparing chaperone fraction across conditions.

Shows mean chaperone fraction (with SEM error bars) per condition, ordered by the ranking from ExposureDynamicsComparator.compare().

Compatible with analysis_type=”exposure”.

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Check if this plotter can handle the given analysis type.

Parameters:

comparison_config (ComparisonConfig) – Full comparison configuration
analysis_type (str) – Analysis type to check (e.g., “rmsf”, “triad”, “distances”)

Returns:

True if this plotter can generate plots for the analysis type

Return type:

bool

plot(data, labels, output_dir, **kwargs)[source]

Generate chaperone fraction bar chart.

class polyzymd.compare.plotters.exposure.ExposureEnrichmentHeatmapPlotter(settings)[source]

Bases: BasePlotter

Heatmap of chaperone enrichment per (polymer_type, aa_group).

One subplot per condition; rows = polymer types, columns = AA groups. Color encodes residue-based enrichment (warm = enriched, cool = depleted).

Compatible with analysis_type=”exposure”.

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Check if this plotter can handle the given analysis type.

Parameters:

comparison_config (ComparisonConfig) – Full comparison configuration
analysis_type (str) – Analysis type to check (e.g., “rmsf”, “triad”, “distances”)

Returns:

True if this plotter can generate plots for the analysis type

Return type:

bool

plot(data, labels, output_dir, **kwargs)[source]

Generate enrichment heatmaps from cached ExposureComparisonResult.

Polymer affinity score plotters for comparison workflow.

This module provides registered plotters for the polymer affinity score:

AffinityStackedBarPlotter: Total affinity score per condition, with stacked segments showing each polymer type’s contribution.
AffinityGroupBarPlotter: Per-group breakdown comparing conditions, one figure per polymer type.

Both plotters load a PolymerAffinityScoreResult JSON saved by the polyzymd compare polymer-affinity command (in results/ adjacent to comparison.yaml).

Physics interpretation

Score < 0 → net favorable polymer-protein affinity Score > 0 → net unfavorable (avoidance dominates) Score = 0 → contacts match the surface-availability reference

Units are always kT (dimensionless, in units of k_bT).

Sign convention

More negative = stronger polymer-protein interaction. Diverging colormap is not used here (unlike BFE heatmaps) because the primary display is bar charts where sign is visually obvious.

class polyzymd.compare.plotters.polymer_affinity.AffinityStackedBarPlotter(settings)[source]

Bases: BasePlotter

Stacked bar chart of total affinity score per condition.

Each bar represents one condition’s total affinity score, with segments colored by polymer type contribution. This gives a quick overview of which polymer types contribute most to the total interaction strength.

Loads PolymerAffinityScoreResult from results/ adjacent to comparison.yaml.

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Check if this plotter can handle the given analysis type.

Parameters:

comparison_config (ComparisonConfig) – Full comparison configuration
analysis_type (str) – Analysis type to check (e.g., “rmsf”, “triad”, “distances”)

Returns:

True if this plotter can generate plots for the analysis type

Return type:

bool

plot(data, labels, output_dir, **kwargs)[source]

Generate stacked bar chart of affinity scores by condition.

Parameters:

data (dict) – Condition data dict from orchestrator.
labels (sequence of str) – Condition labels in display order.
output_dir (Path) – Directory to save plot files.

Returns:

Paths to generated plot files.

Return type:

list[Path]

class polyzymd.compare.plotters.polymer_affinity.AffinityGroupBarPlotter(settings)[source]

Bases: BasePlotter

Grouped bar chart of per-group affinity score contributions.

Creates one figure per polymer type with: - Groups on x-axis: protein groups (AA classes) - Bars within each group: one per condition - Error bars: SEM on per-group affinity score - Reference line at score = 0

Loads PolymerAffinityScoreResult from results/.

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Check if this plotter can handle the given analysis type.

Parameters:

comparison_config (ComparisonConfig) – Full comparison configuration
analysis_type (str) – Analysis type to check (e.g., “rmsf”, “triad”, “distances”)

Returns:

True if this plotter can generate plots for the analysis type

Return type:

bool

plot(data, labels, output_dir, **kwargs)[source]

Generate grouped bar charts of per-group affinity scores.

Parameters:

data (dict) – Condition data dict from orchestrator.
labels (sequence of str) – Condition labels in display order.
output_dir (Path) – Directory to save plot files.

Returns:

Paths to generated plot files.

Return type:

list[Path]

CLI

CLI commands for the compare module.

This module provides the polyzymd compare command group with subcommands for initializing comparison projects and running comparisons.