Compare Module

Core

Base classes for comparison analysis.

This module provides abstract base classes that consolidate common patterns across all comparator types, following the Template Method design pattern.

Classes

BaseConditionSummary: Abstract base for condition-level summary statistics.
BaseComparisonResult: Abstract base for complete comparison results with save/load.
PairwiseComparison: Shared model for statistical comparison between two conditions.
ANOVASummary: Shared model for ANOVA results.
BaseComparator: Abstract base implementing the Template Method pattern for comparisons.

Design Principles

Open-Closed Principle: New comparators extend base classes without modifying them.
Template Method: compare() defines the algorithm skeleton; subclasses fill in specifics.
DRY: Statistical tests, pairwise logic, and serialization are implemented once.

class polyzymd.compare.core.base.PairwiseComparison(*, condition_a, condition_b, metric='default', t_statistic, p_value, cohens_d, effect_size_interpretation, direction, significant, percent_change)[source]

Bases: BaseModel

Statistical comparison between two conditions.

This is the standard pairwise comparison result used across all comparator types. For comparators that need additional fields (e.g., multiple metrics), subclass this model.

condition_a

Label of first condition (typically control/reference).

Type:: str

condition_b

Label of second condition (typically treatment).

Type:: str

metric

Name of the metric being compared.

Type:: str

t_statistic

T-test statistic.

Type:: float

p_value

Two-tailed p-value.

Type:: float

cohens_d

Effect size (Cohen’s d).

Type:: float

effect_size_interpretation

“negligible”, “small”, “medium”, or “large”.

Type:: str

direction

Interpretation of change (e.g., “stabilizing”, “improving”).

Type:: str

significant

Whether p < 0.05.

Type:: bool

percent_change

Percent change from condition_a to condition_b.

Type:: float

condition_a: str

condition_b: str

metric: str

t_statistic: float

p_value: float

cohens_d: float

effect_size_interpretation: str

direction: str

significant: bool

percent_change: float

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.ANOVASummary(*, metric='default', f_statistic, p_value, significant)[source]

Bases: BaseModel

One-way ANOVA result summary.

metric

Name of the metric tested (e.g., “rmsf”, “coverage”).

Type:: str

f_statistic

F-statistic from ANOVA.

Type:: float

p_value

P-value for the test.

Type:: float

significant

Whether p < 0.05.

Type:: bool

metric: str

f_statistic: float

p_value: float

significant: bool

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.BaseConditionSummary(*, label, config_path, n_replicates, replicate_values)[source]

Bases: BaseModel, ABC

Abstract base class for condition-level summary statistics.

All condition summaries share these common fields. Subclasses add analysis-specific fields (e.g., mean_rmsf, coverage_mean).

label

Display name for this condition.

Type:: str

config_path

Path to the simulation config file.

Type:: str

n_replicates

Number of replicates included.

Type:: int

replicate_values

Per-replicate values of the primary metric (for statistical tests).

Type:: list[float]

label: str

config_path: str

n_replicates: int

replicate_values: list[float]

abstract property primary_metric_value: float

Return the primary metric value for ranking/comparison.

This is used by BaseComparator for sorting and statistical tests.

abstract property primary_metric_sem: float: Return the SEM of the primary metric.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.BaseComparisonResult(*, metric, name, control_label=None, conditions, pairwise_comparisons, anova=None, ranking, equilibration_time, created_at, polyzymd_version='1.2.1')[source]

Bases: BaseModel, ABC, Generic[TConditionSummary, TPairwiseComparison]

Abstract base class for comparison results.

Provides common serialization (save/load) and accessor methods. Subclasses define analysis-specific fields.

metric

The primary metric being compared (e.g., “rmsf”, “simultaneous_contact_fraction”).

Type:: str

name

Name of the comparison project.

Type:: str

control_label

Label of the control condition.

Type:: str, optional

conditions

Summary for each condition.

Type:: list[TConditionSummary]

pairwise_comparisons

Statistical comparisons (all vs control, or all pairs).

Type:: list[TPairwiseComparison]

anova

ANOVA result if 3+ conditions.

Type:: ANOVASummary, optional

ranking

Labels sorted by primary metric.

Type:: list[str]

equilibration_time

Equilibration time used.

Type:: str

created_at

When the analysis was run.

Type:: datetime

polyzymd_version

Version of polyzymd used.

Type:: str

comparison_type: ClassVar[str] = 'base'

metric: str

name: str

control_label: str | None

conditions: list[Any]

pairwise_comparisons: list[Any]

anova: ANOVASummary | list[ANOVASummary] | None

ranking: list[str]

equilibration_time: str

created_at: datetime

polyzymd_version: str

save(path)[source]

Save result to JSON file.

Parameters:: path (Path or str) – Output path.
Returns:: Path to saved file.
Return type:: Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:: path (Path or str) – Path to JSON file.
Returns:: Loaded result.
Return type:: Self

get_condition(label)[source]

Get a condition by label.

Parameters:: label (str) – Condition label.
Returns:: The matching condition.
Return type:: BaseConditionSummary
Raises:: KeyError – If condition not found.

get_comparison(label)[source]

Get pairwise comparison for a condition vs control.

Parameters:: label (str) – Treatment condition label.
Returns:: The comparison, or None if not found.
Return type:: PairwiseComparison or None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.BaseComparisonResult(*, metric, name, control_label=None, conditions, pairwise_comparisons, anova=None, ranking, equilibration_time, created_at, polyzymd_version='1.2.1')[source]

Bases: BaseModel, ABC, Generic[TConditionSummary, TPairwiseComparison]

Abstract base class for comparison results.

Provides common serialization (save/load) and accessor methods. Subclasses define analysis-specific fields.

metric

The primary metric being compared (e.g., “rmsf”, “simultaneous_contact_fraction”).

Type:: str

name

Name of the comparison project.

Type:: str

control_label

Label of the control condition.

Type:: str, optional

conditions

Summary for each condition.

Type:: list[TConditionSummary]

pairwise_comparisons

Statistical comparisons (all vs control, or all pairs).

Type:: list[TPairwiseComparison]

anova

ANOVA result if 3+ conditions.

Type:: ANOVASummary, optional

ranking

Labels sorted by primary metric.

Type:: list[str]

equilibration_time

Equilibration time used.

Type:: str

created_at

When the analysis was run.

Type:: datetime

polyzymd_version

Version of polyzymd used.

Type:: str

comparison_type: ClassVar[str] = 'base'

metric: str

name: str

control_label: str | None

conditions: list[Any]

pairwise_comparisons: list[Any]

anova: ANOVASummary | list[ANOVASummary] | None

ranking: list[str]

equilibration_time: str

created_at: datetime

polyzymd_version: str

save(path)[source]

Save result to JSON file.

Parameters:: path (Path or str) – Output path.
Returns:: Path to saved file.
Return type:: Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:: path (Path or str) – Path to JSON file.
Returns:: Loaded result.
Return type:: Self

get_condition(label)[source]

Get a condition by label.

Parameters:: label (str) – Condition label.
Returns:: The matching condition.
Return type:: BaseConditionSummary
Raises:: KeyError – If condition not found.

get_comparison(label)[source]

Get pairwise comparison for a condition vs control.

Parameters:: label (str) – Treatment condition label.
Returns:: The comparison, or None if not found.
Return type:: PairwiseComparison or None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.BaseComparisonResult(*, metric, name, control_label=None, conditions, pairwise_comparisons, anova=None, ranking, equilibration_time, created_at, polyzymd_version='1.2.1')[source]

Bases: BaseModel, ABC, Generic[TConditionSummary, TPairwiseComparison]

Abstract base class for comparison results.

Provides common serialization (save/load) and accessor methods. Subclasses define analysis-specific fields.

metric

The primary metric being compared (e.g., “rmsf”, “simultaneous_contact_fraction”).

Type:: str

name

Name of the comparison project.

Type:: str

control_label

Label of the control condition.

Type:: str, optional

conditions

Summary for each condition.

Type:: list[TConditionSummary]

pairwise_comparisons

Statistical comparisons (all vs control, or all pairs).

Type:: list[TPairwiseComparison]

anova

ANOVA result if 3+ conditions.

Type:: ANOVASummary, optional

ranking

Labels sorted by primary metric.

Type:: list[str]

equilibration_time

Equilibration time used.

Type:: str

created_at

When the analysis was run.

Type:: datetime

polyzymd_version

Version of polyzymd used.

Type:: str

comparison_type: ClassVar[str] = 'base'

metric: str

name: str

control_label: str | None

conditions: list[Any]

pairwise_comparisons: list[Any]

anova: ANOVASummary | list[ANOVASummary] | None

ranking: list[str]

equilibration_time: str

created_at: datetime

polyzymd_version: str

save(path)[source]

Save result to JSON file.

Parameters:: path (Path or str) – Output path.
Returns:: Path to saved file.
Return type:: Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:: path (Path or str) – Path to JSON file.
Returns:: Loaded result.
Return type:: Self

get_condition(label)[source]

Get a condition by label.

Parameters:: label (str) – Condition label.
Returns:: The matching condition.
Return type:: BaseConditionSummary
Raises:: KeyError – If condition not found.

get_comparison(label)[source]

Get pairwise comparison for a condition vs control.

Parameters:: label (str) – Treatment condition label.
Returns:: The comparison, or None if not found.
Return type:: PairwiseComparison or None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.BaseComparisonResult(*, metric, name, control_label=None, conditions, pairwise_comparisons, anova=None, ranking, equilibration_time, created_at, polyzymd_version='1.2.1')[source]

Bases: BaseModel, ABC, Generic[TConditionSummary, TPairwiseComparison]

Abstract base class for comparison results.

Provides common serialization (save/load) and accessor methods. Subclasses define analysis-specific fields.

metric

The primary metric being compared (e.g., “rmsf”, “simultaneous_contact_fraction”).

Type:: str

name

Name of the comparison project.

Type:: str

control_label

Label of the control condition.

Type:: str, optional

conditions

Summary for each condition.

Type:: list[TConditionSummary]

pairwise_comparisons

Statistical comparisons (all vs control, or all pairs).

Type:: list[TPairwiseComparison]

anova

ANOVA result if 3+ conditions.

Type:: ANOVASummary, optional

ranking

Labels sorted by primary metric.

Type:: list[str]

equilibration_time

Equilibration time used.

Type:: str

created_at

When the analysis was run.

Type:: datetime

polyzymd_version

Version of polyzymd used.

Type:: str

comparison_type: ClassVar[str] = 'base'

metric: str

name: str

control_label: str | None

conditions: list[Any]

pairwise_comparisons: list[Any]

anova: ANOVASummary | list[ANOVASummary] | None

ranking: list[str]

equilibration_time: str

created_at: datetime

polyzymd_version: str

save(path)[source]

Save result to JSON file.

Parameters:: path (Path or str) – Output path.
Returns:: Path to saved file.
Return type:: Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:: path (Path or str) – Path to JSON file.
Returns:: Loaded result.
Return type:: Self

get_condition(label)[source]

Get a condition by label.

Parameters:: label (str) – Condition label.
Returns:: The matching condition.
Return type:: BaseConditionSummary
Raises:: KeyError – If condition not found.

get_comparison(label)[source]

Get pairwise comparison for a condition vs control.

Parameters:: label (str) – Treatment condition label.
Returns:: The comparison, or None if not found.
Return type:: PairwiseComparison or None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.BaseComparisonResult(*, metric, name, control_label=None, conditions, pairwise_comparisons, anova=None, ranking, equilibration_time, created_at, polyzymd_version='1.2.1')[source]

Bases: BaseModel, ABC, Generic[TConditionSummary, TPairwiseComparison]

Abstract base class for comparison results.

Provides common serialization (save/load) and accessor methods. Subclasses define analysis-specific fields.

metric

The primary metric being compared (e.g., “rmsf”, “simultaneous_contact_fraction”).

Type:: str

name

Name of the comparison project.

Type:: str

control_label

Label of the control condition.

Type:: str, optional

conditions

Summary for each condition.

Type:: list[TConditionSummary]

pairwise_comparisons

Statistical comparisons (all vs control, or all pairs).

Type:: list[TPairwiseComparison]

anova

ANOVA result if 3+ conditions.

Type:: ANOVASummary, optional

ranking

Labels sorted by primary metric.

Type:: list[str]

equilibration_time

Equilibration time used.

Type:: str

created_at

When the analysis was run.

Type:: datetime

polyzymd_version

Version of polyzymd used.

Type:: str

comparison_type: ClassVar[str] = 'base'

metric: str

name: str

control_label: str | None

conditions: list[Any]

pairwise_comparisons: list[Any]

anova: ANOVASummary | list[ANOVASummary] | None

ranking: list[str]

equilibration_time: str

created_at: datetime

polyzymd_version: str

save(path)[source]

Save result to JSON file.

Parameters:: path (Path or str) – Output path.
Returns:: Path to saved file.
Return type:: Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:: path (Path or str) – Path to JSON file.
Returns:: Loaded result.
Return type:: Self

get_condition(label)[source]

Get a condition by label.

Parameters:: label (str) – Condition label.
Returns:: The matching condition.
Return type:: BaseConditionSummary
Raises:: KeyError – If condition not found.

get_comparison(label)[source]

Get pairwise comparison for a condition vs control.

Parameters:: label (str) – Treatment condition label.
Returns:: The comparison, or None if not found.
Return type:: PairwiseComparison or None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.core.base.BaseComparator(config, analysis_settings, equilibration=None)[source]

Bases: ABC, Generic[TAnalysisSettings, TConditionData, TConditionSummary, TResult]

Abstract base class for all comparators using Template Method pattern.

The compare() method defines the comparison algorithm skeleton: 1. Load/compute analysis for each condition 2. Build condition summaries 3. Compute pairwise statistical comparisons 4. Compute ANOVA (if 3+ conditions) 5. Rank conditions 6. Build and return result

Subclasses implement the abstract methods to customize each step.

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (TAnalysisSettings) – Analysis-specific settings.
equilibration (str, optional) – Equilibration time override.
Parameters (Type)
---------------
TAnalysisSettings – Type of analysis settings (e.g., RMSFAnalysisSettings).
TConditionData – Type of raw data loaded for each condition.
TConditionSummary – Type of condition summary (e.g., RMSFConditionSummary).
TResult – Type of comparison result (e.g., RMSFComparisonResult).

comparison_type: ClassVar[str] = 'base'

__init__(config, analysis_settings, equilibration=None)[source]

abstractmethod classmethod comparison_type_name()[source]

Return the comparison type identifier (e.g., “rmsf”, “contacts”).

Returns:: Type identifier used in registry and CLI.
Return type:: str

abstract property metric_type: MetricType

Declare whether this comparator’s metric is mean or variance-based.

This determines how autocorrelation is handled in the underlying analysis:

MEAN_BASED: Use all frames for computation, correct uncertainty using N_eff (effective sample size). Examples: average distance, contact fraction, catalytic triad proximity.
VARIANCE_BASED: Subsample to independent frames separated by 2τ (correlation time) to avoid bias in variance estimates. Examples: RMSF, fluctuation metrics.

Contributors implementing new comparators MUST declare the appropriate metric type to ensure correct statistical treatment per LiveCoMS best practices (Grossfield et al., 2018).

Returns:: The metric type for this comparator.
Return type:: MetricType

References

Grossfield et al. (2018) LiveCoMS 1:5067 (Best Practices for Uncertainty)
GitHub: dmzuckerman/Sampling-Uncertainty

compare(recompute=False)[source]

Run comparison across all conditions (Template Method).

This method defines the algorithm skeleton. Subclasses customize behavior by implementing the abstract hook methods.

Parameters:: recompute (bool, optional) – If True, force recompute even if cached results exist.
Returns:: Complete comparison results with statistics and rankings.
Return type:: TResult

Registry for comparator types.

This module provides extensible infrastructure for registering comparator types following the Open-Closed Principle (OCP). New comparators can be added by registering with the ComparatorRegistry without modifying core code.

Example

Registering a new comparator:

>>> from polyzymd.compare.core.registry import ComparatorRegistry
>>> from polyzymd.compare.core.base import BaseComparator
>>>
>>> @ComparatorRegistry.register("my_metric")
... class MyComparator(BaseComparator):
...     @classmethod
...     def comparison_type_name(cls) -> str:
...         return "my_metric"
...     ...
>>>
>>> # Create comparator instance via registry
>>> comparator = ComparatorRegistry.create("my_metric", config, settings)

class polyzymd.compare.core.registry.ComparatorRegistry[source]

Bases: object

Registry for comparator implementations.

Allows new comparators to be registered without modifying core code. Use the register decorator to add new comparator classes.

Examples

>>> @ComparatorRegistry.register("rmsf")
... class RMSFComparator(BaseComparator):
...     ...
>>>
>>> # List available comparators
>>> ComparatorRegistry.list_available()
['contacts', 'rmsf', 'triad']
>>>
>>> # Create comparator instance
>>> comparator = ComparatorRegistry.create("rmsf", config, settings)

classmethod register(name=None)[source]

Decorator to register a comparator class.

Parameters:: name (str, optional) – Registry key. If None, uses the class’s comparison_type_name().
Returns:: Decorator function.
Return type:: Callable

Examples

>>> @ComparatorRegistry.register("rmsf")
... class RMSFComparator(BaseComparator):
...     @classmethod
...     def comparison_type_name(cls) -> str:
...         return "rmsf"

classmethod get(name)[source]

Get comparator class by name.

Parameters:: name (str) – Comparator type identifier.
Returns:: The registered comparator class.
Return type:: Type[BaseComparator]
Raises:: ValueError – If the comparator type is not registered.

classmethod list_available()[source]

List all registered comparator types.

Returns:: Sorted list of registered type names.
Return type:: list[str]

classmethod is_registered(name)[source]

Check if a comparator type is registered.

Parameters:: name (str) – Comparator type identifier.
Returns:: True if registered, False otherwise.
Return type:: bool

classmethod create(name, config, analysis_settings, equilibration=None, **kwargs)[source]

Factory to create a comparator instance.

Parameters:

name (str) – Comparator type identifier.
config (ComparisonConfig) – Comparison configuration.
analysis_settings – Analysis-specific settings.
equilibration (str, optional) – Equilibration time override.
**kwargs – Additional comparator-specific arguments.

Returns:

Configured comparator instance.

Return type:

BaseComparator

classmethod clear()[source]

Clear the registry (for testing purposes).

Configuration

Configuration schema for comparison projects.

This module defines the YAML schema for comparison.yaml files that specify which simulation conditions to compare.

The schema has two main sections: - analysis_settings: Defines WHAT analyses to run (shared across conditions) - comparison_settings: Defines HOW to compare (statistical parameters)

Both sections use a registry-based approach for extensibility. New analysis types can be added by registering with AnalysisSettingsRegistry and ComparisonSettingsRegistry (see polyzymd.compare.settings).

class polyzymd.compare.config.ConditionConfig(*, label, config, replicates)[source]

Bases: BaseModel

Configuration for one condition in a comparison.

label

Display name for this condition (e.g., “No Polymer”, “100% SBMA”)

Type:: str

config

Path to the simulation’s config.yaml file

Type:: Path

replicates

List of replicate numbers to include in the analysis

Type:: list[int]

label: str

config: Path

replicates: list[int]

classmethod resolve_path(v)[source]

Convert string paths to Path objects.

classmethod ensure_list(v)[source]

Ensure replicates is a list.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.AnalysisSettingsContainer(**data)[source]

Bases: BaseModel

Container for analysis settings (WHAT to analyze).

Uses dynamic attribute access to support any registered analysis type without hardcoding field names.

model_config = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

__init__(**data)[source]

Initialize with dynamic analysis settings.

Parameters:: **data (Any) – Analysis settings keyed by analysis type name.

get(analysis_type)[source]

Get settings for a specific analysis type.

Parameters:: analysis_type (str) – Analysis type identifier (e.g., “rmsf”, “contacts”).
Returns:: Settings for the analysis type, or None if not configured.
Return type:: BaseAnalysisSettings or None

get_enabled_analyses()[source]

Get list of enabled analysis types.

Returns:: Names of configured analyses (presence implies enabled).
Return type:: list[str]

Notes

Uses actual model data from comparison.yaml rather than relying on a registry. This makes comparison.yaml the source of truth for which analyses are enabled.

to_analysis_yaml_dict(replicates, eq_time)[source]

Convert to analysis.yaml-compatible dictionary.

Parameters:

replicates (list[int]) – Replicate numbers for the analysis.yaml.
eq_time (str) – Equilibration time for the analysis.yaml.

Returns:

Dictionary suitable for writing to analysis.yaml.

Return type:

dict[str, Any]

class polyzymd.compare.config.ComparisonSettingsContainer(**data)[source]

Bases: BaseModel

Container for comparison settings (HOW to compare).

Uses dynamic attribute access to support any registered comparison type. Each analysis type in analysis_settings must have a corresponding entry here (can be empty dict) to enable comparison.

model_config = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

__init__(**data)[source]

Initialize with dynamic comparison settings.

Parameters:: **data (Any) – Comparison settings keyed by analysis type name.

get(analysis_type)[source]

Get settings for a specific comparison type.

Parameters:: analysis_type (str) – Analysis type identifier (e.g., “rmsf”, “contacts”).
Returns:: Comparison settings, or None if not configured.
Return type:: BaseComparisonSettings or None

get_enabled_comparisons()[source]

Get list of enabled comparison types.

Returns:: Names of configured comparisons.
Return type:: list[str]

class polyzymd.compare.config.RMSFPlotSettings(*, show_error=True, highlight_residues=<factory>, figsize_profile=(14, 4), figsize_comparison=(8, 6))[source]

Bases: BasePlotSettings

RMSF-specific plot customization.

show_error

Show error bands/bars on plots (default True)

Type:: bool

highlight_residues

Residue numbers to highlight with vertical lines (e.g., active site)

Type:: list[int]

figsize_profile

Figure size for per-residue profile plots

Type:: tuple[float, float]

figsize_comparison

Figure size for bar comparison plots

Type:: tuple[float, float]

show_error: bool

highlight_residues: list[int]

figsize_profile: tuple[float, float]

figsize_comparison: tuple[float, float]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.TriadPlotSettings(*, generate_kde_panel=True, generate_bars=True, generate_2d_kde=False, threshold_line_color='red', kde_fill_alpha=0.7, kde_xlim=(0.0, 7.0), figsize_kde_panel=None, figsize_bars=(10, 6))[source]

Bases: BasePlotSettings

Triad-specific plot customization.

generate_kde_panel

Generate multi-row KDE panel plot (default True)

Type:: bool

generate_bars

Generate grouped threshold bar chart (default True)

Type:: bool

generate_2d_kde

Generate 2D joint KDE plot (default False, more specialized)

Type:: bool

threshold_line_color

Color for threshold vertical line

Type:: str

kde_fill_alpha

Transparency for KDE fill (0-1)

Type:: float

kde_xlim

X-axis limits for KDE panel in Angstroms (default (0, 7)).

Type:: tuple[float, float]

figsize_kde_panel

Figure size for KDE panel (auto-calculated if None)

Type:: tuple[float, float] | None

figsize_bars

Figure size for bar chart

Type:: tuple[float, float]

generate_kde_panel: bool

generate_bars: bool

generate_2d_kde: bool

threshold_line_color: str

kde_fill_alpha: float

kde_xlim: tuple[float, float]

figsize_kde_panel: tuple[float, float] | None

figsize_bars: tuple[float, float]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.DistancesPlotSettings(*, show_threshold=True, use_kde=True, generate_state_bars=True, figsize=(10, 6))[source]

Bases: BasePlotSettings

Distance analysis plot customization.

show_threshold

Show threshold line on distribution plots

Type:: bool

use_kde

Use KDE instead of histogram for distributions

Type:: bool

generate_state_bars

Generate per-pair state bar charts (above/below threshold). Each pair gets its own figure showing the fraction of frames in each state per condition. Default True.

Type:: bool

figsize

Default figure size for distance plots

Type:: tuple[float, float]

show_threshold: bool

use_kde: bool

generate_state_bars: bool

figsize: tuple[float, float]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.ContactsPlotSettings(*, figsize=(10, 8), generate_enrichment_heatmap=True, generate_enrichment_bars=True, figsize_enrichment_heatmap=None, figsize_enrichment_bars=(10, 6), enrichment_colormap='RdBu_r', show_enrichment_error=True, generate_system_coverage_heatmap=True, generate_system_coverage_bars=True, figsize_system_coverage_heatmap=None, figsize_system_coverage_bars=(10, 6), show_system_coverage_error=True, generate_user_partition_bars=True, figsize_user_partition_bars=(10, 6), show_user_partition_error=True, generate_contact_fraction_profile=True, figsize_contact_fraction_profile=(14, 5), show_contact_fraction_profile_error=True, contact_fraction_profile_threshold=None, generate_residence_time_profile=True, figsize_residence_time_profile=(14, 5), show_residence_time_profile_error=True, generate_cf_by_aa_class_bars=True, figsize_cf_by_aa_class_bars=(10, 6), show_cf_by_aa_class_error=True, generate_cf_by_partition_bars=True, figsize_cf_by_partition_bars=(10, 6), show_cf_by_partition_error=True, generate_rt_by_aa_class_bars=True, figsize_rt_by_aa_class_bars=(10, 6), show_rt_by_aa_class_error=True, generate_rt_by_partition_bars=True, figsize_rt_by_partition_bars=(10, 6), show_rt_by_partition_error=True, highlight_residues=<factory>)[source]

Bases: BasePlotSettings

Contacts analysis plot customization.

figsize

Default figure size for contact plots

Type:: tuple[float, float]

generate_enrichment_heatmap

Generate binding preference enrichment heatmap (default True)

Type:: bool

generate_enrichment_bars

Generate binding preference bar charts (default True)

Type:: bool

figsize_enrichment_heatmap

Figure size for enrichment heatmap (auto-calculated if None)

Type:: tuple[float, float] | None

figsize_enrichment_bars

Figure size for enrichment bar charts

Type:: tuple[float, float]

enrichment_colormap

Colormap for enrichment heatmap (diverging recommended)

Type:: str

show_enrichment_error

Show error bars on enrichment bar charts (default True)

Type:: bool

generate_system_coverage_heatmap

Generate system coverage enrichment heatmap (default True)

Type:: bool

generate_system_coverage_bars

Generate system coverage bar charts (default True)

Type:: bool

figsize_system_coverage_heatmap

Figure size for system coverage heatmap (auto-calculated if None)

Type:: tuple[float, float] | None

figsize_system_coverage_bars

Figure size for system coverage bar charts

Type:: tuple[float, float]

show_system_coverage_error

Show error bars on system coverage bar charts (default True)

Type:: bool

generate_user_partition_bars

Generate user-defined partition bar charts (default True)

Type:: bool

figsize_user_partition_bars

Figure size for user-defined partition bar charts

Type:: tuple[float, float]

show_user_partition_error

Show error bars on user-defined partition bar charts (default True)

Type:: bool

generate_contact_fraction_profile

Generate per-residue contact fraction line plot (default True)

Type:: bool

figsize_contact_fraction_profile

Figure size for contact fraction profile plot

Type:: tuple[float, float]

show_contact_fraction_profile_error

Show SEM fill_between bands on contact fraction profile (default True)

Type:: bool

contact_fraction_profile_threshold

If set, draw a horizontal threshold line on the contact fraction profile. Residues above this value are considered “high contact”.

Type:: float or None

generate_residence_time_profile

Generate per-residue mean residence time line plot (default True)

Type:: bool

figsize_residence_time_profile

Figure size for residence time profile plot

Type:: tuple[float, float]

show_residence_time_profile_error

Show SEM fill_between bands on residence time profile (default True)

Type:: bool

generate_cf_by_aa_class_bars

Generate contact fraction by AA class grouped bar chart (default True)

Type:: bool

figsize_cf_by_aa_class_bars

Figure size for contact fraction by AA class bar chart

Type:: tuple[float, float]

show_cf_by_aa_class_error

Show error bars on contact fraction by AA class bar chart (default True)

Type:: bool

generate_cf_by_partition_bars

Generate contact fraction by user-defined partition bar charts (default True)

Type:: bool

figsize_cf_by_partition_bars

Figure size for contact fraction by partition bar charts

Type:: tuple[float, float]

show_cf_by_partition_error

Show error bars on contact fraction by partition bar charts (default True)

Type:: bool

generate_rt_by_aa_class_bars

Generate residence time by AA class grouped bar chart (default True)

Type:: bool

figsize_rt_by_aa_class_bars

Figure size for residence time by AA class bar chart

Type:: tuple[float, float]

show_rt_by_aa_class_error

Show error bars on residence time by AA class bar chart (default True)

Type:: bool

generate_rt_by_partition_bars

Generate residence time by user-defined partition bar charts (default True)

Type:: bool

figsize_rt_by_partition_bars

Figure size for residence time by partition bar charts

Type:: tuple[float, float]

show_rt_by_partition_error

Show error bars on residence time by partition bar charts (default True)

Type:: bool

highlight_residues

Residue IDs to mark with vertical dashed lines on profile plots. Useful for highlighting active-site residues or known anchor points.

Type:: list[int]

figsize: tuple[float, float]

generate_enrichment_heatmap: bool

generate_enrichment_bars: bool

figsize_enrichment_heatmap: tuple[float, float] | None

figsize_enrichment_bars: tuple[float, float]

enrichment_colormap: str

show_enrichment_error: bool

generate_system_coverage_heatmap: bool

generate_system_coverage_bars: bool

figsize_system_coverage_heatmap: tuple[float, float] | None

figsize_system_coverage_bars: tuple[float, float]

show_system_coverage_error: bool

generate_user_partition_bars: bool

figsize_user_partition_bars: tuple[float, float]

show_user_partition_error: bool

generate_contact_fraction_profile: bool

figsize_contact_fraction_profile: tuple[float, float]

show_contact_fraction_profile_error: bool

contact_fraction_profile_threshold: float | None

generate_residence_time_profile: bool

figsize_residence_time_profile: tuple[float, float]

show_residence_time_profile_error: bool

generate_cf_by_aa_class_bars: bool

figsize_cf_by_aa_class_bars: tuple[float, float]

show_cf_by_aa_class_error: bool

generate_cf_by_partition_bars: bool

figsize_cf_by_partition_bars: tuple[float, float]

show_cf_by_partition_error: bool

generate_rt_by_aa_class_bars: bool

figsize_rt_by_aa_class_bars: tuple[float, float]

show_rt_by_aa_class_error: bool

generate_rt_by_partition_bars: bool

figsize_rt_by_partition_bars: tuple[float, float]

show_rt_by_partition_error: bool

highlight_residues: list[int]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.BFEPlotSettings(*, generate_heatmap=True, generate_bars=True, figsize_heatmap=None, figsize_bars=(10, 6), colormap='RdBu_r', show_error_bars=True, annotate_heatmap=True)[source]

Bases: BasePlotSettings

Binding free energy plot customization.

generate_heatmap

Generate ΔG_sel heatmap (rows = AA groups, columns = conditions). Default True.

Type:: bool

generate_bars

Generate ΔG_sel grouped bar chart (one bar per condition per AA group). Default True.

Type:: bool

figsize_heatmap

Figure size for ΔG_sel heatmap (auto-calculated if None).

Type:: tuple[float, float] | None

figsize_bars

Figure size for ΔG_sel bar charts.

Type:: tuple[float, float]

colormap

Diverging colormap for heatmap (default “RdBu_r”: red = avoidance, blue = preference).

Type:: str

show_error_bars

Show SEM error bars on bar charts. Default True.

Type:: bool

annotate_heatmap

Annotate each heatmap cell with its ΔG_sel value. Default True.

Type:: bool

generate_heatmap: bool

generate_bars: bool

figsize_heatmap: tuple[float, float] | None

figsize_bars: tuple[float, float]

colormap: str

show_error_bars: bool

annotate_heatmap: bool

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.AffinityPlotSettings(*, generate_stacked_bars=True, generate_group_bars=True, figsize_stacked=(10, 6), figsize_group_bars=(10, 6), show_error_bars=True)[source]

Bases: BasePlotSettings

Polymer affinity score plot customization.

generate_stacked_bars

Generate stacked bar chart of total score by condition, broken down by polymer type. Default True.

Type:: bool

generate_group_bars

Generate grouped bar chart showing per-group contributions across conditions. Default True.

Type:: bool

figsize_stacked

Figure size for stacked bar chart.

Type:: tuple[float, float]

figsize_group_bars

Figure size for grouped bar charts.

Type:: tuple[float, float]

show_error_bars

Show SEM error bars on plots. Default True.

Type:: bool

generate_stacked_bars: bool

generate_group_bars: bool

figsize_stacked: tuple[float, float]

figsize_group_bars: tuple[float, float]

show_error_bars: bool

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.SSPlotSettings(*, generate_timeline=True, generate_content_bars=True, generate_individual_bars=True, generate_diff_heatmap=True, figsize_timeline=(14, 6), figsize_content_bars=(10, 6), figsize_diff_heatmap=None, diff_colormap='RdBu_r')[source]

Bases: BasePlotSettings

Secondary structure plot customization.

generate_timeline

Generate per-condition residue x time SS heatmap. Default True.

Type:: bool

generate_content_bars

Generate grouped bar chart of helix/strand/coil fractions. Default True.

Type:: bool

generate_individual_bars

Generate one bar chart per SS type (helix, beta-sheet, no-SS). Default True.

Type:: bool

generate_diff_heatmap

Generate condition x residue persistence difference heatmap. Default True.

Type:: bool

figsize_timeline

Figure size for timeline heatmap.

Type:: tuple[float, float]

figsize_content_bars

Figure size for content bar chart.

Type:: tuple[float, float]

figsize_diff_heatmap

Figure size for difference heatmap (auto-calculated if None).

Type:: tuple[float, float] | None

diff_colormap

Diverging colormap for difference heatmap.

Type:: str

generate_timeline: bool

generate_content_bars: bool

generate_individual_bars: bool

generate_diff_heatmap: bool

figsize_timeline: tuple[float, float]

figsize_content_bars: tuple[float, float]

figsize_diff_heatmap: tuple[float, float] | None

diff_colormap: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.PlotTheme(*, title_fontsize=13, suptitle_fontsize=14, label_fontsize=11, tick_fontsize=9, legend_fontsize=9, annotation_fontsize=9, small_fontsize=8, tiny_fontsize=7, bar_alpha=0.85, bar_edgecolor='black', bar_linewidth=0.5, bar_capsize=4, dot_size=18, dot_alpha=0.7, dot_color='black', line_alpha=0.8, fill_alpha=0.25, reference_line_color='black', reference_line_style='--', reference_line_width=1.5, highlight_line_alpha=0.5, hide_top_spine=True, hide_right_spine=True, title_fontweight='bold', legend_loc='center left', legend_bbox=(1.02, 0.5), show_watermark=True)[source]

Bases: BaseModel

Centralized visual defaults for all comparison plots.

Replaces ~219 hardcoded style values (font sizes, alphas, line widths, marker sizes, spine visibility, etc.) across all plotter files with a single configurable Pydantic model.

Three presets are available via class methods:

PlotTheme.publication() — default; print-ready sizes and weights.
PlotTheme.presentation() — ~1.3x larger fonts/dots/lines for slides.
PlotTheme.minimal() — no dots, no bar edges, thinner lines.

Users can override individual values in comparison.yaml:

plot_settings:
  style: "publication"
  theme:
    title_fontsize: 16
    dot_size: 24

Parameters:

title_fontsize (int) – Font size for axes titles.
suptitle_fontsize (int) – Font size for figure suptitles.
label_fontsize (int) – Font size for axis labels (xlabel/ylabel).
tick_fontsize (int) – Font size for tick labels.
legend_fontsize (int) – Font size for legend entries.
annotation_fontsize (int) – Font size for heatmap cell annotations and inline text.
small_fontsize (int) – Font size for secondary annotations (e.g. SEM ± labels).
tiny_fontsize (int) – Font size for fine-grained annotations (e.g. residue IDs).
bar_alpha (float) – Opacity for bar chart fill.
bar_edgecolor (str) – Edge colour for bar outlines.
bar_linewidth (float) – Edge line width for bars.
bar_capsize (int) – Error bar cap size in points.
dot_size (int) – Marker size for replicate dot overlays (s= in scatter).
dot_alpha (float) – Opacity for replicate dots.
dot_color (str) – Colour for replicate dots.
line_alpha (float) – Opacity for line plots (e.g. RMSF profiles).
fill_alpha (float) – Opacity for fill_between bands (e.g. SEM regions).
reference_line_color (str) – Colour for horizontal/vertical reference lines.
reference_line_style (str) – Linestyle for reference lines (e.g. "--").
reference_line_width (float) – Line width for reference lines.
highlight_line_alpha (float) – Opacity for highlight / vertical reference lines.
hide_top_spine (bool) – Whether to hide the top axis spine.
hide_right_spine (bool) – Whether to hide the right axis spine.
title_fontweight (str) – Font weight for titles (e.g. "bold", "normal").
legend_loc (str) – Matplotlib legend location string (e.g. "center left"). Used with legend_bbox to place the legend outside the axes.
legend_bbox (tuple of float) – bbox_to_anchor for legend placement, relative to axes. Default (1.02, 0.5) places it just outside the right edge, vertically centred.
show_watermark (bool) – Whether to render a subtle “Made by PolyzyMD” watermark in the bottom-right corner of every saved figure. Default True.

title_fontsize: int

suptitle_fontsize: int

label_fontsize: int

tick_fontsize: int

legend_fontsize: int

annotation_fontsize: int

small_fontsize: int

tiny_fontsize: int

bar_alpha: float

bar_edgecolor: str

bar_linewidth: float

bar_capsize: int

dot_size: int

dot_alpha: float

dot_color: str

line_alpha: float

fill_alpha: float

reference_line_color: str

reference_line_style: str

reference_line_width: float

highlight_line_alpha: float

hide_top_spine: bool

hide_right_spine: bool

title_fontweight: str

legend_loc: str

legend_bbox: tuple[float, float]

show_watermark: bool

classmethod publication()[source]

Publication preset — default values, print-ready.

classmethod presentation()[source]

Presentation preset — ~1.3x larger fonts/dots/lines for slides.

classmethod minimal()[source]

Minimal preset — no dots, no bar edges, thinner lines.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.config.PlotSettings(*, output_dir=PosixPath('figures'), format='png', dpi=300, style='publication', color_palette='tab10', theme=<factory>, **data)[source]

Bases: BaseModel

Global plot settings for comparison.yaml.

Controls plot generation for all analyses. Per-analysis plot settings are discovered via PlotSettingsRegistry — any key in the YAML that matches a registered analysis type is parsed into the corresponding settings class. Unrecognised keys that are not global fields are logged and skipped.

output_dir

Directory for generated plots (relative to comparison.yaml)

Type:: Path

format

Image format: “png”, “pdf”, or “svg”

Type:: str

dpi

Resolution for raster formats (PNG)

Type:: int

style

Plot style preset: “publication”, “presentation”, or “minimal”

Type:: str

color_palette

Seaborn/matplotlib color palette name

Type:: str

theme

Resolved visual theme. Built from the style preset and any user overrides in the theme: YAML block.

Type:: PlotTheme

Notes

Attribute access for any registered analysis type always succeeds: if the user did not provide that section in YAML, a default-constructed settings instance is returned. This means self.settings.rmsf.show_error is always safe, even when the YAML has no rmsf: block.

Examples

In comparison.yaml:

plot_settings:
  output_dir: "figures/"
  format: "png"
  dpi: 300
  style: "publication"

  rmsf:
    highlight_residues: [77, 133, 156]

  triad:
    generate_2d_kde: true

model_config = {'extra': 'allow'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

output_dir: Path

format: str

dpi: int

style: str

color_palette: str

theme: PlotTheme

__init__(**data)[source]

Initialize with global fields and registry-discovered per-analysis settings.

Theme resolution: the style field selects a preset (publication, presentation, or minimal) and then any user-supplied theme: overrides are merged on top. This allows style: presentation with theme: {dot_size: 40} to use the presentation preset but override just the dot size.

Parameters:: **data (Any) – Plot settings from YAML. Keys matching registered analysis types are parsed into their settings classes; global keys are handled by Pydantic; unknown keys are logged and skipped.

__getattr__(name)[source]

Fall back to default-constructed settings for registered types.

This ensures self.settings.rmsf.show_error works even when the user omitted the rmsf: block from their YAML.

Parameters:: name (str) – Attribute name.
Returns:: Default-constructed settings if name is a registered type.
Return type:: BasePlotSettings
Raises:: AttributeError – If name is not a registered plot settings type.

classmethod resolve_output_dir(v)[source]

Convert string paths to Path objects.

class polyzymd.compare.config.ComparisonConfig(*, name, description=None, control=None, conditions, defaults=<factory>, analysis_settings=<factory>, comparison_settings=<factory>, plot_settings=<factory>, source_path=None)[source]

Bases: BaseModel

Schema for comparison.yaml configuration files.

A comparison config defines multiple simulation conditions to compare, along with analysis settings and comparison-specific parameters.

The schema follows a three-section pattern: - analysis_settings: WHAT to analyze (shared across conditions) - comparison_settings: HOW to compare (statistical parameters) - plot_settings: HOW to visualize (plot customization)

name

Name of the comparison project

Type:: str

description

Description of what is being compared

Type:: str, optional

control

Label of the control condition for relative comparisons

Type:: str, optional

conditions

List of conditions to compare

Type:: list[ConditionConfig]

defaults

Default analysis parameters (equilibration_time)

Type:: AnalysisDefaults

analysis_settings

Analysis parameters (WHAT to analyze)

Type:: AnalysisSettingsContainer

comparison_settings

Comparison parameters (HOW to compare)

Type:: ComparisonSettingsContainer

plot_settings

Plot customization (HOW to visualize)

Type:: PlotSettings

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> print(config.name)
"Polymer Stabilization Study"
>>> for cond in config.conditions:
...     print(f"{cond.label}: {cond.config}")
>>> print("Enabled analyses:", config.analysis_settings.get_enabled_analyses())
>>> rmsf_settings = config.analysis_settings.get("rmsf")
>>> if rmsf_settings:
...     print(f"RMSF selection: {rmsf_settings.selection}")

name: str

description: str | None

control: str | None

conditions: list[ConditionConfig]

defaults: AnalysisDefaults

analysis_settings: AnalysisSettingsContainer

comparison_settings: ComparisonSettingsContainer

plot_settings: PlotSettings

source_path: Path | None

classmethod parse_analysis_settings(v)[source]

Parse analysis_settings from dict or container.

classmethod parse_comparison_settings(v)[source]

Parse comparison_settings from dict or container.

validate_comparison_coverage()[source]

Validate that comparison_settings covers all analysis_settings.

Each analysis type in analysis_settings must have a corresponding entry in comparison_settings (can be empty {}).

classmethod from_yaml(path)[source]

Load comparison config from YAML file.

Parameters:

path (Path or str) – Path to comparison.yaml file

Returns:

Loaded and validated configuration

Return type:

ComparisonConfig

Raises:

FileNotFoundError – If the config file doesn’t exist
ValidationError – If the config is invalid

to_yaml(path)[source]

Save comparison config to YAML file.

Parameters:: path (Path or str) – Output path for comparison.yaml

get_condition(label)[source]

Get a condition by its label.

Parameters:: label (str) – The condition label to find
Returns:: The matching condition
Return type:: ConditionConfig
Raises:: KeyError – If no condition with that label exists

validate_config()[source]

Validate the comparison configuration.

Returns:: List of error messages (empty if valid)
Return type:: list[str]

generate_analysis_yaml(condition)[source]

Generate analysis.yaml content for a specific condition.

Parameters:: condition (ConditionConfig) – The condition to generate analysis.yaml for.
Returns:: YAML content for the analysis.yaml file.
Return type:: str

generate_analysis_yaml_for_all()[source]

Generate analysis.yaml content for all conditions.

Returns:: Dictionary mapping condition labels to analysis.yaml content.
Return type:: dict[str, str]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

polyzymd.compare.config.generate_comparison_template(name, eq_time='10ns')[source]

Generate a template comparison.yaml file.

Parameters:

name (str) – Project name
eq_time (str) – Default equilibration time

Returns:

YAML template content

Return type:

str

Settings

Analysis and comparison settings for the comparison workflow.

This module defines the concrete settings classes for each analysis type, registered via the AnalysisSettingsRegistry and ComparisonSettingsRegistry.

Analysis Settings (WHAT to analyze): - RMSFAnalysisSettings: RMSF calculation parameters - DistancesAnalysisSettings: Distance pair monitoring parameters - CatalyticTriadAnalysisSettings: Active site distance analysis - ContactsAnalysisSettings: Polymer-protein contact parameters

Comparison Settings (HOW to compare): - RMSFComparisonSettings: (no comparison-specific params) - DistancesComparisonSettings: (no comparison-specific params) - CatalyticTriadComparisonSettings: (no comparison-specific params) - ContactsComparisonSettings: FDR, effect size, top residues

All settings classes are auto-registered on module import.

class polyzymd.compare.settings.RMSFAnalysisSettings(*, selection='protein and name CA', reference_mode='centroid', reference_frame=None, reference_file=None)[source]

Bases: BaseAnalysisSettings

RMSF analysis settings.

selection

MDAnalysis selection string for RMSF calculation.

Type:: str

reference_mode

Reference structure mode: centroid, average, frame, or external.

Type:: str

reference_frame

Frame number if reference_mode is ‘frame’ (1-indexed).

Type:: int, optional

reference_file

Path to external PDB file if reference_mode is ‘external’.

Type:: str, optional

selection: str

reference_mode: str

reference_frame: int | None

reference_file: str | None

classmethod analysis_type()[source]

Return the analysis type identifier.

classmethod validate_reference_mode(v)[source]

Validate reference mode is one of the allowed values.

validate_reference_params()[source]

Validate reference_frame and reference_file for their modes.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.RMSFComparisonSettings[source]

Bases: BaseComparisonSettings

Comparison settings for RMSF analysis.

Currently empty — all RMSF comparison behavior uses defaults from BaseComparisonSettings. This class exists as an extension point: add fields here when RMSF-specific comparison parameters are needed (e.g., a per-residue significance threshold) without modifying the orchestrator or other comparison types.

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.DistancePairSettings(*, label, selection_a, selection_b, threshold=None, below_label=None, above_label=None)[source]

Bases: BaseAnalysisSettings

Configuration for a single distance pair.

label

Human-readable label for this pair.

Type:: str

selection_a

First atom/point selection.

Type:: str

selection_b

Second atom/point selection.

Type:: str

threshold

Per-pair distance threshold (Angstroms). If None, uses the global threshold from DistancesAnalysisSettings.

Type:: float, optional

below_label

Display label for the “below threshold” state (e.g. "Bound", "Closed"). When None, defaults to "Below {threshold}Å".

Type:: str, optional

above_label

Display label for the “above threshold” state (e.g. "Unbound", "Open"). When None, defaults to "Above {threshold}Å".

Type:: str, optional

label: str

selection_a: str

selection_b: str

threshold: float | None

below_label: str | None

above_label: str | None

classmethod analysis_type()[source]

Return the analysis type identifier.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.DistancesAnalysisSettings(*, threshold=3.5, pairs=<factory>, use_pbc=True, align_trajectory=True, alignment_selection='protein and name CA', alignment_mode='centroid', alignment_frame=None)[source]

Bases: BaseAnalysisSettings

Distance analysis settings.

threshold

Distance threshold for contact analysis (Angstroms).

Type:: float, optional

pairs

List of atom pairs to measure distances between.

Type:: list[DistancePairSettings]

use_pbc

Use PBC-aware minimum image distances. Default True.

Type:: bool

align_trajectory

Align trajectory before distance calculation. Default True. When enabled, removes rotational drift and COM motion that can add noise to inter-domain distance measurements.

Type:: bool

alignment_selection

MDAnalysis selection for trajectory alignment. Default: “protein and name CA”.

Type:: str

alignment_mode

Reference mode for alignment: “centroid”, “average”, or “frame”. Default: “centroid”.

Type:: str

alignment_frame

Reference frame (1-indexed) when alignment_mode=”frame”.

Type:: int, optional

threshold: float | None

pairs: list[DistancePairSettings]

use_pbc: bool

align_trajectory: bool

alignment_selection: str

alignment_mode: str

alignment_frame: int | None

classmethod analysis_type()[source]

Return the analysis type identifier.

classmethod validate_pairs(v)[source]

Ensure at least one pair is defined.

classmethod validate_alignment_mode(v)[source]

Validate alignment mode is one of the allowed values.

validate_alignment_frame_required()[source]

Ensure alignment_frame is provided when alignment_mode is ‘frame’.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

get_pair_selections()[source]

Get list of (selection_a, selection_b) tuples.

get_pair_labels()[source]

Get list of pair labels.

get_pair_thresholds()[source]

Get list of thresholds per pair, using global threshold as fallback.

Returns:: List of thresholds, one per pair. If a pair has no explicit threshold, the global threshold is used. If neither is set, None is returned.
Return type:: list[float | None]

get_alignment_config()[source]

Build an AlignmentConfig from these settings.

Returns:: Configuration for trajectory alignment, ready to pass to align_trajectory() or DistanceCalculator.
Return type:: AlignmentConfig

Notes

Import is done inside the method to avoid circular imports.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.DistancesComparisonSettings[source]

Bases: BaseComparisonSettings

Comparison settings for distance analysis.

Currently empty — all distance comparison behavior uses defaults from BaseComparisonSettings. This class exists as an extension point: add fields here when distance-specific comparison parameters are needed (e.g., per-pair significance thresholds) without modifying the orchestrator or other comparison types.

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.TriadPairSettings(*, label, selection_a, selection_b)[source]

Bases: BaseAnalysisSettings

Configuration for one distance pair in a catalytic triad/active site.

label

Human-readable label for this pair (e.g., “Asp133-His156”).

Type:: str

selection_a

First atom/point selection.

Type:: str

selection_b

Second atom/point selection.

Type:: str

label: str

selection_a: str

selection_b: str

classmethod analysis_type()[source]

Return the analysis type identifier.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.CatalyticTriadAnalysisSettings(*, name, pairs, threshold=3.5, description=None)[source]

Bases: BaseAnalysisSettings

Catalytic triad/active site analysis settings.

name

Name of the triad/active site (e.g., “LipA_catalytic_triad”).

Type:: str

pairs

Distance pairs to monitor.

Type:: list[TriadPairSettings]

threshold

Distance threshold for contact/H-bond analysis (Angstroms).

Type:: float

description

Description of the active site.

Type:: str, optional

name: str

pairs: list[TriadPairSettings]

threshold: float

description: str | None

classmethod analysis_type()[source]

Return the analysis type identifier.

classmethod validate_pairs(v)[source]

Ensure at least one pair is defined.

property n_pairs: int: Number of distance pairs.

get_pair_selections()[source]

Get list of (selection_a, selection_b) tuples.

get_pair_labels()[source]

Get list of pair labels.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.CatalyticTriadComparisonSettings[source]

Bases: BaseComparisonSettings

Comparison settings for catalytic triad analysis.

Currently empty — all triad comparison behavior uses defaults from BaseComparisonSettings. This class exists as an extension point: add fields here when triad-specific comparison parameters are needed (e.g., functional distance thresholds) without modifying the orchestrator or other comparison types.

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.BindingPreferenceFieldsMixin(*, surface_exposure_threshold=0.2, enzyme_pdb_for_sasa=None, include_default_aa_groups=True, protein_groups=None, protein_partitions=None, polymer_type_selections=None)[source]

Bases: BaseAnalysisSettings

Shared fields for experimental binding-preference-derived analyses.

Both ContactsAnalysisSettings and BindingFreeEnergyAnalysisSettings need identical fields for surface exposure, protein grouping, and polymer type selection. This mixin provides them once, keeping defaults in sync.

surface_exposure_threshold

Relative SASA threshold for surface exposure (0.0-1.0).

Type:: float

enzyme_pdb_for_sasa

Path to enzyme PDB for SASA calculation.

Type:: str, optional

include_default_aa_groups

Include default AA class groupings (aromatic, polar, etc.).

Type:: bool

protein_groups

Custom protein groups as {name: [resid1, resid2, …]}.

Type:: dict[str, list[int]], optional

protein_partitions

Custom partitions for system coverage comparison.

Type:: dict[str, list[str]], optional

polymer_type_selections

Custom polymer type selections as {name: “MDAnalysis selection”}.

Type:: dict[str, str], optional

surface_exposure_threshold: float

enzyme_pdb_for_sasa: str | None

include_default_aa_groups: bool

protein_groups: dict[str, list[int]] | None

protein_partitions: dict[str, list[str]] | None

polymer_type_selections: dict[str, str] | None

classmethod analysis_type()[source]

Return the analysis type identifier (override in subclass).

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.ContactsAnalysisSettings(*, surface_exposure_threshold=0.2, enzyme_pdb_for_sasa=None, include_default_aa_groups=True, protein_groups=None, protein_partitions=None, polymer_type_selections=None, polymer_selection='chainID C', protein_selection='protein', cutoff=4.5, polymer_types=None, grouping='aa_class', compute_residence_times=True, compute_binding_preference=False, enrichment_normalization='residue')[source]

Bases: BindingPreferenceFieldsMixin

Polymer-protein contact analysis settings.

Inherits binding preference fields (surface_exposure_threshold, enzyme_pdb_for_sasa, include_default_aa_groups, protein_groups, protein_partitions, polymer_type_selections) from BindingPreferenceFieldsMixin.

polymer_selection

MDAnalysis selection for polymer atoms.

Type:: str

protein_selection

MDAnalysis selection for protein atoms.

Type:: str

cutoff

Distance cutoff for contacts in Angstroms.

Type:: float

polymer_types

Filter contacts by polymer residue names.

Type:: list[str], optional

grouping

How to group protein residues: aa_class, secondary_structure, or none.

Type:: str

compute_residence_times

If True, compute residence time statistics.

Type:: bool

compute_binding_preference

If True, compute binding preference enrichment analysis.

Type:: bool

enrichment_normalization

DEPRECATED (kept for backward compatibility). Enrichment is now always normalized by protein surface availability. This field is ignored.

Type:: str

polymer_selection: str

protein_selection: str

cutoff: float

polymer_types: list[str] | None

grouping: str

compute_residence_times: bool

compute_binding_preference: bool

enrichment_normalization: str

classmethod analysis_type()[source]

Return the analysis type identifier.

classmethod validate_grouping(v)[source]

Validate grouping mode.

classmethod validate_enrichment_normalization(v)[source]

Validate enrichment normalization method.

validate_protein_partitions()[source]

Validate protein_partitions references and mutual exclusivity.

Validates: 1. All groups referenced in partitions exist in protein_groups 2. Groups within each partition don’t overlap (mutually exclusive)

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.ContactsComparisonSettings(*, fdr_alpha=0.05, min_effect_size=0.5, top_residues=10)[source]

Bases: BaseComparisonSettings

Comparison settings for polymer-protein contacts analysis.

fdr_alpha

False discovery rate alpha for Benjamini-Hochberg correction.

Type:: float

min_effect_size

Minimum Cohen’s d effect size to highlight in reports.

Type:: float

top_residues

Number of top residues (by effect size) to display in console.

Type:: int

fdr_alpha: float

min_effect_size: float

top_residues: int

classmethod analysis_type()[source]

Return the analysis type identifier.

classmethod validate_fdr_alpha(v)[source]

Validate FDR alpha is in valid range.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.ExposureAnalysisSettings(*, protein_selection='protein', polymer_selection='chainID C', exposure_threshold=0.2, transient_lower=0.2, transient_upper=0.8, min_event_length=1, probe_radius_nm=0.14, n_sphere_points=960, protein_chain='A', polymer_resnames=None)[source]

Bases: BaseAnalysisSettings

Experimental exposure dynamics settings (dynamic SASA-based chaperone analysis).

protein_selection

MDAnalysis selection for protein atoms (chain A by default).

Type:: str

polymer_selection

MDAnalysis selection for polymer atoms (chain C by default).

Type:: str

exposure_threshold

Relative SASA threshold for classifying a residue as exposed.

Type:: float

transient_lower

Lower bound of exposure fraction for “transient” classification.

Type:: float

transient_upper

Upper bound of exposure fraction for “transient” classification.

Type:: float

min_event_length

Minimum exposed-window length (frames) to count as an event.

Type:: int

probe_radius_nm

Probe radius for MDTraj shrake_rupley, in nm.

Type:: float

n_sphere_points

Number of sphere points for shrake_rupley.

Type:: int

protein_chain

Chain letter for protein (default “A”).

Type:: str

polymer_resnames

Subset of polymer monomer resnames to include. If None, all detected.

Type:: list[str], optional

protein_selection: str

polymer_selection: str

exposure_threshold: float

transient_lower: float

transient_upper: float

min_event_length: int

probe_radius_nm: float

n_sphere_points: int

protein_chain: str

polymer_resnames: list[str] | None

classmethod analysis_type()[source]

Return the analysis type identifier.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.ExposureComparisonSettings[source]

Bases: BaseComparisonSettings

Comparison settings for exposure dynamics analysis.

Currently empty — all exposure comparison behavior uses defaults from BaseComparisonSettings. This class exists as an extension point: add fields here when exposure-specific comparison parameters are needed (e.g., transient classification thresholds) without modifying the orchestrator or other comparison types.

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.BindingFreeEnergyAnalysisSettings(*, surface_exposure_threshold=0.2, enzyme_pdb_for_sasa=None, include_default_aa_groups=True, protein_groups=None, protein_partitions=None, polymer_type_selections=None, units='kT', compute_binding_preference=True)[source]

Bases: BindingPreferenceFieldsMixin

Experimental settings for binding free energy analysis via Boltzmann inversion.

Computes the selectivity free energy:

ΔG_sel = -k_B·T · ln(contact_share / expected_share)

where: - contact_share = fraction of polymer contacts directed at an AA group - expected_share = fraction of exposed surface belonging to that AA group - T = simulation temperature (from SimulationConfig)

This is a post-processing analysis that consumes binding preference results from the contacts analysis layer (no new per-frame computation is needed).

Inherits binding preference fields (surface_exposure_threshold, enzyme_pdb_for_sasa, include_default_aa_groups, protein_groups, protein_partitions, polymer_type_selections) from BindingPreferenceFieldsMixin.

units

Energy units for output. One of “kT” (dimensionless, in units of k_bT — the thermal energy), “kcal/mol”, or “kJ/mol”.

Type:: str

compute_binding_preference

Compute binding preference from contacts data when cached results are not found.

Type:: bool

units: str

compute_binding_preference: bool

classmethod validate_units(v)[source]

Validate energy units.

classmethod analysis_type()[source]

Return the analysis type identifier.

k_b()[source]

Return k_B in the selected energy units.

Returns:: Boltzmann constant in kcal/(mol·K) or kJ/(mol·K). When units=’kT’, returns 0.0 — callers should use kT=1.0 directly instead of k_b() * T.
Return type:: float

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

Returns:: Dictionary suitable for writing to analysis.yaml.
Return type:: dict

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.BindingFreeEnergyComparisonSettings(*, fdr_alpha=0.05)[source]

Bases: BaseComparisonSettings

Comparison settings for binding free energy analysis.

fdr_alpha

False discovery rate alpha for Benjamini-Hochberg correction of p-values across (polymer_type, AA_group) pairs.

Type:: float

fdr_alpha: float

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.PolymerAffinityScoreSettings(*, surface_exposure_threshold=0.2, enzyme_pdb_for_sasa=None, include_default_aa_groups=True, protein_groups=None, protein_partitions=None, polymer_type_selections=None, compute_binding_preference=True)[source]

Bases: BindingPreferenceFieldsMixin

Experimental settings for polymer affinity score analysis.

The polymer affinity score is a comparative metric that quantifies total polymer-protein interaction strength:

S = Σ_{p,g} N_{p,g} × ΔG_sel_{p,g} [kT]

where:: N = mean_contact_fraction × n_exposed_in_group ΔG_sel = -ln(contact_share / expected_share)

This is a post-processing analysis that consumes binding preference results from the contacts analysis layer — no new per-frame computation is needed. All scores are in kT (dimensionless); the temperature factor cancels in the Boltzmann inversion ratio.

Important

This metric assumes thermodynamic independence of contacts. The absolute values are NOT rigorous binding free energies. Only relative differences between polymer compositions are meaningful (comparative ranking).

Inherits binding preference fields (surface_exposure_threshold, enzyme_pdb_for_sasa, include_default_aa_groups, protein_groups, protein_partitions, polymer_type_selections) from BindingPreferenceFieldsMixin.

compute_binding_preference

Compute binding preference from contacts data when cached results are not found.

Type:: bool

compute_binding_preference: bool

classmethod analysis_type()[source]

Return the analysis type identifier.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

Returns:: Dictionary suitable for writing to analysis.yaml.
Return type:: dict

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.PolymerAffinityScoreComparisonSettings(*, fdr_alpha=0.05)[source]

Bases: BaseComparisonSettings

Comparison settings for polymer affinity score analysis.

fdr_alpha

False discovery rate alpha for Benjamini-Hochberg correction of pairwise p-values across conditions.

Type:: float

fdr_alpha: float

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.SecondaryStructureAnalysisSettings(*, chain_id='A')[source]

Bases: BaseAnalysisSettings

Secondary structure (DSSP) analysis settings.

chain_id

Chain letter for the protein to analyze (default “A”).

Type:: str

chain_id: str

classmethod analysis_type()[source]

Return the analysis type identifier.

to_analysis_yaml_dict()[source]

Convert to analysis.yaml-compatible dictionary.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.settings.SecondaryStructureComparisonSettings[source]

Bases: BaseComparisonSettings

Comparison settings for secondary structure analysis.

Currently empty — all secondary structure comparison behavior uses defaults from BaseComparisonSettings. This class exists as an extension point: add fields here when SS-specific comparison parameters are needed without modifying the orchestrator.

classmethod analysis_type()[source]

Return the analysis type identifier.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

polyzymd.compare.settings.get_all_analysis_types()[source]

Get all registered analysis types.

Returns:: Sorted list of registered analysis type names.
Return type:: list[str]

polyzymd.compare.settings.get_all_comparison_types()[source]

Get all registered comparison settings types.

Returns:: Sorted list of registered comparison type names.
Return type:: list[str]

Statistics

Statistical tests for comparing simulation conditions.

This module provides statistical functions for comparing analysis results across multiple conditions, including t-tests, ANOVA, and effect sizes.

All functions use SciPy for statistical calculations.

class polyzymd.compare.statistics.TTestResult(t_statistic, p_value)[source]

Bases: object

Result of a two-sample t-test.

t_statistic

The t-statistic

Type:: float

p_value

Two-tailed p-value

Type:: float

t_statistic: float

p_value: float

property significant: bool: Whether the result is significant at p < 0.05.

to_dict()[source]

Convert to dictionary.

__init__(t_statistic, p_value)

class polyzymd.compare.statistics.EffectSize(cohens_d, interpretation, direction)[source]

Bases: object

Cohen’s d effect size with interpretation.

cohens_d

The effect size (positive = group1 > group2)

Type:: float

interpretation

Categorical interpretation: “negligible”, “small”, “medium”, “large”

Type:: str

direction

For RMSF: “stabilizing” (d > 0, lower RMSF) or “destabilizing” (d < 0)

Type:: str

cohens_d: float

interpretation: str

direction: str

to_dict()[source]

Convert to dictionary.

__init__(cohens_d, interpretation, direction)

class polyzymd.compare.statistics.ANOVAResult(f_statistic, p_value)[source]

Bases: object

Result of one-way ANOVA.

f_statistic

The F-statistic

Type:: float

p_value

P-value for the test

Type:: float

f_statistic: float

p_value: float

property significant: bool: Whether the result is significant at p < 0.05.

to_dict()[source]

Convert to dictionary.

__init__(f_statistic, p_value)

polyzymd.compare.statistics.independent_ttest(group1, group2)[source]

Perform two-sample independent t-test.

Tests the null hypothesis that two independent samples have identical expected values.

Parameters:

group1 (array_like) – First group of values (e.g., control replicate means)
group2 (array_like) – Second group of values (e.g., treatment replicate means)

Returns:

Result containing t-statistic and p-value

Return type:

TTestResult

Examples

>>> control = [0.715, 0.693, 0.696]  # No polymer RMSF
>>> treatment = [0.517, 0.586]        # 100% SBMA RMSF
>>> result = independent_ttest(control, treatment)
>>> print(f"t = {result.t_statistic:.3f}, p = {result.p_value:.4f}")

polyzymd.compare.statistics.cohens_d(group1, group2, rmsf_mode=True)[source]

Compute Cohen’s d effect size.

Cohen’s d is the difference between means divided by the pooled standard deviation. A positive d means group1 has higher values.

For RMSF comparisons (rmsf_mode=True), direction is interpreted as: - d > 0 (control > treatment) = “stabilizing” (treatment reduces RMSF) - d < 0 (control < treatment) = “destabilizing” (treatment increases RMSF)

Parameters:

group1 (array_like) – First group (typically control)
group2 (array_like) – Second group (typically treatment)
rmsf_mode (bool, optional) – If True, interpret direction for RMSF (lower = better). Default is True.

Returns:

Effect size with interpretation

Return type:

EffectSize

Notes

Effect size interpretation (Cohen, 1988): - |d| < 0.2: negligible - 0.2 <= |d| < 0.5: small - 0.5 <= |d| < 0.8: medium - |d| >= 0.8: large

polyzymd.compare.statistics.one_way_anova(*groups)[source]

Perform one-way ANOVA across multiple groups.

Tests the null hypothesis that all groups have the same mean.

Parameters:: *groups (array_like) – Variable number of groups to compare
Returns:: Result containing F-statistic and p-value
Return type:: ANOVAResult

Examples

>>> no_poly = [0.715, 0.693, 0.696]
>>> sbma = [0.517, 0.586]
>>> egma = [0.558, 0.738, 0.496]
>>> result = one_way_anova(no_poly, sbma, egma)
>>> print(f"F = {result.f_statistic:.3f}, p = {result.p_value:.4f}")

polyzymd.compare.statistics.percent_change(control_mean, treatment_mean)[source]

Calculate percent change from control.

Parameters:

control_mean (float) – Mean value of control condition
treatment_mean (float) – Mean value of treatment condition

Returns:

Percent change: (treatment - control) / control * 100 Negative = reduction, Positive = increase

Return type:

float

Comparators

Contacts

Contacts comparator for comparing polymer-protein contacts across conditions.

This module provides the ContactsComparator class that orchestrates contacts analysis and statistical comparison across multiple conditions.

Key features: - Aggregate-level comparisons (coverage, mean contact fraction) - Effect size (Cohen’s d) for practical significance - ANOVA for 3+ conditions - Auto-exclusion of conditions without polymer (e.g., “No Polymer” controls)

The comparator inherits from BaseComparator and implements the Template Method pattern for DRY comparison logic. Since contacts has TWO primary metrics (coverage and mean_contact_fraction), some methods are customized.

Note

Per-residue pairwise comparisons have been removed. Contact data is mechanistic (explains WHY stability changes), not an observable. Per-residue contact-RMSF correlations are computed in polyzymd compare report.

class polyzymd.compare.comparators.contacts.ContactsComparator(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

Bases: BaseComparator[ContactsAnalysisSettings, dict[str, Any], ContactsConditionSummary, ContactsComparisonResult]

Compare polymer-protein contacts across multiple simulation conditions.

This class loads contacts analysis results for each condition (computing them if necessary), then performs statistical comparisons including: - Aggregate-level comparisons (coverage, mean contact fraction) - ANOVA for 3+ conditions - Effect sizes (Cohen’s d) for practical significance

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (ContactsAnalysisSettings) – Settings defining what contacts to analyze (selections, cutoff).
comparison_settings (ContactsComparisonSettings, optional) – Settings for how to compare (FDR alpha, effect sizes). Defaults to ContactsComparisonSettings() if not provided.
equilibration (str, optional) – Equilibration time override (e.g., “10ns”). If None, uses config.defaults.equilibration_time.

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> analysis_settings = config.analysis_settings.get("contacts")
>>> comparison_settings = config.comparison_settings.get("contacts")
>>> comparator = ContactsComparator(config, analysis_settings, comparison_settings)
>>> result = comparator.compare()
>>> print(result.ranking_by_coverage)
["100% SBMA", "50/50 Mix", "100% EGMA"]

Notes

Higher contact fraction is considered “better” (more polymer-protein interaction)
Conditions without polymer atoms are automatically excluded
This is a MEAN_BASED metric (contact fractions are averages)

comparison_type: ClassVar[str] = 'contacts'

__init__(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Contact fraction is a mean-based metric.

Contact fraction is the average fraction of frames where a residue is in contact with the polymer. This is an average over frames, so the mean converges regardless of autocorrelation. However, we need to correct uncertainty using N_eff (effective sample size).

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

compare(recompute=False)[source]

Run comparison across all conditions.

Overrides base to handle contacts-specific logic: - Dual metrics (coverage and mean_contact_fraction) - Auto-exclusion of no-polymer conditions - Custom result building

Parameters:: recompute (bool, optional) – If True, force recompute even if cached results exist.
Returns:: Complete comparison results with statistics and rankings.
Return type:: ContactsComparisonResult

RMSF

RMSF comparator for comparing flexibility across conditions.

This module provides the RMSFComparator class that orchestrates RMSF analysis and statistical comparison across multiple conditions.

The comparator inherits from BaseComparator and implements the Template Method pattern for DRY comparison logic.

class polyzymd.compare.comparators.rmsf.RMSFComparator(config, analysis_settings, equilibration=None, selection_override=None, reference_mode_override=None, reference_frame_override=None, reference_file_override=None)[source]

Bases: BaseComparator[RMSFAnalysisSettings, dict[str, Any], RMSFConditionSummary, RMSFComparisonResult]

Compare RMSF across multiple simulation conditions.

This class loads RMSF results for each condition (computing them if necessary), then performs statistical comparisons including t-tests, ANOVA, and effect size calculations.

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (RMSFAnalysisSettings) – RMSF analysis settings (from config.analysis_settings.get(“rmsf”)).
equilibration (str, optional) – Equilibration time override (e.g., “10ns”). If None, uses config.defaults.equilibration_time.
selection_override (str, optional) – Override for atom selection (requires –override flag on CLI).
reference_mode_override (str, optional) – Override for reference mode (requires –override flag on CLI).
reference_frame_override (int, optional) – Override for reference frame (requires –override flag on CLI).
reference_file_override (str, optional) – Override for external reference PDB file path (requires –override flag on CLI). Used when reference_mode is “external”.

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> rmsf_settings = config.analysis_settings.get("rmsf")
>>> comparator = RMSFComparator(config, rmsf_settings, equilibration="10ns")
>>> result = comparator.compare()
>>> print(result.ranking)
["100% SBMA", "100% EGMA", "No Polymer", "50/50 Mix"]

comparison_type: ClassVar[str] = 'rmsf'

__init__(config, analysis_settings, equilibration=None, selection_override=None, reference_mode_override=None, reference_frame_override=None, reference_file_override=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

RMSF is a variance-based metric.

RMSF measures root-mean-square fluctuations, which are inherently variance-based. Correlated frames lead to biased variance estimates, so independent subsampling (2τ separation) is required for accurate uncertainty quantification.

Returns:: MetricType.VARIANCE_BASED
Return type:: MetricType

Triad

Catalytic triad comparator for comparing active site geometry across conditions.

This module provides the TriadComparator class that orchestrates catalytic triad analysis and statistical comparison across multiple conditions.

The key metric is “simultaneous contact fraction” - the percentage of frames where ALL pairs in the triad are below the contact threshold simultaneously. Higher values indicate better triad integrity and potentially better catalytic competence.

The comparator inherits from BaseComparator and implements the Template Method pattern for DRY comparison logic.

class polyzymd.compare.comparators.triad.TriadComparator(config, analysis_settings, equilibration=None)[source]

Bases: BaseComparator[CatalyticTriadAnalysisSettings, dict[str, Any], TriadConditionSummary, TriadComparisonResult]

Compare catalytic triad geometry across multiple simulation conditions.

This class loads triad analysis results for each condition (computing them if necessary), then performs statistical comparisons including t-tests, ANOVA, and effect size calculations on the simultaneous contact fraction.

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (CatalyticTriadAnalysisSettings) – Catalytic triad analysis settings (from config.analysis_settings.get(“catalytic_triad”)).
equilibration (str, optional) – Equilibration time override (e.g., “10ns”). If None, uses config.defaults.equilibration_time.

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> triad_settings = config.analysis_settings.get("catalytic_triad")
>>> comparator = TriadComparator(config, triad_settings, equilibration="10ns")
>>> result = comparator.compare()
>>> print(result.ranking)
["100% SBMA", "100% EGMA", "No Polymer", "50/50 Mix"]

Notes

Higher simultaneous contact fraction is better (triad is more intact).

comparison_type: ClassVar[str] = 'triad'

__init__(config, analysis_settings, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Catalytic triad contact fraction is a mean-based metric.

The simultaneous contact fraction is an average over frames (fraction of frames where all pairs are in contact). The mean converges regardless of autocorrelation, but we need to correct the uncertainty using N_eff (effective sample size = N/g where g is the statistical inefficiency).

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

Distances

Distances comparator for comparing distance metrics across conditions.

This module provides the DistancesComparator class that orchestrates distance analysis and statistical comparison across multiple conditions.

The primary ranking metric is mean distance (lower = closer interactions). Secondary metric is fraction below threshold (if threshold specified).

The comparator inherits from BaseComparator and implements the Template Method pattern for DRY comparison logic.

class polyzymd.compare.comparators.distances.DistancesComparator(config, analysis_settings, equilibration=None)[source]

Bases: BaseComparator[DistancesAnalysisSettings, dict[str, Any], DistanceConditionSummary, DistanceComparisonResult]

Compare distance metrics across multiple simulation conditions.

This class loads distance analysis results for each condition (computing them if necessary), then performs statistical comparisons including t-tests, ANOVA, and effect size calculations on both mean distance and fraction below threshold.

Each distance pair is compared independently - there is no cross-pair averaging since different pairs measure fundamentally different physical quantities (e.g., H-bond distances vs lid-opening distances).

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (DistancesAnalysisSettings) – Distance analysis settings (from config.analysis_settings.get(“distances”)).
equilibration (str, optional) – Equilibration time override (e.g., “10ns”). If None, uses config.defaults.equilibration_time.

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> dist_settings = config.analysis_settings.get("distances")
>>> comparator = DistancesComparator(config, dist_settings, equilibration="10ns")
>>> result = comparator.compare()
>>> print(result.ranking_by_pair["Catalytic H-bond"])  # Per-pair ranking
["100% SBMA", "No Polymer", "50/50 Mix", "100% EGMA"]

Notes

Lower mean distance is better (closer interactions). Higher fraction below threshold is better (more time in contact).

comparison_type: ClassVar[str] = 'distances'

__init__(config, analysis_settings, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Distance analysis is a mean-based metric.

The mean distance is an average over frames. The mean converges regardless of autocorrelation, but we need to correct the uncertainty using N_eff (effective sample size = N/g where g is the statistical inefficiency).

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

compare(recompute=False)[source]

Run the comparison across all conditions.

Each distance pair is compared independently - rankings and statistics are computed per-pair since averaging unrelated distances (e.g., H-bond + lid-opening) is not semantically meaningful.

Parameters:: recompute (bool) – Force recompute even if cached results exist.
Returns:: Complete comparison result with per-pair rankings.
Return type:: DistanceComparisonResult

Exposure Dynamics

Exposure dynamics comparator for chaperone-like polymer-protein interaction analysis.

This module provides ExposureDynamicsComparator, which orchestrates: 1. SASA computation (MDTraj shrake_rupley, protein-only) 2. Exposure dynamics analysis (classify residues, detect chaperone events) 3. Chaperone enrichment (dual residue/atom normalization) 4. Statistical comparison of chaperone fraction across conditions

Design follows the ContactsComparator pattern: - compare() is fully overridden (custom multi-metric flow) - _load_or_compute() handles caching at replicate level - Condition summaries aggregate per-replicate ExposureDynamicsResults

Registration: @ComparatorRegistry.register("exposure")

class polyzymd.compare.comparators.exposure.ExposureDynamicsComparator(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

Bases: BaseComparator[ExposureAnalysisSettings, dict[str, Any], ExposureConditionSummary, ExposureComparisonResult]

Compare chaperone-like polymer activity across simulation conditions.

Combines per-frame SASA data with polymer-protein contact data to:

Classify each protein residue as stably exposed, stably buried, or transiently exposed.
Detect “chaperone events” (buried → exposed → polymer contact → re-buried) and unassisted refolding events.
Compute dynamic chaperone enrichment per (polymer_type, aa_group) pair with dual residue/atom normalization.
Statistically compare chaperone_fraction across conditions.

Parameters:

config (ComparisonConfig) – Comparison configuration defining conditions.
analysis_settings (ExposureAnalysisSettings) – Settings defining SASA and exposure parameters.
comparison_settings (ExposureComparisonSettings, optional) – Settings for statistical comparison. Defaults to ExposureComparisonSettings() if not provided.
equilibration (str, optional) – Equilibration time override. If None, uses config.defaults.equilibration_time.

Notes

This is a MEAN_BASED metric (chaperone fraction is an average over frames).
Conditions without polymer (no chaperone events possible) are excluded.
Contacts must be pre-computed (contacts_rep{n}.json must exist).
SASA is computed on demand and cached under analysis_dir/sasa/.

comparison_type: ClassVar[str] = 'exposure'

__init__(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Chaperone fraction is a mean-based metric.

Chaperone fraction is the fraction of exposed windows that coincide with polymer contact — an average over discrete events. The mean converges regardless of autocorrelation; uncertainty is corrected using N_eff.

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

compare(recompute=False)[source]

Run exposure dynamics comparison across all conditions.

Parameters:: recompute (bool, optional) – If True, force recompute even if cached results exist.
Returns:: Complete comparison with statistics and rankings.
Return type:: ExposureComparisonResult

Binding Free Energy

Binding free energy comparator via Boltzmann inversion of binding preference.

This module implements BindingFreeEnergyComparator, which converts the existing binding preference (enrichment) data into a selectivity free energy ΔG_sel in real units (kT, kcal/mol, or kJ/mol).

Physics

In the NPT ensemble the correct thermodynamic potential is the Gibbs free energy G. The polymer distributes its contacts across protein surface groups. Both the observed contact distribution (contact_share) and the null reference distribution (expected_share, proportional to each group’s solvent-exposed surface area) are proper probability distributions that sum to 1 over the partition. Boltzmann inversion of their ratio gives the selectivity free energy:

ΔG_sel(j) = -k_B·T · ln(contact_share_j / expected_share_j)

Because both distributions are normalized over the same partition, there is no arbitrary constant — ΔG_sel(j) is fully determined by the data.

Because contact_share / expected_share = enrichment + 1, per replicate:

ΔG_sel,rep = -k_B·T · ln(enrichment_rep + 1)

This is the exact Boltzmann-inverted version of the dimensionless enrichment score.

Sign convention:: ΔG_sel < 0 → preferential contact (observed > surface-availability reference) ΔG_sel > 0 → contact avoidance (observed < surface-availability reference) ΔG_sel = 0 → contacts match the surface-availability reference exactly

Differences between groups (ΔG_sel(i) - ΔG_sel(j)) give the relative selectivity. Differences between conditions (ΔG_sel,B(j) - ΔG_sel,A(j)) give a true ΔΔG.

Temperature handling

ΔG_sel computed at temperature T is not comparable to ΔG_sel at T’ (in physical units). Pairwise statistics are suppressed between conditions at different simulation temperatures.

Design

Consumes cached binding preference files produced by ContactsComparator / binding_preference.py. When cached data is missing, computes binding preference on-demand from per-replicate contacts_rep{N}.json files (following the same load-or-compute contract as every other comparator).
Inherits BaseComparator but overrides compare() (like ContactsComparator) because the result type (BindingFreeEnergyResult) does not conform to BaseComparisonResult.

class polyzymd.compare.comparators.binding_free_energy.BindingFreeEnergyComparator(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

Bases: BaseComparator[BindingFreeEnergyAnalysisSettings, dict[str, Any], FreeEnergyConditionSummary, BindingFreeEnergyResult]

Compare selectivity free energy (ΔG_sel) across simulation conditions.

Consumes cached binding preference results (produced by the contacts analysis layer) and converts them to selectivity free energies via Boltzmann inversion:

ΔG_sel = -k_B·T · ln(contact_share / expected_share)

Statistical comparisons are only computed between conditions that share the same simulation temperature. Cross-temperature pairs are flagged and their statistics suppressed.

Parameters:

config (ComparisonConfig) – Comparison configuration.
analysis_settings (BindingFreeEnergyAnalysisSettings) – Units, surface-exposure threshold, custom partitions.
comparison_settings (BindingFreeEnergyComparisonSettings, optional) – FDR alpha. Defaults to BindingFreeEnergyComparisonSettings().
equilibration (str, optional) – Equilibration time override.

Examples

>>> config = ComparisonConfig.from_yaml("comparison.yaml")
>>> settings = BindingFreeEnergyAnalysisSettings(units="kcal/mol")
>>> comparator = BindingFreeEnergyComparator(config, settings)
>>> result = comparator.compare()
>>> print(result.units)
kcal/mol

Notes

This is a MEAN_BASED metric (contact fractions are averages over frames, not fluctuation-based quantities).

comparison_type: ClassVar[str] = 'binding_free_energy'

__init__(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Contact share is a mean-based metric.

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

compare(recompute=False)[source]

Run binding free energy comparison across all conditions.

Parameters:: recompute (bool, optional) – Ignored (binding free energy is always recomputed from cached binding preference data; it is fast and stateless).
Returns:: Complete ΔG_sel comparison result.
Return type:: BindingFreeEnergyResult

Polymer Affinity Score

Polymer affinity score comparator.

This module implements PolymerAffinityScoreComparator, which quantifies the total strength of polymer-protein interactions by summing per-contact free energy contributions weighted by the number of simultaneous contacts.

Physics

For each (polymer_type, protein_group) pair:

S_{p,g} = N_{p,g} × ΔG_sel(p,g)

where:: N_{p,g} = mean_contact_fraction × n_exposed_in_group ΔG_sel(p,g) = -ln(contact_share / expected_share) [kT]

Because contact_share / expected_share = enrichment + 1:

ΔG_sel,rep = -ln(enrichment_rep + 1)

The total affinity score for a polymer type is:

S_p = Σ_g S_{p,g}

The total affinity score for a condition is:

S = Σ_p S_p

Independence assumption

This formulation assumes contacts are thermodynamically independent — each contact contributes the same free energy regardless of what other contacts exist simultaneously. The absolute values are NOT rigorous binding free energies. Only the relative differences between polymer compositions are meaningful as a comparative scoring metric.

Sign convention

S < 0 → net favorable polymer-protein interaction S > 0 → net unfavorable (avoidance dominates) S = 0 → contacts match the surface-availability reference

Temperature handling

All scores are in kT (dimensionless); the temperature factor cancels in the Boltzmann inversion ratio. Pairwise statistics are suppressed between conditions at different simulation temperatures because N changes.

Design

Consumes cached binding preference files produced by the contacts analysis layer. When cached data is missing, computes binding preference on-demand from per-replicate contacts_rep{N}.json files.
Inherits BaseComparator but overrides compare() (like the BFE comparator) because the result type does not conform to BaseComparisonResult.
Uses AggregatedBindingPreferenceEntry objects (from bp_result.entries) for the group-level data that includes mean_contact_fraction.

class polyzymd.compare.comparators.polymer_affinity.PolymerAffinityScoreComparator(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

Bases: BaseComparator[PolymerAffinityScoreSettings, dict[str, Any], AffinityScoreConditionSummary, PolymerAffinityScoreResult]

Compare polymer affinity scores across simulation conditions.

Computes a composite interaction score for each (polymer_type, protein_group) pair by multiplying the mean number of simultaneous contacts by the per-contact selectivity free energy:

S = N × ΔG_sel [kT]

The total score is summed across all polymer types and protein groups. More negative = stronger net polymer-protein affinity.

Statistical comparisons use per-replicate total scores and are only computed between conditions at the same simulation temperature.

Parameters:

config (ComparisonConfig) – Comparison configuration.
analysis_settings (PolymerAffinityScoreSettings) – Surface-exposure threshold, protein groups, etc.
comparison_settings (PolymerAffinityScoreComparisonSettings, optional) – FDR alpha. Defaults to PolymerAffinityScoreComparisonSettings().
equilibration (str, optional) – Equilibration time override.

Notes

This is a MEAN_BASED metric (contact fractions are averages over frames, not fluctuation-based quantities).

comparison_type: ClassVar[str] = 'polymer_affinity'

__init__(config, analysis_settings, comparison_settings=None, equilibration=None)[source]

classmethod comparison_type_name()[source]

Return the comparison type identifier.

property metric_type: MetricType

Contact fractions and shares are mean-based metrics.

Returns:: MetricType.MEAN_BASED
Return type:: MetricType

compare(recompute=False)[source]

Run polymer affinity score comparison across all conditions.

Parameters:: recompute (bool, optional) – Ignored (affinity scores are always recomputed from cached binding preference data; the computation is fast and stateless).
Returns:: Complete polymer affinity score comparison result.
Return type:: PolymerAffinityScoreResult

Results

Common result modules live under polyzymd.compare.results.

Stable result families include:

polyzymd.compare.results.rmsf
polyzymd.compare.results.triad
polyzymd.compare.results.contacts
polyzymd.compare.results.distances
polyzymd.compare.results.secondary_structure

Result models for binding free energy comparison analysis.

Physics background

In the NPT ensemble (constant pressure, as used in all polyzymd simulations) the correct thermodynamic potential is the Gibbs free energy G.

The quantity computed here is a selectivity free energy (ΔG_sel) that measures how much more (or less) favorable it is for a polymer to contact a given group of protein residues compared to what would be expected if the polymer contacted each exposed surface residue in proportion to that residue group’s share of the total solvent-exposed protein surface.

Concretely: if aromatic residues make up 10% of the solvent-exposed surface but receive 20% of the polymer’s contacts, the polymer preferentially contacts aromatic residues. The reference (expected) distribution is simply proportional to surface availability — not any property of the polymer itself.

ΔG_sel(j) = -k_B·T · ln(contact_share_j / expected_share_j)

where:

contact_share_j = (contact frames involving residues in group j) /: (total contact frames across all protein residues) — the observed fraction of polymer contacts directed at group j
expected_share_j = (number of solvent-exposed residues in group j) /: (total number of solvent-exposed protein residues) — the fraction of the protein surface belonging to group j; this is the reference assuming contacts are distributed purely by surface area

k_B = Boltzmann constant (0.0019872041 kcal mol⁻¹ K⁻¹) T = simulation temperature in Kelvin

Because both distributions are normalized over the same partition (they sum to 1 over all groups), there is no arbitrary additive constant — ΔG_sel is fully determined by the data.

When units=’kT’ (default), the formula simplifies to:

ΔG_sel(j) / k_BT = -ln(contact_share_j / expected_share_j)

yielding a dimensionless value directly comparable to the thermal energy scale. A value of -1.0 means the binding preference is exactly 1 k_bT favorable relative to the surface-availability reference.

Note: contact_share / expected_share = enrichment_ratio = enrichment + 1 (where enrichment is the existing dimensionless enrichment score from binding preference analysis). So ΔG_sel = -kT·ln(enrichment + 1), and the two representations are mathematically equivalent; ΔG_sel simply puts the enrichment score on a physically meaningful energy scale.

Sign convention:: ΔG_sel < 0 → preferential contact (observed > surface-availability reference) ΔG_sel > 0 → contact avoidance (observed < surface-availability reference) ΔG_sel = 0 → contacts match the surface-availability reference exactly

Differences between conditions (ΔG_sel,B(j) − ΔG_sel,A(j)) give a true ΔΔG, stored in FreeEnergyPairwiseEntry.delta_delta_G.

Uncertainty propagation

When multiple independent replicates are available, two uncertainty estimates are reported:

Between-replicate SEM on ΔG_sel (primary, used for pairwise statistics): ΔG_sel is computed independently for each replicate, and the SEM is taken directly across those values. This is the most statistically sound approach for independent replicates and is the quantity used in t-tests.
Delta-method propagation (analytical approximation, stored for reference): For the mean contact_share and its SEM, uncertainty is propagated through the logarithm using first-order error propagation (Taylor 1997, ch. 3; Bevington & Robinson 2003, ch. 3):

σ(ΔG_sel) ≈ k_B·T · √[(σ_cs / cs)² + (σ_es / es)²] (or simply √[…] when units=’kT’)

where σ_cs = SEM of contact_share across replicates, and σ_es ≈ 0 because expected_share is computed from a single static PDB structure (no replicate variance). This simplifies to σ(ΔG_sel) ≈ k_B·T · (σ_cs / cs) (or σ_cs / cs when units=’kT’).

References: - Taylor, J. R. (1997). An Introduction to Error Analysis, 2nd ed.

University Science Books. (Ch. 3: Error propagation for functions of one or more variables)
- Bevington, P. R. & Robinson, D. K. (2003). Data Reduction and Error Analysis for the Physical Sciences, 3rd ed. McGraw-Hill. (Ch. 3)
- Wikipedia: Delta method, https://en.wikipedia.org/wiki/Delta_method

Temperature handling:: When units=’kT’, ΔG_sel = -ln(ratio) is temperature-independent (the same ratio at any temperature gives the same dimensionless value). However, the underlying contact probabilities ARE temperature-dependent, so cross- temperature comparisons still require caution. When units=’kcal/mol’ or ‘kJ/mol’, ΔG_sel computed at temperature T is NOT directly comparable to ΔG_sel at temperature T’. Pairwise statistical comparisons are only computed between conditions sharing the same simulation temperature.

class polyzymd.compare.results.binding_free_energy.FreeEnergyEntry(*, polymer_type, protein_group, partition_name='aa_class', contact_share, expected_share, enrichment_ratio, delta_G=None, delta_G_uncertainty=None, delta_G_per_replicate=<factory>, units='kT', temperature_K, n_replicates=0, n_exposed_in_group=0)[source]

Bases: BaseModel

Free energy analysis for one (polymer_type, protein_group) pair in one condition.

Stores both the ΔG_sel value and the raw probability quantities used to compute it, enabling reproducibility and downstream verification.

polymer_type

Polymer residue type (e.g., “SBM”, “EGM”).

Type:: str

protein_group

Protein amino acid group label (e.g., “aromatic”, “charged_positive”).

Type:: str

partition_name

Name of the partition this group belongs to (e.g., “aa_class”).

Type:: str

contact_share

Observed fraction of polymer contacts directed at this group. This is P_obs in ΔG_sel = -kT·ln(P_obs / P_ref).

Type:: float

expected_share

Surface-availability-weighted reference fraction. This is P_ref in ΔG_sel = -kT·ln(P_obs / P_ref).

Type:: float

enrichment_ratio

contact_share / expected_share (= enrichment + 1). Stored for traceability; ΔG_sel = -kT·ln(enrichment_ratio).

Type:: float

delta_G

ΔG_sel in the configured units. None when contact_share = 0 or expected_share = 0 (log undefined or reference missing).

Type:: float | None

delta_G_uncertainty

σ(ΔG_sel) from delta-method error propagation. None if delta_G is None or if SEM data is unavailable (single replicate).

Type:: float | None

delta_G_per_replicate

Per-replicate ΔG_sel values used for cross-condition statistics.

Type:: list[float]

units

Energy units (“kT”, “kcal/mol”, or “kJ/mol”).

Type:: str

temperature_K

Simulation temperature in Kelvin (used as kT denominator).

Type:: float

n_replicates

Number of replicates with valid data for this entry.

Type:: int

n_exposed_in_group

Number of surface-exposed residues in this group (used for expected_share).

Type:: int

polymer_type: str

protein_group: str

partition_name: str

contact_share: float

expected_share: float

enrichment_ratio: float

delta_G: float | None

delta_G_uncertainty: float | None

delta_G_per_replicate: list[float]

units: str

temperature_K: float

n_replicates: int

n_exposed_in_group: int

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.binding_free_energy.FreeEnergyConditionSummary(*, label, config_path, temperature_K, n_replicates, units='kT', entries=<factory>, polymer_types=<factory>, protein_groups=<factory>)[source]

Bases: BaseModel

Free energy summary for one simulation condition.

Aggregates FreeEnergyEntry objects across all (polymer_type, protein_group) pairs for a single condition, together with condition metadata.

label

Display name for this condition.

Type:: str

config_path

Path to the SimulationConfig YAML used.

Type:: str

temperature_K

Simulation temperature in Kelvin.

Type:: float

n_replicates

Number of replicates in this condition.

Type:: int

units

Energy units (“kT”, “kcal/mol”, or “kJ/mol”).

Type:: str

entries

All (polymer_type, protein_group) ΔG_sel entries.

Type:: list[FreeEnergyEntry]

polymer_types

Polymer residue types present.

Type:: list[str]

protein_groups

Protein group labels analyzed.

Type:: list[str]

label: str

config_path: str

temperature_K: float

n_replicates: int

units: str

entries: list[FreeEnergyEntry]

polymer_types: list[str]

protein_groups: list[str]

property primary_metric_value: float: Mean ΔG_sel across all valid entries (for BaseConditionSummary compatibility).

property primary_metric_sem: float: Mean σ(ΔG_sel) across all valid entries.

get_entry(polymer_type, protein_group, partition_name=None)[source]

Get the FreeEnergyEntry for a (polymer_type, protein_group) pair.

Parameters:

polymer_type (str) – Polymer type.
protein_group (str) – AA group label.
partition_name (str or None, optional) – If given, further restrict to entries belonging to this partition. Necessary when the same protein_group label appears in multiple partitions (e.g., “rest_of_protein” in several user-defined partitions).

Return type:

FreeEnergyEntry or None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.binding_free_energy.FreeEnergyPairwiseEntry(*, polymer_type, protein_group, condition_a, condition_b, temperature_a_K, temperature_b_K, cross_temperature=False, delta_G_a=None, delta_G_b=None, delta_delta_G=None, t_statistic=None, p_value=None)[source]

Bases: BaseModel

Pairwise comparison between two conditions for one (polymer, group) pair.

Each condition has a per-group selectivity free energy ΔG_sel. The difference ΔΔG = ΔG_sel,B − ΔG_sel,A is a true double-delta quantity.

Statistics are only computed when both conditions share the same simulation temperature. If temperatures differ, all stat fields are None and the cross_temperature flag is set to True.

polymer_type

Polymer residue type.

Type:: str

protein_group

Protein group label.

Type:: str

condition_a

Label of the first condition.

Type:: str

condition_b

Label of the second condition.

Type:: str

temperature_a_K

Temperature of condition A in Kelvin.

Type:: float

temperature_b_K

Temperature of condition B in Kelvin.

Type:: float

cross_temperature

True when temperatures differ — statistics are suppressed.

Type:: bool

delta_G_a

ΔG_sel for condition A.

Type:: float | None

delta_G_b

ΔG_sel for condition B.

Type:: float | None

delta_delta_G

ΔΔG = ΔG_sel,B − ΔG_sel,A. Positive → B has less favorable selectivity.

Type:: float | None

t_statistic

T-test statistic (None for cross-temperature pairs).

Type:: float | None

p_value

Two-tailed p-value (None for cross-temperature pairs).

Type:: float | None

polymer_type: str

protein_group: str

condition_a: str

condition_b: str

temperature_a_K: float

temperature_b_K: float

cross_temperature: bool

delta_G_a: float | None

delta_G_b: float | None

delta_delta_G: float | None

t_statistic: float | None

p_value: float | None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.binding_free_energy.BindingFreeEnergyResult(*, name, units='kT', formula='ΔG_sel = -ln(contact_share / expected_share) [units: k_bT]', mixed_temperatures=False, temperature_groups=<factory>, conditions=<factory>, pairwise_comparisons=<factory>, polymer_types=<factory>, protein_groups=<factory>, surface_exposure_threshold=None, equilibration_time='', created_at=<factory>, polyzymd_version=<factory>)[source]

Bases: BaseModel

Complete binding free energy comparison result.

This is the main output from BindingFreeEnergyComparator.compare().

Physics summary

Formula: ΔG_sel = -k_B·T · ln(contact_share / expected_share)

Uncertainty: σ(ΔG_sel) = k_B·T · √[(σ_cs/cs)² + (σ_es/es)²]

Temperature note: pairwise statistics are suppressed between conditions at different temperatures. The mixed_temperatures flag indicates this occurred. Each condition’s temperature is stored in its summary.

name

Name of the comparison project.

Type:: str

units

Energy units (“kT”, “kcal/mol”, or “kJ/mol”).

Type:: str

formula

Human-readable formula string (for documentation/output).

Type:: str

mixed_temperatures

True if conditions span more than one simulation temperature.

Type:: bool

temperature_groups

Mapping of temperature (K) to condition labels at that temperature.

Type:: dict[float, list[str]]

conditions

Summary for each condition.

Type:: list[FreeEnergyConditionSummary]

pairwise_comparisons

All pairwise comparisons (cross-T pairs have stats suppressed).

Type:: list[FreeEnergyPairwiseEntry]

polymer_types

All polymer types found.

Type:: list[str]

protein_groups

All protein groups analyzed.

Type:: list[str]

surface_exposure_threshold

SASA threshold used (from binding preference settings).

Type:: float | None

equilibration_time

Equilibration time used.

Type:: str

created_at

When the analysis was run.

Type:: datetime

polyzymd_version

Version of polyzymd used.

Type:: str

name: str

units: str

formula: str

mixed_temperatures: bool

temperature_groups: dict[str, list[str]]

conditions: list[FreeEnergyConditionSummary]

pairwise_comparisons: list[FreeEnergyPairwiseEntry]

polymer_types: list[str]

protein_groups: list[str]

surface_exposure_threshold: float | None

equilibration_time: str

created_at: datetime

polyzymd_version: str

save(path)[source]

Save result to JSON file.

Parameters:: path (Path or str) – Output path.
Returns:: Path to the saved file.
Return type:: Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:: path (Path or str) – Path to JSON file.
Returns:: Loaded result.
Return type:: BindingFreeEnergyResult

get_condition(label)[source]

Get a condition summary by label.

Parameters:: label (str) – Condition label.
Return type:: FreeEnergyConditionSummary
Raises:: KeyError – If not found.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Exposure dynamics condition summary and comparison result models.

These classes inherit from the base classes in compare/core/ and add exposure-dynamics-specific fields for chaperone event analysis.

class polyzymd.compare.results.exposure.ExposureConditionSummary(*, label, config_path, n_replicates, replicate_values, mean_transient_fraction, sem_transient_fraction, mean_chaperone_fraction, sem_chaperone_fraction, mean_n_transient, mean_total_chaperone_events=0.0, mean_total_unassisted_events=0.0, enrichment_by_polymer_type=<factory>, polymer_types=<factory>, aa_groups=<factory>)[source]

Bases: BaseConditionSummary

Summary statistics for one condition in an exposure dynamics comparison.

label

Display name for this condition.

Type:: str

config_path

Path to the simulation config file.

Type:: str

n_replicates

Number of replicates included.

Type:: int

replicate_values

Per-replicate mean chaperone fraction across transient residues.

Type:: list[float]

mean_transient_fraction

Mean fraction of protein residues that are transiently exposed, averaged across replicates.

Type:: float

sem_transient_fraction

Standard error of mean_transient_fraction.

Type:: float

mean_chaperone_fraction

Mean chaperone fraction (chaperone events / total exposed windows) across transient residues and replicates.

Type:: float

sem_chaperone_fraction

Standard error of mean_chaperone_fraction.

Type:: float

mean_n_transient

Mean number of transient residues across replicates.

Type:: float

mean_total_chaperone_events

Mean total chaperone event count across replicates.

Type:: float

mean_total_unassisted_events

Mean total unassisted event count across replicates.

Type:: float

enrichment_by_polymer_type

Nested dict: polymer_type → aa_group → mean enrichment_residue.

Type:: dict[str, dict[str, float]]

polymer_types

Polymer types present in this condition.

Type:: list[str]

aa_groups

Amino-acid groups present in this condition.

Type:: list[str]

mean_transient_fraction: float

sem_transient_fraction: float

mean_chaperone_fraction: float

sem_chaperone_fraction: float

mean_n_transient: float

mean_total_chaperone_events: float

mean_total_unassisted_events: float

enrichment_by_polymer_type: dict[str, dict[str, float]]

polymer_types: list[str]

aa_groups: list[str]

property primary_metric_value: float: Return mean chaperone fraction as the primary metric.

property primary_metric_sem: float: Return SEM of chaperone fraction.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.exposure.ExposureComparisonResult(*, metric='chaperone_fraction', name, control_label=None, conditions=<factory>, pairwise_comparisons=<factory>, anova=None, ranking, equilibration_time='0ns', created_at=<factory>, polyzymd_version='1.2.1', ranking_by_transient_fraction=<factory>, excluded_conditions=<factory>)[source]

Bases: BaseComparisonResult[ExposureConditionSummary, PairwiseComparison]

Complete exposure dynamics comparison result.

This is the main output from ExposureDynamicsComparator.compare(). Contains per-condition summaries of transient exposure and chaperone event statistics, plus pairwise statistical comparisons.

metric

Always “chaperone_fraction”.

Type:: str

name

Comparison project name.

Type:: str

control_label

Label of the control condition.

Type:: str, optional

conditions

Summary for each condition.

Type:: list[ExposureConditionSummary]

pairwise_comparisons

Pairwise t-tests on chaperone_fraction.

Type:: list[PairwiseComparison]

anova

One-way ANOVA across all conditions.

Type:: ANOVASummary, optional

ranking

Condition labels sorted by chaperone_fraction (highest first).

Type:: list[str]

ranking_by_transient_fraction

Condition labels sorted by transient_fraction (highest first).

Type:: list[str]

excluded_conditions

Conditions excluded (e.g., no-polymer controls).

Type:: list[str]

equilibration_time

Equilibration time used.

Type:: str

created_at

When the analysis was run.

Type:: datetime

polyzymd_version

Version of polyzymd used.

Type:: str

comparison_type: ClassVar[str] = 'exposure'

metric: str

conditions: list[ExposureConditionSummary]

pairwise_comparisons: list[PairwiseComparison]

ranking_by_transient_fraction: list[str]

excluded_conditions: list[str]

equilibration_time: str

created_at: datetime

polyzymd_version: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Result models for polymer affinity score comparison analysis.

The polymer affinity score is a comparative metric that quantifies the total strength of polymer-protein interactions by summing per-contact free energy contributions weighted by the number of simultaneous contacts.

Physics

For each (polymer_type, protein_group) pair, the affinity score is:

S_{p,g} = N_{p,g} × ΔG_sel(p,g)

where:

N_{p,g} = mean number of simultaneous contacts per frame: = mean_contact_fraction × n_exposed_in_group

ΔG_sel(p,g) = -ln(contact_share / expected_share) [in units of k_bT]

The total affinity score for a polymer type is:

S_p = Σ_g S_{p,g}

The total affinity score for a condition is:

S = Σ_p S_p

Independence assumption

This formulation assumes contacts are thermodynamically independent — each contact contributes the same free energy regardless of what other contacts exist simultaneously. This is the standard polyvalent binding approximation (Mammen et al., Angew. Chem. Int. Ed. 1998, 37, 2754).

The absolute values are NOT rigorous thermodynamic binding free energies. However, the relative differences between polymer compositions are meaningful as a comparative scoring function, analogous to scoring functions in molecular docking or MM/PBSA decomposition.

Sign convention

S < 0 → net favorable polymer-protein interaction S > 0 → net unfavorable (avoidance dominates) S = 0 → contacts match the surface-availability reference

Interpretation

More negative total score → stronger net polymer-protein affinity. When combined with structural stability metrics (RMSF, triad contacts), the affinity score helps rank polymer compositions by total interaction strength.

Uncertainty propagation

Per-replicate scores are computed independently:

S_rep = N_rep × ΔG_sel,rep

where N_rep = contact_fraction_rep × n_exposed_in_group, and ΔG_sel,rep = -ln(enrichment_rep + 1). The mean and SEM are taken across replicates. This approach naturally captures the covariance between N and ΔG_sel.

When per-replicate data is unavailable, analytical error propagation is used:

σ(S) = √[(N·σ_ΔG_sel)² + (ΔG_sel·σ_N)²]

class polyzymd.compare.results.polymer_affinity.AffinityScoreEntry(*, polymer_type, protein_group, partition_name='aa_class', n_contacts, delta_G_per_contact=None, affinity_score=None, affinity_score_uncertainty=None, affinity_score_per_replicate=<factory>, mean_contact_fraction=0.0, n_exposed_in_group=0, contact_share=0.0, expected_share=0.0, temperature_K=0.0, n_replicates=0)[source]

Bases: BaseModel

Affinity score for one (polymer_type, protein_group) pair in one condition.

Stores both the composite score and its constituent quantities for reproducibility and downstream verification.

polymer_type

Polymer residue type (e.g., “SBM”, “EGM”).

Type:: str

protein_group

Protein amino acid group label (e.g., “aromatic”, “charged_positive”).

Type:: str

partition_name

Name of the partition this group belongs to (e.g., “aa_class”).

Type:: str

n_contacts

Mean number of simultaneous contacts per frame. Computed as mean_contact_fraction * n_exposed_in_group.

Type:: float

delta_G_per_contact

Per-contact selectivity free energy in kT. Computed as -ln(contact_share / expected_share).

Type:: float | None

affinity_score

Composite score: n_contacts * delta_G_per_contact (kT). More negative = stronger favorable interaction.

Type:: float | None

affinity_score_uncertainty

Uncertainty on affinity_score. From replicate SEM when available, otherwise from analytical error propagation.

Type:: float | None

affinity_score_per_replicate

Per-replicate affinity scores for statistical testing.

Type:: list[float]

mean_contact_fraction

Mean per-residue contact fraction in this group (from binding preference).

Type:: float

n_exposed_in_group

Number of surface-exposed residues in this group.

Type:: int

contact_share

Observed fraction of polymer contacts directed at this group.

Type:: float

expected_share

Surface-availability reference fraction.

Type:: float

temperature_K

Simulation temperature in Kelvin.

Type:: float

n_replicates

Number of replicates with valid data.

Type:: int

polymer_type: str

protein_group: str

partition_name: str

n_contacts: float

delta_G_per_contact: float | None

affinity_score: float | None

affinity_score_uncertainty: float | None

affinity_score_per_replicate: list[float]

mean_contact_fraction: float

n_exposed_in_group: int

contact_share: float

expected_share: float

temperature_K: float

n_replicates: int

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.polymer_affinity.PolymerTypeScore(*, polymer_type, total_score, total_score_uncertainty=None, total_score_per_replicate=<factory>, total_n_contacts=0.0, group_contributions=<factory>)[source]

Bases: BaseModel

Aggregated affinity score for one polymer type across all protein groups.

The score is the sum of per-group affinity scores:: S_p = Σ_g (N_g × ΔG_sel(g))

polymer_type

Polymer residue type (e.g., “SBM”, “EGM”).

Type:: str

total_score

Sum of affinity scores across all protein groups (kT).

Type:: float

total_score_uncertainty

Uncertainty on total_score.

Type:: float | None

total_score_per_replicate

Per-replicate total scores for statistical testing.

Type:: list[float]

total_n_contacts

Total mean simultaneous contacts per frame across all groups.

Type:: float

group_contributions

Breakdown by protein group (for detail reporting).

Type:: list[AffinityScoreEntry]

polymer_type: str

total_score: float

total_score_uncertainty: float | None

total_score_per_replicate: list[float]

total_n_contacts: float

group_contributions: list[AffinityScoreEntry]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.polymer_affinity.AffinityScoreConditionSummary(*, label, config_path, temperature_K, n_replicates=0, total_score=0.0, total_score_uncertainty=None, total_score_per_replicate=<factory>, total_n_contacts=0.0, polymer_type_scores=<factory>, entries=<factory>, polymer_types=<factory>, protein_groups=<factory>)[source]

Bases: BaseModel

Affinity score summary for one simulation condition.

Aggregates scores at three levels: per (polymer_type, protein_group), per polymer_type, and total condition score.

label

Display name for this condition.

Type:: str

config_path

Path to the SimulationConfig YAML used.

Type:: str

temperature_K

Simulation temperature in Kelvin.

Type:: float

n_replicates

Number of replicates in this condition.

Type:: int

total_score

Grand total affinity score across all polymer types and groups (kT).

Type:: float

total_score_uncertainty

Uncertainty on total_score.

Type:: float | None

total_score_per_replicate

Per-replicate grand total scores for pairwise statistics.

Type:: list[float]

total_n_contacts

Total mean simultaneous contacts per frame (all types, all groups).

Type:: float

polymer_type_scores

Per-polymer-type score breakdown.

Type:: list[PolymerTypeScore]

entries

All (polymer_type, protein_group) entries.

Type:: list[AffinityScoreEntry]

polymer_types

Polymer types present.

Type:: list[str]

protein_groups

Protein groups analyzed.

Type:: list[str]

label: str

config_path: str

temperature_K: float

n_replicates: int

total_score: float

total_score_uncertainty: float | None

total_score_per_replicate: list[float]

total_n_contacts: float

polymer_type_scores: list[PolymerTypeScore]

entries: list[AffinityScoreEntry]

polymer_types: list[str]

protein_groups: list[str]

property primary_metric_value: float: Total affinity score (for ranking compatibility).

property primary_metric_sem: float: Uncertainty on total affinity score.

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.polymer_affinity.AffinityScorePairwiseEntry(*, condition_a, condition_b, temperature_a_K, temperature_b_K, cross_temperature=False, score_a=0.0, score_b=0.0, delta_score=None, t_statistic=None, p_value=None)[source]

Bases: BaseModel

Pairwise affinity score comparison between two conditions.

Compares total affinity scores. Statistics are suppressed for cross-temperature pairs.

condition_a

Label of the first condition (typically control or reference).

Type:: str

condition_b

Label of the second condition.

Type:: str

temperature_a_K

Temperature of condition A in Kelvin.

Type:: float

temperature_b_K

Temperature of condition B in Kelvin.

Type:: float

cross_temperature

True when temperatures differ (statistics suppressed).

Type:: bool

score_a

Total affinity score for condition A (kT).

Type:: float

score_b

Total affinity score for condition B (kT).

Type:: float

delta_score

Difference: score_B - score_A (kT). Negative = B has stronger affinity than A.

Type:: float | None

t_statistic

T-test statistic (None for cross-temperature pairs).

Type:: float | None

p_value

Two-tailed p-value (None for cross-temperature pairs).

Type:: float | None

condition_a: str

condition_b: str

temperature_a_K: float

temperature_b_K: float

cross_temperature: bool

score_a: float

score_b: float

delta_score: float | None

t_statistic: float | None

p_value: float | None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class polyzymd.compare.results.polymer_affinity.PolymerAffinityScoreResult(*, name, methodology='Polymer Affinity Score: S = Σ (N_contacts × ΔG_sel_per_contact) [kT]. N_contacts = mean_contact_fraction × n_exposed_in_group. ΔG_sel_per_contact = -ln(contact_share / expected_share). More negative = stronger net polymer-protein affinity. Assumes contact independence; interpret as comparative scoring metric.', mixed_temperatures=False, temperature_groups=<factory>, conditions=<factory>, pairwise_comparisons=<factory>, polymer_types=<factory>, protein_groups=<factory>, surface_exposure_threshold=None, equilibration_time='', created_at=<factory>, polyzymd_version=<factory>)[source]

Bases: BaseModel

Complete polymer affinity score comparison result.

This is the main output from PolymerAffinityScoreComparator.compare().

The polymer affinity score quantifies total polymer-protein interaction strength as a comparative metric. It is computed by summing per-contact selectivity free energies weighted by the number of simultaneous contacts:

S = Σ_{p,g} N_{p,g} × ΔG_sel(p,g)

where the sum runs over all (polymer_type, protein_group) pairs.

Important

This quantity assumes contact independence and should be interpreted as a relative affinity score, not a rigorous thermodynamic binding free energy. See the module docstring for details.

name

Name of the comparison project.

Type:: str

methodology

Human-readable description of the scoring methodology.

Type:: str

mixed_temperatures

True if conditions span more than one simulation temperature.

Type:: bool

temperature_groups

Mapping of temperature (K, as str) to condition labels.

Type:: dict[str, list[str]]

conditions

Summary for each condition.

Type:: list[AffinityScoreConditionSummary]

pairwise_comparisons

All pairwise comparisons.

Type:: list[AffinityScorePairwiseEntry]

polymer_types

All polymer types found.

Type:: list[str]

protein_groups

All protein groups analyzed.

Type:: list[str]

surface_exposure_threshold

SASA threshold used (from settings).

Type:: float | None

equilibration_time

Equilibration time used.

Type:: str

created_at

When the analysis was run.

Type:: datetime

polyzymd_version

Version of polyzymd used.

Type:: str

name: str

methodology: str

mixed_temperatures: bool

temperature_groups: dict[str, list[str]]

conditions: list[AffinityScoreConditionSummary]

pairwise_comparisons: list[AffinityScorePairwiseEntry]

polymer_types: list[str]

protein_groups: list[str]

surface_exposure_threshold: float | None

equilibration_time: str

created_at: datetime

polyzymd_version: str

save(path)[source]

Save result to JSON file.

Parameters:: path (Path or str) – Output path.
Returns:: Path to the saved file.
Return type:: Path

classmethod load(path)[source]

Load result from JSON file.

Parameters:: path (Path or str) – Path to JSON file.
Returns:: Loaded result.
Return type:: PolymerAffinityScoreResult

get_condition(label)[source]

Look up a condition summary by label.

Parameters:: label (str) – Condition display name.
Return type:: AffinityScoreConditionSummary or None

get_ranking()[source]

Return conditions ranked by total affinity score (most negative first).

Returns:: Conditions sorted by total_score ascending.
Return type:: list[AffinityScoreConditionSummary]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Formatters

Output formatters for binding free energy comparison results.

Provides console table, Markdown, and JSON output for BindingFreeEnergyResult.

polyzymd.compare.binding_free_energy_formatters.format_bfe_console_table(result)[source]

Format a BindingFreeEnergyResult as a console-friendly ASCII table.

Parameters:: result (BindingFreeEnergyResult) – Comparison result to format.
Returns:: ASCII table string.
Return type:: str

polyzymd.compare.binding_free_energy_formatters.format_bfe_markdown(result)[source]

Format a BindingFreeEnergyResult as Markdown.

Parameters:: result (BindingFreeEnergyResult) – Comparison result to format.
Returns:: Markdown-formatted string.
Return type:: str

polyzymd.compare.binding_free_energy_formatters.format_bfe_json(result)[source]

Format a BindingFreeEnergyResult as JSON.

Parameters:: result (BindingFreeEnergyResult) – Comparison result to format.
Returns:: JSON string.
Return type:: str

polyzymd.compare.binding_free_energy_formatters.format_bfe_result(result, format='table')[source]

Format a BindingFreeEnergyResult in the requested format.

Parameters:

result (BindingFreeEnergyResult) – Comparison result to format.
format (str) – Output format: “table” (default), “markdown”, or “json”.

Returns:

Formatted string.

Return type:

str

Raises:

ValueError – If format is not recognized.

Output formatters for polymer affinity score comparison results.

Provides console table, Markdown, and JSON output for PolymerAffinityScoreResult.

polyzymd.compare.polymer_affinity_formatters.format_affinity_console_table(result)[source]

Format a PolymerAffinityScoreResult as a console-friendly ASCII table.

Parameters:: result (PolymerAffinityScoreResult) – Comparison result to format.
Returns:: ASCII table string.
Return type:: str

polyzymd.compare.polymer_affinity_formatters.format_affinity_markdown(result)[source]

Format a PolymerAffinityScoreResult as Markdown.

Parameters:: result (PolymerAffinityScoreResult) – Comparison result to format.
Returns:: Markdown-formatted string.
Return type:: str

polyzymd.compare.polymer_affinity_formatters.format_affinity_json(result)[source]

Format a PolymerAffinityScoreResult as JSON.

Parameters:: result (PolymerAffinityScoreResult) – Comparison result to format.
Returns:: JSON string.
Return type:: str

polyzymd.compare.polymer_affinity_formatters.format_affinity_result(result, format='table')[source]

Format a PolymerAffinityScoreResult in the requested format.

Parameters:

result (PolymerAffinityScoreResult) – Comparison result to format.
format (str) – Output format: “table” (default), “markdown”, or “json”.

Returns:

Formatted string.

Return type:

str

Raises:

ValueError – If format is not recognized.

Plotters

Binding free energy plotters for comparison workflow.

This module provides registered plotters for ΔG_sel (selectivity free energy) analysis: - BFEHeatmapPlotter: ΔG_sel heatmap with rows = AA groups, columns = conditions - BFEBarPlotter: Grouped bar chart of ΔG_sel by AA residue class

Both plotters load a BindingFreeEnergyResult JSON saved by the polyzymd compare binding-free-energy command (in results/ adjacent to comparison.yaml) rather than per-condition analysis directories.

Partition-aware plotting

Each FreeEnergyEntry carries a partition_name field (e.g., “aa_class”, “lid_helices”, “whole_lid_domain”) that identifies which residue grouping scheme produced that entry. Different partitions use different denominators (each partition’s total exposed surface area), so mixing groups from different partitions on the same figure is scientifically misleading.

Both plotters therefore produce one figure per (partition, polymer_type) combination. When only a single partition is present (the common case for datasets that only use default AA-class grouping), filenames and titles omit the partition name to preserve backward compatibility.

Physics interpretation

ΔG_sel < 0 → preferential contact (polymer contacts this group more than: expected from surface availability alone)

ΔG_sel > 0 → contact avoidance (polymer contacts this group less than expected) ΔG_sel = 0 → contacts match surface-availability reference exactly

Diverging colormap (RdBu_r by default) is centered at 0.0: - Blue (negative) → preference - White (zero) → neutral - Red (positive) → avoidance

Units are whatever was specified in analysis_settings.binding_free_energy.units (kT by default — dimensionless, in units of k_bT).

class polyzymd.compare.plotters.binding_free_energy.BFEHeatmapPlotter(settings)[source]

Bases: BasePlotter

Generate ΔG_sel heatmap comparing binding free energy across conditions.

Creates one figure per (partition, polymer_type) combination: - Rows: protein groups belonging to that partition - Columns: Conditions (e.g., 0% SBMA, 25% SBMA, …) - Color: ΔG_sel value with diverging colormap centered at 0

When only a single partition exists (e.g., just “aa_class”), filenames and titles match the previous single-partition behavior for backward compatibility.

Loads BindingFreeEnergyResult from results/ adjacent to comparison.yaml (accepts both binding_free_energy_comparison_*.json and bfe_comparison_*.json naming conventions).

Sign convention

Blue (negative ΔG_sel) = preferential contact Red (positive ΔG_sel) = contact avoidance

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Return True for ‘binding_free_energy’ when heatmap is enabled.

plot(data, labels, output_dir, **kwargs)[source]

Generate ΔG_sel heatmaps, one per (partition, polymer_type).

Parameters:

data (dict) – Mapping of condition_label -> condition data dict from ComparisonPlotter._load_analysis_data().
labels (sequence of str) – Condition labels in desired display order.
output_dir (Path) – Directory to save plot files.
**kwargs (Any) – Unused; for interface compatibility.

Returns:

Paths to generated plot files, or empty list.

Return type:

list[Path]

class polyzymd.compare.plotters.binding_free_energy.BFEBarPlotter(settings)[source]

Bases: BasePlotter

Generate ΔG_sel grouped bar charts comparing binding free energy across conditions.

Creates one figure per (partition, polymer_type) combination with: - Groups on x-axis: protein groups from that partition - Bars within each group: one per condition - Error bars: between-replicate SEM on ΔG_sel (delta-method fallback) - Reference line at ΔG_sel = 0

When only a single partition exists, filenames and titles match the previous single-partition behavior for backward compatibility.

Loads BindingFreeEnergyResult from results/ adjacent to comparison.yaml (accepts both binding_free_energy_comparison_*.json and bfe_comparison_*.json naming conventions).

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Return True for ‘binding_free_energy’ when bar charts are enabled.

plot(data, labels, output_dir, **kwargs)[source]

Generate ΔG_sel grouped bar charts, one per (partition, polymer_type).

Parameters:

data (dict) – Mapping of condition_label -> condition data dict from ComparisonPlotter._load_analysis_data().
labels (sequence of str) – Condition labels in desired display order.
output_dir (Path) – Directory to save plot files.
**kwargs (Any) – Unused; for interface compatibility.

Returns:

Paths to generated plot files, or empty list.

Return type:

list[Path]

Exposure dynamics plotters for comparison workflow.

Provides two registered plotters:

ExposureChaperoneFractionPlotter ("exposure_chaperone_fraction") Bar chart comparing mean chaperone fraction across conditions.
ExposureEnrichmentHeatmapPlotter ("exposure_enrichment_heatmap") Heatmap of residue-based chaperone enrichment per (polymer_type, aa_group).

Both plotters follow the established BasePlotter pattern: load data from data[label]["analysis_dir"] paths rather than expecting data to be passed via kwargs.

class polyzymd.compare.plotters.exposure.ExposureChaperoneFractionPlotter(settings)[source]

Bases: BasePlotter

Bar chart comparing chaperone fraction across conditions.

Shows mean chaperone fraction (with SEM error bars) per condition, ordered by the ranking from ExposureDynamicsComparator.compare().

Compatible with analysis_type=”exposure”.

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Check if this plotter can handle the given analysis type.

Parameters:

comparison_config (ComparisonConfig) – Full comparison configuration
analysis_type (str) – Analysis type to check (e.g., “rmsf”, “triad”, “distances”)

Returns:

True if this plotter can generate plots for the analysis type

Return type:

bool

plot(data, labels, output_dir, **kwargs)[source]

Generate chaperone fraction bar chart.

class polyzymd.compare.plotters.exposure.ExposureEnrichmentHeatmapPlotter(settings)[source]

Bases: BasePlotter

Heatmap of chaperone enrichment per (polymer_type, aa_group).

One subplot per condition; rows = polymer types, columns = AA groups. Color encodes residue-based enrichment (warm = enriched, cool = depleted).

Compatible with analysis_type=”exposure”.

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Check if this plotter can handle the given analysis type.

Parameters:

comparison_config (ComparisonConfig) – Full comparison configuration
analysis_type (str) – Analysis type to check (e.g., “rmsf”, “triad”, “distances”)

Returns:

True if this plotter can generate plots for the analysis type

Return type:

bool

plot(data, labels, output_dir, **kwargs)[source]

Generate enrichment heatmaps from cached ExposureComparisonResult.

Polymer affinity score plotters for comparison workflow.

This module provides registered plotters for the polymer affinity score:

AffinityStackedBarPlotter: Total affinity score per condition, with stacked segments showing each polymer type’s contribution.
AffinityGroupBarPlotter: Per-group breakdown comparing conditions, one figure per polymer type.

Both plotters load a PolymerAffinityScoreResult JSON saved by the polyzymd compare polymer-affinity command (in results/ adjacent to comparison.yaml).

Physics interpretation

Score < 0 → net favorable polymer-protein affinity Score > 0 → net unfavorable (avoidance dominates) Score = 0 → contacts match the surface-availability reference

Units are always kT (dimensionless, in units of k_bT).

Sign convention

More negative = stronger polymer-protein interaction. Diverging colormap is not used here (unlike BFE heatmaps) because the primary display is bar charts where sign is visually obvious.

class polyzymd.compare.plotters.polymer_affinity.AffinityStackedBarPlotter(settings)[source]

Bases: BasePlotter

Stacked bar chart of total affinity score per condition.

Each bar represents one condition’s total affinity score, with segments colored by polymer type contribution. This gives a quick overview of which polymer types contribute most to the total interaction strength.

Loads PolymerAffinityScoreResult from results/ adjacent to comparison.yaml.

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Check if this plotter can handle the given analysis type.

Parameters:

comparison_config (ComparisonConfig) – Full comparison configuration
analysis_type (str) – Analysis type to check (e.g., “rmsf”, “triad”, “distances”)

Returns:

True if this plotter can generate plots for the analysis type

Return type:

bool

plot(data, labels, output_dir, **kwargs)[source]

Generate stacked bar chart of affinity scores by condition.

Parameters:

data (dict) – Condition data dict from orchestrator.
labels (sequence of str) – Condition labels in display order.
output_dir (Path) – Directory to save plot files.

Returns:

Paths to generated plot files.

Return type:

list[Path]

class polyzymd.compare.plotters.polymer_affinity.AffinityGroupBarPlotter(settings)[source]

Bases: BasePlotter

Grouped bar chart of per-group affinity score contributions.

Creates one figure per polymer type with: - Groups on x-axis: protein groups (AA classes) - Bars within each group: one per condition - Error bars: SEM on per-group affinity score - Reference line at score = 0

Loads PolymerAffinityScoreResult from results/.

classmethod plot_type()[source]

Return the unique identifier for this plotter.

Returns:: Plot type identifier (e.g., “triad_kde_panel”, “rmsf_comparison”)
Return type:: str

can_plot(comparison_config, analysis_type)[source]

Check if this plotter can handle the given analysis type.

Parameters:

comparison_config (ComparisonConfig) – Full comparison configuration
analysis_type (str) – Analysis type to check (e.g., “rmsf”, “triad”, “distances”)

Returns:

True if this plotter can generate plots for the analysis type

Return type:

bool

plot(data, labels, output_dir, **kwargs)[source]

Generate grouped bar charts of per-group affinity scores.

Parameters:

data (dict) – Condition data dict from orchestrator.
labels (sequence of str) – Condition labels in display order.
output_dir (Path) – Directory to save plot files.

Returns:

Paths to generated plot files.

Return type:

list[Path]

CLI

CLI commands for the compare module.

This module provides the polyzymd compare command group with subcommands for initializing comparison projects and running comparisons.