How-To: Analyze Hydrogen Bonds Across Conditions

This guide shows you how to configure, run, and interpret the hydrogen_bonds analysis plugin. By the end, you will have a working hydrogen-bond comparison across simulation conditions, with grouped summaries, composition breakdowns, and publication-ready figures.

Environment Setup

All commands below assume you have activated the PolyzyMD pixi environment:

pixi shell -e build

Alternatively, prefix each command with pixi run -e build.

When to Use This Plugin

Use hydrogen_bonds when you need to answer questions like:

Does the polymer form more H-bonds with the protein than the substrate does?
Are protein internal H-bonds disrupted by polymer conjugation?
Which specific residue pairs form the most persistent H-bonds with the polymer?
How is the total H-bond budget distributed across protein, substrate, and polymer?

The plugin wraps MDAnalysis HydrogenBondAnalysis with a flexible groups + summaries model: you define named atom groups, then specify which pairs or self-interactions to summarize. This lets you answer multiple questions in a single analysis run.

Before You Start

You need:

A working pixi environment (pixi install -e build)
Completed simulation trajectories for at least two conditions
A comparison.yaml defining your conditions (see How to Compare Simulation Conditions)
Familiarity with the PolyzyMD chain convention (A=protein, B=substrate, C=polymer)

Quick Start

Add a hydrogen_bonds section to the plugins: block in your comparison.yaml:

plugins:
  hydrogen_bonds:
    groups:
      protein_all: "protein"
      polymer_all: "chainid C"
    summaries:
      protein_polymer:
        between: [protein_all, polymer_all]

Run the analysis:

pixi run -e build polyzymd compare run hydrogen_bonds

That is all you need for a minimal protein–polymer H-bond comparison. The sections below explain every setting and show more advanced configurations.

Configuration Reference

All settings live under plugins: hydrogen_bonds: in comparison.yaml. Every field has a sensible default, so you only need to specify what you want to change.

Geometric Criteria

Field	Type	Default	Description
`distance_cutoff`	float	`3.0`	Donor–acceptor distance cutoff in Å
`angle_cutoff`	float	`150`	D–H···A angle cutoff in degrees

These match the standard MDAnalysis HydrogenBondAnalysis defaults. The distance is measured between the heavy-atom donor and the acceptor (not the hydrogen). The angle is measured at the hydrogen: D–H···A.

Groups

groups:
  protein_all: "protein"
  substrate: "resname pNB"
  polymer_all: "chainid C"

Each entry maps a name (used in summaries) to an MDAnalysis selection string. Groups are resolved once at the start of the analysis. You can use any valid MDAnalysis selection syntax including protein, chainid, resname, resid, boolean operators, and parentheses.

Group overlaps

If two groups share atoms (e.g., "protein" and "protein and resid 106"), the plugin logs a warning and H-bonds in the overlap region may be counted in multiple summaries. For clean composition accounting, use disjoint groups.

Summaries

Summaries define what to measure. Each summary has a unique name and exactly one mode:

between: [group_a, group_b] — Count H-bonds where one partner is in group A and the other in group B (inter-group).
within: group_name — Count H-bonds where both partners are in the same group (intra-group).

summaries:
  protein_polymer:
    between: [protein_all, polymer_all]
  protein_internal:
    within: protein_all

Each summary produces an independent metric (mean_hbonds_<name>) in the comparison output. You can define as many summaries as you need — the plugin runs one global MDAnalysis H-bond search and post-classifies events into summaries.

Composition

Composition analysis breaks down the total H-bond budget by donor→acceptor partition pairs. This is independent of summaries and uses its own set of disjoint selections.

composition:
  partitions:
    protein: "protein"
    substrate: "resname pNB"
    polymer: "chainid C"

The plugin counts every detected H-bond event, classifies donor and acceptor atoms into partitions, and reports both absolute counts (mean H-bonds/frame) and fractional shares per partition pair (e.g., protein→polymer, protein→protein, polymer→substrate).

Composition is opt-in. If composition is omitted, no composition results are computed.

Partition disjointness

Composition partitions should be disjoint (no shared atoms). If partitions overlap, the plugin raises an error by default. Set allow_overlapping_composition: true to allow overlap with warnings (overlapping atoms are counted in both partitions, so fractions may exceed 1.0).

Other Settings

Field	Type	Default	Description
`update_selections`	bool	`true`	Re-evaluate atom selections each frame. Required for coordinate-dependent selections like `around`. For purely structural selections (`chainid`, `resname`), this adds overhead without changing results.
`top_n_pairs`	int	`15`	Number of top residue pairs to report per summary
`allow_empty_groups`	bool	`true`	If `true` (default), warn and skip summaries when a referenced group selection matches no atoms. Set `false` for strict validation (raise `ValueError`).
`allow_overlapping_composition`	bool	`false`	If `false`, overlapping composition partitions raise an error. Set `true` to allow overlap with warnings.
`timestep_ps`	float or null	`null`	Manual frame spacing in ps for time-axis plots. If null, read from trajectory metadata.

If you want composition partitions to mirror group selections, define them explicitly in YAML with the same keys/selections.

Technical Detail

update_selections: true makes MDAnalysis re-evaluate atom selections every frame, which is important for coordinate-dependent selections such as around, sphzone, and cyzone.

However, the plugin resolves group membership used for donor/acceptor pair-classification once at the start from the initial frame index sets. This means pair-tracking classification does not change if atoms move between groups later in the trajectory. This behavior is deliberate for performance and for consistent summary definitions across frames.

When to set update_selections to false

If all your groups use structural selections (chainid, resname, resid, protein), setting update_selections: false is safe and faster. Only set it to true when you use coordinate-dependent keywords like around, sphzone, or cyzone. Note that even with update_selections: true, group membership for post-classification is evaluated once — this is exact for structural selections and an approximation for coordinate-dependent ones.

Common Recipes

Protein ↔ polymer H-bonds (minimal)

plugins:
  hydrogen_bonds:
    groups:
      protein_all: "protein"
      polymer_all: "chainid C"
    summaries:
      protein_polymer:
        between: [protein_all, polymer_all]

Standard 4-summary configuration with composition

This is the recommended starting point for a typical enzyme–polymer–substrate system. It covers the four most common interactions and adds a composition breakdown.

plugins:
  hydrogen_bonds:
    distance_cutoff: 3.0
    angle_cutoff: 150
    groups:
      protein_all: "protein"
      substrate: "resname pNB"
      polymer_all: "chainid C"
    summaries:
      protein_polymer:
        between: [protein_all, polymer_all]
      protein_substrate:
        between: [protein_all, substrate]
      protein_internal:
        within: protein_all
      polymer_internal:
        within: polymer_all
    composition:
      partitions:
        protein: "protein"
        substrate: "resname pNB"
        polymer: "chainid C"

Monomer-resolved polymer analysis

If your polymer contains distinct monomer types (e.g., SBMA and EGMA), you can define separate groups to compare their H-bonding behavior:

plugins:
  hydrogen_bonds:
    groups:
      protein_all: "protein"
      substrate: "resname pNB"
      polymer_all: "chainid C"
      polymer_sbm: "chainid C and resname SBM"
      polymer_egm: "chainid C and resname EGM"
      active_site: "protein and (resid 106 or resid 225 or resid 188)"
    summaries:
      protein_polymer:
        between: [protein_all, polymer_all]
      protein_sbm:
        between: [protein_all, polymer_sbm]
      protein_egm:
        between: [protein_all, polymer_egm]
      active_site_polymer:
        between: [active_site, polymer_all]
      protein_internal:
        within: protein_all
      polymer_internal:
        within: polymer_all
    composition:
      partitions:
        protein: "protein"
        substrate: "resname pNB"
        polymer_sbm: "chainid C and resname SBM"
        polymer_egm: "chainid C and resname EGM"

Protein ↔ substrate only

For systems without a polymer (or for a control condition comparison):

plugins:
  hydrogen_bonds:
    groups:
      protein_all: "protein"
      substrate: "resname pNB"
    summaries:
      protein_substrate:
        between: [protein_all, substrate]

Active-site focused analysis

To focus on H-bonds near the catalytic site:

plugins:
  hydrogen_bonds:
    groups:
      active_site: "protein and (resid 106 or resid 225 or resid 188)"
      polymer_all: "chainid C"
      substrate: "resname pNB"
    summaries:
      active_site_polymer:
        between: [active_site, polymer_all]
      active_site_substrate:
        between: [active_site, substrate]

Running the Analysis

Local execution

pixi run -e build polyzymd compare run hydrogen_bonds

This runs the full pipeline: compute_replicate for every replicate, aggregate for every condition, then compare and plot.

HPC execution

For long trajectories or many replicates, submit as a SLURM job:

pixi run -e build polyzymd compare submit hydrogen_bonds \
    -f comparison.yaml \
    --partition aa100 \
    --mem 8G \
    --time 02:00:00

Performance note

The hydrogen bonds plugin is marked with execution_cost_hint = "high". H-bond detection iterates over donor–acceptor pairs across all selected atoms every frame. For large systems (50k+ atoms) with many frames, allocate at least 8 GB memory and 1–2 hours per replicate.

See How To: Submit Analysis Jobs to a SLURM Cluster for the full HPC submission guide.

Understanding the Output

Per-summary metrics

Each summary produces these fields in the aggregated result:

Field	Description
`mean_hbonds_per_frame`	Average number of H-bonds per frame, averaged across replicates
`sem_hbonds_per_frame`	Standard error of the mean across replicates
`per_replicate_mean_hbonds`	Per-replicate mean values (for statistical tests)
`mean_unique_pairs_per_frame`	Mean number of unique donor-residue/acceptor-residue pairs per frame
`sem_unique_pairs_per_frame`	Standard error across replicates for the unique-pairs metric
`mean_fraction_with_any`	Fraction of frames that had at least one H-bond

The primary comparison metric is mean_hbonds_per_frame per summary. In the comparison output, this appears as mean_hbonds_<summary_name> (e.g., mean_hbonds_protein_polymer).

Residue pairs

Each summary reports the top residue pairs ranked by occupancy (fraction of frames where the pair has at least one H-bond). Two views are provided:

Directed pairs — Donor→Acceptor direction is preserved. Useful for understanding H-bond directionality (which residue donates).
Undirected pairs — Both directions are merged into one pair. Useful for identifying which residue pairs interact most frequently regardless of direction.

Each pair reports mean_occupancy, sem_occupancy, mean_events_per_frame, and per-replicate occupancy values.

Composition entries

When composition is configured, the result includes per-partition-pair entries:

Field	Description
`donor_partition → acceptor_partition`	The partition pair (e.g., protein→polymer)
`mean_hbonds_per_frame`	Absolute count for this pair
`mean_fraction_of_total`	This pair’s share of all detected H-bonds

Composition fractions help normalize for differences in donor/acceptor availability across conditions. If a polymer condition has more total H-bonds but the protein→protein fraction stays constant, the polymer is adding new interactions rather than disrupting existing ones.

Statistical comparison

The default comparison pipeline produces:

Pairwise t-tests between all condition pairs for each summary metric, with Benjamini–Hochberg FDR correction
Cohen’s d effect sizes for each pairwise comparison
ANOVA (when ≥3 conditions) as an omnibus test
Rankings of conditions by each metric

The higher_is_better flag is set to None for H-bond counts (neither direction is universally better), and direction labels use “fewer H-bonds” / “similar” / “more H-bonds”.

Interpreting the Plots

The plugin generates up to five plot types. All plots use the shared PolyzyMD theming system for consistent publication-quality output.

Summary comparison bar chart

File: hbond_summary_comparison.png

Faceted grouped bars with one subplot per summary and one bar per condition. Each summary gets its own y-axis scale, improving readability when summaries have very different magnitudes. Error bars show SEM across replicates.

Time series

File: hbond_timeseries_<summary>.png (one per summary)

Per-frame H-bond counts over time in physical units (ns). The mean across replicates is shown as a solid line with a ±1 SD shaded band (single-replicate runs show the trace only). Useful for detecting equilibration issues (early drift) or transient events (sudden spikes indicating a binding/unbinding event).

Top residue pairs occupancy

File: hbond_top_pairs_<summary>.png (one per summary)

Horizontal grouped bar chart showing top undirected residue pairs by mean occupancy. To focus on cross-condition comparison, only pairs present in at least two conditions are shown.

Composition absolute bars

File: hbond_composition_absolute.png

Stacked bar chart showing mean H-bonds/frame broken down by partition pair (e.g., protein→protein, protein→polymer, polymer→polymer). Each condition gets one stacked bar. Reveals how the total H-bond budget is distributed.

Composition fractional bars

File: hbond_composition_fraction.png

Same as above but normalized to fractions. The y-axis auto-scales: for disjoint partitions it ranges 0–1, but when allow_overlapping_composition: true is set and partitions overlap, stacked fractions may exceed 1.0. Useful for comparing the relative H-bond distribution even when total counts differ between conditions.

Scientific Considerations

Interpreting H-bond counts

Keep these caveats in mind when interpreting results:

Geometric criteria, not energetics. H-bond counts reflect MDAnalysis geometric criteria (distance and angle cutoffs), not interaction energies. A “hydrogen bond” by this definition may not correspond to a thermodynamically significant interaction.
System size matters. Cross-condition comparisons of raw H-bond counts should account for differences in system size (number of donors/acceptors). The composition fractional view helps normalize for this.
Composition fractions normalize availability. If condition A has 200 polymer atoms and condition B has 400, condition B will likely show more protein–polymer H-bonds in absolute terms. Composition fractions reveal whether the proportion also changes.
Residue-pair rankings are noisy. Individual residue-pair occupancies can vary substantially across replicates. Focus on the top 3–5 pairs and cross-reference with structural knowledge. The top_n_pairs setting controls how many pairs are reported (default: 15).
Statistical correction. Pairwise comparisons use FDR-corrected (Benjamini–Hochberg) p-values. When comparing many summaries across many conditions, check adjusted p-values rather than raw values.

Full Example: comparison.yaml

Here is a complete comparison.yaml for a CALB enzyme study with three polymer conditions:

name: "calb_hbond_study"
description: "Hydrogen bond analysis for CALB with polymer conjugates"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_CALB_pNPB/config.yaml"
    replicates: [1, 2, 3]

  - label: "SBMA-100"
    config: "../SBMA_100_CALB_pNPB/config.yaml"
    replicates: [1, 2, 3]

  - label: "EGMA-100"
    config: "../EGMA_100_CALB_pNPB/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

plugins:
  hydrogen_bonds:
    distance_cutoff: 3.0
    angle_cutoff: 150
    groups:
      protein_all: "protein"
      substrate: "resname pNB"
      polymer_all: "chainid C"
    summaries:
      protein_polymer:
        between: [protein_all, polymer_all]
      protein_substrate:
        between: [protein_all, substrate]
      protein_internal:
        within: protein_all
      polymer_internal:
        within: polymer_all
    composition:
      partitions:
        protein: "protein"
        substrate: "resname pNB"
        polymer: "chainid C"

plot_settings:
  format: "png"
  dpi: 300
  style: "publication"

Aliases

The plugin responds to these CLI names:

hydrogen_bonds (canonical)
hbonds
hbond

# These are equivalent:
polyzymd compare run hydrogen_bonds
polyzymd compare run hbonds
polyzymd compare run hbond