How-To: Analyze Hydrogen Bonds Across Conditions
This guide shows you how to configure, run, and interpret the hydrogen_bonds
analysis plugin. By the end, you will have a working hydrogen-bond comparison
across simulation conditions, with grouped summaries, composition breakdowns,
and publication-ready figures.
Environment Setup
All commands below assume you have activated the PolyzyMD pixi environment:
pixi shell -e build
Alternatively, prefix each command with pixi run -e build.
When to Use This Plugin
Use hydrogen_bonds when you need to answer questions like:
Does the polymer form more H-bonds with the protein than the substrate does?
Are protein internal H-bonds disrupted by polymer conjugation?
Which specific residue pairs form the most persistent H-bonds with the polymer?
How is the total H-bond budget distributed across protein, substrate, and polymer?
The plugin wraps MDAnalysis HydrogenBondAnalysis with a flexible
groups + summaries model: you define named atom groups, then specify
which pairs or self-interactions to summarize. This lets you answer multiple
questions in a single analysis run.
Before You Start
You need:
A working pixi environment (
pixi install -e build)Completed simulation trajectories for at least two conditions
A
comparison.yamldefining your conditions (see How to Compare Simulation Conditions)Familiarity with the PolyzyMD chain convention (A=protein, B=substrate, C=polymer)
Quick Start
Add a hydrogen_bonds section to the plugins: block in your
comparison.yaml:
plugins:
hydrogen_bonds:
groups:
protein_all: "protein"
polymer_all: "chainid C"
summaries:
protein_polymer:
between: [protein_all, polymer_all]
Run the analysis:
pixi run -e build polyzymd compare run hydrogen_bonds
That is all you need for a minimal protein–polymer H-bond comparison. The sections below explain every setting and show more advanced configurations.
Configuration Reference
All settings live under plugins: hydrogen_bonds: in comparison.yaml. Every
field has a sensible default, so you only need to specify what you want to
change.
Geometric Criteria
Field |
Type |
Default |
Description |
|---|---|---|---|
|
float |
|
Donor–acceptor distance cutoff in Å |
|
float |
|
D–H···A angle cutoff in degrees |
These match the standard MDAnalysis HydrogenBondAnalysis defaults. The
distance is measured between the heavy-atom donor and the acceptor (not the
hydrogen). The angle is measured at the hydrogen: D–H···A.
Groups
groups:
protein_all: "protein"
substrate: "resname pNB"
polymer_all: "chainid C"
Each entry maps a name (used in summaries) to an MDAnalysis selection
string. Groups are resolved once at the start of the analysis. You can use
any valid MDAnalysis selection syntax including protein, chainid,
resname, resid, boolean operators, and parentheses.
Group overlaps
If two groups share atoms (e.g., "protein" and
"protein and resid 106"), the plugin logs a warning and H-bonds in the
overlap region may be counted in multiple summaries. For clean composition
accounting, use disjoint groups.
Summaries
Summaries define what to measure. Each summary has a unique name and exactly one mode:
between: [group_a, group_b]— Count H-bonds where one partner is in group A and the other in group B (inter-group).within: group_name— Count H-bonds where both partners are in the same group (intra-group).
summaries:
protein_polymer:
between: [protein_all, polymer_all]
protein_internal:
within: protein_all
Each summary produces an independent metric (mean_hbonds_<name>) in the
comparison output. You can define as many summaries as you need — the plugin
runs one global MDAnalysis H-bond search and post-classifies events into
summaries.
Composition
Composition analysis breaks down the total H-bond budget by donor→acceptor partition pairs. This is independent of summaries and uses its own set of disjoint selections.
composition:
partitions:
protein: "protein"
substrate: "resname pNB"
polymer: "chainid C"
The plugin counts every detected H-bond event, classifies donor and acceptor atoms into partitions, and reports both absolute counts (mean H-bonds/frame) and fractional shares per partition pair (e.g., protein→polymer, protein→protein, polymer→substrate).
Composition is opt-in. If composition is omitted, no composition results are
computed.
Partition disjointness
Composition partitions should be disjoint (no shared atoms). If partitions
overlap, the plugin raises an error by default. Set
allow_overlapping_composition: true to allow overlap with warnings
(overlapping atoms are counted in both partitions, so fractions may exceed 1.0).
Other Settings
Field |
Type |
Default |
Description |
|---|---|---|---|
|
bool |
|
Re-evaluate atom selections each frame. Required for coordinate-dependent selections like |
|
int |
|
Number of top residue pairs to report per summary |
|
bool |
|
If |
|
bool |
|
If |
|
float or null |
|
Manual frame spacing in ps for time-axis plots. If null, read from trajectory metadata. |
If you want composition partitions to mirror group selections, define them explicitly in YAML with the same keys/selections.
Technical Detail
update_selections: true makes MDAnalysis re-evaluate atom selections every
frame, which is important for coordinate-dependent selections such as
around, sphzone, and cyzone.
However, the plugin resolves group membership used for donor/acceptor pair-classification once at the start from the initial frame index sets. This means pair-tracking classification does not change if atoms move between groups later in the trajectory. This behavior is deliberate for performance and for consistent summary definitions across frames.
When to set update_selections to false
If all your groups use structural selections (chainid, resname, resid,
protein), setting update_selections: false is safe and faster. Only set
it to true when you use coordinate-dependent keywords like around,
sphzone, or cyzone. Note that even with update_selections: true, group
membership for post-classification is evaluated once — this is exact for
structural selections and an approximation for coordinate-dependent ones.
Common Recipes
Protein ↔ polymer H-bonds (minimal)
plugins:
hydrogen_bonds:
groups:
protein_all: "protein"
polymer_all: "chainid C"
summaries:
protein_polymer:
between: [protein_all, polymer_all]
Standard 4-summary configuration with composition
This is the recommended starting point for a typical enzyme–polymer–substrate system. It covers the four most common interactions and adds a composition breakdown.
plugins:
hydrogen_bonds:
distance_cutoff: 3.0
angle_cutoff: 150
groups:
protein_all: "protein"
substrate: "resname pNB"
polymer_all: "chainid C"
summaries:
protein_polymer:
between: [protein_all, polymer_all]
protein_substrate:
between: [protein_all, substrate]
protein_internal:
within: protein_all
polymer_internal:
within: polymer_all
composition:
partitions:
protein: "protein"
substrate: "resname pNB"
polymer: "chainid C"
Monomer-resolved polymer analysis
If your polymer contains distinct monomer types (e.g., SBMA and EGMA), you can define separate groups to compare their H-bonding behavior:
plugins:
hydrogen_bonds:
groups:
protein_all: "protein"
substrate: "resname pNB"
polymer_all: "chainid C"
polymer_sbm: "chainid C and resname SBM"
polymer_egm: "chainid C and resname EGM"
active_site: "protein and (resid 106 or resid 225 or resid 188)"
summaries:
protein_polymer:
between: [protein_all, polymer_all]
protein_sbm:
between: [protein_all, polymer_sbm]
protein_egm:
between: [protein_all, polymer_egm]
active_site_polymer:
between: [active_site, polymer_all]
protein_internal:
within: protein_all
polymer_internal:
within: polymer_all
composition:
partitions:
protein: "protein"
substrate: "resname pNB"
polymer_sbm: "chainid C and resname SBM"
polymer_egm: "chainid C and resname EGM"
Protein ↔ substrate only
For systems without a polymer (or for a control condition comparison):
plugins:
hydrogen_bonds:
groups:
protein_all: "protein"
substrate: "resname pNB"
summaries:
protein_substrate:
between: [protein_all, substrate]
Active-site focused analysis
To focus on H-bonds near the catalytic site:
plugins:
hydrogen_bonds:
groups:
active_site: "protein and (resid 106 or resid 225 or resid 188)"
polymer_all: "chainid C"
substrate: "resname pNB"
summaries:
active_site_polymer:
between: [active_site, polymer_all]
active_site_substrate:
between: [active_site, substrate]
Running the Analysis
Local execution
pixi run -e build polyzymd compare run hydrogen_bonds
This runs the full pipeline: compute_replicate for every replicate,
aggregate for every condition, then compare and plot.
HPC execution
For long trajectories or many replicates, submit as a SLURM job:
pixi run -e build polyzymd compare submit hydrogen_bonds \
-f comparison.yaml \
--partition aa100 \
--mem 8G \
--time 02:00:00
Performance note
The hydrogen bonds plugin is marked with execution_cost_hint = "high".
H-bond detection iterates over donor–acceptor pairs across all selected atoms
every frame. For large systems (50k+ atoms) with many frames, allocate at
least 8 GB memory and 1–2 hours per replicate.
See How To: Submit Analysis Jobs to a SLURM Cluster for the full HPC submission guide.
Understanding the Output
Per-summary metrics
Each summary produces these fields in the aggregated result:
Field |
Description |
|---|---|
|
Average number of H-bonds per frame, averaged across replicates |
|
Standard error of the mean across replicates |
|
Per-replicate mean values (for statistical tests) |
|
Mean number of unique donor-residue/acceptor-residue pairs per frame |
|
Standard error across replicates for the unique-pairs metric |
|
Fraction of frames that had at least one H-bond |
The primary comparison metric is mean_hbonds_per_frame per summary. In the
comparison output, this appears as mean_hbonds_<summary_name> (e.g.,
mean_hbonds_protein_polymer).
Residue pairs
Each summary reports the top residue pairs ranked by occupancy (fraction of frames where the pair has at least one H-bond). Two views are provided:
Directed pairs — Donor→Acceptor direction is preserved. Useful for understanding H-bond directionality (which residue donates).
Undirected pairs — Both directions are merged into one pair. Useful for identifying which residue pairs interact most frequently regardless of direction.
Each pair reports mean_occupancy, sem_occupancy, mean_events_per_frame,
and per-replicate occupancy values.
Composition entries
When composition is configured, the result includes per-partition-pair entries:
Field |
Description |
|---|---|
|
The partition pair (e.g., protein→polymer) |
|
Absolute count for this pair |
|
This pair’s share of all detected H-bonds |
Composition fractions help normalize for differences in donor/acceptor availability across conditions. If a polymer condition has more total H-bonds but the protein→protein fraction stays constant, the polymer is adding new interactions rather than disrupting existing ones.
Statistical comparison
The default comparison pipeline produces:
Pairwise t-tests between all condition pairs for each summary metric, with Benjamini–Hochberg FDR correction
Cohen’s d effect sizes for each pairwise comparison
ANOVA (when ≥3 conditions) as an omnibus test
Rankings of conditions by each metric
The higher_is_better flag is set to None for H-bond counts (neither
direction is universally better), and direction labels use “fewer H-bonds” /
“similar” / “more H-bonds”.
Interpreting the Plots
The plugin generates up to five plot types. All plots use the shared PolyzyMD theming system for consistent publication-quality output.
Summary comparison bar chart
File: hbond_summary_comparison.png
Faceted grouped bars with one subplot per summary and one bar per condition. Each summary gets its own y-axis scale, improving readability when summaries have very different magnitudes. Error bars show SEM across replicates.
Time series
File: hbond_timeseries_<summary>.png (one per summary)
Per-frame H-bond counts over time in physical units (ns). The mean across replicates is shown as a solid line with a ±1 SD shaded band (single-replicate runs show the trace only). Useful for detecting equilibration issues (early drift) or transient events (sudden spikes indicating a binding/unbinding event).
Top residue pairs occupancy
File: hbond_top_pairs_<summary>.png (one per summary)
Horizontal grouped bar chart showing top undirected residue pairs by mean occupancy. To focus on cross-condition comparison, only pairs present in at least two conditions are shown.
Composition absolute bars
File: hbond_composition_absolute.png
Stacked bar chart showing mean H-bonds/frame broken down by partition pair (e.g., protein→protein, protein→polymer, polymer→polymer). Each condition gets one stacked bar. Reveals how the total H-bond budget is distributed.
Composition fractional bars
File: hbond_composition_fraction.png
Same as above but normalized to fractions. The y-axis auto-scales: for disjoint
partitions it ranges 0–1, but when allow_overlapping_composition: true is set
and partitions overlap, stacked fractions may exceed 1.0. Useful for comparing
the relative H-bond distribution even when total counts differ between
conditions.
Scientific Considerations
Interpreting H-bond counts
Keep these caveats in mind when interpreting results:
Geometric criteria, not energetics. H-bond counts reflect MDAnalysis geometric criteria (distance and angle cutoffs), not interaction energies. A “hydrogen bond” by this definition may not correspond to a thermodynamically significant interaction.
System size matters. Cross-condition comparisons of raw H-bond counts should account for differences in system size (number of donors/acceptors). The composition fractional view helps normalize for this.
Composition fractions normalize availability. If condition A has 200 polymer atoms and condition B has 400, condition B will likely show more protein–polymer H-bonds in absolute terms. Composition fractions reveal whether the proportion also changes.
Residue-pair rankings are noisy. Individual residue-pair occupancies can vary substantially across replicates. Focus on the top 3–5 pairs and cross-reference with structural knowledge. The
top_n_pairssetting controls how many pairs are reported (default: 15).Statistical correction. Pairwise comparisons use FDR-corrected (Benjamini–Hochberg) p-values. When comparing many summaries across many conditions, check adjusted p-values rather than raw values.
Full Example: comparison.yaml
Here is a complete comparison.yaml for a CALB enzyme study with three
polymer conditions:
name: "calb_hbond_study"
description: "Hydrogen bond analysis for CALB with polymer conjugates"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../noPoly_CALB_pNPB/config.yaml"
replicates: [1, 2, 3]
- label: "SBMA-100"
config: "../SBMA_100_CALB_pNPB/config.yaml"
replicates: [1, 2, 3]
- label: "EGMA-100"
config: "../EGMA_100_CALB_pNPB/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
plugins:
hydrogen_bonds:
distance_cutoff: 3.0
angle_cutoff: 150
groups:
protein_all: "protein"
substrate: "resname pNB"
polymer_all: "chainid C"
summaries:
protein_polymer:
between: [protein_all, polymer_all]
protein_substrate:
between: [protein_all, substrate]
protein_internal:
within: protein_all
polymer_internal:
within: polymer_all
composition:
partitions:
protein: "protein"
substrate: "resname pNB"
polymer: "chainid C"
plot_settings:
format: "png"
dpi: 300
style: "publication"
Aliases
The plugin responds to these CLI names:
hydrogen_bonds(canonical)hbondshbond
# These are equivalent:
polyzymd compare run hydrogen_bonds
polyzymd compare run hbonds
polyzymd compare run hbond
See Also
How to Compare Simulation Conditions — Setting up
comparison.yamlComparison and Plotting Reference — Plugin listing and statistical terms
How To: Submit Analysis Jobs to a SLURM Cluster — Submitting analysis jobs to SLURM
Statistics Best Practices for MD Analysis — Autocorrelation and uncertainty
Extending the Analysis Framework — Writing your own analysis plugin