# How-To: Analyze Hydrogen Bonds Across Conditions This guide shows you how to configure, run, and interpret the `hydrogen_bonds` analysis plugin. By the end, you will have a working hydrogen-bond comparison across simulation conditions, with grouped summaries, composition breakdowns, and publication-ready figures. :::{admonition} Environment Setup :class: tip All commands below assume you have activated the PolyzyMD pixi environment: ```bash pixi shell -e build ``` Alternatively, prefix each command with `pixi run -e build`. ::: ## When to Use This Plugin Use `hydrogen_bonds` when you need to answer questions like: - Does the polymer form more H-bonds with the protein than the substrate does? - Are protein internal H-bonds disrupted by polymer conjugation? - Which specific residue pairs form the most persistent H-bonds with the polymer? - How is the total H-bond budget distributed across protein, substrate, and polymer? The plugin wraps MDAnalysis `HydrogenBondAnalysis` with a flexible **groups + summaries** model: you define named atom groups, then specify which pairs or self-interactions to summarize. This lets you answer multiple questions in a single analysis run. ## Before You Start You need: - A working pixi environment (`pixi install -e build`) - Completed simulation trajectories for at least two conditions - A `comparison.yaml` defining your conditions (see {doc}`analysis_compare_conditions`) - Familiarity with the PolyzyMD chain convention (A=protein, B=substrate, C=polymer) ## Quick Start Add a `hydrogen_bonds` section to the `plugins:` block in your `comparison.yaml`: ```yaml plugins: hydrogen_bonds: groups: protein_all: "protein" polymer_all: "chainid C" summaries: protein_polymer: between: [protein_all, polymer_all] ``` Run the analysis: ```bash pixi run -e build polyzymd compare run hydrogen_bonds ``` That is all you need for a minimal protein–polymer H-bond comparison. The sections below explain every setting and show more advanced configurations. ## Configuration Reference All settings live under `plugins: hydrogen_bonds:` in `comparison.yaml`. Every field has a sensible default, so you only need to specify what you want to change. ### Geometric Criteria | Field | Type | Default | Description | |-------|------|---------|-------------| | `distance_cutoff` | float | `3.0` | Donor–acceptor distance cutoff in Å | | `angle_cutoff` | float | `150` | D–H···A angle cutoff in degrees | These match the standard MDAnalysis `HydrogenBondAnalysis` defaults. The distance is measured between the heavy-atom donor and the acceptor (not the hydrogen). The angle is measured at the hydrogen: D–H···A. :::{admonition} Topology and hydrogen caveats :class: important The plugin passes your configured group union to MDAnalysis as the donor and acceptor selection, and uses `() and element H` for hydrogens. This means your topology must contain explicit hydrogen atoms and usable element metadata for hydrogen selection. Topologies without hydrogens, with missing or incorrect element records, or with coarse-grained beads will undercount or return no H-bonds. The current public settings do not expose separate donor, hydrogen, and acceptor selection strings. ::: ### Groups ```yaml groups: protein_all: "protein" substrate: "resname pNB" polymer_all: "chainid C" ``` Each entry maps a **name** (used in summaries) to an **MDAnalysis selection string**. Groups are resolved once at the start of the analysis. You can use any valid MDAnalysis selection syntax including `protein`, `chainid`, `resname`, `resid`, boolean operators, and parentheses. :::{admonition} Group overlaps :class: warning If two groups share atoms (e.g., `"protein"` and `"protein and resid 106"`), the plugin logs a warning and H-bonds in the overlap region may be counted in multiple summaries. For clean composition accounting, use disjoint groups. ::: ### Summaries Summaries define **what to measure**. Each summary has a unique name and exactly one mode: - **`between: [group_a, group_b]`** — Count H-bonds where one partner is in group A and the other in group B (inter-group). - **`within: group_name`** — Count H-bonds where both partners are in the same group (intra-group). ```yaml summaries: protein_polymer: between: [protein_all, polymer_all] protein_internal: within: protein_all ``` Each summary produces an independent metric (`mean_hbonds_`) in the comparison output. You can define as many summaries as you need — the plugin runs one global MDAnalysis H-bond search and post-classifies events into summaries. ### Composition Composition analysis breaks down the **total** H-bond budget by donor→acceptor partition pairs. This is independent of summaries and uses its own set of disjoint selections. ```yaml composition: partitions: protein: "protein" substrate: "resname pNB" polymer: "chainid C" ``` The plugin counts every detected H-bond event, classifies donor and acceptor atoms into partitions, and reports both absolute counts (mean H-bonds/frame) and fractional shares per partition pair (e.g., protein→polymer, protein→protein, polymer→substrate). Composition is opt-in. If `composition` is omitted, no composition results are computed. :::{admonition} Partition disjointness :class: important Composition partitions should be disjoint (no shared atoms). If partitions overlap, the plugin raises an error by default. Set `allow_overlapping_composition: true` to allow overlap with warnings (overlapping atoms are counted in both partitions, so fractions may exceed 1.0). ::: ### Other Settings | Field | Type | Default | Description | |-------|------|---------|-------------| | `update_selections` | bool | `true` | Re-evaluate simple atom-group selections each frame. Coordinate-dependent group selections such as `around`, `sphzone`, and `cyzone` are not supported when this is `true`; use static groups or separate preprocessed groups instead. | | `top_n_pairs` | int | `15` | Number of top residue pairs to report per summary | | `allow_empty_groups` | bool | `true` | If `true` (default), warn and skip summaries when a referenced group selection matches no atoms. Set `false` for strict validation (raise `ValueError`). | | `allow_overlapping_composition` | bool | `false` | If `false`, overlapping composition partitions raise an error. Set `true` to allow overlap with warnings. | | `timestep_ps` | float or null | `null` | Manual frame spacing in ps for time-axis plots. If null, read from trajectory metadata. | If you want composition partitions to mirror group selections, define them explicitly in YAML with the same keys/selections. :::{admonition} Technical Detail :class: note `update_selections: true` makes MDAnalysis re-evaluate simple atom-group selections every frame. This is appropriate for structural selections such as `protein`, `chainid`, `resname`, and `resid`. Coordinate-dependent group selections such as `around`, `sphzone`, and `cyzone` are rejected by source validation when `update_selections: true` because the plugin resolves group membership for donor/acceptor post-classification from initial frame index sets. Use static groups, or create separate preprocessed groups before running the analysis. However, the plugin resolves group membership used for donor/acceptor pair-classification once at the start from the initial frame index sets. This means pair-tracking classification does not change if atoms move between groups later in the trajectory. This behavior is deliberate for performance and for consistent summary definitions across frames. ::: :::{admonition} When to set update_selections to false :class: note If all your groups use structural selections (`chainid`, `resname`, `resid`, `protein`), setting `update_selections: false` is safe and faster. Keep coordinate-dependent selections out of `groups` when `update_selections: true`; use static groups or separate preprocessed groups for distance-based membership. ::: ## Common Recipes ### Protein ↔ polymer H-bonds (minimal) ```yaml plugins: hydrogen_bonds: groups: protein_all: "protein" polymer_all: "chainid C" summaries: protein_polymer: between: [protein_all, polymer_all] ``` ### Standard 4-summary configuration with composition This is the recommended starting point for a typical enzyme–polymer–substrate system. It covers the four most common interactions and adds a composition breakdown. ```yaml plugins: hydrogen_bonds: distance_cutoff: 3.0 angle_cutoff: 150 groups: protein_all: "protein" substrate: "resname pNB" polymer_all: "chainid C" summaries: protein_polymer: between: [protein_all, polymer_all] protein_substrate: between: [protein_all, substrate] protein_internal: within: protein_all polymer_internal: within: polymer_all composition: partitions: protein: "protein" substrate: "resname pNB" polymer: "chainid C" ``` ### Monomer-resolved polymer analysis If your polymer contains distinct monomer types (e.g., SBMA and EGMA), you can define separate groups to compare their H-bonding behavior: ```yaml plugins: hydrogen_bonds: groups: protein_all: "protein" substrate: "resname pNB" polymer_all: "chainid C" polymer_sbm: "chainid C and resname SBM" polymer_egm: "chainid C and resname EGM" active_site: "protein and (resid 106 or resid 225 or resid 188)" summaries: protein_polymer: between: [protein_all, polymer_all] protein_sbm: between: [protein_all, polymer_sbm] protein_egm: between: [protein_all, polymer_egm] active_site_polymer: between: [active_site, polymer_all] protein_internal: within: protein_all polymer_internal: within: polymer_all composition: partitions: protein: "protein" substrate: "resname pNB" polymer_sbm: "chainid C and resname SBM" polymer_egm: "chainid C and resname EGM" ``` ### Protein ↔ substrate only For systems without a polymer (or for a control condition comparison): ```yaml plugins: hydrogen_bonds: groups: protein_all: "protein" substrate: "resname pNB" summaries: protein_substrate: between: [protein_all, substrate] ``` ### Active-site focused analysis To focus on H-bonds near the catalytic site: ```yaml plugins: hydrogen_bonds: groups: active_site: "protein and (resid 106 or resid 225 or resid 188)" polymer_all: "chainid C" substrate: "resname pNB" summaries: active_site_polymer: between: [active_site, polymer_all] active_site_substrate: between: [active_site, substrate] ``` ## Running the Analysis ### Local execution ```bash pixi run -e build polyzymd compare run hydrogen_bonds ``` This runs the full pipeline: the per-replicate compute stage for every replicate, `aggregate` for every condition, then `compare` and `plot`. ### HPC execution For long trajectories or many replicates, submit as a SLURM job: ```bash pixi run -e build polyzymd compare submit hydrogen_bonds \ -f comparison.yaml \ --partition aa100 \ --mem 8G \ --time 02:00:00 ``` :::{admonition} Performance note :class: note The hydrogen bonds plugin is marked with `execution_cost_hint = "high"`. H-bond detection iterates over donor–acceptor pairs across all selected atoms every frame. For large systems (50k+ atoms) with many frames, allocate at least 8 GB memory and 1–2 hours per replicate. ::: See {doc}`hpc_execution` for the full HPC submission guide. ## Understanding the Output ### Per-summary metrics Each summary produces these fields in the aggregated result: | Field | Description | |-------|-------------| | `mean_hbonds_per_frame` | Average number of H-bonds per frame, averaged across replicates | | `sem_hbonds_per_frame` | Standard error of the mean across replicates | | `per_replicate_mean_hbonds` | Per-replicate mean values (for statistical tests) | | `mean_unique_pairs_per_frame` | Mean number of unique donor-residue/acceptor-residue pairs per frame | | `sem_unique_pairs_per_frame` | Standard error across replicates for the unique-pairs metric | | `mean_fraction_with_any` | Fraction of frames that had at least one H-bond | The primary comparison metric is `mean_hbonds_per_frame` per summary. In the comparison output, this appears as `mean_hbonds_` (e.g., `mean_hbonds_protein_polymer`). ### Residue pairs Each summary reports the top residue pairs ranked by **occupancy** (fraction of frames where the pair has at least one H-bond). Two views are provided: - **Directed pairs** — Donor→Acceptor direction is preserved. Useful for understanding H-bond directionality (which residue donates). - **Undirected pairs** — Both directions are merged into one pair. Useful for identifying which residue *pairs* interact most frequently regardless of direction. Each pair reports `mean_occupancy`, `sem_occupancy`, `mean_events_per_frame`, and per-replicate occupancy values. ### Composition entries When composition is configured, the result includes per-partition-pair entries: | Field | Description | |-------|-------------| | `donor_partition → acceptor_partition` | The partition pair (e.g., protein→polymer) | | `mean_hbonds_per_frame` | Absolute count for this pair | | `mean_fraction_of_total` | This pair's share of all detected H-bonds | Composition fractions help normalize for differences in donor/acceptor availability across conditions. If a polymer condition has more total H-bonds but the protein→protein fraction stays constant, the polymer is adding new interactions rather than disrupting existing ones. ### Statistical comparison The default comparison pipeline produces: - **Pairwise t-tests** between all condition pairs for each summary metric, with Benjamini–Hochberg FDR correction - **Cohen's d** effect sizes for each pairwise comparison - **ANOVA** (when ≥3 conditions) as an omnibus test - **Rankings** of conditions by each metric The `higher_is_better` flag is set to `None` for H-bond counts (neither direction is universally better), and direction labels use "fewer H-bonds" / "similar" / "more H-bonds". ## Interpreting the Plots The plugin generates up to five plot types. All plots use the shared PolyzyMD theming system for consistent publication-quality output. ### Summary comparison bar chart **File:** `hbond_summary_comparison.png` Faceted grouped bars with one subplot per summary and one bar per condition. Each summary gets its own y-axis scale, improving readability when summaries have very different magnitudes. Error bars show SEM across replicates. ### Time series **File:** `hbond_timeseries_.png` (one per summary) Per-frame H-bond counts over time in physical units (ns). The mean across replicates is shown as a solid line with a ±1 SD shaded band (single-replicate runs show the trace only). Useful for detecting equilibration issues (early drift) or transient events (sudden spikes indicating a binding/unbinding event). ### Top residue pairs occupancy **File:** `hbond_top_pairs_.png` (one per summary) Horizontal grouped bar chart showing top undirected residue pairs by mean occupancy. To focus on cross-condition comparison, only pairs present in at least two conditions are shown. ### Composition absolute bars **File:** `hbond_composition_absolute.png` Stacked bar chart showing mean H-bonds/frame broken down by partition pair (e.g., protein→protein, protein→polymer, polymer→polymer). Each condition gets one stacked bar. Reveals how the total H-bond budget is distributed. ### Composition fractional bars **File:** `hbond_composition_fraction.png` Same as above but normalized to fractions. The y-axis auto-scales: for disjoint partitions it ranges 0–1, but when `allow_overlapping_composition: true` is set and partitions overlap, stacked fractions may exceed 1.0. Useful for comparing the *relative* H-bond distribution even when total counts differ between conditions. ## Scientific Considerations :::{admonition} Interpreting H-bond counts :class: warning Keep these caveats in mind when interpreting results: 1. **Geometric criteria, not energetics.** H-bond counts reflect MDAnalysis geometric criteria (distance and angle cutoffs), not interaction energies. A "hydrogen bond" by this definition may not correspond to a thermodynamically significant interaction. 2. **System size matters.** Cross-condition comparisons of raw H-bond counts should account for differences in system size (number of donors/acceptors). The composition fractional view helps normalize for this. 3. **Composition fractions normalize availability.** If condition A has 200 polymer atoms and condition B has 400, condition B will likely show more protein–polymer H-bonds in absolute terms. Composition fractions reveal whether the *proportion* also changes. 4. **Residue-pair rankings are noisy.** Individual residue-pair occupancies can vary substantially across replicates. Focus on the top 3–5 pairs and cross-reference with structural knowledge. The `top_n_pairs` setting controls how many pairs are reported (default: 15). 5. **Statistical correction.** Pairwise comparisons use FDR-corrected (Benjamini–Hochberg) p-values. When comparing many summaries across many conditions, check adjusted p-values rather than raw values. ::: ## Full Example: comparison.yaml Here is a complete `comparison.yaml` for a CALB enzyme study with three polymer conditions: ```yaml name: "calb_hbond_study" description: "Hydrogen bond analysis for CALB with polymer conjugates" control: "No Polymer" conditions: - label: "No Polymer" config: "../noPoly_CALB_pNPB/config.yaml" replicates: [1, 2, 3] - label: "SBMA-100" config: "../SBMA_100_CALB_pNPB/config.yaml" replicates: [1, 2, 3] - label: "EGMA-100" config: "../EGMA_100_CALB_pNPB/config.yaml" replicates: [1, 2, 3] defaults: equilibration_time: "10ns" plugins: hydrogen_bonds: distance_cutoff: 3.0 angle_cutoff: 150 groups: protein_all: "protein" substrate: "resname pNB" polymer_all: "chainid C" summaries: protein_polymer: between: [protein_all, polymer_all] protein_substrate: between: [protein_all, substrate] protein_internal: within: protein_all polymer_internal: within: polymer_all composition: partitions: protein: "protein" substrate: "resname pNB" polymer: "chainid C" plot_settings: format: "png" dpi: 300 style: "compact" ``` ## CLI name Run the plugin with its canonical CLI name: - `hydrogen_bonds` ```bash polyzymd compare run hydrogen_bonds ``` ## See Also - {doc}`analysis_compare_conditions` — Setting up `comparison.yaml` - {doc}`../reference/analysis_comparison_reference` — Plugin listing and statistical terms - {doc}`hpc_execution` — Submitting analysis jobs to SLURM - {doc}`../explanation/analysis_statistics_best_practices` — Autocorrelation and uncertainty - {doc}`../contributor_guide/extending_analyses` — Writing your own analysis plugin