# Binding Free Energy Analysis Quantify **how much each polymer type preferentially contacts specific protein residue groups** in physically meaningful energy units (kcal/mol or kJ/mol). This analysis answers the experimentalist's core question: *"Which polymer chemistry has the most favorable interaction with the enzyme surface—and by how much?"* ```{note} Binding free energy analysis is a **post-processing step**. It requires: 1. Completed contacts analysis with [`compute_binding_preference: true`](analysis_binding_preference.md) 2. Cached binding preference files on disk (produced automatically by `polyzymd compare run contacts`) ``` ```{warning} Binding free energy is currently labeled **experimental** in the presentation release. The command and plots remain available, but PolyzyMD now adds explicit experimental warnings because the scientific framing and interpretation may still change. ``` :::{admonition} Environment Setup :class: tip All commands below assume you have activated the PolyzyMD pixi environment: ```bash pixi shell -e build ``` Alternatively, prefix each command with `pixi run -e build`. ::: ## Scientific Motivation ### From Enrichment to Free Energy [Binding preference analysis](analysis_binding_preference.md) reports a dimensionless **enrichment**: ```{math} \text{Enrichment} = \frac{\text{contact\_share}}{\text{expected\_share}} - 1 ``` Enrichment is intuitive (+0.5 means 50% more contacts than random) but has no units. It cannot answer: *"Is a +0.5 enrichment for aromatic residues thermodynamically significant?"* Binding free energy converts enrichment into **selectivity free energy (ΔG$_{\mathrm{sel}}$)** via Boltzmann inversion. Because polyzymd simulations run in the NPT ensemble, the correct thermodynamic potential is the Gibbs free energy. ### The Core Formula ```{math} \Delta G_{\mathrm{sel}} = -k_B T \ln\!\left(\frac{\text{contact\_share}}{\text{expected\_share}}\right) ``` Equivalently, since `contact_share / expected_share = enrichment + 1`: ```{math} \Delta G_{\mathrm{sel}} = -k_B T \ln(\text{enrichment} + 1) ``` | ΔG_sel | Meaning | |--------|---------| | **< 0** | Preferential binding — the polymer favors this residue group | | **= 0** | Neutral — contact frequency matches random | | **> 0** | Avoidance — the polymer contacts this group less than expected | ### Relation to Enrichment ΔG$_{\mathrm{sel}}$ is the **exact** Boltzmann-inverted version of enrichment. The enrichment score is a first-order Taylor expansion (without units); ΔG$_{\mathrm{sel}}$ is exact and carries physical units. The relationship is: ```{math} \text{enrichment} = e^{-\Delta G_{\mathrm{sel}} / k_B T} - 1 ``` ### Uncertainty (Delta Method) Uncertainty is propagated analytically via the delta method: ```{math} \sigma(\Delta G_{\mathrm{sel}}) = k_B T \sqrt{\left(\frac{\sigma_\text{cs}}{\text{cs}}\right)^2 + \left(\frac{\sigma_\text{es}}{\text{es}}\right)^2} ``` where `σ_cs` is the SEM of the contact share across replicates, and `σ_es ≈ 0` (expected share is computed from a single PDB SASA calculation). ### Temperature and Comparability ΔG$_{\mathrm{sel}}$ computed at temperature *T* is **not directly comparable** to ΔG$_{\mathrm{sel}}$ at temperature *T'* — the k_BT scale factor differs. When conditions span multiple simulation temperatures, pairwise statistics are automatically suppressed for cross-temperature pairs. ## Quick Start ### Step 1: Enable Binding Preference in contacts analysis Binding free energy is a post-processor — it reads the files produced by contacts analysis. Ensure binding preference is enabled when you run contacts: ```yaml # comparison.yaml name: "SBMA vs EGMA Free Energy Study" control: "No Polymer" structures: enzyme_pdb: "structures/enzyme.pdb" conditions: - label: "No Polymer" config: "../no_polymer/config.yaml" replicates: [1, 2, 3] - label: "100% SBMA" config: "../sbma_100/config.yaml" replicates: [1, 2, 3] - label: "100% EGMA" config: "../egma_100/config.yaml" replicates: [1, 2, 3] plugins: contacts: name: "polymer_contacts" polymer_selection: "resname SBM EGM" protein_selection: "protein" cutoff: 4.5 compute_binding_preference: true surface_exposure_threshold: 0.2 enzyme_pdb_for_sasa: "structures/enzyme.pdb" include_default_aa_groups: true # Binding free energy: reads cached binding preference data binding_free_energy: units: "kcal/mol" # or "kJ/mol" surface_exposure_threshold: 0.2 fdr_alpha: 0.05 ``` ### Step 2: Run contacts analysis first (if not already done) ```bash polyzymd compare run contacts -f comparison.yaml ``` This generates the cached binding preference files under each condition's `analysis/contacts/` directory. ### Step 3: Run binding free energy analysis ```bash polyzymd compare run binding_free_energy -f comparison.yaml ``` ## Example Output ``` Binding Free Energy Comparison: SBMA vs EGMA Free Energy Study ================================================================================ Formula : ΔG_sel = -k_B·T · ln(contact_share / expected_share) Units : kcal/mol Equilibration: 10ns ΔG_sel Summary by Condition (sign: negative = preferential binding) -------------------------------------------------------------------------------- 100% SBMA (T = 300.0 K, n = 3) Polymer AA Group ΔG_sel ±σ N_rep --------------------------------------------------------------- SBM aromatic -0.243 +0.031 3 SBM charged_negative +0.091 +0.022 3 SBM charged_positive -0.013 +0.015 3 SBM nonpolar +0.044 +0.019 3 SBM polar -0.001 +0.001 3 100% EGMA (T = 300.0 K, n = 3) Polymer AA Group ΔG_sel ±σ N_rep --------------------------------------------------------------- EGM aromatic -0.523 +0.034 3 EGM charged_negative +0.319 +0.028 3 EGM charged_positive +0.112 +0.049 3 EGM nonpolar -0.166 +0.021 3 EGM polar +0.146 +0.010 3 Pairwise ΔΔG Differences (ΔG_sel,B − ΔG_sel,A) -------------------------------------------------------------------------------- 100% SBMA → 100% EGMA Polymer AA Group ΔG_sel,A ΔG_sel,B ΔΔG_B−A p-value p_adj BH sig -------------------------------------------------------------------------------------------- EGM aromatic N/A -0.523 N/A -- -- -- SBM aromatic -0.243 N/A N/A -- -- -- ``` ### Interpreting the Numbers From the example: - **EGMA (EGM) aromatic: ΔG$_{\mathrm{sel}}$ = −0.52 kcal/mol** — strong preferential contact with aromatic residues (about half a kcal/mol more favorable than random). This is equivalent to the enrichment score of +0.90 from binding preference analysis. - **EGMA charged_negative: ΔG$_{\mathrm{sel}}$ = +0.32 kcal/mol** — avoidance of charged negative residues. - **SBMA (SBM) charged_positive: ΔG$_{\mathrm{sel}}$ = −0.01 kcal/mol** — near-neutral, consistent with SBMA's balanced zwitterionic character. ```{tip} A rule of thumb from protein biophysics: differences of **|ΔG$_{\mathrm{sel}}$| > 0.5 kcal/mol** are generally considered thermodynamically significant at room temperature. Smaller values may still be real but require larger replicate sets to distinguish from noise. ``` ## Configuration Reference ### `plugins.binding_free_energy` | Field | Type | Default | Description | |-------|------|---------|-------------| | `units` | str | `"kcal/mol"` | Energy units: `"kcal/mol"` or `"kJ/mol"` | | `surface_exposure_threshold` | float | `0.2` | Minimum relative SASA to consider a residue exposed (0.0–1.0) | | `protein_partitions` | dict | `null` | Custom named partitions (see below) | `fdr_alpha` controls the Benjamini-Hochberg correction threshold applied to pairwise p-values within each temperature group. It is configured in the `plugins.binding_free_energy` block. ### Full YAML Example ```yaml plugins: binding_free_energy: units: "kcal/mol" surface_exposure_threshold: 0.2 # Optional: custom partitions (groups must be defined under contacts.protein_groups) protein_partitions: catalytic_regions: - catalytic_triad - oxyanion_hole lid_helices: - lid_helix_5 - lid_helix_10 fdr_alpha: 0.05 ``` ## CLI Reference ``` polyzymd compare run binding_free_energy [OPTIONS] ``` | Option | Description | |--------|-------------| | `-f, --file PATH` | Path to `comparison.yaml` (default: `comparison.yaml`) | | `--recompute` | Force recompute (clears binding preference cache) | | `--format [table\|markdown\|json]` | Output format (default: `table`) | | `-o, --output PATH` | Save output to file | | `-q, --quiet` | Suppress INFO messages | | `--debug` | Enable DEBUG logging | | `--eq-time TEXT` | Override equilibration time (e.g., `10ns`) | ### Usage Examples ```bash # Default: table output to console polyzymd compare run binding_free_energy -f comparison.yaml # Units are configured in plugins.binding_free_energy.units # Save markdown report polyzymd compare run binding_free_energy --format markdown -o report.md # JSON for downstream processing polyzymd compare run binding_free_energy --format json -o bfe_result.json # Tune units/fdr_alpha in comparison.yaml, then rerun # Verbose debugging polyzymd compare run binding_free_energy --debug ``` ## Custom Protein Partitions You can analyze specific structural regions of interest alongside the default amino acid class groups. Define custom groups in the contacts settings, then group them into named partitions in the binding free energy settings. ```yaml plugins: contacts: compute_binding_preference: true include_default_aa_groups: true protein_groups: catalytic_triad: [77, 133, 156] oxyanion_hole: [10, 77] lid_helix_5: [141, 142, 143, 144, 145, 146, 147, 148, 149, 150] lid_helix_10: [281, 282, 283, 284, 285, 286, 287, 288, 289, 290] binding_free_energy: units: "kcal/mol" protein_partitions: # Must be mutually exclusive groups within each partition catalytic_regions: - catalytic_triad - oxyanion_hole lid_helices: - lid_helix_5 - lid_helix_10 ``` The pairwise and per-condition tables will include one row per (polymer_type, partition_element) pair for each custom partition. ```{important} Partition groups must be **mutually exclusive** (no residue in two groups of the same partition). PolyzyMD validates this at config load time. See [Binding Preference Analysis](analysis_binding_preference.md#partition-based-architecture) for a full explanation. ``` ## Multi-Temperature Studies When conditions were simulated at different temperatures (e.g., a temperature stability study), ΔG$_{\mathrm{sel}}$ values are reported per condition but **pairwise statistics are suppressed** between conditions at different temperatures: ``` Note: pairwise statistics suppressed for cross-temperature pairs. Temperatures: 300.0 K (100% SBMA, 100% EGMA), 320.0 K (High-T SBMA) ``` Per-condition ΔG$_{\mathrm{sel}}$ values are still shown — they are valid within each temperature group. Only the pairwise ΔΔG_B−A values and p-values are omitted for cross-temperature pairs, since k_BT differs. ## Output File Location When you use `-o / --output`, the formatted report is saved to the specified path. The canonical JSON cache is saved to `comparison/binding_free_energy/result.json`: ``` project/ └── comparison/ └── binding_free_energy/ └── result.json ``` The JSON file can be reloaded for downstream processing: ```python import json from pathlib import Path result = json.loads( Path("comparison/binding_free_energy/result.json").read_text() ) # Access per-condition data for summary in result["conditions"]: print(f"\n{summary['label']} ({summary['temperature_K']} K)") for entry in summary["entries"]: if entry["delta_G"] is not None: unc = entry.get("delta_G_uncertainty") or 0.0 print( f" {entry['polymer_type']} × {entry['protein_group']}: " f"ΔG_sel = {entry['delta_G']:+.3f} ± {unc:.3f} " f"{result['units']}" ) # Access pairwise comparisons for pair in result["pairwise_comparisons"]: if not pair["cross_temperature"] and pair["p_value"] is not None: print( f"{pair['condition_a']} → {pair['condition_b']} | " f"{pair['polymer_type']} × {pair['protein_group']}: " f"ΔΔG_B−A = {pair['delta_delta_G']:+.3f}, " f"p = {pair['p_value']:.4f}" ) ``` ## Statistical Notes ### Pairwise t-tests For each shared (polymer_type, protein_group) pair between two conditions at the same temperature, an independent two-sample t-test is performed on the per-replicate ΔG$_{\mathrm{sel}}$ distributions. Requires **at least 2 replicates per condition**. ### FDR Correction P-values are corrected for multiple comparisons using the **Benjamini-Hochberg** (BH) false discovery rate procedure at the configured `fdr_alpha` (default 0.05). Same-temperature pairwise entries form one hypothesis family per temperature group. Both raw and BH-adjusted p-values are shown in the table and Markdown formatters, with significance determined by the adjusted value. ### Statistical Inefficiency and planned MetricType classification Contact share is a **mean-based metric** (frame averages, not fluctuations). Its mean converges regardless of autocorrelation, though the SEM reported here is a naive cross-replicate SEM (not autocorrelation-corrected). With 3+ independent replicates this is generally conservative and appropriate. The `MetricType` classification is a planned future enhancement and is not yet implemented in the current codebase. See [Statistics Best Practices](../explanation/analysis_statistics_best_practices.md) for guidance on replicate counts and equilibration. ## Worked Example: Comparing Polymer Stabilization ### Scientific Question *"Does EGMA or SBMA show stronger preferential binding to the enzyme's aromatic surface residues? Can we quantify the difference in kcal/mol?"* ### Setup ```yaml # comparison.yaml name: "EGMA vs SBMA Aromatic Preference" control: "100% SBMA" conditions: - label: "100% SBMA" config: "../sbma_100/config.yaml" replicates: [1, 2, 3, 4, 5] - label: "100% EGMA" config: "../egma_100/config.yaml" replicates: [1, 2, 3, 4, 5] plugins: contacts: polymer_selection: "resname SBM EGM" cutoff: 4.5 compute_binding_preference: true surface_exposure_threshold: 0.2 enzyme_pdb_for_sasa: "structures/lipase.pdb" include_default_aa_groups: true binding_free_energy: units: "kcal/mol" fdr_alpha: 0.05 ``` ### Run ```bash # Step 1: Generate binding preference data polyzymd compare run contacts -f comparison.yaml # Step 2: Compute ΔG_sel polyzymd compare run binding_free_energy -f comparison.yaml --format markdown -o bfe_report.md ``` ### Interpretation Table | Observation | Interpretation | |-------------|----------------| | EGMA aromatic: ΔG_sel = −0.52 ± 0.03 kcal/mol | EGMA has a significant thermodynamic preference for aromatic surface residues | | SBMA aromatic: ΔG_sel = −0.24 ± 0.03 kcal/mol | SBMA also preferentially contacts aromatics, but less strongly | | ΔΔG = ΔG_sel,EGMA − ΔG_sel,SBMA = −0.28 kcal/mol, p < 0.01 | The difference is statistically significant with 5 replicates | | SBMA charged groups: ΔG_sel ≈ 0 | SBMA's zwitterionic balance leads to near-neutral electrostatic preference | ## Troubleshooting ### "No 'binding_free_energy' plugin configured" Ensure your `comparison.yaml` has a `binding_free_energy:` block under `plugins`: ```yaml plugins: contacts: compute_binding_preference: true # ... other contacts settings binding_free_energy: # ← required units: "kcal/mol" ``` ### "No binding preference data found for condition" The binding preference cache files are missing. Run contacts analysis first: ```bash polyzymd compare run contacts -f comparison.yaml ``` If the files still aren't found, check that `compute_binding_preference: true` was set in the contacts analysis settings when you ran contacts. ### "N/A" entries in the output ΔG$_{\mathrm{sel}}$ is undefined when `contact_share = 0` (no contacts observed) or `expected_share = 0` (no exposed residues in that group). These are shown as `N/A` and excluded from pairwise comparisons. ### "Pairwise statistics suppressed for cross-temperature pairs" Your conditions were simulated at different temperatures. ΔG$_{\mathrm{sel}}$ at different temperatures is not directly comparable — k_BT differs. You can still compare ΔG$_{\mathrm{sel}}$ within a temperature group, but not across groups. Consider re-running all conditions at the same temperature if cross-condition comparisons are needed. ### kJ/mol vs kcal/mol To convert: 1 kcal/mol = 4.184 kJ/mol. The thermal energy at 300 K is: - 0.596 kcal/mol (k_BT) - 2.494 kJ/mol (k_BT) Values smaller than ~0.3 k_BT (0.18 kcal/mol or 0.75 kJ/mol) are likely within thermal noise. ## See Also - [Binding Preference Analysis](analysis_binding_preference.md) — the prerequisite enrichment analysis - [Contacts Analysis Quick Start](analysis_contacts_quickstart.md) — basic contacts - [Statistics Best Practices](../explanation/analysis_statistics_best_practices.md) — replicate planning - [Comparing Conditions](analysis_compare_conditions.md) — multi-condition workflows - [Extending the Analysis Framework](../contributor_guide/extending_analyses.md) — add a new plugin