Binding Free Energy Analysis
Quantify how much each polymer type preferentially contacts specific protein residue groups in physically meaningful energy units (kcal/mol or kJ/mol).
This analysis answers the experimentalist’s core question: “Which polymer chemistry has the most favorable interaction with the enzyme surface—and by how much?”
Note
Binding free energy analysis is a post-processing step. It requires:
Completed contacts analysis with
compute_binding_preference: trueCached binding preference files on disk (produced automatically by
polyzymd compare contacts)
Warning
Binding free energy is currently labeled experimental in the presentation release. The command and plots remain available, but PolyzyMD now adds explicit experimental warnings because the scientific framing and interpretation may still change.
Scientific Motivation
From Enrichment to Free Energy
Binding preference analysis reports a dimensionless enrichment:
Enrichment is intuitive (+0.5 means 50% more contacts than random) but has no units. It cannot answer: “Is a +0.5 enrichment for aromatic residues thermodynamically significant?”
Binding free energy converts enrichment into selectivity free energy (ΔG\(_{\mathrm{sel}}\)) via Boltzmann inversion. Because polyzymd simulations run in the NPT ensemble, the correct thermodynamic potential is the Gibbs free energy.
The Core Formula
Equivalently, since contact_share / expected_share = enrichment + 1:
ΔG_sel |
Meaning |
|---|---|
< 0 |
Preferential binding — the polymer favors this residue group |
= 0 |
Neutral — contact frequency matches random |
> 0 |
Avoidance — the polymer contacts this group less than expected |
Relation to Enrichment
ΔG\(_{\mathrm{sel}}\) is the exact Boltzmann-inverted version of enrichment. The enrichment score is a first-order Taylor expansion (without units); ΔG\(_{\mathrm{sel}}\) is exact and carries physical units.
The relationship is:
Uncertainty (Delta Method)
Uncertainty is propagated analytically via the delta method:
where σ_cs is the SEM of the contact share across replicates, and
σ_es ≈ 0 (expected share is computed from a single PDB SASA calculation).
Temperature and Comparability
ΔG\(_{\mathrm{sel}}\) computed at temperature T is not directly comparable to ΔG\(_{\mathrm{sel}}\) at temperature T’ — the k_BT scale factor differs. When conditions span multiple simulation temperatures, pairwise statistics are automatically suppressed for cross-temperature pairs.
Quick Start
Step 1: Enable Binding Preference in contacts analysis
Binding free energy is a post-processor — it reads the files produced by contacts analysis. Ensure binding preference is enabled when you run contacts:
# comparison.yaml
name: "SBMA vs EGMA Free Energy Study"
control: "No Polymer"
structures:
enzyme_pdb: "structures/enzyme.pdb"
conditions:
- label: "No Polymer"
config: "../no_polymer/config.yaml"
replicates: [1, 2, 3]
- label: "100% SBMA"
config: "../sbma_100/config.yaml"
replicates: [1, 2, 3]
- label: "100% EGMA"
config: "../egma_100/config.yaml"
replicates: [1, 2, 3]
analysis_settings:
contacts:
name: "polymer_contacts"
polymer_selection: "resname SBM EGM"
protein_selection: "protein"
cutoff: 4.5
compute_binding_preference: true
surface_exposure_threshold: 0.2
enzyme_pdb_for_sasa: "structures/enzyme.pdb"
include_default_aa_groups: true
# Binding free energy: reads cached binding preference data
binding_free_energy:
units: "kcal/mol" # or "kJ/mol"
surface_exposure_threshold: 0.2
comparison_settings:
binding_free_energy:
fdr_alpha: 0.05
Step 2: Run contacts analysis first (if not already done)
polyzymd compare contacts -f comparison.yaml
This generates the cached binding preference files under each condition’s
analysis/contacts/ directory.
Step 3: Run binding free energy analysis
polyzymd compare binding-free-energy -f comparison.yaml
Example Output
Binding Free Energy Comparison: SBMA vs EGMA Free Energy Study
================================================================================
Formula : ΔG_sel = -k_B·T · ln(contact_share / expected_share)
Units : kcal/mol
Equilibration: 10ns
ΔG_sel Summary by Condition (sign: negative = preferential binding)
--------------------------------------------------------------------------------
100% SBMA (T = 300.0 K, n = 3)
Polymer AA Group ΔG_sel ±σ N_rep
---------------------------------------------------------------
SBM aromatic -0.243 +0.031 3
SBM charged_negative +0.091 +0.022 3
SBM charged_positive -0.013 +0.015 3
SBM nonpolar +0.044 +0.019 3
SBM polar -0.001 +0.001 3
100% EGMA (T = 300.0 K, n = 3)
Polymer AA Group ΔG_sel ±σ N_rep
---------------------------------------------------------------
EGM aromatic -0.523 +0.034 3
EGM charged_negative +0.319 +0.028 3
EGM charged_positive +0.112 +0.049 3
EGM nonpolar -0.166 +0.021 3
EGM polar +0.146 +0.010 3
Pairwise ΔΔG Differences (ΔG_sel,B − ΔG_sel,A)
--------------------------------------------------------------------------------
100% SBMA → 100% EGMA
Polymer AA Group ΔG_sel,A ΔG_sel,B ΔΔG_B−A p-value
-----------------------------------------------------------------------------
EGM aromatic N/A -0.523 N/A --
SBM aromatic -0.243 N/A N/A --
Interpreting the Numbers
From the example:
EGMA (EGM) aromatic: ΔG\(_{\mathrm{sel}}\) = −0.52 kcal/mol — strong preferential contact with aromatic residues (about half a kcal/mol more favorable than random). This is equivalent to the enrichment score of +0.90 from binding preference analysis.
EGMA charged_negative: ΔG\(_{\mathrm{sel}}\) = +0.32 kcal/mol — avoidance of charged negative residues.
SBMA (SBM) charged_positive: ΔG\(_{\mathrm{sel}}\) = −0.01 kcal/mol — near-neutral, consistent with SBMA’s balanced zwitterionic character.
Tip
A rule of thumb from protein biophysics: differences of |ΔG\(_{\mathrm{sel}}\)| > 0.5 kcal/mol are generally considered thermodynamically significant at room temperature. Smaller values may still be real but require larger replicate sets to distinguish from noise.
Configuration Reference
analysis_settings.binding_free_energy
Field |
Type |
Default |
Description |
|---|---|---|---|
|
str |
|
Energy units: |
|
float |
|
Minimum relative SASA to consider a residue exposed (0.0–1.0) |
|
dict |
|
Custom named partitions (see below) |
comparison_settings.binding_free_energy
Field |
Type |
Default |
Description |
|---|---|---|---|
|
float |
|
Benjamini-Hochberg FDR threshold for pairwise t-tests |
Full YAML Example
analysis_settings:
binding_free_energy:
units: "kcal/mol"
surface_exposure_threshold: 0.2
# Optional: custom partitions (groups must be defined under contacts.protein_groups)
protein_partitions:
catalytic_regions:
- catalytic_triad
- oxyanion_hole
lid_helices:
- lid_helix_5
- lid_helix_10
comparison_settings:
binding_free_energy:
fdr_alpha: 0.05
CLI Reference
polyzymd compare binding-free-energy [OPTIONS]
Option |
Description |
|---|---|
|
Path to |
|
Override energy units |
|
Override FDR alpha (default: 0.05) |
|
Force recompute (clears binding preference cache) |
|
Output format (default: |
|
Save output to file |
|
Suppress INFO messages |
|
Enable DEBUG logging |
|
Override equilibration time (e.g., |
Usage Examples
# Default: table output to console
polyzymd compare binding-free-energy -f comparison.yaml
# kJ/mol units
polyzymd compare binding-free-energy --units kJ/mol
# Save markdown report
polyzymd compare binding-free-energy --format markdown -o report.md
# JSON for downstream processing
polyzymd compare binding-free-energy --format json -o bfe_result.json
# Stricter p-value threshold
polyzymd compare binding-free-energy --fdr-alpha 0.01
# Verbose debugging
polyzymd compare binding-free-energy --debug
Custom Protein Partitions
You can analyze specific structural regions of interest alongside the default amino acid class groups. Define custom groups in the contacts settings, then group them into named partitions in the binding free energy settings.
analysis_settings:
contacts:
compute_binding_preference: true
include_default_aa_groups: true
protein_groups:
catalytic_triad: [77, 133, 156]
oxyanion_hole: [10, 77]
lid_helix_5: [141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
lid_helix_10: [281, 282, 283, 284, 285, 286, 287, 288, 289, 290]
binding_free_energy:
units: "kcal/mol"
protein_partitions:
# Must be mutually exclusive groups within each partition
catalytic_regions:
- catalytic_triad
- oxyanion_hole
lid_helices:
- lid_helix_5
- lid_helix_10
The pairwise and per-condition tables will include one row per (polymer_type, partition_element) pair for each custom partition.
Important
Partition groups must be mutually exclusive (no residue in two groups of the same partition). PolyzyMD validates this at config load time. See Binding Preference Analysis for a full explanation.
Multi-Temperature Studies
When conditions were simulated at different temperatures (e.g., a temperature stability study), ΔG\(_{\mathrm{sel}}\) values are reported per condition but pairwise statistics are suppressed between conditions at different temperatures:
Note: pairwise statistics suppressed for cross-temperature pairs.
Temperatures: 300.0 K (100% SBMA, 100% EGMA), 320.0 K (High-T SBMA)
Per-condition ΔG\(_{\mathrm{sel}}\) values are still shown — they are valid within each temperature group. Only the pairwise ΔΔG_B−A values and p-values are omitted for cross-temperature pairs, since k_BT differs.
Output File Location
When you use -o / --output, the formatted report is saved to the specified
path. A JSON copy of the result is also saved to the comparison’s results/
directory:
project/
└── comparison_results/
└── binding_free_energy/
└── binding_free_energy_YYYYMMDD_HHMMSS.json
The JSON file can be reloaded for downstream processing:
from polyzymd.compare.results.binding_free_energy import BindingFreeEnergyResult
result = BindingFreeEnergyResult.load("binding_free_energy_20260221_143000.json")
# Access per-condition data
for summary in result.conditions:
print(f"\n{summary.label} ({summary.temperature_K} K)")
for entry in summary.entries:
if entry.delta_G is not None:
print(
f" {entry.polymer_type} × {entry.protein_group}: "
f"ΔG_sel = {entry.delta_G:+.3f} ± {entry.delta_G_uncertainty:.3f} "
f"{result.units}"
)
# Access pairwise comparisons
for pair in result.pairwise_comparisons:
if not pair.cross_temperature and pair.p_value is not None:
print(
f"{pair.condition_a} → {pair.condition_b} | "
f"{pair.polymer_type} × {pair.protein_group}: "
f"ΔΔG_B−A = {pair.delta_delta_G:+.3f}, p = {pair.p_value:.4f}"
)
Statistical Notes
Pairwise t-tests
For each shared (polymer_type, protein_group) pair between two conditions at the same temperature, an independent two-sample t-test is performed on the per-replicate ΔG\(_{\mathrm{sel}}\) distributions. Requires at least 2 replicates per condition.
FDR Correction
P-values are corrected for multiple comparisons using the Benjamini-Hochberg false discovery rate procedure at the specified alpha (default 0.05). Adjusted p-values are stored in the JSON output but not currently shown in the table or markdown formatters.
Statistical Inefficiency and MetricType
Contact share is a mean-based metric (frame averages, not fluctuations). Its mean converges regardless of autocorrelation, though the SEM reported here is a naive cross-replicate SEM (not autocorrelation-corrected). With 3+ independent replicates this is generally conservative and appropriate.
See Statistics Best Practices for guidance on replicate counts and equilibration.
Worked Example: Comparing Polymer Stabilization
Scientific Question
“Does EGMA or SBMA show stronger preferential binding to the enzyme’s aromatic surface residues? Can we quantify the difference in kcal/mol?”
Setup
# comparison.yaml
name: "EGMA vs SBMA Aromatic Preference"
control: "100% SBMA"
conditions:
- label: "100% SBMA"
config: "../sbma_100/config.yaml"
replicates: [1, 2, 3, 4, 5]
- label: "100% EGMA"
config: "../egma_100/config.yaml"
replicates: [1, 2, 3, 4, 5]
analysis_settings:
contacts:
polymer_selection: "resname SBM EGM"
cutoff: 4.5
compute_binding_preference: true
surface_exposure_threshold: 0.2
enzyme_pdb_for_sasa: "structures/lipase.pdb"
include_default_aa_groups: true
binding_free_energy:
units: "kcal/mol"
comparison_settings:
binding_free_energy:
fdr_alpha: 0.05
Run
# Step 1: Generate binding preference data
polyzymd compare contacts -f comparison.yaml
# Step 2: Compute ΔG_sel
polyzymd compare binding-free-energy -f comparison.yaml --format markdown -o bfe_report.md
Interpretation Table
Observation |
Interpretation |
|---|---|
EGMA aromatic: ΔG_sel = −0.52 ± 0.03 kcal/mol |
EGMA has a significant thermodynamic preference for aromatic surface residues |
SBMA aromatic: ΔG_sel = −0.24 ± 0.03 kcal/mol |
SBMA also preferentially contacts aromatics, but less strongly |
ΔΔG = ΔG_sel,EGMA − ΔG_sel,SBMA = −0.28 kcal/mol, p < 0.01 |
The difference is statistically significant with 5 replicates |
SBMA charged groups: ΔG_sel ≈ 0 |
SBMA’s zwitterionic balance leads to near-neutral electrostatic preference |
Troubleshooting
“No ‘binding_free_energy’ in analysis_settings”
Ensure your comparison.yaml has a binding_free_energy: block under
analysis_settings:
analysis_settings:
contacts:
compute_binding_preference: true
# ... other contacts settings
binding_free_energy: # ← required
units: "kcal/mol"
“No binding preference data found for condition”
The binding preference cache files are missing. Run contacts analysis first:
polyzymd compare contacts -f comparison.yaml
If the files still aren’t found, check that compute_binding_preference: true
was set in the contacts analysis settings when you ran contacts.
“N/A” entries in the output
ΔG\(_{\mathrm{sel}}\) is undefined when contact_share = 0 (no contacts observed) or
expected_share = 0 (no exposed residues in that group). These are shown as
N/A and excluded from pairwise comparisons.
“Pairwise statistics suppressed for cross-temperature pairs”
Your conditions were simulated at different temperatures. ΔG\(_{\mathrm{sel}}\) at different temperatures is not directly comparable — k_BT differs. You can still compare ΔG\(_{\mathrm{sel}}\) within a temperature group, but not across groups. Consider re-running all conditions at the same temperature if cross-condition comparisons are needed.
kJ/mol vs kcal/mol
To convert: 1 kcal/mol = 4.184 kJ/mol. The thermal energy at 300 K is:
0.596 kcal/mol (k_BT)
2.494 kJ/mol (k_BT)
Values smaller than ~0.3 k_BT (0.18 kcal/mol or 0.75 kJ/mol) are likely within thermal noise.
See Also
Binding Preference Analysis — the prerequisite enrichment analysis
Contacts Analysis Quick Start — basic contacts
Statistics Best Practices — replicate planning
Comparing Conditions — multi-condition workflows
Extending Comparators — add custom comparators