Catalytic Triad Analysis: Quick Start
Analyze catalytic triad geometry and integrity in under 5 minutes.
Note
Want to understand the statistics? This guide focuses on getting results quickly. For proper interpretation and statistical best practices, see the Best Practices Guide.
TL;DR
# Single replicate
polyzymd analyze triad -c comparison.yaml --condition "No Polymer" -r 1 --eq-time 100ns
# Multiple replicates (recommended)
polyzymd analyze triad -c comparison.yaml --condition "No Polymer" --eq-time 100ns
# All conditions in comparison.yaml
polyzymd analyze triad -c comparison.yaml --eq-time 100ns
# Force recompute (ignore cache)
polyzymd analyze triad -c comparison.yaml --condition "No Polymer" --recompute
Prerequisites
Before running catalytic triad analysis, you need:
Completed production simulation(s) - at least one replicate
A comparison.yaml file - with a
catalytic_triadsection defining your triadTrajectory files - in the scratch directory specified in your simulation configs
Setting Up comparison.yaml
The triad analysis uses a comparison.yaml file to define both your simulation
conditions and the catalytic triad geometry to analyze.
Basic Structure
name: "my_enzyme_study"
description: "Effect of polymer on enzyme stability"
conditions:
- label: "No Polymer"
config: "../projects/noPoly/config.yaml"
replicates: [1, 2, 3]
- label: "With Polymer"
config: "../projects/withPoly/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "100ns"
# Catalytic triad configuration
catalytic_triad:
name: "MyEnzyme_catalytic_triad"
description: "Ser-His-Asp catalytic triad"
threshold: 3.5 # Angstroms - H-bond distance cutoff
pairs:
- label: "Asp-His"
selection_a: "midpoint(protein and resid 133 and name OD1 OD2)"
selection_b: "protein and resid 156 and name ND1"
- label: "His-Ser"
selection_a: "protein and resid 156 and name NE2"
selection_b: "protein and resid 77 and name OG"
Example: LipA (Lipase A)
For the Ser-His-Asp catalytic triad of Bacillus subtilis Lipase A:
catalytic_triad:
name: "LipA_catalytic_triad"
description: "Ser-His-Asp catalytic triad of Lipase A (Bacillus subtilis)"
threshold: 3.5
pairs:
- label: "Asp133-His156"
selection_a: "midpoint(protein and resid 133 and name OD1 OD2)"
selection_b: "protein and resid 156 and name ND1"
- label: "His156-Ser77"
selection_a: "protein and resid 156 and name NE2"
selection_b: "protein and resid 77 and name OG"
Selection Syntax
PolyzyMD supports three types of atom selections:
Syntax |
Description |
Example |
|---|---|---|
Standard |
MDAnalysis selection |
|
|
Geometric midpoint of selected atoms |
|
|
Center of mass of selected atoms |
|
Warning
Chain-Aware Selections Required
Residue numbers restart at 1 for each chain in PolyzyMD systems. A selection like
resid 77 will match residues from all chains (protein, polymer, and water).
For protein residues, always use protein and resid X:
# INCORRECT - may include polymer/water atoms with same residue number
selection_a: "resid 77 and name OG"
# CORRECT - restricts to protein chain only
selection_a: "protein and resid 77 and name OG"
PolyzyMD will emit a runtime warning if your selection spans multiple chains, but it’s best to write correct selections from the start.
Tip
Use midpoint() for carboxylate groups (Asp, Glu) where either oxygen can
accept an H-bond. This gives a single representative point for the acceptor.
Basic Usage
For reproducible, version-controlled analysis, define triad analysis settings
in a comparison.yaml file (see Setting Up comparison.yaml):
# comparison.yaml
name: "my_enzyme_study"
conditions:
- label: "No Polymer"
config: "../projects/noPoly/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "100ns"
catalytic_triad:
name: "LipA_catalytic_triad"
description: "Ser-His-Asp catalytic triad"
threshold: 3.5
pairs:
- label: "Asp133-His156"
selection_a: "midpoint(protein and resid 133 and name OD1 OD2)"
selection_b: "protein and resid 156 and name ND1"
- label: "His156-Ser77"
selection_a: "protein and resid 156 and name NE2"
selection_b: "protein and resid 77 and name OG"
Then run with minimal CLI arguments:
# Analyze all conditions using settings from comparison.yaml
polyzymd analyze triad -c comparison.yaml
# Analyze specific condition
polyzymd analyze triad -c comparison.yaml --condition "No Polymer"
# Force recompute (ignore cache)
polyzymd analyze triad -c comparison.yaml --recompute
Benefits of YAML configuration:
All analysis parameters are version-controlled
Reproducible across team members and machines
Self-documenting experiment setup
Single Replicate
polyzymd analyze triad -c comparison.yaml --condition "No Polymer" -r 1 --eq-time 100ns
Expected output:
Loading comparison config from: comparison.yaml
Triad: LipA_catalytic_triad
Description: Ser-His-Asp catalytic triad of Lipase A (Bacillus subtilis)
Pairs: 2
- Asp133-His156
- His156-Ser77
Threshold: 3.5 A
Equilibration: 100ns
=== No Polymer ===
Replicates: [1]
Triad Analysis Complete
Asp133-His156: 2.91 A (93.2% below threshold)
His156-Ser77: 3.12 A (78.4% below threshold)
Simultaneous contact: 74.1%
(SEM: +/-8.2%)
Multiple Replicates (Recommended)
Omit the -r flag to analyze all replicates defined for that condition:
polyzymd analyze triad -c comparison.yaml --condition "No Polymer" --eq-time 100ns
Analyzing All Conditions
To analyze every condition in your comparison.yaml:
polyzymd analyze triad -c comparison.yaml --eq-time 100ns
This loops through all conditions and reports results for each.
Use the Python API for programmatic analysis and integration with custom workflows:
from polyzymd.compare.config import ComparisonConfig
from polyzymd.analysis.triad import CatalyticTriadAnalyzer
from polyzymd.config.loader import load_config
# Load comparison configuration
comp_config = ComparisonConfig.from_yaml("comparison.yaml")
# Get the first condition's simulation config
condition = comp_config.conditions[0] # "No Polymer"
sim_config = load_config(condition.config)
# Create analyzer
analyzer = CatalyticTriadAnalyzer(
config=sim_config,
triad_config=comp_config.catalytic_triad,
equilibration="100ns",
)
# Single replicate analysis
result = analyzer.compute(replicate=1)
print(f"Simultaneous contact: {result.simultaneous_contact_fraction * 100:.1f}%")
# Per-pair results
for pair in result.pair_results:
print(f" {pair.pair_label}: {pair.mean_distance:.2f} A "
f"({pair.fraction_below_threshold * 100:.1f}% below threshold)")
# Multi-replicate aggregation
agg_result = analyzer.compute_aggregated(replicates=[1, 2, 3])
print(f"\nAggregated: {agg_result.overall_simultaneous_contact * 100:.1f} "
f"+/- {agg_result.sem_simultaneous_contact * 100:.1f}%")
# Save results
result.save("triad_rep1.json")
agg_result.save("triad_aggregated.json")
When to use Python:
Integrating triad analysis into larger pipelines
Custom post-processing or visualization
Programmatic iteration over many conditions
Combining with other analysis modules
Understanding the Output
Field |
Meaning |
|---|---|
Per-pair distance |
Mean distance between the two selections |
% below threshold |
Fraction of frames where that pair is in contact |
Simultaneous contact |
Fraction where ALL pairs are in contact at once |
SEM |
Standard error (autocorrelation-corrected) |
The simultaneous contact fraction is the key metric - it tells you what percentage of simulation time the catalytic triad maintains proper geometry for catalysis.
Aggregated Output
When analyzing multiple replicates, you get aggregated statistics:
Triad Analysis Complete (Aggregated)
Asp133-His156: 3.09 +/- 0.21 A (88.7 +/- 5.2% below)
His156-Ser77: 4.03 +/- 1.07 A (65.3 +/- 18.4% below)
Simultaneous contact: 49.9 +/- 27.3%
Per-replicate:
Rep 1: 74.1%
Rep 2: 51.2%
Rep 3: 24.5%
The aggregated result shows:
Mean and SEM across replicates for each pair
Overall simultaneous contact with SEM
Per-replicate breakdown to assess variability
Comparing Conditions
To compare triad geometry across conditions (e.g., with vs. without polymer) with proper statistical analysis, use one of these approaches:
Create a comparison.yaml file with your conditions, then run the comparison
with a single command:
# comparison.yaml
name: "polymer_triad_study"
description: "Effect of polymer on catalytic triad integrity"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../no_polymer/config.yaml"
replicates: [1, 2, 3]
- label: "With Polymer"
config: "../with_polymer/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "100ns"
catalytic_triad:
name: "LipA_catalytic_triad"
threshold: 3.5
pairs:
- label: "Asp133-His156"
selection_a: "midpoint(protein and resid 133 and name OD1 OD2)"
selection_b: "protein and resid 156 and name ND1"
- label: "His156-Ser77"
selection_a: "protein and resid 156 and name NE2"
selection_b: "protein and resid 77 and name OG"
# Run comparison with automatic t-tests, effect sizes, and ranking
polyzymd compare triad -f comparison.yaml
# Output formats
polyzymd compare triad -f comparison.yaml --format markdown # For docs
polyzymd compare triad -f comparison.yaml --format json # Machine-readable
Output includes:
Simultaneous contact % ± SEM for each condition
% change relative to control
p-value (two-sample t-test)
Cohen’s d effect size
Ranking (highest contact = best triad integrity)
See Comparing Conditions for the full guide.
Run analysis on each condition separately, then use polyzymd compare show:
# Step 1: Analyze each condition
polyzymd analyze triad -c comparison.yaml --condition "No Polymer"
polyzymd analyze triad -c comparison.yaml --condition "With Polymer"
# Step 2: Run comparison (uses cached triad results)
polyzymd compare triad -f comparison.yaml
The comparison command automatically loads cached results if available, so you don’t recompute triad analysis.
Use TriadComparator for programmatic comparison with full statistical output:
from polyzymd.compare import ComparisonConfig, TriadComparator
# Load comparison configuration (must have catalytic_triad: section)
config = ComparisonConfig.from_yaml("comparison.yaml")
# Run comparison (computes triad analysis if not cached)
comparator = TriadComparator(config, equilibration="100ns")
result = comparator.compare()
# Access results
print(f"Ranking (best triad first): {result.ranking}")
for cond in result.conditions:
contact_pct = cond.mean_simultaneous_contact * 100
sem_pct = cond.sem_simultaneous_contact * 100
print(f"{cond.label}: {contact_pct:.1f} ± {sem_pct:.1f}%")
# Statistical comparisons
for comp in result.pairwise_comparisons:
sig = "*" if comp.significant else ""
print(f"{comp.condition_b} vs {comp.condition_a}: "
f"{comp.percent_change:+.1f}%, p={comp.p_value:.4f}{sig}, "
f"d={comp.cohens_d:.2f}")
# Save result for later
result.save("results/triad_comparison.json")
Example output:
Ranking (best triad first): ['With Polymer', 'No Polymer']
No Polymer: 49.9 ± 27.3%
With Polymer: 87.3 ± 2.2%
With Polymer vs No Polymer: +74.9%, p=0.0892, d=1.93
Tip
For proper statistical interpretation (understanding p-values with small N, effect sizes, ANOVA for 3+ conditions), see the Best Practices Guide.
Output Files
Results are saved in your project’s analysis directory:
<projects_directory>/
└── analysis/
└── triad/
├── run_1/
│ └── triad_LipA_catalytic_triad_eq100ns.json
├── run_2/
│ └── triad_LipA_catalytic_triad_eq100ns.json
├── run_3/
│ └── triad_LipA_catalytic_triad_eq100ns.json
└── aggregated/
└── triad_LipA_catalytic_triad_reps1-3_eq100ns.json
JSON Result Structure (Single Replicate)
{
"analysis_type": "catalytic_triad",
"triad_name": "LipA_catalytic_triad",
"triad_description": "Ser-His-Asp catalytic triad...",
"threshold": 3.5,
"simultaneous_contact_fraction": 0.741,
"n_frames_simultaneous": 1482,
"sim_contact_sem": 0.082,
"sim_contact_correlation_time": 8542.5,
"sim_contact_n_independent": 12,
"pair_results": [
{
"pair_label": "Asp133-His156",
"mean_distance": 2.91,
"std_distance": 0.45,
"fraction_below_threshold": 0.932,
...
},
...
],
"n_frames_used": 2000,
"n_frames_total": 3000,
...
}
JSON Result Structure (Aggregated)
{
"analysis_type": "catalytic_triad_aggregated",
"n_replicates": 3,
"replicates": [1, 2, 3],
"overall_simultaneous_contact": 0.499,
"sem_simultaneous_contact": 0.273,
"per_replicate_simultaneous": [0.741, 0.512, 0.245],
"pair_results": [
{
"pair_label": "Asp133-His156",
"overall_mean": 3.09,
"overall_sem": 0.21,
"per_replicate_means": [2.91, 3.05, 3.31],
...
},
...
],
...
}
Common Options
Option |
Default |
Description |
|---|---|---|
|
(required) |
Path to comparison.yaml |
|
(all) |
Specific condition label to analyze |
|
(from yaml) |
Override replicate specification |
|
(from yaml) |
Equilibration time to skip |
|
off |
Ignore cached results |
|
(auto) |
Custom output location |
Replicate Specification Formats
Format |
Meaning |
|---|---|
|
Single replicate |
|
Replicates 1 through 5 |
|
Specific replicates |
Equilibration Time
Always skip the equilibration period to ensure you’re analyzing equilibrated conformations:
# Skip first 100 ns
polyzymd analyze triad -c comparison.yaml --eq-time 100ns
# Skip first 10 ns (shorter simulations)
polyzymd analyze triad -c comparison.yaml --eq-time 10ns
Troubleshooting
“No catalytic_triad section found”
Cause: comparison.yaml doesn’t have a catalytic_triad section
Fix: Add the catalytic_triad section to your comparison.yaml. See the
Setting Up comparison.yaml section above.
Very High Distances (> 10 A)
Cause: Usually indicates wrong atom selections or residue numbering
Fix:
Check that residue numbers match your PDB file
Verify atom names (OD1/OD2 for Asp, ND1/NE2 for His, OG for Ser)
Load trajectory in VMD/PyMOL to visually verify selections
Very Low Simultaneous Contact (< 10%)
Cause: Could indicate:
Triad disruption (real result)
Wrong threshold
Wrong selections
Fix:
Check individual pair percentages - which pair is failing?
Try increasing threshold slightly (e.g., 4.0 A)
Visualize trajectory to confirm triad state
Selection Errors
“Selection ‘resid X and name Y’ returned 0 atoms”
Cause: Atom name or residue number doesn’t exist
Fix:
Check residue numbering in your topology
Verify atom names match force field conventions
For
midpoint(), ensure all atoms in the selection exist
Low Statistical Reliability Warning
Cause: Correlation time is long relative to simulation length
Fix: This is informational. Use multiple replicates for robust statistics. See Best Practices Guide.
Missing Replicate Warning
Message: Skipping replicate N: trajectory data not found
Cause: The requested replicate hasn’t been simulated yet or the path is incorrect
Fix: This is informational - analysis continues with available replicates. If this is unexpected, check that the simulation completed and paths are correct. See Handling Incomplete Data for details.
Next Steps
Understand the statistics: Best Practices Guide
Compare conditions: Comparing Conditions Guide
Analyze flexibility:
polyzymd analyze rmsf --help