Tutorial: Measure Surface Accessibility with the SASA Plugin
This tutorial walks you through using the SASA (Solvent Accessible Surface Area) analysis plugin to measure how polymer conjugation affects enzyme surface exposure. By the end, you will have configured a multi-run SASA analysis, executed it, and interpreted the comparison results.
What You Will Learn
How to configure multiple SASA “runs” with different target and context selections
How the target/context model lets you isolate polymer shielding effects
How to run SASA analysis locally and on HPC
How to read the comparison output and identify shielding signals in the plots
Prerequisites
Before starting, make sure you have:
A working pixi environment (
pixi install -e build)Completed simulation trajectories for at least two conditions
A
comparison.yamldefining your conditions (see How to Compare Simulation Conditions)Familiarity with the PolyzyMD chain convention (A=protein, B=substrate, C=polymer)
If you have not run a basic analysis yet, complete Tutorial: Run Your First Analysis first.
How the SASA Plugin Works
The SASA plugin computes Solvent Accessible Surface Area using the Shrake-Rupley algorithm via MDAnalysis. It reports per-frame total SASA and per-residue SASA profiles.
The key feature is the multi-run model. Instead of computing one number, you define multiple “runs” — each with its own target and context selections:
Selection |
What it controls |
|---|---|
|
Atoms whose SASA is reported |
|
Atoms that are considered as blocking surface during the calculation |
For example, setting target_selection: "protein" and
context_selection: "protein or chainID C" reports the protein’s SASA while
treating the polymer (chain C) as an obstruction. Comparing this to a run
where context_selection: "protein" (polymer ignored) tells you how much
surface area the polymer covers.
Step 1: Configure SASA Runs
Add a sasa section to the plugins: block in your comparison.yaml. Each
run defines one SASA calculation with its own target and context.
Here is a four-run configuration that fully characterizes polymer shielding:
plugins:
sasa:
runs:
- label: "protein_isolated"
target_selection: "protein"
context_selection: "protein"
- label: "protein_with_polymer"
target_selection: "protein"
context_selection: "protein or chainID C"
- label: "active_site_isolated"
target_selection: "protein and (resid 77 or resid 156 or resid 262)"
context_selection: "protein"
- label: "active_site_with_polymer"
target_selection: "protein and (resid 77 or resid 156 or resid 262)"
context_selection: "protein or chainID C"
probe_radius_nm: 0.14
n_sphere_points: 960
What each run tells you
Run |
Question it answers |
|---|---|
|
What is the total protein SASA when only protein atoms block solvent? (baseline) |
|
What is the protein SASA when the polymer can also block solvent? (shielded) |
|
How exposed is the active site without polymer effects? |
|
How exposed is the active site when the polymer is present? |
The difference between protein_isolated and protein_with_polymer is the
polymer’s shielding effect. A decrease in SASA indicates that the polymer
covers part of the protein surface.
Tip
For the “No Polymer” control condition, the protein_with_polymer run will
produce the same result as protein_isolated because there is no chain C. This
is expected and provides a useful internal consistency check.
SASA algorithm parameters
Parameter |
Default |
Description |
|---|---|---|
|
|
Shrake-Rupley probe radius in nm (standard water-sized probe) |
|
|
Number of points on each atom’s test sphere (higher = more accurate, slower) |
|
|
Frames processed per chunk (lower = less memory, slower) |
The defaults are suitable for most enzyme-polymer systems. Increase
n_sphere_points to 1500+ only if you need very high precision for
small SASA differences.
Step 2: The Full comparison.yaml
Here is a complete example for a CALB enzyme study:
name: "calb_sasa_study"
description: "SASA analysis for CALB with polymer conjugates"
control: "No Polymer"
conditions:
- label: "No Polymer"
config: "../noPoly_CALB_pNPB/config.yaml"
replicates: [1, 2, 3]
- label: "SBMA-100"
config: "../SBMA_100_CALB_pNPB/config.yaml"
replicates: [1, 2, 3]
- label: "EGMA-100"
config: "../EGMA_100_CALB_pNPB/config.yaml"
replicates: [1, 2, 3]
defaults:
equilibration_time: "10ns"
plugins:
sasa:
runs:
- label: "protein_isolated"
target_selection: "protein"
context_selection: "protein"
- label: "protein_with_polymer"
target_selection: "protein"
context_selection: "protein or chainID C"
- label: "active_site_isolated"
target_selection: "protein and (resid 77 or resid 156 or resid 262)"
context_selection: "protein"
- label: "active_site_with_polymer"
target_selection: "protein and (resid 77 or resid 156 or resid 262)"
context_selection: "protein or chainID C"
probe_radius_nm: 0.14
n_sphere_points: 960
plot_settings:
format: "png"
dpi: 300
style: "publication"
Step 3: Run Locally
For small systems or interactive exploration, run the SASA analysis locally:
pixi run -e build polyzymd compare run sasa \
-f comparison.yaml
This runs the full pipeline sequentially: compute_replicate for every
replicate, aggregate for every condition, then compare and plot. Expect
this to take several minutes per replicate depending on trajectory length and
system size.
Note
SASA computation is CPU-intensive and memory-intensive because the
Shrake-Rupley algorithm iterates over every atom in the context selection for
every frame. The chunk_size parameter controls how many frames are loaded
into memory at once.
Step 4: Run on HPC
For large systems or many replicates, submit the analysis as SLURM jobs:
pixi run -e build polyzymd compare submit sasa \
-f comparison.yaml \
--partition aa100 \
--mem 8G \
--time 02:00:00
This submits a DAG of SLURM jobs that process replicates in parallel. See How To: Submit Analysis Jobs to a SLURM Cluster for the full HPC guide, including dry runs, monitoring, and troubleshooting.
Tip
SASA jobs are marked with execution_cost_hint = "high" in the plugin.
Allocate at least 8 GB of memory and 1–2 hours of wall time per replicate
for systems with 50,000+ atoms.
Step 5: Interpret the Results
Comparison JSON
The comparison result is saved to
comparison/sasa/result.json. It contains:
Per-condition summaries with mean SASA ± SEM for each run
Pairwise comparisons between conditions for each run (t-test, Cohen’s d, percent change)
ANOVA results for each run when three or more conditions are compared
Rankings of conditions by SASA for each run
Direction labels:
"shielding"(SASA decreased >1%),"exposure"(SASA increased >1%), or"unchanged"
Here is a simplified excerpt showing how to read the key fields:
{
"run_labels": ["protein_isolated", "protein_with_polymer"],
"conditions": [
{
"label": "No Polymer",
"run_summaries": [
{
"label": "protein_isolated",
"mean_sasa": 12450.3,
"sem_sasa": 85.2
},
{
"label": "protein_with_polymer",
"mean_sasa": 12448.1,
"sem_sasa": 84.9
}
]
},
{
"label": "SBMA-100",
"run_summaries": [
{
"label": "protein_isolated",
"mean_sasa": 12380.5,
"sem_sasa": 91.0
},
{
"label": "protein_with_polymer",
"mean_sasa": 11200.7,
"sem_sasa": 102.3
}
]
}
],
"pairwise_comparisons": [
{
"run_label": "protein_with_polymer",
"condition_a": "No Polymer",
"condition_b": "SBMA-100",
"p_value": 0.003,
"cohens_d": -1.82,
"direction": "shielding",
"significant": true,
"percent_change": -10.0
}
]
}
How to read the results
Compare
protein_isolatedacross conditions. This measures intrinsic protein compactness differences (without polymer effects). Small differences here indicate that the protein folds similarly across conditions.Compare
protein_with_polymeracross conditions. The “No Polymer” condition serves as the baseline. Conditions with polymer show lower SASA in this run if the polymer shields the surface.Calculate the shielding effect. For each polymer condition, subtract
protein_with_polymerfromprotein_isolated. The larger the difference, the more surface the polymer covers.Check active site runs. If
active_site_with_polymeris significantly lower thanactive_site_isolatedfor a polymer condition, the polymer may be blocking substrate access — a concern for enzyme activity.
Plots
The SASA plugin generates three types of plots:
Plot |
File pattern |
What it shows |
|---|---|---|
Comparison bars |
|
Mean SASA ± SEM per condition, with replicate scatter points |
Time series |
|
Per-frame SASA traces overlaid for each condition |
Residue profiles |
|
Per-residue mean SASA across conditions |
The bar plots are the most informative for quick assessment. Look for
conditions where the protein_with_polymer bar is significantly lower than
the protein_isolated bar — this is the polymer shielding signal.
Common Configurations
Recipe collection (how-to mode)
The configurations below are task-oriented recipes rather than step-by-step tutorial content. Use them as starting points for your own SASA analysis.
Minimal: whole-protein SASA only
plugins:
sasa:
runs:
- label: "protein_total"
target_selection: "protein"
When context_selection is omitted, it defaults to match target_selection.
This measures the protein’s self-SASA without considering any other molecules.
Two-run: basic shielding comparison
plugins:
sasa:
runs:
- label: "protein_only"
target_selection: "protein"
context_selection: "protein"
- label: "protein_full_context"
target_selection: "protein"
context_selection: "all"
Using "all" as the context includes everything (protein, polymer, substrate,
solvent). This gives the “true” SASA but makes comparisons harder because
solvent box size differences between conditions can affect the result.
Monomer-specific shielding
If your polymer contains specific monomer types, you can test which monomer contributes more to shielding:
plugins:
sasa:
runs:
- label: "protein_isolated"
target_selection: "protein"
context_selection: "protein"
- label: "protein_with_sbma"
target_selection: "protein"
context_selection: "protein or resname SBMA"
- label: "protein_with_egma"
target_selection: "protein"
context_selection: "protein or resname EGMA"
Warning
Ensure your monomer residue names (resname SBMA, resname EGMA) match
the actual residue names in your topology. Check with:
pixi run -e build python -c "
import MDAnalysis as mda
u = mda.Universe('solvated_system.pdb')
print(set(u.select_atoms('chainID C').residues.resnames))
"
Active site focus with specific residues
plugins:
sasa:
runs:
- label: "catalytic_triad"
target_selection: "protein and (resid 77 or resid 156 or resid 262)"
context_selection: "protein or chainID C"
- label: "binding_pocket"
target_selection: "protein and (resid 77 or resid 156 or resid 262 or resid 80 or resid 155)"
context_selection: "protein or chainID C"
Stride for long trajectories
If your trajectory has many frames and SASA computation is slow, increase the stride to analyze every Nth frame:
plugins:
sasa:
runs:
- label: "protein_with_polymer"
target_selection: "protein"
context_selection: "protein or chainID C"
stride: 5
Note
A stride of 5 uses every 5th frame, reducing computation time by ~5x. The plugin still computes autocorrelation-corrected SEM on the subsampled data.
Result Models
Reference material
This section is reference-style content for plugin developers and advanced users who need to inspect the data models programmatically.
For plugin developers or advanced users, the SASA result hierarchy is:
Model |
Level |
Key fields |
|---|---|---|
|
Per-replicate, per-run |
|
|
Per-replicate (all runs) |
|
|
Per-condition, per-run |
|
|
Per-condition (all runs) |
|
|
Cross-condition |
|
Raw per-frame and per-residue SASA data is stored as NPZ sidecars alongside the JSON result files.
What You Have Now
After following this tutorial, you have:
a multi-run SASA configuration that isolates the polymer shielding effect
comparison results with per-run statistical tests across conditions
bar, time series, and residue-profile plots for visual assessment
the understanding to design custom SASA run configurations for your system
See Also
How To: Submit Analysis Jobs to a SLURM Cluster — Submitting analysis jobs to SLURM
How to Compare Simulation Conditions — Setting up comparison.yaml
Extending the Analysis Framework — Writing your own analysis plugin
Statistics Best Practices for MD Analysis — Autocorrelation and uncertainty