Tutorial: Measure Surface Accessibility with the SASA Plugin

This tutorial walks you through using the SASA (Solvent Accessible Surface Area) analysis plugin to measure how polymer conjugation affects enzyme surface exposure. By the end, you will have configured a multi-run SASA analysis, executed it, and interpreted the comparison results.

What You Will Learn

How to configure multiple SASA “runs” with different target and context selections
How the target/context model lets you isolate polymer shielding effects
How to run SASA analysis locally and on HPC
How to read the comparison output and identify shielding signals in the plots

Prerequisites

Before starting, make sure you have:

A working pixi environment (pixi install -e build)
Completed simulation trajectories for at least two conditions
A comparison.yaml defining your conditions (see How to Compare Simulation Conditions)
Familiarity with the PolyzyMD chain convention (A=protein, B=substrate, C=polymer)

If you have not run a basic analysis yet, complete Tutorial: Run Your First Analysis first.

How the SASA Plugin Works

The SASA plugin computes Solvent Accessible Surface Area using the Shrake-Rupley algorithm via MDAnalysis. It reports per-frame total SASA and per-residue SASA profiles.

The key feature is the multi-run model. Instead of computing one number, you define multiple “runs” — each with its own target and context selections:

Selection	What it controls
`target_selection`	Atoms whose SASA is reported
`context_selection`	Atoms that are considered as blocking surface during the calculation

For example, setting target_selection: "protein" and context_selection: "protein or chainID C" reports the protein’s SASA while treating the polymer (chain C) as an obstruction. Comparing this to a run where context_selection: "protein" (polymer ignored) tells you how much surface area the polymer covers.

Step 1: Configure SASA Runs

Add a sasa section to the plugins: block in your comparison.yaml. Each run defines one SASA calculation with its own target and context.

Here is a four-run configuration that fully characterizes polymer shielding:

plugins:
  sasa:
    runs:
      - label: "protein_isolated"
        target_selection: "protein"
        context_selection: "protein"

      - label: "protein_with_polymer"
        target_selection: "protein"
        context_selection: "protein or chainID C"

      - label: "active_site_isolated"
        target_selection: "protein and (resid 77 or resid 156 or resid 262)"
        context_selection: "protein"

      - label: "active_site_with_polymer"
        target_selection: "protein and (resid 77 or resid 156 or resid 262)"
        context_selection: "protein or chainID C"

    probe_radius_nm: 0.14
    n_sphere_points: 960

What each run tells you

Run	Question it answers
`protein_isolated`	What is the total protein SASA when only protein atoms block solvent? (baseline)
`protein_with_polymer`	What is the protein SASA when the polymer can also block solvent? (shielded)
`active_site_isolated`	How exposed is the active site without polymer effects?
`active_site_with_polymer`	How exposed is the active site when the polymer is present?

The difference between protein_isolated and protein_with_polymer is the polymer’s shielding effect. A decrease in SASA indicates that the polymer covers part of the protein surface.

Tip

For the “No Polymer” control condition, the protein_with_polymer run will produce the same result as protein_isolated because there is no chain C. This is expected and provides a useful internal consistency check.

SASA algorithm parameters

Parameter	Default	Description
`probe_radius_nm`	`0.14`	Shrake-Rupley probe radius in nm (standard water-sized probe)
`n_sphere_points`	`960`	Number of points on each atom’s test sphere (higher = more accurate, slower)
`chunk_size`	`100`	Frames processed per chunk (lower = less memory, slower)

The defaults are suitable for most enzyme-polymer systems. Increase n_sphere_points to 1500+ only if you need very high precision for small SASA differences.

Step 2: The Full comparison.yaml

Here is a complete example for a CALB enzyme study:

name: "calb_sasa_study"
description: "SASA analysis for CALB with polymer conjugates"
control: "No Polymer"

conditions:
  - label: "No Polymer"
    config: "../noPoly_CALB_pNPB/config.yaml"
    replicates: [1, 2, 3]

  - label: "SBMA-100"
    config: "../SBMA_100_CALB_pNPB/config.yaml"
    replicates: [1, 2, 3]

  - label: "EGMA-100"
    config: "../EGMA_100_CALB_pNPB/config.yaml"
    replicates: [1, 2, 3]

defaults:
  equilibration_time: "10ns"

plugins:
  sasa:
    runs:
      - label: "protein_isolated"
        target_selection: "protein"
        context_selection: "protein"
      - label: "protein_with_polymer"
        target_selection: "protein"
        context_selection: "protein or chainID C"
      - label: "active_site_isolated"
        target_selection: "protein and (resid 77 or resid 156 or resid 262)"
        context_selection: "protein"
      - label: "active_site_with_polymer"
        target_selection: "protein and (resid 77 or resid 156 or resid 262)"
        context_selection: "protein or chainID C"
    probe_radius_nm: 0.14
    n_sphere_points: 960

plot_settings:
  format: "png"
  dpi: 300
  style: "publication"

Step 3: Run Locally

For small systems or interactive exploration, run the SASA analysis locally:

pixi run -e build polyzymd compare run sasa \
    -f comparison.yaml

This runs the full pipeline sequentially: compute_replicate for every replicate, aggregate for every condition, then compare and plot. Expect this to take several minutes per replicate depending on trajectory length and system size.

Note

SASA computation is CPU-intensive and memory-intensive because the Shrake-Rupley algorithm iterates over every atom in the context selection for every frame. The chunk_size parameter controls how many frames are loaded into memory at once.

Step 4: Run on HPC

For large systems or many replicates, submit the analysis as SLURM jobs:

pixi run -e build polyzymd compare submit sasa \
    -f comparison.yaml \
    --partition aa100 \
    --mem 8G \
    --time 02:00:00

This submits a DAG of SLURM jobs that process replicates in parallel. See How To: Submit Analysis Jobs to a SLURM Cluster for the full HPC guide, including dry runs, monitoring, and troubleshooting.

Tip

SASA jobs are marked with execution_cost_hint = "high" in the plugin. Allocate at least 8 GB of memory and 1–2 hours of wall time per replicate for systems with 50,000+ atoms.

Step 5: Interpret the Results

Comparison JSON

The comparison result is saved to comparison/sasa/result.json. It contains:

Per-condition summaries with mean SASA ± SEM for each run
Pairwise comparisons between conditions for each run (t-test, Cohen’s d, percent change)
ANOVA results for each run when three or more conditions are compared
Rankings of conditions by SASA for each run
Direction labels: "shielding" (SASA decreased >1%), "exposure" (SASA increased >1%), or "unchanged"

Here is a simplified excerpt showing how to read the key fields:

{
  "run_labels": ["protein_isolated", "protein_with_polymer"],
  "conditions": [
    {
      "label": "No Polymer",
      "run_summaries": [
        {
          "label": "protein_isolated",
          "mean_sasa": 12450.3,
          "sem_sasa": 85.2
        },
        {
          "label": "protein_with_polymer",
          "mean_sasa": 12448.1,
          "sem_sasa": 84.9
        }
      ]
    },
    {
      "label": "SBMA-100",
      "run_summaries": [
        {
          "label": "protein_isolated",
          "mean_sasa": 12380.5,
          "sem_sasa": 91.0
        },
        {
          "label": "protein_with_polymer",
          "mean_sasa": 11200.7,
          "sem_sasa": 102.3
        }
      ]
    }
  ],
  "pairwise_comparisons": [
    {
      "run_label": "protein_with_polymer",
      "condition_a": "No Polymer",
      "condition_b": "SBMA-100",
      "p_value": 0.003,
      "cohens_d": -1.82,
      "direction": "shielding",
      "significant": true,
      "percent_change": -10.0
    }
  ]
}

How to read the results

Compare protein_isolated across conditions. This measures intrinsic protein compactness differences (without polymer effects). Small differences here indicate that the protein folds similarly across conditions.
Compare protein_with_polymer across conditions. The “No Polymer” condition serves as the baseline. Conditions with polymer show lower SASA in this run if the polymer shields the surface.
Calculate the shielding effect. For each polymer condition, subtract protein_with_polymer from protein_isolated. The larger the difference, the more surface the polymer covers.
Check active site runs. If active_site_with_polymer is significantly lower than active_site_isolated for a polymer condition, the polymer may be blocking substrate access — a concern for enzyme activity.

Plots

The SASA plugin generates three types of plots:

Plot	File pattern	What it shows
Comparison bars	`sasa_comparison_<run>.png`	Mean SASA ± SEM per condition, with replicate scatter points
Time series	`sasa_timeseries_<run>.png`	Per-frame SASA traces overlaid for each condition
Residue profiles	`sasa_profile_<run>.png`	Per-residue mean SASA across conditions

The bar plots are the most informative for quick assessment. Look for conditions where the protein_with_polymer bar is significantly lower than the protein_isolated bar — this is the polymer shielding signal.

Common Configurations

Recipe collection (how-to mode)

The configurations below are task-oriented recipes rather than step-by-step tutorial content. Use them as starting points for your own SASA analysis.

Minimal: whole-protein SASA only

plugins:
  sasa:
    runs:
      - label: "protein_total"
        target_selection: "protein"

When context_selection is omitted, it defaults to match target_selection. This measures the protein’s self-SASA without considering any other molecules.

Two-run: basic shielding comparison

plugins:
  sasa:
    runs:
      - label: "protein_only"
        target_selection: "protein"
        context_selection: "protein"
      - label: "protein_full_context"
        target_selection: "protein"
        context_selection: "all"

Using "all" as the context includes everything (protein, polymer, substrate, solvent). This gives the “true” SASA but makes comparisons harder because solvent box size differences between conditions can affect the result.

Monomer-specific shielding

If your polymer contains specific monomer types, you can test which monomer contributes more to shielding:

plugins:
  sasa:
    runs:
      - label: "protein_isolated"
        target_selection: "protein"
        context_selection: "protein"
      - label: "protein_with_sbma"
        target_selection: "protein"
        context_selection: "protein or resname SBMA"
      - label: "protein_with_egma"
        target_selection: "protein"
        context_selection: "protein or resname EGMA"

Warning

Ensure your monomer residue names (resname SBMA, resname EGMA) match the actual residue names in your topology. Check with:

pixi run -e build python -c "
import MDAnalysis as mda
u = mda.Universe('solvated_system.pdb')
print(set(u.select_atoms('chainID C').residues.resnames))
"

Active site focus with specific residues

plugins:
  sasa:
    runs:
      - label: "catalytic_triad"
        target_selection: "protein and (resid 77 or resid 156 or resid 262)"
        context_selection: "protein or chainID C"
      - label: "binding_pocket"
        target_selection: "protein and (resid 77 or resid 156 or resid 262 or resid 80 or resid 155)"
        context_selection: "protein or chainID C"

Stride for long trajectories

If your trajectory has many frames and SASA computation is slow, increase the stride to analyze every Nth frame:

plugins:
  sasa:
    runs:
      - label: "protein_with_polymer"
        target_selection: "protein"
        context_selection: "protein or chainID C"
        stride: 5

Note

A stride of 5 uses every 5th frame, reducing computation time by ~5x. The plugin still computes autocorrelation-corrected SEM on the subsampled data.

Result Models

Reference material

This section is reference-style content for plugin developers and advanced users who need to inspect the data models programmatically.

For plugin developers or advanced users, the SASA result hierarchy is:

Model	Level	Key fields
`SASARunResult`	Per-replicate, per-run	`mean_sasa`, `std_sasa`, `sem_sasa`, `n_frames_used`, `n_target_atoms`
`SASAResult`	Per-replicate (all runs)	`run_results: list[SASARunResult]`
`SASARunAggregatedResult`	Per-condition, per-run	`overall_mean`, `overall_sem`, `per_replicate_means`, `per_residue_mean_sasa`
`SASAAggregatedResult`	Per-condition (all runs)	`run_results: list[SASARunAggregatedResult]`
`SASAComparisonResult`	Cross-condition	`pairwise_comparisons`, `ranking_by_run`, `anova_by_run`

Raw per-frame and per-residue SASA data is stored as NPZ sidecars alongside the JSON result files.

What You Have Now

After following this tutorial, you have:

a multi-run SASA configuration that isolates the polymer shielding effect
comparison results with per-run statistical tests across conditions
bar, time series, and residue-profile plots for visual assessment
the understanding to design custom SASA run configurations for your system