Distance Analysis: Quick Start

Compute inter-atomic distances with proper statistical handling in under 5 minutes.

Note

Want to understand the statistics? This guide focuses on getting results quickly. For proper uncertainty quantification (autocorrelation correction, SEM vs. SD), see Statistics Best Practices for MD Analysis.

Environment Setup

All commands below assume you have activated the PolyzyMD pixi environment:

pixi shell -e build

Alternatively, prefix each command with pixi run -e build.

TL;DR

# Configure distance pairs in comparison.yaml, then run:
polyzymd compare run distances -f comparison.yaml --eq-time 10ns

# Run all enabled analyses in the same workflow
polyzymd compare run-all -f comparison.yaml --eq-time 10ns

# Force recompute and machine-readable output
polyzymd compare run distances -f comparison.yaml --eq-time 10ns --recompute --format json

Prerequisites

Before running distance analysis, you need:

  1. Completed production simulation(s) — at least one replicate

  2. Comparison configcomparison.yaml with conditions and plugin settings

  3. Trajectory files — in the scratch directory specified in config

Verify your setup:

# Check that trajectories exist
ls $(polyzymd info -c config.yaml --scratch-dir)/production_*/

What Distance Analysis Provides

The distance analysis module computes:

Feature

Description

Mean distance

Average distance over trajectory (equilibrated portion)

SEM

Autocorrelation-corrected standard error of the mean

Mode (KDE peak)

Most probable distance from kernel density estimation

Contact fraction

% of frames below a distance threshold

Distribution

Full histogram and KDE for visualization

Tip

When to use distances vs. contacts vs. triad:

  • Distances: Specific atom pairs with continuous distance values

  • Contacts: All residue-residue contacts at an interface (binary count)

  • Triad: Pre-defined catalytic geometry with simultaneous contact analysis

Basic Usage

Define distance pairs in comparison.yaml:

# comparison.yaml
name: "distance_quickstart"
control: "no_polymer"

conditions:
  - label: "no_polymer"
    config: "configs/no_polymer.yaml"
    replicates: [1, 2, 3]
  - label: "with_polymer"
    config: "configs/with_polymer.yaml"
    replicates: [1, 2, 3]

plugins:
  distances:
    enabled: true
    pairs:
      - label: "Ser77-His156"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 156 and name NE2"
      - label: "His156-Asp133"
        selection_a: "protein and resid 156 and name ND1"
        selection_b: "midpoint(protein and resid 133 and name OD1 OD2)"

Run analysis:

# Run distances only
polyzymd compare run distances -f comparison.yaml --eq-time 10ns

# Run all enabled analyses
polyzymd compare run-all -f comparison.yaml --eq-time 10ns

# Force recompute
polyzymd compare run distances -f comparison.yaml --eq-time 10ns --recompute
polyzymd compare run distances -f comparison.yaml --eq-time 10ns

Expected behavior:

Loading comparison config from: comparison.yaml
Running plugin: distances
  Equilibration: 10ns
  Conditions: no_polymer, with_polymer
Distance comparison complete

Run all enabled analyses:

polyzymd compare run-all -f comparison.yaml --eq-time 10ns

Add Contact-Style Thresholds

Set a threshold to report the fraction of frames below a cutoff (useful for hydrogen-bond-style geometry checks).

plugins:
  distances:
    enabled: true
    threshold: 3.5
    pairs:
      - label: "Ser77-His156"
        selection_a: "protein and resid 77 and name OG"
        selection_b: "protein and resid 156 and name NE2"
polyzymd compare run distances -f comparison.yaml --eq-time 10ns

Write Robust Selections

PolyzyMD supports standard MDAnalysis selections plus helper syntax like midpoint(...), com(...), and pdbindex N.

Warning

Chain-aware selections are required

Residue numbers restart by chain in PolyzyMD systems. A selection like resid 141-148 can match multiple chains.

For protein residues, include protein and ...:

# Incorrect
selection_a: "com(resid 141-148)"

# Correct
selection_a: "com(protein and resid 141-148)"

Common patterns:

```yaml
# Midpoint of Asp carboxylate oxygens
selection_a: "midpoint(protein and resid 133 and name OD1 OD2)"

# Center of mass of ligand
selection_b: "com(resname LIG)"

# Single atom
selection_a: "protein and resid 77 and name OG"

# Atom by PDB serial number
selection_a: "pdbindex 2740"

Use PBC and Alignment Defaults

Distance analysis uses PBC-aware distances (use_pbc: true) and trajectory alignment (align_trajectory: true) by default.

These defaults reduce artifacts from periodic wrapping and global rigid-body motion, so measured distances reflect local geometry.

If you need to override either behavior, see Distances Plugin Reference.

Compare Distances Across Conditions

Use the same command after defining conditions and pairs in comparison.yaml:

polyzymd compare run distances -f comparison.yaml --eq-time 10ns

This provides:

  • Pair-level summaries across conditions

  • Ranking by mean distance (primary) and fraction below threshold (secondary)

  • Statistical tests (t-tests, effect sizes, ANOVA)

For broader multi-plugin workflows, see How to Compare Simulation Conditions.

Reference and Troubleshooting

For full field tables, output JSON schemas, plot types, CLI options, and troubleshooting fixes, see Distances Plugin Reference.

Next Steps