Help! My Simulation Shows a Ball of Bonds Instead of Reasonable Looking Molecules!

A debugging case study from the PolyzyMD development team (January 2026)

You’ve waited hours (or days) for your MD simulation to finish. You excitedly load the trajectory in PyMOL or VMD, and… chaos. Bonds stretch across the entire simulation box. Molecules look like abstract art. Your beautiful enzyme-polymer system looks like someone threw spaghetti at the screen.

If this sounds familiar, you’re not alone. This document chronicles our journey debugging this exact issue in PolyzyMD, and we hope it saves future scientists the hours of head-scratching we went through.

TL;DR (The Quick Fix)

If you’re in a hurry, here’s the essence:

Symptom: Some molecules look broken (bonds span the box), but others look fine
Root Cause: The atom order in your DCD trajectory doesn’t match the atom order in your topology file (PDB)
Why it happens: Any code that processes molecules by type (grouping waters together, ions together, etc.) can reorder atoms
Solution: Ensure the code that creates your OpenMM system processes molecules in the SAME order as the topology file written for visualization

Key diagnostic: If only SOME molecules are broken (especially water/ions) while others (protein, ligand) look fine, you almost certainly have an atom order mismatch.

The Problem We Encountered

What We Saw

After running a production MD simulation with PolyzyMD, we loaded the trajectory in PyMOL:

PyMOL> load solvated_system.pdb
PyMOL> load_traj trajectory.dcd, solvated_system

Instead of a nice solvated enzyme system, we saw:

Bonds spanning the entire simulation box - hundreds of angstroms long
Molecules that looked “exploded” - atoms scattered everywhere
A tangled mess that was completely unanalyzable

The Crucial Observation

Here’s what made this bug particularly insidious: not everything was broken.

The protein looked perfect - proper secondary structure, reasonable motion
The substrate/ligand looked fine - stayed near the active site
The polymers looked correct - maintained their chain structure

But:

Water molecules were completely scrambled
Ions appeared in nonsensical positions
The entire solvent shell was chaos

This selective breakage was the key clue that ultimately led us to the solution.

Why This Matters

Broken trajectories aren’t just ugly - they’re scientifically dangerous:

Analysis becomes meaningless - RMSD, RMSF, distances are all wrong
Subtle bugs might go unnoticed - if you’re only looking at the protein, you might miss the problem
Wasted compute time - hours/days of simulation that can’t be used
Potential for incorrect conclusions - if you don’t notice the problem

Background: How DCD Trajectories Work

Before diving into our debugging journey, let’s understand the fundamentals. This knowledge is essential for diagnosing trajectory problems.

The DCD File Format

DCD (CHARMM/NAMD trajectory format) is one of the most common trajectory formats in MD. Here’s what’s inside:

DCD File Contents:
├── Header
│   ├── Number of atoms
│   ├── Number of frames
│   └── Timestep information
└── Frames (repeated for each saved frame)
    ├── Box vectors (if periodic)
    └── Coordinates
        ├── X coordinates for ALL atoms (atom 0, atom 1, ..., atom N-1)
        ├── Y coordinates for ALL atoms
        └── Z coordinates for ALL atoms

What DCD files contain:

Atomic coordinates (X, Y, Z for each atom)
Box dimensions (for periodic systems)
Frame count and timing information

What DCD files do NOT contain:

Atom names
Residue names or numbers
Chain identifiers
Bond information
Element types
Charges
Any topology information whatsoever

The Critical Assumption

When you load a DCD file with a topology file (PDB, PSF, etc.), the visualization software makes a simple assumption:

Atom 0 in the DCD corresponds to Atom 0 in the topology. Atom 1 in the DCD corresponds to Atom 1 in the topology. And so on…

There is no other way to map coordinates to atoms. The DCD format provides no atom identifiers - just coordinates indexed by position.

A Simple Analogy

Imagine you have:

A guest list (topology): “Seat 1: Alice, Seat 2: Bob, Seat 3: Carol…”
A seating chart update (DCD frame): “Seat 1 is at position (x1,y1), Seat 2 is at (x2,y2)…”

If your seating chart was made with a different guest list order, you’d have Alice sitting where Bob should be, Bob where Carol should be, etc. The party would be chaos.

That’s exactly what happens with mismatched atom order.

Why DCD Works This Way

This design isn’t a flaw - it’s a feature:

Efficiency: No redundant topology data in every frame
Speed: Sequential coordinate arrays are fast to read/write
Compatibility: Works with any simulation package
Size: Trajectories stay small (coordinates only)

The tradeoff is that you are responsible for ensuring atom order consistency.

Our Debugging Journey

Here’s where we share our wrong turns, so you can avoid them.

Attempt 1: “It Must Be PBC Wrapping!”

Initial Hypothesis: Periodic boundary conditions (PBC) are causing molecules to be split across box edges. The visualization software is drawing bonds across the box instead of wrapping them.

Why this seemed reasonable:

PBC wrapping is a common source of visualization artifacts
Molecules near box edges DO get split across boundaries
Many trajectory post-processing discussions mention this

What we tried:

# In the simulation runner, we tried various combinations:
state = simulation.context.getState(
    getPositions=True,
    enforcePeriodicBox=True   # Try to keep molecules whole
)

# Later tried:
enforcePeriodicBox=False  # Maybe this was the problem?

We also implemented custom unwrapping algorithms, trying to “make molecules whole” after the fact.

Why it didn’t work:

PBC wrapping causes molecules to appear split at box boundaries, but the atoms themselves are still in the correct relative positions. If you zoom in on a “split” water molecule due to PBC, you’d see:

One hydrogen on the left edge of the box
The oxygen and other hydrogen on the right edge
But the H-O distance would still be ~1 Å (correct)

Our problem was different: atoms were in completely wrong positions relative to each other. A water molecule might have one hydrogen near the protein and another hydrogen 50 Å away in the bulk solvent.

The clue we missed: If this were PBC wrapping, ALL molecules would be equally affected. But our protein looked perfect. Only the water/ions were scrambled.

Attempt 2: “Maybe It’s the DCD Reporter?”

Second Hypothesis: The OpenMM DCDReporter is writing coordinates incorrectly, perhaps with some buffering or ordering issue.

What we tried:

Examining the reporter configuration
Checking frame-by-frame coordinate outputs
Comparing DCD output to PDB snapshots

Why it didn’t work:

The DCDReporter was working correctly. When we extracted coordinates for a specific atom index from the DCD and from a PDB snapshot at the same timestep, they matched. The coordinates were being written correctly.

The problem wasn’t that coordinates were wrong - it was that we were interpreting them with the wrong atom mapping.

Attempt 3: “Is It the Topology File?”

Third Hypothesis: Maybe the PDB topology file we’re using for visualization is corrupted or wrong.

What we tried:

Regenerating the topology file
Checking atom counts (they matched!)
Visual inspection of the PDB

What we found:

The topology file was fine. Atom count matched the DCD perfectly. The PDB looked correct when loaded by itself.

The key realization: Both files were internally consistent. The problem was that they were consistent with DIFFERENT atom orderings.

The Breakthrough: Pattern Recognition

We finally sat down and carefully observed which molecules were broken:

Molecule Type	Appearance	Position in Old Processing Order
Protein	Perfect	1st
Substrate	Perfect	2nd
Polymers	Perfect	3rd
Water	Broken	LAST (grouped)
Ions	Broken	LAST (grouped)

The pattern: Everything that was processed early (in its “natural” order) looked fine. Everything that was grouped and processed at the end was broken.

This led us to examine our Interchange creation code…

Root Cause: Atom Order Mismatch

How OpenFF Interchange Works

OpenFF Interchange is a powerful tool that converts OpenFF topology objects into OpenMM systems. The key operation for our story is Interchange.combine():

# Combining two interchanges appends atoms
combined = interchange_A.combine(interchange_B)

# Result atom order:
# [all atoms from A] + [all atoms from B]

The order of combination determines the final atom order.

The Old Algorithm (Broken)

Our original _create_interchange_batched() method was designed for efficiency. It grouped molecules by type to minimize redundant parameterization:

# OLD CODE (simplified) - THE PROBLEM

# Group molecules by SMILES (molecule type)
smiles_groups = defaultdict(list)
for molecule in solvated_topology.molecules:
    smiles = molecule.to_smiles()
    if smiles not in water_ion_smiles:
        smiles_groups[smiles].append(molecule)

# Create interchange for each molecule type
for smiles, molecules in smiles_groups.items():
    inc = ff.create_interchange(topology=molecules)
    all_interchanges.append(inc)

# Process water + ions LAST as a single batch
water_ion_mols = [m for m in topology if m.to_smiles() in water_ion_smiles]
water_ion_inc = ff.create_interchange(topology=water_ion_mols)
all_interchanges.append(water_ion_inc)  # <-- Always at the end!

# Combine all
combined = all_interchanges[0]
for inc in all_interchanges[1:]:
    combined = combined.combine(inc)

This produced an atom order like:

[protein atoms] [substrate atoms] [polymer atoms] [ALL water atoms] [ALL ion atoms]

The Topology File Order

But solvated_system.pdb was written directly from the OpenFF topology BEFORE this reordering:

# How the PDB was written
solvated_topology.to_file("solvated_system.pdb")

The OpenFF solvated topology had molecules in their original packing order:

[protein] [substrate] [polymer1] [water1] [Na+] [polymer2] [water2] [Cl-] [water3] ...

Water and ions were interspersed throughout, not grouped at the end!

The Mismatch Visualized

PDB Topology (used for visualization):
Index:  0    1    2    3    4    5    6    7    8    9    ...
Atom:   N    CA   C    O    ...  OW   HW1  HW2  NA+  OW   ...
        └─protein─┘              └water┘     └ion┘ └water...

DCD Trajectory (after our processing):  
Index:  0    1    2    3    4    5    6    7    8    9    ...
Atom:   N    CA   C    O    ...  OW   OW   OW   OW   OW   ...
        └─protein─┘              └──ALL water grouped──┘

When PyMOL loads both:
- DCD atom 5 (a water oxygen) → mapped to PDB atom 5 (also water) ✓
- DCD atom 6 (a water oxygen) → mapped to PDB atom 6 (water hydrogen!) ✗
- DCD atom 7 (a water oxygen) → mapped to PDB atom 7 (water hydrogen!) ✗
- DCD atom 8 (a water oxygen) → mapped to PDB atom 8 (sodium ion!) ✗

Result: Complete chaos in the solvent

The protein matched because it was first in both orderings. The solvent was scrambled because it was reordered.

The Solution

New Algorithm: Preserve Order

The fix was conceptually simple: process molecules in their original order, only batching consecutive molecules of the same type:

# NEW CODE (simplified) - THE FIX

current_smiles = None
current_batch = []

def flush_batch():
    """Create interchange for current batch."""
    if current_batch:
        inc = ff.create_interchange(topology=current_batch)
        all_interchanges.append(inc)
    current_batch.clear()

# Process molecules in ORIGINAL ORDER
for molecule in solvated_topology.molecules:  # Original iteration order!
    mol_smiles = molecule.to_smiles()
    
    if mol_smiles != current_smiles:
        # Different molecule type - flush previous batch
        flush_batch()
        current_smiles = mol_smiles
    
    current_batch.append(molecule)

flush_batch()  # Don't forget the last batch

# Combine all (now in correct order!)
combined = all_interchanges[0]
for inc in all_interchanges[1:]:
    combined = combined.combine(inc)

This preserves the exact molecule order while still batching consecutive same-type molecules for efficiency.

Key Principle

The atom order in your trajectory MUST match the atom order in your topology file.

Any code that manipulates molecular systems must preserve this order, or must regenerate the topology file to match the new order.

Why Batching Consecutive Molecules Works

If the original order is:

[protein] [water1] [water2] [water3] [Na+] [water4] [water5] [Cl-]

The new algorithm produces batches:

Batch 1: [protein]           → Interchange 1
Batch 2: [water1,2,3]        → Interchange 2  (consecutive waters)
Batch 3: [Na+]               → Interchange 3
Batch 4: [water4,5]          → Interchange 4  (consecutive waters)
Batch 5: [Cl-]               → Interchange 5

Combined order: [protein][water1,2,3][Na+][water4,5][Cl-] - matches original!

Lessons for Future Developers

1. DCD Files Are “Dumb”

Never forget that DCD files contain no topology information. They rely entirely on external files for atom identification. This means:

The DCD doesn’t know atom names, elements, or residues
Coordinates are purely positional (atom 0, atom 1, …)
Any mismatch with the topology file causes silent corruption

2. Diagnosis Checklist

When you see broken molecules in a trajectory, ask yourself:

Are ALL molecules broken, or just some?
- All broken → probably PBC wrapping
- Some broken → probably atom order mismatch
Which molecules are affected?
- Grouped at end of processing → likely reordering issue
- Random distribution → look elsewhere

Do atom counts match?

import MDAnalysis as mda
u = mda.Universe('topology.pdb', 'trajectory.dcd')
print(f"PDB atoms: {len(u.atoms)}")
print(f"DCD atoms: {u.trajectory.n_atoms}")
# These MUST match

Is there code that groups molecules by type?
- Dictionary grouping by SMILES/residue name
- Separate processing of water/ions
- Any sorting or reordering operations

3. Quick Verification Script

Use this script to sanity-check your trajectory:

import MDAnalysis as mda

def verify_trajectory(topology_path, trajectory_path):
    """Verify trajectory is compatible with topology."""
    u = mda.Universe(topology_path, trajectory_path)
    
    # Check 1: Atom counts match
    topo_atoms = len(u.atoms)
    traj_atoms = u.trajectory.n_atoms
    print(f"Topology atoms: {topo_atoms}")
    print(f"Trajectory atoms: {traj_atoms}")
    
    if topo_atoms != traj_atoms:
        print("ERROR: Atom count mismatch!")
        return False
    
    # Check 2: Sample some atoms and their types
    print("\nFirst 10 atoms in topology:")
    for atom in u.atoms[:10]:
        print(f"  {atom.index}: {atom.name} ({atom.resname} {atom.resid})")
    
    # Check 3: Look at water molecules specifically
    waters = u.select_atoms("resname HOH or resname WAT or resname TIP3")
    if len(waters) > 0:
        print(f"\nFound {len(waters)} water atoms")
        print("First water molecule atoms:")
        first_water = u.select_atoms(f"resid {waters[0].resid}")
        for atom in first_water:
            print(f"  {atom.index}: {atom.name}")
    
    print("\nVisualize in PyMOL and check if molecules look whole!")
    return True

# Usage:
verify_trajectory('solvated_system.pdb', 'trajectory.dcd')

4. Prevention Strategies

Test early: Run a short simulation and check visualization BEFORE committing to long runs
Document atom order assumptions: If your code manipulates molecule order, document it clearly
Write topology AFTER processing: If you must reorder atoms, write the topology file from the same data structure that creates the simulation
Use consistent iteration: When iterating over molecules multiple times, ensure the order is deterministic

How to Verify the Fix Worked

After applying a fix for atom order mismatch:

Visual Verification

Load your files in PyMOL:

load solvated_system.pdb
load_traj trajectory.dcd, solvated_system

Check the protein: Should maintain secondary structure throughout
Check water molecules: Zoom in on individual waters
- Should see proper H-O-H geometry
- Bond lengths should be ~1 Å (not 50 Å!)
Check ions: Should appear as single atoms, not bonded to distant atoms
Scrub through frames: Problems often become more apparent with motion

Programmatic Verification

import MDAnalysis as mda

u = mda.Universe('solvated_system.pdb', 'trajectory.dcd')

# Check water geometry in first frame
waters = u.select_atoms("resname HOH or resname WAT")
for ts in u.trajectory[:1]:  # First frame
    # Get first water residue
    first_water_resid = waters[0].resid
    water_atoms = u.select_atoms(f"resid {first_water_resid}")
    
    if len(water_atoms) == 3:
        positions = water_atoms.positions
        # Calculate O-H distances
        oh1 = np.linalg.norm(positions[0] - positions[1])
        oh2 = np.linalg.norm(positions[0] - positions[2])
        
        print(f"Water O-H distances: {oh1:.2f} Å, {oh2:.2f} Å")
        
        if oh1 > 2.0 or oh2 > 2.0:
            print("WARNING: Water geometry looks wrong!")
        else:
            print("Water geometry looks correct!")

Appendix: Technical Details

Relevant Code Location

File: src/polyzymd/builders/system_builder.py

Method: _create_interchange_batched() (around line 500)

Git commit: 2047647 - “Fix atom order mismatch between solvated_system.pdb and Interchange”

The Actual Fix (Abbreviated)

def _create_interchange_batched(self, ff: ForceField, water_mol: Molecule) -> Interchange:
    """Create Interchange using batched molecule processing with preserved order.
    
    This implementation batches molecules by type for efficiency while preserving
    the exact molecule order from self._solvated_topology. This is critical for
    DCD trajectory compatibility - the atom order in the Interchange must match
    the atom order in solvated_system.pdb.
    """
    # ... setup code ...
    
    current_smiles = None
    current_batch: List[Molecule] = []

    def flush_batch():
        """Create interchange for current batch and add to list."""
        nonlocal current_batch, current_smiles
        if not current_batch:
            return
        # ... create interchange for batch ...
        current_batch = []

    # Process molecules in ORIGINAL ORDER
    for molecule in self._solvated_topology.molecules:
        mol_smiles = molecule.to_smiles()

        if mol_smiles != current_smiles:
            flush_batch()  # Different type - flush previous batch
            current_smiles = mol_smiles

        current_batch.append(molecule)

    flush_batch()  # Don't forget last batch
    
    # ... combine interchanges ...

When This Problem Might Appear Elsewhere

Watch out for atom order issues when:

Using any molecule grouping/batching code - efficiency optimizations often group by type
Converting between file formats - some formats sort atoms by element or residue
Adding/removing atoms mid-simulation - ensure all files are regenerated
Combining systems from different sources - verify consistent ordering
Using topology manipulation tools - some tools sort or reorder atoms

Last updated: January 2026

Found this helpful? Found an error? Open an issue at https://github.com/joelaforet/polyzymd/issues

Help! My Simulation Shows a Ball of Bonds Instead of Reasonable Looking Molecules!

TL;DR (The Quick Fix)

The Problem We Encountered

What We Saw

The Crucial Observation

Why This Matters

Background: How DCD Trajectories Work

The DCD File Format

The Critical Assumption

A Simple Analogy

Why DCD Works This Way

Our Debugging Journey

Attempt 1: “It Must Be PBC Wrapping!”

Attempt 2: “Maybe It’s the DCD Reporter?”

Attempt 3: “Is It the Topology File?”

The Breakthrough: Pattern Recognition

Root Cause: Atom Order Mismatch

How OpenFF Interchange Works

The Old Algorithm (Broken)

The Topology File Order

The Mismatch Visualized

The Solution

New Algorithm: Preserve Order

Key Principle

Why Batching Consecutive Molecules Works

Lessons for Future Developers

1. DCD Files Are “Dumb”

2. Diagnosis Checklist

3. Quick Verification Script

4. Prevention Strategies

How to Verify the Fix Worked

Visual Verification

Programmatic Verification

Appendix: Technical Details

Relevant Code Location

The Actual Fix (Abbreviated)

Related Resources

When This Problem Might Appear Elsewhere