Configuration Reference
This document describes all configuration options for PolyzyMD YAML files.
Configuration Structure
A complete configuration file has these sections:
name: "simulation_name"
description: "optional description"
enzyme: { ... } # Required
substrate: { ... } # Optional (null for apo)
polymers: { ... } # Optional (null to disable)
solvent: { ... } # Required
restraints: [ ... ] # Optional
thermodynamics: { ... } # Required
simulation_phases: { ... } # Required
output: { ... } # Required
force_field: { ... } # Optional (has defaults)
Enzyme Configuration
enzyme:
name: "LipA" # Identifier (required)
pdb_path: "structures/enzyme.pdb" # Path to PDB file (required)
description: "Bacillus subtilis Lipase A" # Optional description
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
Short identifier for the enzyme |
|
path |
Yes |
Path to prepared PDB file |
|
string |
No |
Human-readable description |
Substrate Configuration
substrate:
name: "Resorufin-Butyrate" # Identifier (required)
sdf_path: "structures/substrate.sdf" # Path to SDF file (required)
conformer_index: 0 # Which conformer to use (default: 0)
charge_method: "nagl" # Charge assignment method
residue_name: "LIG" # 3-letter residue name
Field |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
string |
Yes |
- |
Substrate identifier |
|
path |
Yes |
- |
Path to SDF with docked conformers |
|
int |
No |
0 |
Index of conformer to use (0-indexed) |
|
string |
No |
“nagl” |
Options: |
|
string |
No |
“LIG” |
3-letter code for topology |
Charge Methods
Method |
Description |
Speed |
|---|---|---|
|
Graph neural network charges |
Fast |
|
Machine learning charges |
Medium |
|
Semi-empirical QM charges |
Slow |
No Substrate (Apo Simulation)
substrate: null
Polymer Configuration
PolyzyMD supports two modes for polymer generation: cached (load from pre-built SDF files) and dynamic (generate on-the-fly from SMILES).
Tip
For a complete guide on dynamic polymer generation, see Dynamic Polymer Generation.
Basic Configuration (Cached Mode)
polymers:
enabled: true # Enable/disable polymers
type_prefix: "SBMA-EGPMA" # Polymer type identifier
monomers: # Monomer definitions
- label: "A" # Single character label
probability: 0.98 # Selection probability (0-1)
name: "SBMA" # Full name (optional)
- label: "B"
probability: 0.02
name: "EGPMA"
length: 5 # Monomers per chain
count: 2 # Number of polymer chains
sdf_directory: null # Pre-built polymer SDFs (optional)
cache_directory: ".polymer_cache" # Cache for generated polymers
Dynamic Generation Configuration
To generate polymers on-the-fly from monomer SMILES (without pre-built SDF files):
polymers:
enabled: true
generation_mode: "dynamic" # Enable dynamic generation
type_prefix: "SBMA-EGPMA"
# ATRP reaction templates (use bundled defaults or custom paths)
reactions:
initiation: "default" # or "/path/to/custom.rxn"
polymerization: "default"
termination: "default"
monomers:
- label: "A"
probability: 0.7
name: "SBMA"
smiles: "[H]C([H])=C(C(=O)OC...)..." # Required for dynamic mode
residue_name: "SBM" # Optional 3-letter residue name
- label: "B"
probability: 0.3
name: "EGPMA"
smiles: "[H]C([H])=C(C(=O)OC...)..."
residue_name: "EGM"
length: 5
count: 2
charger: "nagl" # Charge method: nagl, espaloma, am1bcc
max_retries: 10 # Retries for ring-piercing detection
cache_directory: ".polymer_cache"
All Polymer Options
Field |
Type |
Required |
Default |
Description |
|---|---|---|---|---|
|
bool |
No |
true |
Enable polymer addition |
|
string |
No |
“cached” |
|
|
string |
Yes |
- |
Identifier for polymer type |
|
list |
Yes |
- |
Monomer specifications |
|
int |
Yes |
- |
Chain length (number of monomers) |
|
int |
Yes |
- |
Number of chains to add |
|
path |
No |
null |
Directory with pre-built polymer SDFs |
|
path |
No |
“.polymer_cache” |
Cache directory |
|
object |
No |
all “default” |
ATRP reaction templates (dynamic mode) |
|
string |
No |
“nagl” |
Charge method for dynamic generation |
|
int |
No |
10 |
Max attempts for ring-piercing avoidance |
Monomer Specification
Field |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
Single character (A, B, C…) |
|
float |
Yes |
Selection probability (must sum to 1.0) |
|
string |
No |
Full monomer name |
|
string |
Dynamic only |
Raw monomer SMILES (with C=C double bond) |
|
string |
No |
3-letter residue code for topology |
Charge Methods for Dynamic Generation
Method |
Description |
Speed |
Accuracy |
|---|---|---|---|
|
Graph neural network charges |
Fast |
Good |
|
Machine learning charges |
Medium |
Good |
|
Semi-empirical QM charges |
Slow |
Best |
No Polymers
polymers: null
# or
polymers:
enabled: false
Solvent Configuration
solvent:
primary:
type: "water"
model: "tip3p" # Water model
co_solvents: [] # List of co-solvents (optional)
ions:
neutralize: true # Add counter-ions
nacl_concentration: 0.15 # NaCl concentration (M)
box:
padding: 1.2 # nm from solute to box edge
shape: "rhombic_dodecahedron" # Box shape
target_density: 1.0 # g/mL
tolerance: 2.0 # PACKMOL tolerance (Angstrom)
Water Models
Model |
Description |
|---|---|
|
TIP3P (default, fast) |
|
SPC/E |
|
TIP4P-Ew |
|
OPC (accurate, slower) |
Box Shapes
Shape |
Description |
|---|---|
|
Cubic box |
|
Space-efficient (default) |
|
Alternative space-efficient |
Co-solvents
PolyzyMD supports adding co-solvents to your simulation system. You can specify co-solvents using either volume fraction (v/v) or molar concentration.
Specification Methods
Method |
Field |
Description |
Effect on Water |
|---|---|---|---|
Volume Fraction |
|
Fraction of box volume (0-1) |
Reduces water proportionally |
Concentration |
|
Molar concentration (mol/L) |
Additive (water unchanged) |
Important: Use exactly ONE method per co-solvent. Do not specify both volume_fraction and concentration for the same co-solvent.
Volume Fraction Method
Use this when you want a specific percentage of the solvent to be the co-solvent (e.g., “30% DMSO”).
co_solvents:
- name: "dmso"
volume_fraction: 0.30 # 30% v/v DMSO
Formula:
n = (V_box × phi × rho) / M
Where:
n = number of co-solvent molecules
V_box = simulation box volume (L)
phi = volume fraction (e.g., 0.30 for 30%)
rho = co-solvent density (g/mL)
M = molar mass (g/mol)
Source: src/polyzymd/builders/solvent.py:267-287
The water count is reduced proportionally: if you specify 30% DMSO, water fills the remaining 70% of the box.
Concentration Method
Use this when you want a specific molar concentration (e.g., “2 M urea for protein denaturation studies”).
co_solvents:
- name: "urea"
concentration: 2.0 # 2 M urea
Formula:
n = C × V_box × N_A
Where:
n = number of co-solvent molecules
C = concentration (mol/L)
V_box = simulation box volume (L)
N_A = Avogadro's number (implicit in OpenMM)
Source: src/polyzymd/builders/solvent.py:295-312
The water count is NOT reduced when using concentration. The co-solvent molecules are added to the existing water, which may slightly increase the effective density.
Built-in Co-solvent Library
PolyzyMD includes a library of common co-solvents with pre-defined SMILES and densities. Density values are sourced from PubChem, a public database of chemical compounds. Each compound has a unique Compound Identification Number (CID) that can be used to look up detailed information including density, structure, and safety data.
Name |
SMILES |
Density (g/mL) |
Reference |
|---|---|---|---|
|
|
1.10 |
|
|
|
0.95 |
|
|
|
0.786 |
|
|
|
1.32 |
|
|
|
0.789 |
|
|
|
0.792 |
|
|
|
1.261 |
|
|
|
0.786 |
|
|
|
0.784 |
|
|
|
0.883 |
|
|
|
1.033 |
|
|
|
1.114 |
For library co-solvents, you only need to specify the name and either volume_fraction or concentration:
co_solvents:
- name: "dmso"
volume_fraction: 0.10 # 10% v/v DMSO - smiles and density auto-populated
Custom Co-solvents
For molecules not in the library, you must provide the SMILES string. Density is required only when using volume_fraction:
co_solvents:
# Custom co-solvent with volume fraction (density required)
- name: "ethyl_acetate"
smiles: "CCOC(=O)C"
density: 0.902 # g/mL - required for volume_fraction
volume_fraction: 0.15
# Custom co-solvent with concentration (density not needed)
- name: "my_additive"
smiles: "CC(=O)NC"
concentration: 0.5 # 0.5 M
Multiple Co-solvents
You can combine multiple co-solvents. Each can use either specification method independently:
co_solvents:
- name: "dmso"
volume_fraction: 0.20 # 20% v/v DMSO
- name: "urea"
concentration: 1.0 # Plus 1 M urea
Warning: When using multiple co-solvents with volume_fraction, ensure the total does not exceed 1.0 (100%). The remaining fraction is filled with water.
Warning
YAML List Syntax
A common mistake is placing each field on a separate line with its own -, which creates multiple list items instead of one object with multiple fields.
Incorrect (creates 3 separate incomplete items):
co_solvents:
- name: "dmso"
- volume_fraction: 0.30
- residue_name: "DMS"
Correct (one item with 3 fields):
co_solvents:
- name: "dmso"
volume_fraction: 0.30
residue_name: "DMS"
The - character starts a new list item. All fields belonging to the same item must be indented to the same level without a leading -.
Assumptions and Limitations
Ideal mixing: Volume fractions assume ideal mixing (volumes are additive). Real solutions may deviate.
Room temperature densities: Library densities are approximate values at ~25C.
PACKMOL placement: Co-solvent molecules are placed randomly by PACKMOL and may require equilibration to achieve uniform distribution.
Solvent Parameterization
PolyzyMD uses pre-computed partial charges for all solvent molecules to ensure consistency and performance.
Why Pre-computed Charges?
When adding many copies of the same solvent molecule (e.g., 1000 DMSO molecules), each molecule should have identical partial charges. However, charge calculation methods like AM1BCC have numerical variability - running the calculation twice on the same molecule can produce slightly different charges.
If charges were computed independently for each solvent molecule:
Inconsistency: Identical molecules would have different parameters (physically incorrect)
Performance: AM1BCC is expensive; computing it 1000x is wasteful
Force field issues: Parameter variability can cause OpenFF Interchange errors
How It Works
PolyzyMD solves this by computing charges once and reusing them:
Built-in solvents: Pre-computed SDF files are bundled with the package (in
src/polyzymd/data/solvents/)User cache: Custom solvents are cached in
~/.polyzymd/solvent_cache/after first useLookup order: Memory cache → Bundled SDFs → User cache → Generate and cache
# Lookup order for get_solvent_molecule("dmso")
1. Check in-memory cache (fastest)
2. Check bundled library: src/polyzymd/data/solvents/dmso.sdf
3. Check user cache: ~/.polyzymd/solvent_cache/dmso.sdf
4. Generate from SMILES + AM1BCC, save to user cache
Available Pre-computed Solvents
All 12 library co-solvents plus water models have pre-computed charges:
Solvent |
File |
Charge Method |
|---|---|---|
TIP3P Water |
|
Literature values |
DMSO |
|
AM1BCC |
DMF |
|
AM1BCC |
Acetonitrile |
|
AM1BCC |
Urea |
|
AM1BCC |
Ethanol |
|
AM1BCC |
Methanol |
|
AM1BCC |
Glycerol |
|
AM1BCC |
Isopropanol |
|
AM1BCC |
Acetone |
|
AM1BCC |
THF |
|
AM1BCC |
Dioxane |
|
AM1BCC |
Ethylene Glycol |
|
AM1BCC |
Custom Solvents
When you use a custom co-solvent (not in the library), PolyzyMD will:
Generate the molecule from your SMILES string
Compute AM1BCC partial charges (this may take a few seconds)
Cache the parameterized molecule to
~/.polyzymd/solvent_cache/Reuse the cached version for all future simulations
co_solvents:
- name: "my_custom_solvent"
smiles: "CC(=O)OCC" # First use: computes and caches charges
concentration: 0.5 # Future uses: loads from cache instantly
Managing the Cache
You can inspect and manage the solvent cache programmatically:
from polyzymd.data import list_available_solvents, clear_cache
# List all available solvents (bundled + cached)
solvents = list_available_solvents()
print(solvents)
# {'bundled': ['dmso', 'ethanol', ...], 'cached': ['my_custom_solvent']}
# Clear the user cache (does not affect bundled solvents)
clear_cache()
The user cache location is ~/.polyzymd/solvent_cache/. You can safely delete this directory to force re-computation of custom solvents.
Restraints Configuration
restraints:
- type: "flat_bottom" # Restraint type
name: "substrate_active_site" # Identifier
atom1:
selection: "resid 77 and name OG" # First atom selection
description: "Catalytic serine" # Optional description
atom2:
selection: "resname LIG and name C1"
description: "Substrate carbon"
distance: 3.3 # Angstroms
force_constant: 10000.0 # kJ/mol/nm²
enabled: true # Enable/disable
See Restraints Guide for detailed selection syntax.
Restraint Types
Type |
Description |
|---|---|
|
No force within threshold, harmonic beyond |
|
Harmonic potential at target distance |
|
Prevent distance exceeding threshold |
|
Prevent distance below threshold |
Thermodynamics Configuration
thermodynamics:
temperature: 300.0 # Kelvin
pressure: 1.0 # atmospheres
Simulation Phases Configuration
simulation_phases:
equilibration:
ensemble: "NVT" # NVT, NPT, or NVE
duration: 1.0 # nanoseconds
samples: 100 # frames to save
time_step: 2.0 # femtoseconds
thermostat: "LangevinMiddle"
thermostat_timescale: 1.0 # picoseconds
production:
ensemble: "NPT"
duration: 100.0 # nanoseconds total
samples: 2500 # total frames
time_step: 2.0
thermostat: "LangevinMiddle"
thermostat_timescale: 1.0
barostat: "MC" # Monte Carlo barostat
barostat_frequency: 25 # steps between barostat moves
segments: 10 # Split production into segments
Ensembles
Ensemble |
Description |
|---|---|
|
Constant volume, temperature |
|
Constant pressure, temperature |
|
Microcanonical (no thermostat) |
Thermostats
Thermostat |
Description |
|---|---|
|
Langevin integrator (recommended) |
|
Standard Langevin |
|
Andersen thermostat |
|
Nosé-Hoover chain |
Barostats
Barostat |
Description |
|---|---|
|
Monte Carlo barostat (recommended) |
|
Monte Carlo anisotropic |
Output Configuration
Environment variables ($USER, $HOME, ${VAR}) and ~ are automatically expanded in path fields.
output:
# Directory structure - environment variables are expanded automatically
projects_directory: "/projects/$USER/polyzymd" # Scripts, logs
scratch_directory: "/scratch/alpine/$USER/simulations" # Trajectories
# You can also use ~ for home directory
# projects_directory: "~/polyzymd"
# Subdirectories within projects_directory
job_scripts_subdir: "job_scripts"
slurm_logs_subdir: "slurm_logs"
# Naming
naming_template: "{enzyme}_{substrate}_{polymer_type}_{temperature}K_run{replicate}"
# Output options
save_checkpoint: true # Save restart files
save_state_data: true # Save energy/temperature CSV
trajectory_format: "dcd" # dcd or xtc
Naming Template Variables
Variable |
Description |
Example |
|---|---|---|
|
Enzyme name |
“LipA” |
|
Substrate name |
“ResorufinButyrate” |
|
Polymer type |
“SBMA-EGPMA” |
|
Temperature in K |
“300” |
|
Replicate number |
“1” |
Force Field Configuration
force_field:
protein: "ff14sb_off_impropers_0.0.4.offxml" # Protein force field
small_molecule: "openff-2.0.0.offxml" # Ligand/polymer force field
Available Force Fields
Protein:
ff14sb_off_impropers_0.0.4.offxml- Amber ff14SB (recommended)
Small Molecule:
openff-2.0.0.offxml- OpenFF Sage 2.0 (recommended)openff-2.1.0.offxml- OpenFF Sage 2.1
Key Collision Warnings
When building systems with both proteins and small molecules, you may see warnings like:
Key collision with different parameters, fixing. Key is [#6X4:1]-[#1:2]
This is expected behavior and does not indicate a problem.
Why This Happens
PolyzyMD uses different force fields for different molecule types:
Proteins: ff14SB (Amber force field ported to OpenFF format)
Small molecules: OpenFF Sage 2.0 (general small molecule force field)
When these force fields are combined, the same SMIRKS pattern (e.g., [#6X4:1]-[#1:2] for sp³ carbon-hydrogen bonds) may appear in both, but with different parameter values. This is expected because:
ff14SB was optimized for protein behavior
OpenFF Sage was optimized for general organic molecules
Both are valid parameterizations for their respective domains
How OpenFF Handles This
OpenFF Interchange detects these collisions and resolves them by appending _DUPLICATE to the key, allowing both parameter sets to coexist:
# Simplified OpenFF behavior
if key in existing_parameters:
if parameters_are_identical:
pass # No action needed
else:
key.id += "_DUPLICATE" # Keep both parameter sets
This ensures that:
Protein atoms use ff14SB parameters
Small molecule atoms use OpenFF Sage parameters
The simulation runs correctly with appropriate parameters for each molecule type
What You’ll See in Logs
With PolyzyMD’s logging, you can identify which molecule combinations trigger collisions:
Combining 7 component Interchange(s)
Components: LipA, ResorufinButyrate, EGPMA-SBMA_AAABA, ..., dmso, water/ions
[DEBUG] Combining 'LipA' with 'ResorufinButyrate'...
Key collision with different parameters, fixing. Key is [#6X4:1]-[#1:2]
...
Collisions typically occur when combining protein Interchanges (using ff14SB) with small molecule Interchanges (using OpenFF Sage).
Further Reading
For more details on this behavior, see the OpenFF Interchange documentation:
Complete Example
See the example configurations in src/polyzymd/configs/examples/:
enzyme_only.yaml- Enzyme + substrate, no polymersenzyme_polymer.yaml- Full enzyme + polymer simulationenzyme_cosolvent.yaml- Enzyme with DMSO co-solvent
See Also
Dynamic Polymer Generation - Dynamic polymer generation from SMILES
GROMACS Export and Simulation - Running simulations with GROMACS
Polymer Setup Guide - Polymer setup guide
Restraints Guide - Atom selection and restraints
CLI Reference - CLI documentation