Builders Module
System Builder
System builder for orchestrating the complete simulation system construction.
This module coordinates all builders to create a complete, solvated molecular system and generate the OpenFF Interchange for simulation.
- class polyzymd.builders.system_builder.SystemBuilder(protein_forcefield='ff14sb_off_impropers_0.0.4.offxml', small_molecule_forcefield='openff-2.0.0.offxml')[source]
Bases:
objectOrchestrator for building complete simulation systems.
This class coordinates the 5-stage build pipeline: 1. Enzyme loading and partitioning 2. Substrate loading and charging 3. Polymer generation and packing 4. Solvation with water, ions, and co-solvents 5. Interchange creation with optimized combining
The Interchange.combine() optimization is used when polymers are present, which significantly speeds up parameterization for systems with many unique molecules.
Example
>>> builder = SystemBuilder.from_config(config) >>> interchange = builder.build() >>> # Or step by step: >>> builder = SystemBuilder() >>> builder.build_enzyme("enzyme.pdb") >>> builder.build_substrate("docked.sdf") >>> builder.build_polymers(...) >>> builder.solvate() >>> interchange = builder.create_interchange()
- __init__(protein_forcefield='ff14sb_off_impropers_0.0.4.offxml', small_molecule_forcefield='openff-2.0.0.offxml')[source]
Initialize the SystemBuilder.
- classmethod from_config(config)[source]
Create a SystemBuilder from a configuration object.
- Parameters:
config (SimulationConfig) – SimulationConfig with all system settings.
- Returns:
Configured SystemBuilder instance.
- Return type:
- build_substrate(sdf_path, conformer_index=0, charge_method='nagl', residue_name='LIG')[source]
Build the substrate component.
- build_polymers(characters, probabilities, length, count, type_prefix, sdf_directory=None, seed=None, generation_mode='cached', monomer_smiles=None, monomer_names=None, residue_names=None, reactions=None, charger_type='nagl', max_retries=10, cache_directory=None)[source]
Build polymer components.
Supports two generation modes: - “cached”: Load pre-built polymer SDFs from sdf_directory - “dynamic”: Generate polymers on-the-fly from monomer SMILES
- Parameters:
probabilities (List[float]) – Monomer selection probabilities.
length (int) – Monomers per chain.
count (int) – Number of polymer chains.
type_prefix (str) – Filename prefix.
sdf_directory (str | Path | None) – Directory with pre-built polymer SDFs (cached mode).
seed (int | None) – Random seed for reproducibility.
generation_mode (str) – “cached” or “dynamic”.
monomer_smiles (Dict[str, str] | None) – Dict of monomer name -> SMILES (dynamic mode).
monomer_names (Dict[str, str] | None) – Dict of label -> monomer name (dynamic mode).
residue_names (Dict[str, str] | None) – Dict of monomer name -> 3-char residue name.
reactions (Any | None) – ReactionConfig with ATRP reaction paths (dynamic mode).
charger_type (str) – Charge method (“nagl”, “espaloma”, “am1bcc”).
max_retries (int) – Max retries for polymer generation.
cache_directory (str | Path | None) – Directory for caching generated polymers.
- Returns:
Tuple of (unique polymer molecules, counts).
- Return type:
- combine_solutes()[source]
Combine enzyme, substrate, and polymers into a single topology.
- Returns:
Combined topology ready for solvation.
- Raises:
RuntimeError – If enzyme has not been built.
- Return type:
openff.toolkit.Topology
- pack_polymers(padding=2.0, tolerance=2.0, movebadrandom=False, working_directory=None, box_vectors_nm=None)[source]
Pack polymers around the combined solute topology.
- Parameters:
padding (float) – Box padding in nm. Larger values give polymers more room and can significantly speed up PACKMOL convergence. Ignored when box_vectors_nm is provided.
tolerance (float) – PACKMOL tolerance in Angstrom.
movebadrandom (bool) – When True, pass the
movebadrandomkeyword to PACKMOL. Improves convergence for dense or heterogeneous polymer systems (many unique chain types) by placing badly-packed molecules at random positions in the box.working_directory (str | Path | None) – Directory for PACKMOL files.
box_vectors_nm (List[float] | None) – Optional explicit box dimensions
[Lx, Ly, Lz]in nanometers. When provided, overrides the auto-computed bounding box + padding. The protein is centered at the midpoint of this box.
- Returns:
Topology with polymers packed.
- Raises:
RuntimeError – If solutes not combined or no polymers built.
- Return type:
openff.toolkit.Topology
- solvate(composition=None, padding=0.9, box_shape='rhombic_dodecahedron')[source]
Solvate the system with water and ions.
- Parameters:
composition (SolventComposition | None) – Solvent composition specification.
padding (float) – Box padding in nm.
box_shape (str) – Box geometry.
- Returns:
Solvated topology.
- Raises:
RuntimeError – If solutes not combined.
- Return type:
openff.toolkit.Topology
- create_interchange(use_optimized_combining=True, use_batched_combining=False)[source]
Create the OpenFF Interchange for simulation.
By default, passes the entire solvated topology to a single
ForceField.create_interchange()call. OpenFF’s internalidentical_molecule_groupsmechanism deduplicates SMIRKS matching and charge assignment so that work scales with the number of unique molecule types, not total molecules.Pre-computed charges are supplied via
charge_from_moleculesfor water, polymers, substrate, and co-solvents so that no AM1BCC calculations are triggered at runtime.The legacy batched approach (multiple
create_interchange()calls joined byInterchange.combine()) is available viause_batched_combining=Truefor A/B benchmarking.- Parameters:
- Returns:
OpenFF Interchange object.
- Raises:
RuntimeError – If system not solvated.
- Return type:
openff.interchange.Interchange
- save_topology(path, topology=None)[source]
Save a topology to PDB file.
Assigns PDB-compliant chain IDs and residue numbers before writing to ensure downstream analysis tools can uniquely identify all atoms.
- build_from_config(config, working_dir=None, polymer_seed=None)[source]
Build the complete system from a configuration.
This is the main entry point for config-driven builds.
- Parameters:
config (SimulationConfig) – SimulationConfig with all settings.
working_dir (Optional[Union[str, Path]]) – Working directory for output files.
polymer_seed (Optional[int]) – Random seed for polymer generation. This is used as a fallback if config.polymers.random_seed is not set.
- Returns:
OpenFF Interchange ready for simulation.
- Return type:
Interchange
- get_openmm_components()[source]
Extract OpenMM components from the Interchange.
- Returns:
Tuple of (topology, system, positions).
- Raises:
RuntimeError – If Interchange not created.
- Return type:
- export_to_gromacs(output_dir, prefix=None, gmx_command='gmx', generate_mdps=True)[source]
Export the system to GROMACS format with full simulation setup.
Generates a complete GROMACS simulation setup including: - .gro (coordinates) and .top (topology) files - MDP files for energy minimization, equilibration, and production - Position restraint files for equilibration stages - Run script for executing the full workflow
The topology is split into separate .itp files for each molecule type (monolithic=False), which is cleaner for multi-component systems.
MDP files are generated from config.yaml parameters to match OpenMM simulation settings (temperature, pressure, duration, etc.). OpenFF defaults are used for force field parameters (rcoulomb=0.9, rvdw=0.9, PME, etc.) to ensure 1:1 parity with OpenMM.
- Parameters:
output_dir (str | Path) – Directory to write GROMACS files. Will be created if it doesn’t exist.
prefix (str | None) – Filename prefix for output files. If None, generates a descriptive name from the config (e.g., “LipA_EGPMA-SBMA”).
gmx_command (str) – GROMACS command/path for the run script (default “gmx”).
generate_mdps (bool) – If True (default), generate MDP files from config. If False, only export coordinates and topology.
- Returns:
“gro”: Path to coordinate file
”top”: Path to topology file
”em_mdp”: Path to energy minimization MDP (if generate_mdps=True)
”eq_mdps”: List of equilibration MDP paths (if generate_mdps=True)
”prod_mdp”: Path to production MDP (if generate_mdps=True)
”posres”: Dict of position restraint files (if applicable)
”run_script”: Path to run script (if generate_mdps=True)
- Return type:
Dictionary with paths to generated files
- Raises:
RuntimeError – If Interchange has not been created.
Example
>>> builder = SystemBuilder.from_config(config) >>> builder.build_from_config(config, working_dir) >>> result = builder.export_to_gromacs( ... output_dir="gromacs/", ... prefix="my_system" ... ) >>> print(f"Run: cd {result['gro'].parent} && ./{result['run_script'].name}")
- get_component_info()[source]
Get system component information for atom group resolution.
This method returns a SystemComponentInfo dataclass containing the atom counts and chain assignments for each system component (protein, substrate, polymers, solvent). This information is needed by the AtomGroupResolver to resolve predefined atom group names to indices.
- Returns:
SystemComponentInfo with atom counts and chain assignments
- Raises:
RuntimeError – If solvated topology not created.
- Return type:
SystemComponentInfo
Enzyme Builder
Builder for enzyme/protein components.
This module handles loading PDB structures and partitioning them for use with OpenFF force fields.
- class polyzymd.builders.enzyme.EnzymeBuilder[source]
Bases:
objectBuilder for loading and preparing enzyme structures.
This class handles: - Loading PDB structures into OpenFF Topology - Basic validation of the loaded structure
The PDB file should be properly prepared with: - Correct protonation states - Standard amino acid residue names - Sequential residue numbering
Example
>>> builder = EnzymeBuilder() >>> topology = builder.build("path/to/enzyme.pdb") >>> print(f"Loaded enzyme with {topology.n_atoms} atoms")
- build(pdb_path)[source]
Load an enzyme structure from a PDB file.
- Parameters:
- Returns:
OpenFF Topology with the enzyme structure.
- Raises:
FileNotFoundError – If the PDB file does not exist.
- Return type:
openff.toolkit.Topology
- build_from_config(config)[source]
Load enzyme from a configuration object.
- Parameters:
config (EnzymeConfig) – EnzymeConfig with PDB path.
- Returns:
OpenFF Topology with the enzyme structure.
- Return type:
Topology
- get_molecule()[source]
Get the first (and typically only) molecule from the topology.
- Returns:
The enzyme molecule.
- Raises:
RuntimeError – If no topology has been loaded.
- Return type:
openff.toolkit.Topology
- validate()[source]
Validate the loaded enzyme topology.
- Returns:
True if validation passes.
- Raises:
RuntimeError – If no topology has been loaded.
ValueError – If validation fails.
- Return type:
Substrate Builder
Builder for substrate/ligand components.
This module handles loading docked conformers from SDF files and assigning partial charges using various methods (NAGL, Espaloma, AM1BCC).
- class polyzymd.builders.substrate.SubstrateBuilder[source]
Bases:
objectBuilder for loading and preparing substrate/ligand structures.
This class handles: - Loading docked conformers from SDF files - Selecting specific conformers - Assigning partial charges using NAGL, Espaloma, or AM1BCC - Setting residue metadata
Example
>>> builder = SubstrateBuilder() >>> molecule = builder.build( ... sdf_path="docked.sdf", ... conformer_index=0, ... charge_method="nagl" ... ) >>> print(f"Loaded substrate with {molecule.n_atoms} atoms")
- property all_conformers: List[openff.toolkit.Molecule] | None
Get all conformers loaded from the SDF file.
- build(sdf_path, conformer_index=0, charge_method='nagl', residue_name='LIG')[source]
Load a substrate conformer and assign partial charges.
- Parameters:
- Returns:
OpenFF Molecule with assigned partial charges.
- Raises:
FileNotFoundError – If the SDF file does not exist.
IndexError – If the conformer index is out of range.
ValueError – If the charge method is not supported.
- Return type:
openff.toolkit.Molecule
- build_from_config(config)[source]
Load substrate from a configuration object.
- Parameters:
config (SubstrateConfig) – SubstrateConfig with SDF path and options.
- Returns:
OpenFF Molecule with assigned partial charges.
- Return type:
Molecule
- get_n_conformers()[source]
Get the number of conformers available in the loaded SDF.
- Returns:
Number of conformers.
- Raises:
RuntimeError – If no SDF has been loaded.
- Return type:
- validate()[source]
Validate the loaded substrate molecule.
- Returns:
True if validation passes.
- Raises:
RuntimeError – If no molecule has been loaded.
ValueError – If validation fails.
- Return type:
Polymer Builder
Builder for polymer components.
This module handles random co-polymer sequence generation, loading pre-built polymer structures from SDF files, and optionally generating new polymers using Polymerist when cached structures are not available.
Supports two generation modes:
Cached: Load pre-built SDF files from disk
Dynamic: Generate polymers on-the-fly using Polymerist from raw monomer SMILES
Made by PolyzyMD, by Joseph R. Laforet Jr.
- polyzymd.builders.polymer.canonical_sequence(sequence)[source]
Get the canonical form of a polymer sequence.
Since polymers can be read in either direction, we use the lexicographically smaller of the sequence and its reverse as the canonical form.
- Parameters:
sequence (str) – Polymer sequence string (e.g., “AABBA”).
- Returns:
Canonical form of the sequence.
- Return type:
Example
>>> canonical_sequence("ABAAA") 'AAABA' # reverse is smaller >>> canonical_sequence("AABBA") 'AABBA' # original is smaller
- polyzymd.builders.polymer.generate_random_sequence(length, characters, probabilities)[source]
Generate a random polymer sequence.
- Parameters:
- Returns:
Random sequence string.
- Return type:
Example
>>> generate_random_sequence(5, ["A", "B"], [0.7, 0.3]) 'AABAA' # example output
- class polyzymd.builders.polymer.PolymerBuilder(characters, probabilities, length, type_prefix, sdf_directory=None, cache_directory=None, allow_generation=False, generation_mode='cached', monomer_smiles=None, monomer_names=None, residue_names=None, reactions=None, charger_type='nagl', max_retries=10)[source]
Bases:
objectBuilder for loading and generating polymer structures.
This class supports two generation modes:
Cached mode (legacy): Load pre-built SDF files from disk - Requires sdf_directory with pre-built polymer files - Filenames:
{type_prefix}_seq={sequence}_{length}-mer_charged.sdfDynamic mode: Generate polymers on-the-fly using Polymerist - Requires monomer SMILES and ATRP reaction templates - Automatically generates fragments, builds chains, assigns charges - Caches results for subsequent runs
- Example (cached mode):
>>> builder = PolymerBuilder( ... characters=["A", "B"], ... probabilities=[0.7, 0.3], ... length=5, ... sdf_directory="polymers/", ... type_prefix="SBMA-EGPMA" ... ) >>> molecules, counts = builder.build(count=10)
- Example (dynamic mode):
>>> builder = PolymerBuilder( ... characters=["A", "B"], ... probabilities=[0.7, 0.3], ... length=5, ... type_prefix="SBMA-EGPMA", ... generation_mode="dynamic", ... monomer_smiles={"SBMA": "...", "EGPMA": "..."}, ... monomer_names={"A": "SBMA", "B": "EGPMA"}, ... reactions=reaction_config, ... ) >>> molecules, counts = builder.build(count=10)
- __init__(characters, probabilities, length, type_prefix, sdf_directory=None, cache_directory=None, allow_generation=False, generation_mode='cached', monomer_smiles=None, monomer_names=None, residue_names=None, reactions=None, charger_type='nagl', max_retries=10)[source]
Initialize the PolymerBuilder.
- Parameters:
characters (List[str]) – Monomer unit labels (e.g., [“A”, “B”]).
probabilities (List[float]) – Selection probability for each monomer.
length (int) – Number of monomers per polymer chain.
type_prefix (str) – Prefix for filenames (e.g., “SBMA-EGPMA”).
sdf_directory (Optional[Union[str, Path]]) – Directory containing pre-built polymer SDFs (cached mode).
cache_directory (Optional[Union[str, Path]]) – Directory for caching generated polymers.
allow_generation (bool) – If True, generate missing polymers (for cached mode fallback).
generation_mode (str) – “cached” for pre-built SDFs, “dynamic” for on-the-fly generation.
monomer_smiles (Optional[Dict[str, str]]) – Dictionary of monomer name -> raw SMILES (dynamic mode).
monomer_names (Optional[Dict[str, str]]) – Dictionary of label -> monomer name (dynamic mode).
residue_names (Optional[Dict[str, str]]) – Dictionary of monomer name -> 3-char PDB residue name.
reactions (Optional['ReactionConfig']) – ReactionConfig with paths to ATRP .rxn files (dynamic mode).
charger_type (str) – Charge method (“nagl”, “espaloma”, “am1bcc”) for dynamic mode.
max_retries (int) – Maximum retries for polymer generation (ring-piercing failures).
- Raises:
ValueError – If probabilities don’t sum to 1.0 or lengths mismatch.
ValueError – If dynamic mode but missing required parameters.
- property loaded_molecules: Dict[str, openff.toolkit.Molecule]
Get loaded polymer molecules keyed by canonical sequence.
- build(count, seed=None)[source]
Generate random polymer sequences and load/create corresponding molecules.
- Parameters:
- Returns:
Tuple of (list of molecules for packing, list of canonical sequences).
- Raises:
FileNotFoundError – If SDF file not found and generation not allowed.
- Return type:
- build_from_config(config, seed=None)[source]
Build polymers from a configuration object.
This method extracts all configuration values and updates the builder state accordingly, then calls build() to generate the polymers.
- Parameters:
config (PolymerConfig) – PolymerConfig with polymer settings.
seed (Optional[int]) – Random seed for reproducibility.
- Returns:
Tuple of (list of unique molecules, list of counts).
- Return type:
Tuple[List[Molecule], List[int]]
- get_packing_info()[source]
Get molecules and counts for PACKMOL packing.
- Returns:
Tuple of (list of unique molecules, list of counts).
- Raises:
RuntimeError – If build() has not been called.
- Return type:
- validate()[source]
Validate the loaded polymers.
- Returns:
True if validation passes.
- Raises:
RuntimeError – If no polymers have been loaded.
ValueError – If validation fails.
- Return type:
Solvent Builder
Builder for solvent components.
This module handles solvation with water, ions, and optional co-solvents using PACKMOL for molecular packing.
Solvent Parameterization
This module uses pre-computed partial charges for solvent molecules to ensure:
Consistency: All copies of a solvent molecule have identical parameters
Speed: Charges are computed once, not N times for N molecules
Reproducibility: Same charges across different simulation runs
For built-in solvents (water models, DMSO, ethanol, etc.), charges are loaded from pre-computed SDF files. For custom solvents, AM1BCC charges are computed once and cached in ~/.polyzymd/solvent_cache/ for future use.
See also
polyzymd.data.solvent_moleculesModule providing pre-parameterized solvents
- class polyzymd.builders.solvent.CoSolvent(name, smiles, volume_fraction=None, concentration=None, density=None, residue_name='COS', molecule=None)[source]
Bases:
objectSpecification for a co-solvent component.
Supports two specification methods: - volume_fraction: Specify as fraction (0-1), e.g., 0.30 for 30% v/v - concentration: Specify as molarity (mol/L)
Note: The molecule is NOT created in __post_init__ to avoid running charge calculations prematurely. Instead, molecules are loaded via get_solvent_molecule() in SolventBuilder.solvate() which uses cached charges to ensure all copies have identical parameters.
- Variables:
name (str) – Identifier for the co-solvent.
smiles (str) – SMILES string for the molecule.
volume_fraction (float | None) – Volume fraction (0-1), mutually exclusive with concentration.
concentration (float | None) – Molar concentration (mol/L), mutually exclusive with volume_fraction.
density (float | None) – Density in g/mL (required for volume_fraction calculation).
residue_name (str) – 3-letter residue name.
molecule (openff.toolkit.Molecule | None) – OpenFF Molecule (assigned later with cached charges).
- __post_init__()[source]
Validate co-solvent specification.
Note: We intentionally do NOT create the molecule here. This is done later in SolventBuilder.solvate() using get_solvent_molecule() which provides proper charge caching.
- __init__(name, smiles, volume_fraction=None, concentration=None, density=None, residue_name='COS', molecule=None)
- class polyzymd.builders.solvent.SolvationCounts(water, na, cl, co_solvents=<factory>)[source]
Bases:
objectMolecule counts from solvation for PDB chain/residue assignment.
This dataclass stores the number of each molecule type added during solvation. It is used by SystemBuilder to assign PDB-compliant chain IDs and residue numbers.
The molecule order in the solvated topology is: [solute molecules] + [water] + [Na+] + [Cl-] + [co-solvent 1] + [co-solvent 2] + …
- Variables:
- property total_solvent_molecules: int
Total number of solvent molecules (water + ions + co-solvents).
- __init__(water, na, cl, co_solvents=<factory>)
- class polyzymd.builders.solvent.SolventComposition(water_model='tip3p', co_solvents=<factory>, nacl_concentration=0.1, kcl_concentration=0.0, mgcl2_concentration=0.0, neutralize=True)[source]
Bases:
objectComplete specification of solvent composition.
- Variables:
water_model (Literal['tip3p', 'spce', 'tip4p', 'tip4pew', 'opc']) – Water model to use.
co_solvents (List[polyzymd.builders.solvent.CoSolvent]) – List of co-solvent specifications.
nacl_concentration (float) – NaCl concentration in mol/L.
kcl_concentration (float) – KCl concentration in mol/L.
mgcl2_concentration (float) – MgCl2 concentration in mol/L.
neutralize (bool) – Whether to neutralize system charge.
- property water_volume_fraction: float
Get the water volume fraction (1 - sum of co-solvent fractions).
Note: Only counts co-solvents specified by volume_fraction. Co-solvents specified by concentration don’t reduce water volume.
- __init__(water_model='tip3p', co_solvents=<factory>, nacl_concentration=0.1, kcl_concentration=0.0, mgcl2_concentration=0.0, neutralize=True)
- class polyzymd.builders.solvent.SolventBuilder[source]
Bases:
objectBuilder for solvating molecular systems.
This class handles: - Water solvation with different water models - Ion addition for neutralization and ionic strength - Co-solvent addition with specified volume fractions - Box shape and padding configuration
Example
>>> builder = SolventBuilder() >>> composition = SolventComposition( ... water_model="tip3p", ... nacl_concentration=0.15, ... co_solvents=[CoSolvent("dmso", "CS(=O)C", 0.1)] ... ) >>> solvated = builder.solvate( ... topology=solute_topology, ... composition=composition, ... padding=1.2, # nm ... )
- property solvation_counts: SolvationCounts | None
Get the molecule counts from solvation.
- Returns:
SolvationCounts with water, ion, and co-solvent molecule counts, or None if solvate() has not been called.
- solvate(topology, composition=None, padding=1.2, box_shape='rhombic_dodecahedron', target_density=1.0, tolerance=2.0)[source]
Solvate a topology with water, ions, and optional co-solvents.
- Parameters:
topology (openff.toolkit.Topology) – OpenFF Topology to solvate.
composition (SolventComposition | None) – Solvent composition specification.
padding (float) – Distance from solute to box edge in nm.
box_shape (Literal['cube', 'rhombic_dodecahedron', 'truncated_octahedron']) – Box geometry.
target_density (float) – Target density in g/mL.
tolerance (float) – Minimum molecular spacing for PACKMOL in Angstrom.
- Returns:
Solvated OpenFF Topology.
- Return type:
openff.toolkit.Topology
- solvate_from_config(topology, config)[source]
Solvate using configuration object.
- Parameters:
topology (Topology) – OpenFF Topology to solvate.
config (SolventConfig) – SolventConfig with solvent settings.
- Returns:
Solvated OpenFF Topology.
- Return type:
Topology
- validate()[source]
Validate the solvated topology.
- Returns:
True if validation passes.
- Raises:
RuntimeError – If no topology has been solvated.
ValueError – If validation fails.
- Return type: