OpenFF PDB ingestion reference
This reference separates OpenFF chemistry requirements from PolyzyMD enzyme-input
expectations and lists known error signatures for protein PDB ingestion through
openff.toolkit.Topology.from_pdb().
OpenFF chemistry requirements
OpenFF does not require PolyzyMD’s chain IDs. It requires a PDB whose inferred chemical graph can be matched to supported residue chemistry.
Requirement |
Expected state |
Notes |
|---|---|---|
Hydrogens |
Explicit |
OpenFF protein PDB ingestion expects chemically complete hydrogens |
TER records |
Present where fragments are disconnected |
Mature cleaved proteins may have multiple fragments |
Missing residues |
Curated intentionally |
Header records such as |
Disulfides |
SG-SG connectivity clear; no SG-HG proton |
Verify |
Direct validation |
|
Run this before relying on PolyzyMD build steps |
PolyzyMD enzyme-input expectations
PolyzyMD uses chain IDs to assign biological roles during system building and analysis. These are project conventions, not OpenFF parser requirements.
Role |
PolyzyMD chain convention |
Notes |
|---|---|---|
Protein/enzyme |
|
The enzyme PDB passed to OpenFF is usually protein-only on chain |
Substrate |
|
Usually kept separate from the enzyme PDB and configured as substrate input |
Polymer |
|
Used for conjugates and polymer-specific selections |
Solvent/ions/other |
|
Usually generated or handled outside the enzyme PDB |
An enzyme PDB can satisfy PolyzyMD chain conventions and still fail OpenFF ingestion if the residue graph, hydrogens, termini, or disulfide connectivity do not match supported chemistry.
OpenFF disulfide behavior
CYXmay be accepted as a cysteine-like residue alias during parsing, but it is not a stable public OpenFF residue template for every disulfide case.Disulfide cysteine SG atoms should be bonded to each other and should not have an attached
HGproton.SSBONDrecords identify intended disulfides.CONECTrecords can make the SG-SG bond explicit for parser paths that depend on connectivity.N-terminal cystines combine terminal hydrogens with disulfide chemistry and can expose template/charge mismatches.
Custom substructures JSON
PolyzyMD’s enzyme.custom_substructures_path loads JSON and passes it to
Topology.from_pdb(..., _custom_substructures=...).
Warning
_custom_substructures is a private/experimental OpenFF API. Treat examples as
proofs of concept or upstream-PR candidates, not as stable public OpenFF support.
Shape:
{
"RESNAME": {
"[SMARTS:1]": ["ATOM1"]
}
}
Each residue name maps to SMARTS patterns, and each SMARTS pattern maps to the corresponding PDB atom names for that residue.
Charge diagnostics
Charge mismatch messages are blockers. They usually mean one of these is wrong:
protonation state or terminal hydrogen count
disulfide SG-HG or SG-SG bonding
residue atom naming
missing heavy atoms or missing residues
ambiguous TER, SSBOND, or CONECT records
a custom substructure that does not match the PDB atom graph
Acceptable fixes are chemically explicit: curate the PDB, correct hydrogens and connectivity, model missing atoms when scientifically justified, or document a narrow custom-substructure proof of concept. Do not suppress the error.
Running error catalog
Exact signature |
Likely cause |
Diagnostic |
Acceptable fix |
Caveats |
|---|---|---|---|---|
|
OpenFF matched a residue graph whose formal charge differs from the PDB graph |
Inspect the named residue’s atom list, bonds, hydrogens, TER records, and disulfide records |
Correct residue chemistry or use a reviewed custom substructure proof of concept |
Do not ignore; private custom substructures are not stable API |
Error dump names |
N-terminal cysteine has terminal hydrogens plus disulfide chemistry that does not match OpenFF’s template |
Check SG-HG absence, SG-SG bond, N-terminal hydrogens, and residue naming |
Curate the cystine or test a structure-specific |
Seen in 4CHA proof of concept; not universal |
Renaming disulfide cysteine to |
|
Validate direct OpenFF ingestion and inspect charge mismatch |
Fix connectivity/hydrogens or prepare an upstream OpenFF issue/PR |
Avoid relying on residue rename alone |
Failure adjacent to residues listed in |
Missing-coordinate residues or missing heavy atoms alter termini or local chemistry |
Read PDB header and visualize gaps |
Model missing regions externally if required for the study |
Automatic filling is a modeling decision |
Catalog maintenance rule
When a new OpenFF PDB ingestion error is diagnosed, update this table and Troubleshoot OpenFF PDB ingestion with the exact error text, likely cause, diagnostic command, acceptable fix, and caveats before closing the task unless the user explicitly defers the durable documentation update.