Package Structure and Development Guide
This guide explains how PolyzyMD is structured as a Python package, the design decisions behind it, and how to contribute or create similar packages.
Overview
PolyzyMD is designed to be:
pip-installable:
pip install polyzymdpixi-first for heavy scientific dependencies: use
pixi install -e <env>for the reproducible OpenMM/OpenFF stackTestable in CI: Works with GitHub Actions for automated testing
Developer-friendly: Clear structure for contributors
This document explains the key concepts that make this work.
Package Layout
The src/ Layout
PolyzyMD uses a src layout, which is considered best practice for pip-installable packages:
polyzymd/ # Repository root
├── src/
│ └── polyzymd/ # The actual Python package
│ ├── __init__.py
│ ├── config/
│ ├── builders/
│ ├── simulation/
│ └── ...
├── tests/ # Test suite (outside the package)
├── docs/ # Documentation
├── devtools/ # Development tools and helper scripts
├── pyproject.toml # Package metadata and build config
├── README.md
└── LICENSE
Why src/ Layout?
The src/ layout has several advantages over a “flat” layout:
Aspect |
src/ Layout |
Flat Layout |
|---|---|---|
Import safety |
Forces testing against installed package |
Can accidentally import local directory |
Editable installs |
Works correctly with |
May have import conflicts |
Clear separation |
Package code is isolated |
Package mixed with repo files |
Example of the problem with flat layout:
# With flat layout, this might import local files instead of installed package:
import polyzymd # Which polyzymd? Local or installed?
With src/ layout, you must install the package (even in editable mode) before imports work, ensuring you always test the real package.
Alternative: Flat Layout
Some packages (like Polymerist) use a flat layout:
polymerist/ # Repository root
├── polymerist/ # Package has same name as repo
│ ├── __init__.py
│ └── ...
├── pyproject.toml
└── ...
This is simpler but requires more care to avoid import issues.
The pyproject.toml File
The pyproject.toml file is the modern standard for Python package configuration. It replaces the older setup.py approach.
Key Sections
Build System
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
This tells pip which tool to use for building the package. Common options:
hatchling: Modern, fast, good defaults (what PolyzyMD uses)
setuptools: Traditional, widely supported
flit: Simple, minimal configuration
Project Metadata
[project]
name = "polyzymd"
version = "1.0.0"
description = "Molecular dynamics simulation toolkit for enzyme-polymer systems"
readme = "README.md"
license = "MIT"
authors = [
{ name = "Joseph R. Laforet Jr.", email = "jola3134@colorado.edu" }
]
requires-python = ">=3.10"
Dependencies
dependencies = [
# Core dependencies that pip can install
"pydantic>=2.0.0",
"pyyaml>=6.0",
"click>=8.0.0",
"numpy>=1.21.0,<2.0.0",
]
[project.optional-dependencies]
# Groups of optional dependencies
dev = ["pytest>=7.0.0", "ruff>=0.1.0"]
docs = ["sphinx>=6.0.0", "myst-parser>=1.0.0"]
Entry Points (CLI Commands)
[project.scripts]
polyzymd = "polyzymd.cli.main:main"
This creates the polyzymd command that users can run from terminal.
Handling Heavy Dependencies
The Challenge
PolyzyMD depends on packages that are difficult or impossible to install via pip:
OpenMM: GPU-accelerated MD engine (resolved from conda-forge via pixi)
OpenFF Toolkit: Force field tools (resolved from conda-forge via pixi)
RDKit: Chemistry toolkit (managed in the pixi environment)
PACKMOL: Molecular packing (resolved from conda-forge via pixi)
AmberTools: Optional, for AM1-BCC charging backend (resolved from conda-forge via pixi)
If we list these in dependencies, pip install polyzymd will fail.
The Solution: Lazy Imports
We use lazy imports so the package can be imported without these heavy dependencies:
# polyzymd/__init__.py
__version__ = "1.0.0"
def __getattr__(name):
"""Lazy import heavy modules only when accessed."""
if name == "SystemBuilder":
from polyzymd.builders.system_builder import SystemBuilder
return SystemBuilder
raise AttributeError(f"module 'polyzymd' has no attribute {name!r}")
This means:
import polyzymd # Works without OpenFF installed
print(polyzymd.__version__) # Works - just returns "1.0.0"
from polyzymd import SystemBuilder # NOW imports OpenFF (fails if not installed)
Why This Matters
CI can run basic tests without a full simulation environment
Users can install the package and see helpful error messages
Documentation builds don’t require simulation dependencies
Recommended Installation
We recommend users install via pixi so the heavy scientific dependencies are
resolved reproducibly from pixi.toml:
git clone https://github.com/joelaforet/polyzymd.git
cd polyzymd
pixi install -e build
pixi shell -e build
Optional Charging Backends
PolyzyMD defaults to NAGL for partial charge assignment, which is fast and doesn’t require additional dependencies beyond the OpenFF stack.
For AM1-BCC charges, you can optionally install additional backends:
AmberTools (sqm)
AmberTools is already included in the PolyzyMD pixi environments, so no extra installation step is usually required.
Then use in your code:
from polyzymd.utils.charging import get_charger
# Use AmberTools for AM1-BCC
charger = get_charger("am1bcc", toolkit="ambertools")
charged_mol = charger.charge_molecule(molecule)
OpenEye Toolkit (Commercial)
If you have an OpenEye license:
charger = get_charger("am1bcc", toolkit="openeye")
Note: NAGL is recommended for most use cases as it’s faster and produces comparable results to AM1-BCC.
Continuous Integration (CI)
Two-Tier CI Strategy
PolyzyMD uses a two-tier CI approach:
Tier 1: Basic CI (Runs on Every PR)
Fast checks that don’t require conda:
Linting: Code style with
ruffType checking: Static analysis with
mypyBuild verification: Ensure package builds correctly
Import tests: Test that config module imports
# .github/workflows/ci.yml
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
- run: pip install ruff
- run: ruff check src/polyzymd/
build:
runs-on: ubuntu-latest
steps:
- run: pip install build
- run: python -m build
Tier 2: Full CI (Runs Weekly or on Release)
Comprehensive tests with the full pixi-managed simulation environment:
# .github/workflows/full-test.yml
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: prefix-dev/setup-pixi@v0
- run: pixi install -e build
- run: pixi run -e build pip install . --no-deps
- run: pixi run -e build pytest tests/
The Pixi Manifest
Environment definitions live in pixi.toml:
[workspace]
name = "polyzymd"
channels = ["conda-forge"]
[environments]
build = { features = ["base", "build", "test", "docs"] }
cuda-12-4 = { features = ["base", "cuda-12-4", "test", "docs"] }
cuda-12-6 = { features = ["base", "cuda-12-6", "test", "docs"] }
Version Management
Simple Approach (Current)
PolyzyMD uses a simple hardcoded version:
# polyzymd/__init__.py
__version__ = "1.0.0"
# pyproject.toml
[project]
version = "1.0.0"
When releasing a new version, update both files.
Dynamic Versioning (Alternative)
Some packages use tools like versioningit or setuptools-scm to derive version from git tags:
[project]
dynamic = ["version"]
[tool.versioningit]
default-version = "1+unknown"
[tool.versioningit.write]
file = "polymerist/_version.py"
from importlib.metadata import version
__version__ = version(__name__)
Pros: Version always matches git tags, no manual updates Cons: More complex, requires understanding of the tooling
Creating a Release
Steps for a New Release
Update version numbers
# Edit pyproject.toml and src/polyzymd/__init__.py # Change version = "1.0.0" to version = "1.1.0"
Commit the version bump
git add pyproject.toml src/polyzymd/__init__.py git commit -m "Bump version to 1.1.0" git push origin main
Create and push a git tag
git tag -a v1.1.0 -m "Release 1.1.0" git push origin v1.1.0
GitHub Actions handles the rest
Builds the package
Publishes to PyPI
Creates GitHub release
PyPI Publishing
The release workflow uses “trusted publishing” - no API tokens needed:
# .github/workflows/release.yml
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
# Uses OIDC authentication - configure at pypi.org
To set up trusted publishing:
Go to https://pypi.org/manage/project/polyzymd/settings/publishing/
Add a new “pending publisher” with:
Owner:
joelaforetRepository:
polyzymdWorkflow:
release.yml
Directory Structure Reference
polyzymd/
├── .github/
│ └── workflows/
│ ├── ci.yml # Basic CI (lint, build, import tests)
│ ├── release.yml # Publish to PyPI on tag
│ └── full-test.yml # Full pixi test suite
│
├── devtools/ # Optional helper scripts / dev tooling
│
├── docs/
│ ├── source/
│ │ ├── conf.py # Sphinx configuration
│ │ ├── index.rst # Documentation index
│ │ └── tutorials/ # User guides
│ └── requirements.txt # Docs build dependencies
│
├── src/
│ └── polyzymd/
│ ├── __init__.py # Package init (lazy imports)
│ ├── config/ # Configuration (pydantic schemas)
│ ├── builders/ # System building
│ ├── simulation/ # MD runners
│ ├── workflow/ # SLURM/HPC integration
│ ├── exporters/ # GROMACS export
│ ├── cli/ # Command-line interface
│ └── data/ # Bundled data files
│
├── tests/
│ ├── __init__.py
│ ├── test_imports.py # Basic import tests
│ └── test_config.py # Configuration tests
│
├── .gitignore
├── .readthedocs.yaml # ReadTheDocs configuration
├── LICENSE
├── README.md
└── pyproject.toml # Package metadata and build config
Best Practices Summary
Do
Use
src/layout for clear package boundariesUse lazy imports for heavy dependencies
Keep
__init__.pylightweightSeparate fast CI (lint) from slow CI (full tests)
Use
pyproject.tomlfor all configurationDocument installation requirements clearly
Don’t
Don’t assume pip alone can provision the full OpenMM/OpenFF stack
Don’t eagerly import heavy modules at package init
Don’t mix test artifacts with source code
Don’t hardcode paths - use
importlib.resourcesfor data filesDon’t commit generated files (
.pyc,__pycache__, etc.)
Further Reading
See Also
Contributing to PolyzyMD - How to contribute to PolyzyMD
Architecture - Internal architecture overview
Install PolyzyMD with pixi - User installation guide