# Quick Start Guide

This guide walks through running your first PolyzyMD simulation.

## Overview

A typical PolyzyMD workflow:

1. **Initialize project** with `polyzymd init`
2. **Add structure files** (PDB, SDF)
3. **Configure simulation** (edit YAML file)
4. **Validate** the configuration
5. **Submit** jobs to HPC cluster

## Step 1: Initialize Your Project

The easiest way to start is with the `init` command, which creates a complete
project scaffold:

```bash
polyzymd init --name my_simulation
```

This creates:

```
my_simulation/
├── config.yaml              <- Template configuration (edit this)
├── structures/              <- Add your PDB/SDF files here
│   ├── place_protein_here.placeholder.txt
│   └── place_ligand_here.placeholder.txt
├── job_scripts/             <- SLURM scripts will go here
└── slurm_logs/              <- Job output will go here
```

```{tip}
The `config.yaml` template has all sections commented out with example values.
This teaches you the configuration syntax while you customize it for your system.
```

## Step 2: Add Your Structure Files

Copy your prepared structure files into the `structures/` directory:

```bash
cd my_simulation

# Copy your enzyme PDB
cp /path/to/my_enzyme.pdb structures/enzyme.pdb

# Copy your substrate SDF (if using)
cp /path/to/my_substrate.sdf structures/substrate.sdf

# Remove the placeholder files
rm structures/*.placeholder.txt
```

### Enzyme PDB Requirements

Your enzyme PDB must be **simulation-ready**:

- All hydrogens added
- Missing residues/atoms modeled  
- Proper protonation states
- No alternate conformations
- No crystallographic waters/ligands (unless intentional)

**Recommended preparation tools:**
- [PDB2PQR](https://server.poissonboltzmann.org/) - Protonation
- [CHARMM-GUI](https://www.charmm-gui.org/) - Full preparation
- PyMOL/Chimera - Manual inspection

### Substrate SDF Requirements (if using)

- 3D coordinates (from docking or crystal structure)
- Correct protonation state for simulation pH
- Single conformer (or specify `conformer_index` in config)

## Step 3: Configure Your Simulation

Open `config.yaml` in your editor. The template has all sections commented out
with placeholder values. You need to:

1. **Uncomment** the sections you need
2. **Replace** placeholder values with your actual data
3. **Leave commented** sections you don't need

### Minimal Configuration (Enzyme Only)

For a simple enzyme-in-water simulation, uncomment and edit these sections:

```yaml
name: "my_first_simulation"
description: "Testing PolyzyMD with my enzyme"

# Enzyme - REQUIRED
enzyme:
  name: "MyEnzyme"
  pdb_path: "structures/enzyme.pdb"

# Solvent - REQUIRED
solvent:
  primary:
    type: "water"
    model: "tip3p"
  co_solvents: []
  ions:
    neutralize: true
    nacl_concentration: 0.15
  box:
    padding: 1.2
    shape: "rhombic_dodecahedron"
    target_density: 1.0
    tolerance: 2.0

# Thermodynamics - REQUIRED
thermodynamics:
  temperature: 300.0
  pressure: 1.0

# Simulation phases - REQUIRED
simulation_phases:
  equilibration:
    ensemble: "NVT"
    duration: 1.0
    samples: 100
    time_step: 2.0
    thermostat: "LangevinMiddle"
    thermostat_timescale: 1.0
  production:
    ensemble: "NPT"
    duration: 10.0       # Short for testing
    samples: 250
    time_step: 2.0
    thermostat: "LangevinMiddle"
    thermostat_timescale: 1.0
    barostat: "MC"
    barostat_frequency: 25


# Output - REQUIRED
output:
  projects_directory: "."
  scratch_directory: null
  job_scripts_subdir: "job_scripts"
  slurm_logs_subdir: "slurm_logs"
  naming_template: "{enzyme}_{temperature}K_run{replicate}"
  save_checkpoint: true
  save_state_data: true
  trajectory_format: "dcd"
```

### Adding a Substrate

To include a docked ligand, uncomment the substrate section:

```yaml
substrate:
  name: "MyLigand"
  sdf_path: "structures/substrate.sdf"
  conformer_index: 0
  charge_method: "nagl"
  residue_name: "LIG"
```

### Adding Polymers

To add co-polymer chains around your enzyme:

```yaml
polymers:
  enabled: true
  type_prefix: "SBMA-EGPMA"
  monomers:
    - label: "A"
      probability: 0.98
      name: "SBMA"
    - label: "B"
      probability: 0.02
      name: "EGPMA"
  length: 5
  count: 2
  cache_directory: ".polymer_cache"
```

```{warning}
**YAML List Syntax**

When defining monomers (or any list), each `-` starts a **new list item**. All fields for one monomer must be grouped together:

**Incorrect:**
~~~yaml
monomers:
  - label: "A"
  - probability: 0.98
  - name: "SBMA"
~~~

**Correct:**
~~~yaml
monomers:
  - label: "A"
    probability: 0.98
    name: "SBMA"
~~~
```

See {doc}`polymers` for detailed polymer configuration.

### Adding Restraints

To keep a substrate in the active site:

```yaml
restraints:
  - type: "flat_bottom"
    name: "substrate_restraint"
    atom1:
      selection: "resid 77 and name OG"
      description: "Catalytic serine"
    atom2:
      selection: "resname LIG and name C1"
      description: "Substrate carbon"
    distance: 3.5
    force_constant: 10000.0
    enabled: true
```

See {doc}`restraints` for detailed restraint configuration.

## Step 4: Validate Configuration

Before running, validate your configuration:

```bash
polyzymd validate -c config.yaml
```

This checks:
- All required sections are present
- File paths exist
- Values are within valid ranges
- No conflicting settings

Expected output:
```
Validating configuration: config.yaml
Configuration is valid!

Summary:
  Name: my_first_simulation
  Enzyme: MyEnzyme
  Substrate: None (apo simulation)
  Polymers: Disabled
  Temperature: 300.0 K
  Pressure: 1.0 atm

Simulation phases:
  Equilibration: 1.0 ns (NVT)
  Production: 10.0 ns (NPT)

```

## Step 5: Test Build (Dry Run)

Test that the system can be built without actually building:

```bash
polyzymd build -c config.yaml --dry-run
```

## Step 6: Run Locally (Optional)

For testing on a local machine with GPU, first build the system, then run a
single production segment:

```bash
# Build the system (energy minimization + equilibration)
polyzymd build -c config.yaml

# Run one production segment
polyzymd run-segment -c config.yaml -r 1
```

## Step 7: Submit to HPC

For production runs on an HPC cluster:

```bash
# Generate scripts without submitting (dry run)
polyzymd submit -c config.yaml \
    --replicates 1-3 \
    --preset aa100 \
    --dry-run

# Submit for real
polyzymd submit -c config.yaml \
    --replicates 1-3 \
    --preset aa100 \
    --email your.email@university.edu
```

## Step 8: Monitor Jobs

```bash
# Check job status
squeue -u $USER

# View job output
tail -f slurm_logs/*.out
```

## Output Files

After completion:

```
my_simulation/
├── config.yaml
├── job_scripts/           # SLURM scripts
├── slurm_logs/            # Job output
└── MyEnzyme_300K_run1/    # Simulation output
    ├── solvated_system.pdb
    ├── equilibration/
    │   └── trajectory.dcd
    └── production_0/
        ├── trajectory.dcd
        ├── checkpoint.chk
        └── state_data.csv
```

## Next Steps

- Add a substrate: See {doc}`configuration`
- Add polymers: See {doc}`polymers`
- Add restraints: See {doc}`restraints`
- Run on HPC: See {doc}`hpc_slurm`