# Engram Production-Ready Refactor

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Transform engram from a flat prototype into an installable, type-hinted, tested Python package with a clean root, fixed bugs, and professional documentation.

**Architecture:** Core library moves to `src/engram/` as an installable package. Shared utilities (`atomic_dump`, `setup_logging`) consolidated into `utils.py`. Configuration (paths, keyword maps) centralized in `config.py`. Operational scripts stay in `scripts/`. Entry points exposed via `pyproject.toml`.

**Tech Stack:** Python 3.10+, MCP SDK, PyYAML, ripgrep (system), pytest, asyncio

---

## File Structure

### Created
- `src/engram/__init__.py` — package marker with version
- `src/engram/server.py` — MCP server (from `mcp_server.py`, bugs fixed)
- `src/engram/librarian.py` — indexer (from `librarian.py`, typed)
- `src/engram/segregation.py` — segregation engine (from `segregation_full.py`, typed)
- `src/engram/guardrail.py` — tag verification (from `scripts/vault_guardrail.py`, template artifacts fixed)
- `src/engram/config.py` — centralized paths and keyword maps
- `src/engram/utils.py` — shared `atomic_dump()` and `setup_logging()`
- `tests/__init__.py` — test package
- `tests/test_guardrail.py` — guardrail unit tests
- `tests/test_server.py` — MCP server schema + behavior tests
- `pyproject.toml` — packaging, deps, entry points

### Modified
- `.gitignore` — fix `.DS_Store` bug, add `venv/`, `*.bak`, `*_state.json`, `*.log`
- `dispatcher/bridge.py` — remove `shell=True`, use `shlex.split()`
- `scripts/metrics.py` — fix regex escaping
- `README.md` — full rewrite with Quick Start and Verified-Retrieval Layer branding

### Deleted
- `segregation.py` (10-file prototype, superseded)
- `mcp_server.py.bak`, `segregation.py.bak`, `util/import_goose.py.bak`, `util/import_history.py.bak`
- Root copies after migration: `mcp_server.py`, `librarian.py`, `segregation_full.py`
- `scripts/vault_guardrail.py` (moved to `src/engram/guardrail.py`)

### Moved to `scripts/`
- `auto-export.py`, `engram-paste.py`, `metrics.py` (from root)

---

## Task 1: Create package scaffolding and shared utilities

**Files:**
- Create: `src/engram/__init__.py`
- Create: `src/engram/config.py`
- Create: `src/engram/utils.py`

- [ ] **Step 1: Create `src/engram/__init__.py`**

```python
"""Engram: Sovereign AI Memory Vault."""
__version__ = "0.1.0"
```

- [ ] **Step 2: Create `src/engram/config.py`**

Centralize the `ENGRAM_ROOT`, `ENTITIES_DIR`, keyword topic map, agent pattern map, and CLI regex. All currently hardcoded across `librarian.py`, `segregation_full.py`, and `mcp_server.py`.

- [ ] **Step 3: Create `src/engram/utils.py`**

Extract `atomic_dump()` (duplicated in `librarian.py:26-30` and `segregation_full.py:26-30`) and `setup_logging()` (duplicated in `librarian.py:13-24` and `segregation_full.py:13-24`). Add type hints.

- [ ] **Step 4: Verify imports**

Run: `python -c "from engram.config import ENGRAM_ROOT; from engram.utils import atomic_dump, setup_logging; print('OK')"`
Expected: ImportError (package not installed yet — that's fine, syntax check only)

Run: `python -m py_compile src/engram/config.py && python -m py_compile src/engram/utils.py && echo OK`
Expected: OK

- [ ] **Step 5: Commit**

```bash
git add src/engram/__init__.py src/engram/config.py src/engram/utils.py
git commit -m "feat: create src/engram package with config and shared utils"
```

---

## Task 2: Migrate and fix `mcp_server.py` → `src/engram/server.py`

**Files:**
- Create: `src/engram/server.py`
- Source: `mcp_server.py` (root)

- [ ] **Step 1: Create `src/engram/server.py`**

Fixes applied during migration:
1. **Session filter bug (line 37):** `cmd[3] = f"^{sid}"` overwrites the query. Fix: use `--glob` flag (`["--glob", f"*{sid}*"]`) appended to cmd.
2. **Async hygiene:** Replace `subprocess.run` with `asyncio.create_subprocess_exec`.
3. **Input validation:** Reject `session_id` containing `/` or `..`. Cap query length.
4. **Type hints:** All function signatures typed.
5. **Import from config/utils.**

- [ ] **Step 2: Verify syntax**

Run: `python -m py_compile src/engram/server.py && echo OK`
Expected: OK

- [ ] **Step 3: Commit**

```bash
git add src/engram/server.py
git commit -m "feat: migrate MCP server with async fix and session filter bug fix"
```

---

## Task 3: Migrate and fix `vault_guardrail.py` → `src/engram/guardrail.py`

**Files:**
- Create: `src/engram/guardrail.py`
- Source: `scripts/vault_guardrail.py`

- [ ] **Step 1: Create `src/engram/guardrail.py`**

Fixes applied:
1. **Template artifacts:** All `{{}}` → `{}`, `{{new_fm}}` → `{new_fm}`, etc.
2. **Regex escaping:** `\\n` in regex strings → `\n` (use raw strings).
3. **Type hints on all functions.**
4. **Use `asyncio.create_subprocess_exec` for ripgrep calls** (consistency with server.py).

- [ ] **Step 2: Verify syntax**

Run: `python -m py_compile src/engram/guardrail.py && echo OK`
Expected: OK

- [ ] **Step 3: Commit**

```bash
git add src/engram/guardrail.py
git commit -m "fix: scrub template artifacts from guardrail, add type hints"
```

---

## Task 4: Migrate `librarian.py` and `segregation_full.py`

**Files:**
- Create: `src/engram/librarian.py`
- Create: `src/engram/segregation.py`
- Source: `librarian.py`, `segregation_full.py` (root)

- [ ] **Step 1: Create `src/engram/librarian.py`**

Changes: Import from `engram.config` and `engram.utils`. Add type hints. Replace f-string logging with `%s` style. Keep `main()` as entry point.

- [ ] **Step 2: Create `src/engram/segregation.py`**

Changes: Import from `engram.config` and `engram.utils`. Add type hints. Already uses `%s` logging — good.

- [ ] **Step 3: Verify syntax**

Run: `python -m py_compile src/engram/librarian.py && python -m py_compile src/engram/segregation.py && echo OK`
Expected: OK

- [ ] **Step 4: Commit**

```bash
git add src/engram/librarian.py src/engram/segregation.py
git commit -m "feat: migrate librarian and segregation engine to src/engram"
```

---

## Task 5: Fix remaining bugs (metrics.py, bridge.py)

**Files:**
- Modify: `scripts/metrics.py`
- Modify: `dispatcher/bridge.py`

- [ ] **Step 1: Fix `scripts/metrics.py` regex**

Change double-escaped `\\[`, `\\d`, `\\w` to single-escaped `\[`, `\d`, `\w` in raw strings. Add `if __name__ == "__main__":` guard.

- [ ] **Step 2: Fix `dispatcher/bridge.py` security**

Add `import shlex`. Replace `shell=True` with `shlex.split(cmd)` + `shell=False`.

- [ ] **Step 3: Verify syntax**

Run: `python -m py_compile scripts/metrics.py && python -m py_compile dispatcher/bridge.py && echo OK`
Expected: OK

- [ ] **Step 4: Commit**

```bash
git add scripts/metrics.py dispatcher/bridge.py
git commit -m "fix: correct regex escaping in metrics, remove shell=True from bridge"
```

---

## Task 6: Clean root — move scripts, delete dead files

**Files:**
- Move: `auto-export.py` → `scripts/auto-export.py`
- Move: `engram-paste.py` → `scripts/engram-paste.py`
- Move: `metrics.py` → `scripts/metrics.py`
- Delete: `segregation.py`, `mcp_server.py`, `librarian.py`, `segregation_full.py`
- Delete: `mcp_server.py.bak`, `segregation.py.bak`, `util/import_goose.py.bak`, `util/import_history.py.bak`
- Delete: `scripts/vault_guardrail.py` (now in `src/engram/guardrail.py`)

- [ ] **Step 1: Move operational scripts to `scripts/`**

```bash
git mv auto-export.py scripts/auto-export.py
git mv engram-paste.py scripts/engram-paste.py
git mv metrics.py scripts/metrics.py
```

- [ ] **Step 2: Delete superseded root files**

```bash
git rm segregation.py mcp_server.py librarian.py segregation_full.py
git rm mcp_server.py.bak segregation.py.bak
git rm util/import_goose.py.bak util/import_history.py.bak
git rm scripts/vault_guardrail.py
```

- [ ] **Step 3: Verify clean root**

Run: `ls *.py` in repo root
Expected: No `.py` files in root

- [ ] **Step 4: Commit**

```bash
git add -A
git commit -m "chore: clean root — move scripts, delete dead files and .bak artifacts"
```

---

## Task 7: Packaging and .gitignore

**Files:**
- Create: `pyproject.toml`
- Modify: `.gitignore`

- [ ] **Step 1: Create `pyproject.toml`**

Define: `[build-system]` (setuptools), `[project]` metadata, dependencies (`mcp`, `pyyaml`), optional-deps (`dev`: `pytest`), `[project.scripts]` with `engram-serve` → `engram.server:main` and `engram-index` → `engram.librarian:main`.

- [ ] **Step 2: Fix `.gitignore`**

Fix `.DS_Storetools/` → separate lines for `.DS_Store` and (remove tools/ — it's tracked). Add: `venv/`, `*.bak`, `*_state.json`, `*.log`, `dist/`, `*.egg-info`.

- [ ] **Step 3: Verify install**

Run: `pip install -e . && engram-serve --help 2>&1 | head -5`
Expected: Either help text or clean import (no crash)

- [ ] **Step 4: Commit**

```bash
git add pyproject.toml .gitignore
git commit -m "feat: add pyproject.toml with CLI entry points, fix .gitignore"
```

---

## Task 8: Tests

**Files:**
- Create: `tests/__init__.py`
- Create: `tests/test_guardrail.py`
- Create: `tests/test_server.py`

- [ ] **Step 1: Create `tests/__init__.py`** (empty)

- [ ] **Step 2: Write `tests/test_guardrail.py`**

Test: `verify_tags()` returns verified/pruned split. Test: `update_frontmatter()` writes valid YAML. Use `tmp_path` fixture with sample `.md` files. Mock `subprocess.run` for ripgrep calls.

- [ ] **Step 3: Write `tests/test_server.py`**

Test: tool schema has correct structure. Test: session_id with `..` is rejected. Test: query length cap. Mock `asyncio.create_subprocess_exec`.

- [ ] **Step 4: Run tests**

Run: `pytest tests/ -v`
Expected: All pass

- [ ] **Step 5: Commit**

```bash
git add tests/
git commit -m "test: add pytest suites for guardrail and MCP server"
```

---

## Task 9: README rewrite

**Files:**
- Modify: `README.md`

- [ ] **Step 1: Rewrite `README.md`**

Sections: badge line, one-liner, Quick Start (`pip install -e .`, `engram-serve`, `engram-index`), System Architecture (keep ASCII diagram, update paths), **Engram Verified-Retrieval Layer** (dedicated section branding the guardrail — ALIAS_MAP + ripgrep evidence checks = ground-truth tag verification eliminating RAG hallucinations), Roadmap, License.

- [ ] **Step 2: Commit**

```bash
git add README.md
git commit -m "docs: rewrite README with Quick Start and Verified-Retrieval Layer branding"
```

---

## Execution Order

Tasks 1→2→3→4 (package scaffolding, then core migrations — sequential, each imports from prior)
Task 5 (independent bug fixes — can run after Task 1)
Task 6 (cleanup — after Tasks 2-4 so new files exist before deleting old ones)
Task 7 (packaging — after Task 6 so root is clean)
Task 8 (tests — after Task 7 so package is installable)
Task 9 (README — after Task 7 so Quick Start instructions are accurate)
