# Engram Production-Ready Refactor Log

**Date:** 2026-03-28
**Scope:** Full architectural refactor from prototype to installable package

---

## 1. Repository Structure — "Clean Root" Initiative

- Migrated all core logic into `src/engram/` package:
  - `mcp_server.py` → `src/engram/server.py`
  - `librarian.py` → `src/engram/librarian.py`
  - `segregation_full.py` → `src/engram/segregation.py`
  - `scripts/vault_guardrail.py` → `src/engram/guardrail.py`
- Created shared modules:
  - `src/engram/config.py` — centralized paths, keyword maps, validation patterns
  - `src/engram/utils.py` — `atomic_dump()` and `setup_logging()` (deduplicated)
- Moved operational scripts to `scripts/`:
  - `auto-export.py`, `engram-paste.py`, `metrics.py`
- Root now contains zero `.py` files

## 2. Critical Bug Fixes (P0)

- **MCP session filter** (`server.py`): `cmd[3] = f"^{sid}"` was overwriting the search query with the session ID. Fixed to use ripgrep `--glob *{session_id}*` as an additive filter.
- **Async hygiene** (`server.py`): Replaced blocking `subprocess.run` with `asyncio.create_subprocess_exec` to avoid stalling the MCP event loop.
- **Input validation** (`server.py`): Added query length cap (`MAX_QUERY_LENGTH=500`), session_id regex validation (rejects path traversal like `../../etc/passwd`), and `FileNotFoundError` handling for missing ripgrep.
- **Template artifacts** (`guardrail.py`): Scrubbed all `{{}}` double-brace artifacts that made the file syntactically invalid. Rewrote with correct f-strings and dict literals.
- **Regex escaping** (`metrics.py`): Fixed double-escaped `\\[`, `\\d`, `\\w` → proper `\[`, `\d`, `\w` in raw strings. Regex now actually matches log lines.
- **Shell injection** (`dispatcher/bridge.py`): Removed `shell=True` from `subprocess.run`. Commands now parsed via `shlex.split()` with `shell=False`.

## 3. Packaging & Hygiene

- Created `pyproject.toml`:
  - Build system: setuptools
  - Dependencies: `mcp`, `pyyaml`
  - Optional dev deps: `pytest`, `pytest-asyncio`
  - Console scripts: `engram-serve` → `engram.server:main`, `engram-index` → `engram.librarian:main`
- Fixed `.gitignore`:
  - Corrected `.DS_Storetools/` newline bug → separate `.DS_Store` entry
  - Added: `venv/`, `*.bak`, `*_state.json`, `*.log`, `dist/`, `*.egg-info/`
- Deleted dead files:
  - `segregation.py` (10-file prototype, superseded by `segregation_full.py`)
  - `mcp_server.py.bak`, `segregation.py.bak`, `util/import_goose.py.bak`, `util/import_history.py.bak`

## 4. Test Suite

- Created `tests/test_guardrail.py` (6 tests):
  - `verify_tags()`: all-verified, all-pruned, mixed, unknown-tag-fallback
  - `update_frontmatter()`: bare file, existing frontmatter replacement
- Created `tests/test_server.py` (10 tests):
  - Tool schema validation
  - Input validation: empty query, overlength query, path traversal, valid session_id
  - Ripgrep execution: successful search, no match, missing binary, output truncation
- All 16 tests passing

## 5. Documentation

- Rewrote `README.md`:
  - Added Quick Start section (`pip install -e .`, `engram-serve`, `engram-index`)
  - Updated architecture diagram with new module paths
  - Branded the guardrail as **"Engram Verified-Retrieval Layer"** with technical explanation of ALIAS_MAP + ripgrep evidence checks
  - Added project structure tree and development instructions

## 6. Type Hints & Logging

- PEP 484 type hints added to all function signatures in `src/engram/`
- All `print()` calls in core library replaced with `logging` module
- Checkpoint writes batched to every 100 files (was every file)
