# Engram Data Separation Design

**Date:** 2026-04-07
**Status:** Draft
**Goal:** Separate runtime data from code so the engram repo contains only program files. User data lives at `~/.local/share/engram/`.

---

## Problem

Currently `~/engram/` is both the git repo and the data store. 12GB of memory entities, databases, logs, and telemetry sit alongside source code. This means:

- Sharing the codebase risks leaking personal data
- Deleting/re-cloning the repo destroys user data
- Git operations are slow due to untracked data files
- No clean boundary between "the program" and "my stuff"

## Solution

Split into two locations:

| Location | Contents |
|----------|----------|
| `~/engram/` | Code only — src, scripts, tests, docs, config, docker |
| `~/.local/share/engram/` | Data only — entities, databases, logs, telemetry, encryption keys |

## Data Directory Layout

```
~/.local/share/engram/
├── entities/              # Memory vault (markdown files)
├── agent_memory/          # Per-agent memory namespaces
├── logs/                  # Operation logs (audit.jsonl, *.log)
├── telemetry/             # CLI telemetry traces
├── synapse_data/          # Matrix homeserver data
├── .encryption/           # Encryption keys
├── vault_index.sqlite     # Main memory index (78MB)
├── memory_store.db        # Knowledge graph database
├── workers.sqlite         # Jetson worker heartbeats
├── engram.db/             # Additional database directory
├── librarian_state.json   # Librarian checkpoint state
├── segregation_state.json # Segregation checkpoint state
├── PRUNED_DICTIONARY.md   # Pruned terms reference
├── MASTER_PROFILE.md      # Master user profile
└── SELF_MODEL.md          # Agent self-model (consolidator output)
```

## Code Changes

### 1. config.py (central fix — cascades to 12 modules)

```python
# Before
ENGRAM_ROOT: Path = Path.home() / "engram"

# After
ENGRAM_DATA: Path = Path.home() / ".local" / "share" / "engram"

# All data paths derived from ENGRAM_DATA instead of ENGRAM_ROOT:
ENTITIES_DIR: Path = ENGRAM_DATA / "entities"
TELEMETRY_DIR: Path = ENGRAM_DATA / "telemetry"
AGENT_MEMORY_DIR: Path = ENGRAM_DATA / "agent_memory"
LOG_DIR: Path = ENGRAM_DATA / "logs"
INDEX_PATH: Path = ENGRAM_DATA / "vault_index.sqlite"
ENCRYPTION_KEY_DIR: Path = ENGRAM_DATA / ".encryption"
LIBRARIAN_STATE: Path = ENGRAM_DATA / "librarian_state.json"
SEGREGATION_STATE: Path = ENGRAM_DATA / "segregation_state.json"
PRUNED_DICTIONARY: Path = ENGRAM_DATA / "PRUNED_DICTIONARY.md"
MASTER_PROFILE: Path = ENGRAM_DATA / "MASTER_PROFILE.md"
```

Modules that import these constants and need NO further changes (auto-fixed):
- `server.py`, `mcp_tools.py`, `graph.py`, `consolidator.py`, `retrieval.py`
- `vector.py`, `metrics.py`, `librarian.py`, `encryption.py`
- `segregation.py`, `audit.py`, `versions.py`, `lint.py`

### 2. Hardcoded path fixes (5 files)

| File | Line | Current | Fix |
|------|------|---------|-----|
| `src/engram/db_models.py` | 50 | `Path.home() / "engram" / "workers.sqlite"` | Import `ENGRAM_DATA` from config |
| `src/engram/run_worker_api.py` | 44 | `Path.home() / "engram" / "workers.sqlite"` | Import `ENGRAM_DATA` from config |
| `scripts/engram_watchdog.py` | 44-50 | Own `ENGRAM_ROOT` + derived paths | Import from `engram.config` or redefine to `~/.local/share/engram` |
| `scripts/interview.py` | 63-69 | Own `ENGRAM_ROOT` + derived paths | Import from `engram.config` or redefine to `~/.local/share/engram` |
| `scripts/hyperloop.py` | 44-45 | Own `ENGRAM_ROOT` + derived paths | Import from `engram.config` or redefine to `~/.local/share/engram` |

### 3. Hardcoded absolute paths (2 files)

| File | Line | Current | Fix |
|------|------|---------|-----|
| `scripts/find_dupes.py` | 14, 16, 122 | `/home/geodesix/engram/entities` | Use `ENGRAM_DATA / "entities"` |
| `scripts/persistent_ingest.py` | 46 | `/home/geodesix/entities` | Use `ENGRAM_DATA / "entities"` |

### 4. Systemd service files — NO CHANGES

The service files reference `WorkingDirectory` (the code repo) and `PYTHONPATH` (the code). These correctly stay pointing at `~/engram/`. The running Python code will find data at the new location via the updated config.

### 5. Documentation updates

| File | What to update |
|------|----------------|
| `CONFIGURATION.md` | Line 63: Update path table, add data directory docs |
| `ARCHITECTURE.md` | Line 261: Update ENGRAM_ROOT reference |

## Migration Script

`scripts/migrate-data.sh` — one-time manual script that:

1. Creates `~/.local/share/engram/` directory structure
2. Moves each data item (entities/, logs/, databases, etc.)
3. Verifies each move succeeded (file count, size check)
4. Reports what was moved and what remains in the repo

No symlinks. No automatic detection. Run once, verify, done.

## What Does NOT Change

- Repo location (`~/engram/`)
- Systemd service `WorkingDirectory` paths
- Docker volume mounts (already map to `/app/` internally)
- Shell scripts that derive paths relatively from their own location (`prep_node.sh`, `build-ota-payload.sh`)
- External data paths (`~/.local/share/goose/`, `~/.claude/`, `~/.config/`)

## Verification

After migration + code changes:
1. `engram-server.service` starts and serves on :8001
2. MCP tools can search/save/retrieve memories
3. `vault_index.sqlite` is read/written at new location
4. `entities/` directory no longer exists under `~/engram/`
5. Logs appear in `~/.local/share/engram/logs/`

## Risk

- **Low:** config.py is already the single source of truth for 80% of code
- **Medium:** Scripts with their own path definitions could be missed — mitigated by exhaustive search above
- **Mitigation:** Run the engram test suite after changes, verify service starts
