# Agent Memory Scraper + User Profile Synthesizer

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Scrape Claude Code's learned memories into Engram and synthesize a unified user profile that all agents share, so every agent knows who the user is and how they work.

**Architecture:** Two scripts in the existing Engram scripts directory. `memory_watcher.py` runs on a timer (every 2 hours), reads Claude Code memory files, and ingests them as Engram entities. `profile_synthesizer.py` runs daily, queries all user-related data from Engram, and generates a `USER_PROFILE.md` distributed to each agent's config directory.

**Tech Stack:** Python 3, PyYAML (already available), Engram HTTP API, systemd timers

---

## File Map

| File | Responsibility |
|------|---------------|
| `scripts/memory_watcher.py` | Scrape Claude Code memory files → Engram entities |
| `scripts/profile_synthesizer.py` | Query Engram → build unified USER_PROFILE.md → distribute to agents |
| `systemd/engram-memory-watcher.service` | systemd service for memory scraper |
| `systemd/engram-memory-watcher.timer` | Timer: every 2 hours |
| `systemd/engram-profile-synth.service` | systemd service for profile synthesizer |
| `systemd/engram-profile-synth.timer` | Timer: daily at 7am (after 6am briefing) |

---

## Task 1: Memory Watcher Script

**Files:**
- Create: `/home/geodesix/engram/scripts/memory_watcher.py`

- [ ] **Step 1: Write the memory watcher**

```python
#!/usr/bin/env python3
"""
Memory Watcher — Scrape agent memory files into Engram.

Reads learned facts from each agent's memory store and ingests them
as Engram entities. Currently supports Claude Code; other agents
(Gemini, Rook) already write to Engram directly.

Watched locations:
    ~/.claude/projects/*/memory/*.md  (Claude Code memories)

Runs as a timer every 2 hours alongside session_watcher.py.

Usage:
    python3 memory_watcher.py           # Run once (for timer)
    python3 memory_watcher.py --dry-run # Preview without writing
"""

import json
import logging
import os
import re
import sys
from datetime import datetime
from pathlib import Path
from typing import Optional

import yaml

logging.basicConfig(level=logging.INFO, format="[memory-watcher] %(message)s")
logger = logging.getLogger(__name__)

ENGRAM_ENTITIES = Path(__file__).parent.parent / "entities"
STATE_FILE = Path(__file__).parent.parent / "logs" / "memory_watcher_state.json"

# Agent memory locations
CLAUDE_MEMORY_GLOB = str(Path.home() / ".claude" / "projects" / "*" / "memory" / "*.md")


def load_state() -> dict:
    """Load watcher state (processed files + mtimes)."""
    if STATE_FILE.exists():
        try:
            return json.loads(STATE_FILE.read_text())
        except Exception:
            pass
    return {"processed_files": {}, "last_run": None}


def save_state(state: dict) -> None:
    """Persist watcher state."""
    STATE_FILE.parent.mkdir(parents=True, exist_ok=True)
    state["last_run"] = datetime.now().isoformat()
    STATE_FILE.write_text(json.dumps(state, indent=2))


def parse_memory_file(path: Path) -> Optional[dict]:
    """Parse a Claude Code memory file with YAML frontmatter.

    Returns dict with: name, description, type, content, source_path, project
    """
    text = path.read_text(errors="replace").strip()
    if not text:
        return None

    # Skip MEMORY.md index files
    if path.name == "MEMORY.md":
        return None

    # Parse YAML frontmatter
    frontmatter = {}
    content = text
    if text.startswith("---"):
        parts = text.split("---", 2)
        if len(parts) >= 3:
            try:
                frontmatter = yaml.safe_load(parts[1]) or {}
            except yaml.YAMLError:
                pass
            content = parts[2].strip()

    # Extract project name from path
    # ~/.claude/projects/-home-geodesix-Jetson/memory/user_profile.md
    # → project = "Jetson"
    project = "unknown"
    parts = str(path).split("/projects/")
    if len(parts) > 1:
        project_slug = parts[1].split("/")[0]  # -home-geodesix-Jetson
        project = project_slug.rsplit("-", 1)[-1] if "-" in project_slug else project_slug

    return {
        "name": frontmatter.get("name", path.stem),
        "description": frontmatter.get("description", ""),
        "type": frontmatter.get("type", "unknown"),
        "content": content,
        "source_path": str(path),
        "project": project,
    }


def to_engram_entity(memory: dict) -> str:
    """Convert a parsed memory to Engram Markdown entity format."""
    # Map Claude Code memory types to Engram categories
    type_to_category = {
        "user": "user_profile",
        "feedback": "user_pref",
        "project": "project",
        "reference": "reference",
    }
    category = type_to_category.get(memory["type"], "memory")

    tags = [
        "claude_code_memory",
        f"project:{memory['project']}",
        memory["type"],
    ]

    frontmatter = {
        "name": memory["name"],
        "source": "claude_code",
        "category": category,
        "created_at": datetime.now().isoformat(),
        "ingested_at": datetime.now().isoformat(),
        "topics": [memory["project"], memory["type"]],
        "tags": tags,
        "summary": memory["description"][:200] if memory["description"] else memory["name"],
        "source_path": memory["source_path"],
    }

    yaml_block = yaml.dump(frontmatter, default_flow_style=False, allow_unicode=True)
    return f"---\n{yaml_block}---\n\n{memory['content']}"


def entity_filename(memory: dict) -> str:
    """Generate a stable filename for dedup."""
    # Use source path hash for stability across runs
    slug = re.sub(r"[^a-z0-9]+", "_", memory["name"].lower()).strip("_")[:50]
    return f"claude_memory_{memory['project']}_{slug}.md"


def scrape_claude_memories(state: dict, dry_run: bool = False) -> int:
    """Scrape Claude Code memory files and ingest into Engram."""
    import glob

    files = glob.glob(CLAUDE_MEMORY_GLOB)
    processed = state.get("processed_files", {})
    new_count = 0

    for filepath in sorted(files):
        path = Path(filepath)
        mtime = path.stat().st_mtime

        # Skip if already processed and not modified
        if filepath in processed and processed[filepath] >= mtime:
            continue

        memory = parse_memory_file(path)
        if memory is None:
            continue

        filename = entity_filename(memory)
        entity_path = ENGRAM_ENTITIES / filename

        if dry_run:
            logger.info("DRY RUN: Would ingest %s → %s", path.name, filename)
            new_count += 1
            continue

        # Write entity
        ENGRAM_ENTITIES.mkdir(parents=True, exist_ok=True)
        entity_content = to_engram_entity(memory)
        entity_path.write_text(entity_content)
        logger.info("Ingested: %s → %s (type=%s, project=%s)",
                     path.name, filename, memory["type"], memory["project"])

        # Update state
        processed[filepath] = mtime
        new_count += 1

    state["processed_files"] = processed
    return new_count


def main():
    import argparse
    parser = argparse.ArgumentParser(description="Scrape agent memories into Engram")
    parser.add_argument("--dry-run", action="store_true", help="Preview without writing")
    args = parser.parse_args()

    state = load_state()
    logger.info("Memory watcher starting (last run: %s)", state.get("last_run", "never"))

    total = 0

    # Claude Code memories
    count = scrape_claude_memories(state, dry_run=args.dry_run)
    if count:
        logger.info("Claude Code: %d memories ingested", count)
    total += count

    # Future: add other agents here
    # count = scrape_gemini_memories(state, dry_run=args.dry_run)
    # count = scrape_goose_memories(state, dry_run=args.dry_run)

    if not args.dry_run:
        save_state(state)

    if total == 0:
        logger.info("No new memories to ingest")
    else:
        logger.info("Total: %d memories ingested", total)


if __name__ == "__main__":
    main()
```

- [ ] **Step 2: Test it in dry-run mode**

Run: `python3 /home/geodesix/engram/scripts/memory_watcher.py --dry-run`
Expected: Lists Claude Code memory files it would ingest

- [ ] **Step 3: Run it for real**

Run: `python3 /home/geodesix/engram/scripts/memory_watcher.py`
Expected: Ingests memories, creates entity files in `~/engram/entities/`

- [ ] **Step 4: Verify entities were created**

Run: `ls -la ~/engram/entities/claude_memory_*.md | head -10`
Expected: Files like `claude_memory_Jetson_user_profile.md`, `claude_memory_Jetson_feedback_ssh_password_auth.md`

- [ ] **Step 5: Verify content looks right**

Run: `head -20 ~/engram/entities/claude_memory_Jetson_user_profile.md`
Expected: YAML frontmatter with source=claude_code, category=user_profile, followed by the memory content

- [ ] **Step 6: Run again — should be no-op**

Run: `python3 /home/geodesix/engram/scripts/memory_watcher.py`
Expected: "No new memories to ingest" (already processed)

- [ ] **Step 7: Commit**

```bash
cd ~/engram && git add scripts/memory_watcher.py logs/memory_watcher_state.json
git commit -m "feat: add memory watcher — scrapes Claude Code memories into Engram"
```

---

## Task 2: Profile Synthesizer Script

**Files:**
- Create: `/home/geodesix/engram/scripts/profile_synthesizer.py`

- [ ] **Step 1: Write the profile synthesizer**

```python
#!/usr/bin/env python3
"""
Profile Synthesizer — Build a unified user profile from all agent memories.

Queries Engram for all user-related data (profiles, preferences, feedback)
from every agent source and synthesizes a single USER_PROFILE.md. This
profile is distributed to each agent's config directory so every agent
starts with the same understanding of the user.

Output locations:
    ~/engram/entities/user_profile_synthesized.md  (Engram entity)
    ~/.claude/USER.md                               (Claude Code)
    ~/.gemini/USER.md                               (Gemini CLI)

Runs daily at 7am (after the 6am briefing).

Usage:
    python3 profile_synthesizer.py           # Run once
    python3 profile_synthesizer.py --dry-run # Preview without writing
"""

import glob
import json
import logging
import os
import re
import sys
from collections import defaultdict
from datetime import datetime
from pathlib import Path
from typing import Optional

import yaml

logging.basicConfig(level=logging.INFO, format="[profile-synth] %(message)s")
logger = logging.getLogger(__name__)

ENGRAM_ENTITIES = Path(__file__).parent.parent / "entities"

# Where to distribute the profile
DISTRIBUTE_TO = [
    Path.home() / ".claude" / "USER.md",
    Path.home() / ".gemini" / "USER.md",
]

# Source patterns for user-related entities
USER_ENTITY_PATTERNS = [
    "claude_memory_*_user_profile*.md",
    "claude_memory_*_feedback_*.md",
    "claude_memory_*_project_*.md",
    "claude_memory_*_reference_*.md",
    "*user_profile*.md",
    "*user_pref*.md",
]


def collect_user_data() -> dict:
    """Collect all user-related data from Engram entities."""
    data = {
        "identity": [],
        "preferences": [],
        "feedback": [],
        "projects": [],
        "references": [],
    }

    seen_names = set()

    for pattern in USER_ENTITY_PATTERNS:
        for filepath in glob.glob(str(ENGRAM_ENTITIES / pattern)):
            path = Path(filepath)
            text = path.read_text(errors="replace").strip()
            if not text:
                continue

            # Parse frontmatter
            frontmatter = {}
            content = text
            if text.startswith("---"):
                parts = text.split("---", 2)
                if len(parts) >= 3:
                    try:
                        frontmatter = yaml.safe_load(parts[1]) or {}
                    except yaml.YAMLError:
                        pass
                    content = parts[2].strip()

            name = frontmatter.get("name", path.stem)
            if name in seen_names:
                continue
            seen_names.add(name)

            category = frontmatter.get("category", "")
            source = frontmatter.get("source", "unknown")
            mem_type = frontmatter.get("type", category)

            entry = {
                "name": name,
                "content": content,
                "source": source,
                "category": category,
                "file": path.name,
            }

            if mem_type in ("user", "user_profile"):
                data["identity"].append(entry)
            elif mem_type in ("feedback", "user_pref"):
                data["feedback"].append(entry)
            elif mem_type == "project":
                data["projects"].append(entry)
            elif mem_type == "reference":
                data["references"].append(entry)
            else:
                data["preferences"].append(entry)

    return data


def synthesize_profile(data: dict) -> str:
    """Generate the unified user profile document."""
    now = datetime.now().isoformat()
    sections = []

    sections.append(f"""# User Profile — Synthesized
> Auto-generated by profile_synthesizer.py on {now}
> Sources: {_count_sources(data)} agent memories from Engram
> Refresh: Daily at 7am

This profile is shared across all agents (Claude Code, Gemini CLI, Rook).
It represents what all agents collectively know about the user.""")

    # Identity
    if data["identity"]:
        sections.append("\n## Identity\n")
        for entry in data["identity"]:
            sections.append(f"*From {entry['source']}:*\n{entry['content']}\n")

    # Feedback / Preferences
    if data["feedback"]:
        sections.append("\n## Preferences & Feedback\n")
        sections.append("These are rules the user has given about how to work with them.\n")
        for entry in data["feedback"]:
            sections.append(f"### {entry['name']}\n*Source: {entry['source']}*\n\n{entry['content']}\n")

    # Active Projects
    if data["projects"]:
        sections.append("\n## Active Projects\n")
        for entry in data["projects"]:
            sections.append(f"### {entry['name']}\n*Source: {entry['source']}*\n\n{entry['content']}\n")

    # References
    if data["references"]:
        sections.append("\n## References\n")
        for entry in data["references"]:
            sections.append(f"- **{entry['name']}**: {entry['content'][:200]}\n")

    return "\n".join(sections)


def _count_sources(data: dict) -> int:
    """Count total entries across all categories."""
    return sum(len(v) for v in data.values())


def distribute_profile(profile_text: str, dry_run: bool = False) -> None:
    """Write profile to Engram and to each agent's config directory."""
    # Engram entity
    engram_path = ENGRAM_ENTITIES / "user_profile_synthesized.md"
    if dry_run:
        logger.info("DRY RUN: Would write %s (%d bytes)", engram_path, len(profile_text))
    else:
        engram_path.write_text(profile_text)
        logger.info("Wrote Engram entity: %s", engram_path)

    # Agent config directories
    for dest in DISTRIBUTE_TO:
        if dry_run:
            logger.info("DRY RUN: Would write %s", dest)
        else:
            dest.parent.mkdir(parents=True, exist_ok=True)
            dest.write_text(profile_text)
            logger.info("Distributed to: %s", dest)


def main():
    import argparse
    parser = argparse.ArgumentParser(description="Synthesize unified user profile")
    parser.add_argument("--dry-run", action="store_true")
    args = parser.parse_args()

    logger.info("Profile synthesizer starting")

    data = collect_user_data()
    total = _count_sources(data)
    logger.info("Collected %d user-related entries: %d identity, %d feedback, %d projects, %d references",
                total, len(data["identity"]), len(data["feedback"]),
                len(data["projects"]), len(data["references"]))

    if total == 0:
        logger.warning("No user data found in Engram — run memory_watcher.py first")
        return

    profile = synthesize_profile(data)

    if args.dry_run:
        print("\n" + "=" * 60)
        print(profile)
        print("=" * 60 + "\n")

    distribute_profile(profile, dry_run=args.dry_run)
    logger.info("Done — profile synthesized from %d entries", total)


if __name__ == "__main__":
    main()
```

- [ ] **Step 2: Run in dry-run mode**

Run: `python3 /home/geodesix/engram/scripts/profile_synthesizer.py --dry-run`
Expected: Shows the synthesized profile content and where it would be written

- [ ] **Step 3: Run for real**

Run: `python3 /home/geodesix/engram/scripts/profile_synthesizer.py`
Expected: Writes to Engram entities + `~/.claude/USER.md` + `~/.gemini/USER.md`

- [ ] **Step 4: Verify outputs**

Run: `head -30 ~/.claude/USER.md && echo "---" && head -30 ~/.gemini/USER.md`
Expected: Same profile content in both locations

- [ ] **Step 5: Commit**

```bash
cd ~/engram && git add scripts/profile_synthesizer.py
git commit -m "feat: add profile synthesizer — unified user profile across all agents"
```

---

## Task 3: systemd Timers

**Files:**
- Create: `/home/geodesix/.config/systemd/user/engram-memory-watcher.service`
- Create: `/home/geodesix/.config/systemd/user/engram-memory-watcher.timer`
- Create: `/home/geodesix/.config/systemd/user/engram-profile-synth.service`
- Create: `/home/geodesix/.config/systemd/user/engram-profile-synth.timer`

- [ ] **Step 1: Create memory watcher service**

Write to `~/.config/systemd/user/engram-memory-watcher.service`:

```ini
[Unit]
Description=Engram Memory Watcher — scrape agent memories
After=engram.service

[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /home/geodesix/engram/scripts/memory_watcher.py
WorkingDirectory=/home/geodesix/engram
Environment=PYTHONPATH=/home/geodesix/engram/src

[Install]
WantedBy=default.target
```

- [ ] **Step 2: Create memory watcher timer**

Write to `~/.config/systemd/user/engram-memory-watcher.timer`:

```ini
[Unit]
Description=Run memory watcher every 2 hours

[Timer]
OnBootSec=10min
OnUnitActiveSec=2h

[Install]
WantedBy=timers.target
```

- [ ] **Step 3: Create profile synth service**

Write to `~/.config/systemd/user/engram-profile-synth.service`:

```ini
[Unit]
Description=Engram Profile Synthesizer — unified user profile
After=engram.service engram-memory-watcher.service

[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /home/geodesix/engram/scripts/profile_synthesizer.py
WorkingDirectory=/home/geodesix/engram
Environment=PYTHONPATH=/home/geodesix/engram/src

[Install]
WantedBy=default.target
```

- [ ] **Step 4: Create profile synth timer**

Write to `~/.config/systemd/user/engram-profile-synth.timer`:

```ini
[Unit]
Description=Run profile synthesizer daily at 7am

[Timer]
OnCalendar=*-*-* 07:00:00
Persistent=true

[Install]
WantedBy=timers.target
```

- [ ] **Step 5: Enable and start timers**

```bash
systemctl --user daemon-reload
systemctl --user enable --now engram-memory-watcher.timer
systemctl --user enable --now engram-profile-synth.timer
```

- [ ] **Step 6: Verify timers are active**

Run: `systemctl --user list-timers | grep engram`
Expected: Both timers shown with next run time

- [ ] **Step 7: Commit systemd files**

```bash
cd ~/engram && mkdir -p systemd
cp ~/.config/systemd/user/engram-memory-watcher.{service,timer} systemd/
cp ~/.config/systemd/user/engram-profile-synth.{service,timer} systemd/
git add systemd/
git commit -m "feat: add systemd timers for memory watcher and profile synthesizer"
```

---

## Post-Implementation Checklist

- [ ] `memory_watcher.py` ingests Claude Code memories into Engram entities
- [ ] `memory_watcher.py` skips already-processed files on re-run
- [ ] `memory_watcher.py` skips MEMORY.md index files
- [ ] `profile_synthesizer.py` collects all user data from Engram
- [ ] `profile_synthesizer.py` generates readable, structured profile
- [ ] Profile is written to `~/.claude/USER.md` and `~/.gemini/USER.md`
- [ ] Both timers are active and scheduled
- [ ] Next session: Claude Code and Gemini CLI both see the synthesized profile
