# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**Jetson Hermes NeMo Integration** — Clone NVIDIA NeMo Agent Toolkit, replace OpenClaw with Rook from NousResearch, implement Jetson Nano and Orin hardware workarounds, and generate a deployable install script.

**Target directory:** `go-mobile/jetson-agent/`

---

## Target Hardware

- **Device:** NVIDIA Jetson Orin Nano Super (8GB unified memory, 67 TOPS)
- **OS:** Ubuntu (JetPack 6.2.2+)
- **Swap:** 32GB swap file (mandatory — configured at install or ISO time)
- **Desktop:** GNOME must not auto-load on boot; boot to text login (multi-user.target). User can type `desktop` at prompt to start GNOME on demand. Monitor + keyboard must always reach a TTY login for debugging.
- **Inference:** Remote only (NVIDIA NIM API or other remote endpoints) — no local model loading to preserve memory for multi-agent workloads (6-8 concurrent agents within single sandbox)

---

## Architecture

### Three-Layer Stack

```
┌─────────────────────────────────────────────────────────┐
│  Layer 3: Rook                                  │
│  - Runs inside Citadel sandbox as sole tenant          │
│  - OpenAI-compatible API on :8642 (container-internal)  │
│  - Remote inference only (NIM, OpenRouter, free models) │
│  - Own tool routing, memory, skills, messaging gateways │
│  - State at /sandbox/.hermes/ (bind-mounted to host)    │
│  - 6-8 concurrent agent processes within one sandbox    │
├─────────────────────────────────────────────────────────┤
│  Layer 2: Citadel Sandbox                              │
│  - Landlock filesystem restrictions                     │
│  - seccomp syscall filtering                            │
│  - Network namespace isolation                          │
│  - Shell-based lifecycle manager (replaces TypeScript)  │
│  - OpenClaw fully removed                               │
├─────────────────────────────────────────────────────────┤
│  Layer 1: Host OS                                       │
│  - Ubuntu JetPack 6.2.2+, headless (multi-user.target) │
│  - TTY login always available; `desktop` loads GNOME    │
│  - 32GB swap file on SSD                                │
│  - Tailscale (mandatory)                                │
│  - Fail2ban + UFW (default deny inbound)                │
│  - Rook Nerve Dashboard on :3080 (auto-start)         │
│  - rook-doctor watchdog service                       │
│  - Encrypted backup cron → Google Drive                 │
│  - Avahi/mDNS (hermes.local)                            │
│  - All crypto routed through Orin SE                    │
└─────────────────────────────────────────────────────────┘
```

### Sandbox Management (TypeScript Eliminated)

**Decision:** Remove TypeScript plugin entirely. Replace with shell-based lifecycle manager (Option B).

**Research Finding:** The TypeScript plugin was specific to OpenClaw integration. NeMo Agent Toolkit core is Python-based with no mandatory TypeScript dependency. The sandbox architecture is container-based (Docker + Landlock/seccomp), not TypeScript-dependent.

**Why Eliminate:**
- OpenClaw is being completely replaced by Rook
- Rook handles tool routing, messaging, and inference internally
- No upstream NeMo dependency on TypeScript for sandbox lifecycle

**Implementation (Shell-Based):**
- `src/sandbox-manager/docker-compose.yml` — Citadel+Rook container definition
- `src/sandbox-manager/rook-sandbox.service` — systemd unit for auto-start
- `src/sandbox-manager/manager.sh` — Shell script for start/stop/health
- Hono backend calls docker CLI directly for lifecycle operations

**Benefits:**
- Zero Node.js runtime in sandbox container
- Smaller attack surface
- Simpler debugging (shell scripts vs compiled TS)
- No build step for sandbox layer
- No upstream dependency on OpenClaw/TypeScript

---

## Dependencies & Sources

### Upstream Sources

| Component | Source | Tracking Method |
|-----------|--------|-----------------|
| NeMo Agent Toolkit | `https://github.com/NVIDIA/NeMo-Agent-Toolkit` | Git submodule, pinned to release tag |
| Rook | Local: `src/rook-agent/` → will push to `github.com/YOURUSERNAME/rook-agent` | Local repo during dev, git submodule after OSS |
| Rook Nerve Dashboard | Local: `src/nerve-dashboard/` or separate repo | Separate repo recommended: `github.com/YOURUSERNAME/rook-nerve` |

### Jetson-Specific Notes

- **Node.js 22+ on ARM64:** Use unofficial builds from `https://unofficial-builds.nodejs.org/` or build from source with JetsonHacks scripts
- **JetsonHacks Integration:** Apply Nano Super-specific optimizations from `docs/hardware/jetson-setup.md`
- **Orin SE Crypto:** All cryptographic operations route through hardware Security Engine (AES, SHA, RSA, ECC, TRNG)

---

## Security

### Hardware Security Engine (Orin SE)

All crypto operations route through Orin SE:
- SSH host keys and session encryption → SE via kernel crypto API
- TLS (OpenSSL) → SE engine for AES, RSA, ECC operations
- LUKS (if FDE enabled) → SE-accelerated AES-XTS
- Backup encryption → SE-accelerated AES-256-GCM
- Tailscale WireGuard → kernel crypto API → SE for ChaCha20-Poly1305

Configuration: OpenSSL engine config (`/etc/ssl/openssl.cnf`) set to use Orin SE as default.

### Secret Management System (ARCHITECTURE FINALIZED)

**Full spec:** `docs/architecture/secret-management.md`

**Architecture:** BIP39 Mnemonic → Bitwarden Master Password → LUKS-encrypted Vault Container

**Summary:**
- BIP39 mnemonic (12/24 words) is the root of trust, generated on first boot
- Google Drive folder ID used as salt for domain separation
- HKDF-SHA256 derives three independent keys: Bitwarden master password, LUKS passphrase, backup AES key
- Vaultwarden (self-hosted Bitwarden) runs inside LUKS container
- Agent retrieves secrets via scoped SDK token — never sees master password or full vault
- Daily boot: TPM-backed auto-unlock (Option A) or user PIN (Option B)
- Recovery: mnemonic + Google account = full restore on new hardware

**Current production (interim):** Diceware passphrase + plaintext key files in `~/.hermes/secrets/`. Will migrate to BIP39/LUKS/Vaultwarden stack.

**Recovery Process:**
- User provides passphrase from their records during restore
- System verifies against Argon2id hash
- If valid, decrypts backup stream
- **Forgotten passphrase = unrecoverable backups (by design)**

**Rejected Alternatives (Documented for Reference):**
- GPG (too complex UX for non-technical users)
- Drive folder ID as key material (not secret, easily guessed)
- Bitwarden Secrets Manager (external dependency, rate limits, complexity)
- Hardcoded keys (security risk, no recovery capability)

**Future Enhancement (Post-MVP):**
Shamir's Secret Sharing (2-of-3 scheme)
- Part 1: User writes down (paper/physical)
- Part 2: Encrypted to user's personal Gmail (different from agent Gmail)
- Part 3: Hardware-bound to Orin SE
- Any 2 of 3 parts reconstruct passphrase
- Enables recovery even if user loses written copy

### Dashboard Authentication

**Primary:** Tailscale mandatory
- Dashboard binds to `localhost:3080` and `tailscale0:3080` only
- No public internet access to dashboard
- Tailscale identity provides authentication

**Optional:** Local password
- Can enable password for additional layer (defense in depth)
- Password only checked after Tailscale auth
- Not required for operation

**Rationale:** Agent communication requires Tailscale anyway (for remote access), so dashboard follows same security boundary.

### Other Security Layers

- **Tailscale:** Mandatory for remote access; optional VPS exit node for IP masking
- **Fail2ban + UFW:** Default deny inbound, brute force protection
- **Citadel sandbox:** Landlock, seccomp, network namespace (unchanged from upstream)
- **Encrypted backups only:** Nothing leaves the device unencrypted
- **FDE:** Optional (default off), user-chosen at flash time

---

## Data Architecture

### Training Databases (Per-Model)

Location: `/opt/hermes-data/training/<model>/`

```
/opt/hermes-data/training/
├── gemma-3b/
│   ├── user-sessions.db    # Conversations
│   └── tool-calls.db       # Agent tool invocations
├── llama-3-8b/
│   ├── user-sessions.db
│   └── tool-calls.db
└── [model-identifier]/
    ├── user-sessions.db
    └── tool-calls.db
```

**Export:** CLI script to dump any model's databases to JSONL or Parquet for LoRA training pipelines.

### Checkpoint & Resumption System

**Purpose:** Track long-running projects (LoRA training, data curation, fine-tuning)

**Location:** `/opt/hermes-data/projects/`

```
/opt/hermes-data/projects/
├── active/
│   └── [project-id]/
│       ├── checklist.json       # Task list with status
│       ├── state.json           # Current progress, variables
│       ├── checkpoints/         # Incremental snapshots
│       │   ├── checkpoint-001/
│       │   ├── checkpoint-002/
│       │   └── latest -> checkpoint-002/
│       └── logs/
│           └── execution.log    # Timestamped operations
├── archived/                    # Completed projects (compressed)
└── templates/                   # Reusable project templates
```

**Lifecycle:**
1. Project created → checklist initialized from template
2. Active → tasks executed, checkpoints auto-created every 15 min or on task completion
3. Completed → moved to `archived/`, compressed, auto-scrubbed after 30 days
4. Resumed → load latest checkpoint, continue from last completed task

**Checkpoint Content:**
- Full project state
- Database snapshots (for training projects)
- Relevant intermediate files
- LZ4-compressed, encrypted with backup key

### Project Templates (MVP)

**Primary Template: Multi-Agent Orchestration**

Workflow pattern:
```
1. DECOMPOSE
   └── Break project into atomic, verifiable tasks
   └── Checkpoint: decomposition.json

2. EVALUATE (using high-quality model: Opus 4.6)
   └── Review task completeness and correctness
   └── Generate evaluation function for each task
   └── Checkpoint: evaluation_criteria.json

3. DELEGATE (to fast/cheap models)
   └── Assign subtasks to appropriate models
   └── Parallel execution where possible
   └── Checkpoint: task_assignments.json

4. VERIFY
   └── Run evaluation functions on results
   └── Retry failed tasks with different model
   └── Checkpoint: verification_results.json

5. ASSEMBLE
   └── Combine verified subtask results
   └── Final quality check
   └── Checkpoint: final_output.json
```

**Alternative Mode: Self-Delegating Agent**
- Send decomposed project to self-delegating model (Kimi 2.5)
- Model handles its own task management
- Rook monitors progress via checkpoints
- Intervene only on failure or stall

**Template Files:**
```
templates/
├── multi-agent-orchestration/
│   ├── checklist.json           # Standard task phases
│   ├── models.yaml              # Recommended model assignments
│   ├── evaluation_prompts/      # Evaluation templates
│   └── example_projects/        # Sample workflows
└── blank/
    └── empty checklist for custom projects
```

---

## Project Structure

```
go-mobile/jetson-agent/
├── .github/
│   ├── workflows/
│   │   ├── build-image.yml       # Build SSD image in CI
│   │   ├── test-sandbox.yml      # Test Citadel integration
│   │   └── release.yml           # Release new versions
│   └── ISSUE_TEMPLATE/
├── docs/
│   ├── superpowers/
│   │   ├── specs/                # Design specifications
│   │   └── plans/                # Implementation plans
│   ├── hardware/
│   │   ├── jetson-setup.md       # JetsonHacks tips for Nano Super
│   │   └── orin-se.md            # Security Engine configuration
│   └── user/
│       ├── quick-start.md
│       └── troubleshooting.md
├── src/
│   ├── rook-agent/             # LOCAL HERMES REPO (will push to GitHub)
│   │   ├── Cargo.toml            # If Rust
│   │   ├── pyproject.toml        # If Python
│   │   └── src/
│   ├── nerve-dashboard/          # Dashboard code
│   │   ├── package.json
│   │   ├── server/
│   │   └── src/
│   ├── citadel-patches/        # Patches for NVIDIA NeMo
│   │   ├── 0001-remove-openclaw.patch
│   │   └── 0002-add-hermes-lifecycle.patch
│   ├── sandbox-manager/          # Shell-based lifecycle (REPLACES TYPESCRIPT)
│   │   ├── docker-compose.yml
│   │   ├── rook-sandbox.service
│   │   └── manager.sh
│   ├── doctor/                   # rook-doctor watchdog
│   │   ├── rook-doctor.service
│   │   └── doctor.sh
│   └── backup/                   # Backup scripts
│       ├── encrypt-and-upload.sh
│       └── restore-from-backup.sh
├── scripts/
│   ├── install.sh                # Main installation script
│   ├── first-boot.sh             # First-boot provisioning
│   ├── build-image.sh            # Build flashable SSD image
│   └── verify-install.sh         # Post-install verification
├── config/
│   ├── ufw/
│   ├── fail2ban/
│   ├── systemd/
│   └── logrotate/
├── container/
│   ├── Dockerfile.citadel-rook
│   └── entrypoint.sh
└── Makefile                      # Build orchestration
```

---

## Git Workflow

### Main Branches

- `main` — Production-ready, tagged releases
- `develop` — Integration branch
- `feature/*` — Feature branches

### Rook (local repo located)

**Source location:** `/home/geo/.hermes/rook-agent/` (existing git repo, Python project)

**Development workflow:**
1. **Current:** Use existing repo at `/home/geo/.hermes/rook-agent/`
2. **Integration:** Copy/symlink into `go-mobile/jetson-agent/src/rook-agent/` for unified development
3. **Modifications:** Make changes needed for NeMo/dashboard compatibility
4. **Pre-production:** Push to `github.com/YOURUSERNAME/rook-agent`
5. **Post-OSS:** Package as pip installable or keep as submodule

### Nerve Dashboard

**Recommendation:** Separate repo (`github.com/YOURUSERNAME/rook-nerve`)
- Independent versioning
- Can be developed/tested without full Jetson environment
- Reusable for other Rook deployments

### NeMo Upstream

- Track: `https://github.com/NVIDIA/NeMo-Agent-Toolkit`
- Pin to specific release tag (e.g., `v0.5.0`)
- Apply patches from `src/citadel-patches/` during build
- Check for updates monthly, test before applying

### Release Process

1. Tag `main` with semantic version (e.g., `v1.2.3`)
2. GitHub Actions builds SSD image artifact
3. Image tested on actual Jetson hardware (manual QA)
4. Release published with:
   - Flashable `.img` file
   - Changelog
   - Checksums (SHA-256)
   - Upgrade instructions

---

## Requirements

- Maintain detailed execution log of all operations
- Document each step completed with timestamps
- Capture any errors or dependency issues encountered
- Enable resumption from last successful checkpoint if process interrupts
- Iterate on failed approaches without repetition
- Produce final install script optimized for Jetson hardware constraints

## Logging

**Dual-format logging:**

| Format | Location | Purpose |
|--------|----------|---------|
| Structured JSON | `/var/log/hermes/*.jsonl` | Machine parseable, dashboard analytics, log aggregation |
| Plain text | `/var/log/hermes/*.log` | Human readable, debugging, tail -f |

**Log rotation:** Applies to both formats
- Max total: 1GB
- Retention: 7 days
- Compression: gzip for archived logs

**JSON schema includes:**
- `timestamp` (ISO 8601)
- `level` (DEBUG, INFO, WARN, ERROR)
- `component` (agent, dashboard, sandbox, etc.)
- `event` (action name)
- `context` (session_id, project_id, etc.)
- `data` (arbitrary structured data)
- `message` (human-readable string)

---

## Specs & Plans

- **Design spec:** `docs/superpowers/specs/2026-04-02-jetson-hermes-nemo-design.md`
- **Implementation plans:** `docs/superpowers/plans/`

---

## Rook Nerve Dashboard

- **Goal:** Streaming chat, voice I/O, telemetry, memory/skills/cron management, multi-agent fleet view, Kanban task board, and file browser
- **Stack:** React 19, Tailwind CSS 4, shadcn/ui, Vite 7, Hono 4, TypeScript, Node.js 22+
- **Architecture:** Hono backend proxies to Rook at `localhost:8642`, reads state from `~/.hermes/`, exposes WebSocket + SSE + REST to a React frontend
- **Implementation location:** `/home/geo/rook-nerve/` (or `src/nerve-dashboard/` if kept in main repo)
- **Frontend dev port:** 3080 (Vite)
- **Backend port:** 3081 (Hono)
- **Rook API:** `http://127.0.0.1:8642`

---

## Multi-Agent Clarification

**Architecture:** Single sandbox, multiple agent processes within

- One Citadel container runs on the device
- Inside that container, Rook spawns 6-8 concurrent agent processes
- Each agent gets 250MB memory limit via cgroup (primary agent: 500MB)
- All agents share the same sandbox security boundary
- Training data segregated per model, not per agent

**Rationale:**
- Lower overhead than 6-8 separate containers
- Shared memory space for inter-agent communication
- Simpler resource management
- Single point of control for sandbox security

---

## Remote Development & Debugging

**Primary (Remote):** Tailscale SSH
- Access from anywhere via Tailscale network
- Key-based authentication
- `tailscale ssh hermes@<device-name>`

**Secondary (Local):** USB-C Ethernet (Jetson Orin Nano)
- USB-C port provides RNDIS Ethernet interface
- Direct connection gives IP address (usually 192.168.55.1 or auto-assigned)
- SSH available over this link without network setup
- Useful for: initial setup, recovery, debugging when Tailscale down

**Emergency:** TTY console
- Physical monitor + keyboard always works
- Boot to multi-user.target ensures TTY login available
- Fallback for complete network failure