# Agent Work-Claim & Liveness System

**Date:** 2026-04-06
**Status:** Design (not yet implemented)
**Scope:** Standalone script alongside Engram, or future Engram module

---

## Problem

User runs 3-4 AI agents concurrently (Claude Code, Gemini CLI, Copilot, etc.) across the same repos. Without coordination:

1. Two agents edit the same file simultaneously, creating conflicts
2. An agent starts work on a module another agent is mid-refactor on
3. A crashed agent leaves orphaned worktrees or half-applied changes that block others

Git worktrees provide filesystem isolation but don't communicate **intent** — agents don't know what each other are working on.

## Design

### Claim Registry

Shared directory: `~/.agent-claims/`

Each active claim is a JSON file named `{agent_id}.json`:

```json
{
  "agent_id": "claude-code-session-abc123",
  "agent_type": "claude-code",
  "scope": {
    "repo": "/home/geodesix/engram",
    "paths": ["src/engram/consolidator.py", "src/engram/mcp_tools.py"],
    "branch": "fix/consolidator-bugs",
    "worktree": "/tmp/engram-wt-abc123"
  },
  "task_description": "Fixing consolidator tag filtering and archive collision",
  "claimed_at": "2026-04-06T14:30:00Z",
  "heartbeat": "2026-04-06T14:31:30Z",
  "ttl_seconds": 180,
  "pid": 54321
}
```

### Heartbeat / Canary

- Claiming agent updates `heartbeat` timestamp every 60 seconds
- Implementation: background thread or cron-style touch
- For Claude Code: could be a hook that fires on tool use (natural heartbeat)

### Liveness Detection (Two Layers)

1. **PID check** (fast, local only): `kill -0 $pid` — is the process still running?
2. **Heartbeat expiry** (works for remote agents too): `now - heartbeat > ttl_seconds` means stale

Both must fail before a claim is considered abandoned.

### Claim Lifecycle

```
Agent starts work
  → Check existing claims for scope overlap
  → If overlap: warn and pick different scope, or wait
  → If clear: write claim file
  → Start heartbeat loop

Agent working
  → Heartbeat updates every 60s
  → Other agents see claim, avoid those paths

Agent finishes
  → Delete claim file
  → Clean up worktree if applicable

Agent crashes
  → Heartbeat stops updating
  → After ttl_seconds (default 180s), other agents treat claim as expired
  → Cleanup script can remove stale claim files and orphaned worktrees
```

### Scope Overlap Detection

An agent checking before starting work:

```python
def check_conflicts(my_repo: str, my_paths: list[str]) -> list[dict]:
    """Return list of active claims that overlap with intended work."""
    claims_dir = Path.home() / ".agent-claims"
    conflicts = []
    now = datetime.now(timezone.utc)

    for claim_file in claims_dir.glob("*.json"):
        claim = json.loads(claim_file.read_text())

        # Check liveness
        heartbeat = datetime.fromisoformat(claim["heartbeat"])
        ttl = timedelta(seconds=claim["ttl_seconds"])
        if now - heartbeat > ttl:
            # Check PID as backup
            pid = claim.get("pid")
            if pid and not pid_alive(pid):
                claim_file.unlink()  # Clean up stale claim
                continue

        # Check repo match
        if claim["scope"]["repo"] != my_repo:
            continue

        # Check path overlap
        claimed_paths = set(claim["scope"].get("paths", []))
        if claimed_paths.intersection(my_paths):
            conflicts.append(claim)

    return conflicts
```

### Stale Claim Cleanup

Standalone script or cron job:

```bash
# Run every 5 minutes via cron or systemd timer
engram-claim-cleanup
```

- Removes claim files where heartbeat expired AND PID is dead
- Logs removed claims for audit
- Optionally cleans up orphaned git worktrees listed in stale claims

## Write-Own, Read-All Memory Model

Separate but related: Engram memories should have ownership.

- Each memory tagged with `owner_agent` in frontmatter
- `save_memory` auto-sets owner from `ENGRAM_AGENT_ID` env var
- `update_memory` / `delete_memory` enforce owner match
- `search_memory` / all read operations: unrestricted
- Not a security boundary — a coordination guard against accidental overwrites

Agent identity set via environment:
```bash
export ENGRAM_AGENT_ID=claude-code    # in Claude Code
export ENGRAM_AGENT_ID=gemini-cli     # in Gemini CLI
export ENGRAM_AGENT_ID=rook           # in Rook agent
```

## Implementation Options

### Option A: Standalone Script (Recommended for MVP)
- `scripts/agent-claims.py` — CLI tool: `claim`, `release`, `check`, `cleanup`
- No Engram dependency — works independently
- Any agent can call it via shell
- Add to Engram later as a module if it proves useful

### Option B: Engram Module
- `src/engram/claims.py` + MCP tools: `claim_work`, `release_work`, `check_claims`
- Integrated with Engram's existing MCP server
- Agents that already use Engram MCP get it for free
- Tighter integration with memory ownership model

### Option C: Both
- Standalone script for non-Engram agents
- Engram MCP tools that call the same underlying logic
- Shared claims directory either way

## Open Questions

1. **Scope granularity**: File-level? Directory-level? Branch-level? Probably all three, agent picks what makes sense.
2. **Remote agents**: Agents on Jarvis working on same repo as agents on desktop — claims need to be visible across Tailscale. Shared claims via Engram MCP endpoint?
3. **Conflict resolution**: What happens when an agent NEEDS to touch a claimed file? Force-claim with warning? Queue?
4. **Integration with git worktrees**: Should claiming automatically create a worktree? Or just record which worktree is being used?
