Engineering Rig — Proposed Improvements¶
Five improvements to close the gaps identified in architecture-current.md. Inspired by patterns from Gastown, a production multi-agent orchestration system.
1. Session Recovery (Prime)¶
Problem¶
When an agent restarts, crashes, or Claude Code compacts context — all state is lost. The agent starts fresh with no idea what it was working on. Humans experience the same issue when starting a new Claude Code session.
Current flow (broken)¶
sequenceDiagram
participant A as Agent
participant C as Conductor-E
A->>A: Restart / context compaction
A->>A: Lost: branch, issue, PR, review comments
A->>C: GET /api/assignments/next
Note over A: Picks up NEW work<br/>instead of resuming
Proposed flow¶
sequenceDiagram
participant A as Agent / Human
participant S as Prime Script
participant C as Conductor-E
participant G as GitHub
A->>A: Session starts (or compaction)
A->>S: SessionStart hook fires
S->>S: Read current git branch
S->>C: GET /api/agents/{agentId}
S->>G: Check open PRs for this branch
S->>S: Build context summary
S-->>A: Inject: "You are working on repo#42,<br/>branch feature/issue-42-login,<br/>PR #15 has 2 review comments"
A->>A: Resume work where it left off
Implementation¶
Add hooks/conductor-e-prime.sh to rig-tools (and bake into devcontainer):
# Reads: git branch, Conductor-E agent status, GitHub PRs
# Outputs: context summary injected via SessionStart hook
Add to Claude Code settings.json:
Effort: Small (1 shell script + hook config)¶
2. Pre-Tool Guards¶
Problem¶
Agents can run destructive commands without guardrails. A confused agent could git push --force, rm -rf /workspace, or kubectl delete namespace production.
Current flow (unprotected)¶
graph LR
agent[Agent] -->|git push --force| git[Git]
agent -->|rm -rf /| fs[Filesystem]
agent -->|kubectl delete ns| k8s[Cluster]
style git fill:#ff6666,color:#000
style fs fill:#ff6666,color:#000
style k8s fill:#ff6666,color:#000
Proposed flow¶
graph LR
agent[Agent] -->|command| guard[PreToolUse Guard]
guard -->|safe| exec[Execute]
guard -->|dangerous| block[Block + Log]
block -->|event| conductor[Conductor-E]
style block fill:#ff9999,color:#000
style exec fill:#99ff99,color:#000
What gets blocked¶
| Pattern | Why |
|---|---|
git push --force |
Destroys remote history |
git reset --hard |
Loses uncommitted work |
rm -rf / or rm -rf ~ |
Filesystem destruction |
kubectl delete namespace |
Cluster destruction |
DROP TABLE, DROP DATABASE |
Data loss |
chmod 777 |
Security risk |
Implementation¶
Add hooks/pretool-guard.sh to rig-tools:
# Reads tool_input from Claude Code PreToolUse hook
# Checks against blocklist
# Exit 2 to block, exit 0 to allow
Add to Claude Code settings.json:
Effort: Small (1 shell script + hook config)¶
3. Agent Identity Attribution¶
Problem¶
Git commits from agents use generic names. When reviewing history, you can't tell which agent made a change or trace quality issues to a specific agent instance.
Current state¶
abc1234 feat: add login (Dev-E <noreply@dashecorp.com>)
def5678 fix: auth bug (Dev-E <noreply@dashecorp.com>)
Which Dev-E? Node? Dotnet? Was it a human or agent?
Proposed state¶
abc1234 feat: add login (dev-e-node <agent@dashecorp.com>)
def5678 fix: auth bug (human-stig <stig@dashecorp.com>)
ghi9012 refactor: cleanup (dev-e-dotnet <agent@dashecorp.com>)
Implementation¶
graph TB
subgraph "Agent (k8s)"
env[AGENT_ID=dev-e-node]
git_config[git config user.name = dev-e-node]
end
subgraph "Human (local)"
hooks_env[CONDUCTOR_AGENT_ID=human-stig]
git_user[git config user.name = human-stig]
end
subgraph "Conductor-E"
history[Work history per agent identity]
cost[Cost tracking per agent identity]
end
Set in HelmRelease values:
extraEnv:
- name: GIT_AUTHOR_NAME
value: "dev-e-node"
- name: GIT_AUTHOR_EMAIL
value: "agent@dashecorp.com"
For humans, rig-tools install sets CONDUCTOR_AGENT_ID=human-$(whoami).
Effort: Small (env vars in HelmRelease + rig-tools)¶
4. Centralized Hooks Config¶
Problem¶
Each developer and agent workspace configures Claude Code hooks independently. No consistency. New team members miss critical hooks. Updates require manual changes everywhere.
Current state¶
graph TB
ws1[Workspace 1<br/>settings.json] -->|manual| hooks1[heartbeat hook]
ws2[Workspace 2<br/>settings.json] -->|manual| hooks2[heartbeat + guard]
ws3[Workspace 3<br/>settings.json] -->|missing| hooks3[no hooks]
style hooks3 fill:#ff9999,color:#000
Proposed state¶
graph TB
base[rig-tools/hooks-base.json<br/>Base config for everyone]
dev_override[hooks-overrides/dev.json<br/>Dev-specific overrides]
review_override[hooks-overrides/reviewer.json<br/>Reviewer overrides]
base --> merge1[Merge]
dev_override --> merge1
merge1 --> ws1[Dev workspace<br/>settings.json]
base --> merge2[Merge]
review_override --> merge2
merge2 --> ws2[Reviewer workspace<br/>settings.json]
base --> ws3[Default workspace<br/>settings.json]
Base hooks (all roles)¶
| Hook | Event | Purpose |
|---|---|---|
conductor-e-prime |
SessionStart | Resume context after restart |
conductor-e-hook |
PostToolUse | Heartbeat + event detection |
conductor-e-hook |
Stop | Mark idle |
pretool-guard |
PreToolUse | Block dangerous commands |
Implementation¶
Add to rig-tools:
hooks-base.json # Shared base config
hooks-overrides/
dev.json # Dev-E specific
reviewer.json # Review-E specific
scripts/hooks-sync.sh # Generate settings.json from merged config
./install.sh runs hooks-sync.sh automatically.
Effort: Medium (config files + merge script + install update)¶
5. Escalation with Severity Routing¶
Problem¶
When agents get stuck, they post a message to Discord. No severity levels, no routing, no tracking, no re-escalation. Critical issues get the same treatment as minor blockers.
Current flow¶
graph LR
agent[Stuck Agent] -->|"🛑 Stuck on repo#42"| discord[Discord #tasks]
discord -->|human notices... eventually| human[Human]
style discord fill:#ffcc00,color:#000
Proposed flow¶
graph TB
agent[Agent] -->|ESCALATE P2| conductor[Conductor-E]
conductor -->|P2: Medium| thread[Discord Thread<br/>on the PR]
conductor -->|P1: High| channel[Discord Channel<br/>#admin]
conductor -->|P0: Critical| dm[Discord DM<br/>+ @mention]
conductor -->|4h unacked?| bump[Bump Severity<br/>P2→P1→P0]
bump -->|re-route| conductor
Severity levels¶
| Level | When | Notification | Auto-escalate |
|---|---|---|---|
| P2 | Minor blocker, needs guidance | Discord thread | → P1 after 4h |
| P1 | CI stuck, review conflict | Discord #admin | → P0 after 4h |
| P0 | Security issue, data risk | Discord DM + @mention | Stays P0 |
Implementation¶
New Conductor-E events:
ESCALATION_CREATED { severity, reason, agentId, repo, issueNumber }
ESCALATION_ACKED { escalationId }
ESCALATION_CLOSED { escalationId, resolution }
New rig-tools command:
Conductor-E cron job checks unacked escalations every hour, bumps severity after threshold.
Effort: Medium (Conductor-E API changes + rig-tools CLI + Discord routing)¶
Implementation Roadmap¶
gantt
title Rig Improvements
dateFormat YYYY-MM-DD
section Phase 1 (Quick Wins)
Session Recovery (Prime) :p1, 2026-04-17, 2d
Pre-Tool Guards :p2, 2026-04-17, 1d
Agent Identity Attribution :p3, 2026-04-17, 1d
section Phase 2 (Consistency)
Centralized Hooks Config :p4, after p1, 3d
section Phase 3 (Reliability)
Escalation System :p5, after p4, 5d
What This Does NOT Change¶
- Conductor-E stays as the central coordinator (not replaced by a CLI)
- GitHub Issues stays as the issue tracker (not replaced by Beads)
- FluxCD stays for GitOps (no change)
- PostgreSQL + Marten stays for event sourcing (not replaced by Dolt)
- Discord stays for communication (enhanced, not replaced)
- KEDA scale-to-zero stays (no change)
These improvements layer on top of the existing architecture. No rewrites.