Skip to content

Engineering Rig — Proposed Improvements

Five improvements to close the gaps identified in architecture-current.md. Inspired by patterns from Gastown, a production multi-agent orchestration system.

1. Session Recovery (Prime)

Problem

When an agent restarts, crashes, or Claude Code compacts context — all state is lost. The agent starts fresh with no idea what it was working on. Humans experience the same issue when starting a new Claude Code session.

Current flow (broken)

sequenceDiagram
    participant A as Agent
    participant C as Conductor-E

    A->>A: Restart / context compaction
    A->>A: Lost: branch, issue, PR, review comments
    A->>C: GET /api/assignments/next
    Note over A: Picks up NEW work<br/>instead of resuming

Proposed flow

sequenceDiagram
    participant A as Agent / Human
    participant S as Prime Script
    participant C as Conductor-E
    participant G as GitHub

    A->>A: Session starts (or compaction)
    A->>S: SessionStart hook fires
    S->>S: Read current git branch
    S->>C: GET /api/agents/{agentId}
    S->>G: Check open PRs for this branch
    S->>S: Build context summary
    S-->>A: Inject: "You are working on repo#42,<br/>branch feature/issue-42-login,<br/>PR #15 has 2 review comments"
    A->>A: Resume work where it left off

Implementation

Add hooks/conductor-e-prime.sh to rig-tools (and bake into devcontainer):

# Reads: git branch, Conductor-E agent status, GitHub PRs
# Outputs: context summary injected via SessionStart hook

Add to Claude Code settings.json:

{
  "hooks": {
    "SessionStart": [{"type": "command", "command": "conductor-e-prime"}]
  }
}

Effort: Small (1 shell script + hook config)


2. Pre-Tool Guards

Problem

Agents can run destructive commands without guardrails. A confused agent could git push --force, rm -rf /workspace, or kubectl delete namespace production.

Current flow (unprotected)

graph LR
    agent[Agent] -->|git push --force| git[Git]
    agent -->|rm -rf /| fs[Filesystem]
    agent -->|kubectl delete ns| k8s[Cluster]
    style git fill:#ff6666,color:#000
    style fs fill:#ff6666,color:#000
    style k8s fill:#ff6666,color:#000

Proposed flow

graph LR
    agent[Agent] -->|command| guard[PreToolUse Guard]
    guard -->|safe| exec[Execute]
    guard -->|dangerous| block[Block + Log]
    block -->|event| conductor[Conductor-E]
    style block fill:#ff9999,color:#000
    style exec fill:#99ff99,color:#000

What gets blocked

Pattern Why
git push --force Destroys remote history
git reset --hard Loses uncommitted work
rm -rf / or rm -rf ~ Filesystem destruction
kubectl delete namespace Cluster destruction
DROP TABLE, DROP DATABASE Data loss
chmod 777 Security risk

Implementation

Add hooks/pretool-guard.sh to rig-tools:

# Reads tool_input from Claude Code PreToolUse hook
# Checks against blocklist
# Exit 2 to block, exit 0 to allow

Add to Claude Code settings.json:

{
  "hooks": {
    "PreToolUse": [{"type": "command", "command": "pretool-guard"}]
  }
}

Effort: Small (1 shell script + hook config)


3. Agent Identity Attribution

Problem

Git commits from agents use generic names. When reviewing history, you can't tell which agent made a change or trace quality issues to a specific agent instance.

Current state

abc1234 feat: add login (Dev-E <noreply@dashecorp.com>)
def5678 fix: auth bug (Dev-E <noreply@dashecorp.com>)

Which Dev-E? Node? Dotnet? Was it a human or agent?

Proposed state

abc1234 feat: add login (dev-e-node <agent@dashecorp.com>)
def5678 fix: auth bug (human-stig <stig@dashecorp.com>)
ghi9012 refactor: cleanup (dev-e-dotnet <agent@dashecorp.com>)

Implementation

graph TB
    subgraph "Agent (k8s)"
        env[AGENT_ID=dev-e-node]
        git_config[git config user.name = dev-e-node]
    end

    subgraph "Human (local)"
        hooks_env[CONDUCTOR_AGENT_ID=human-stig]
        git_user[git config user.name = human-stig]
    end

    subgraph "Conductor-E"
        history[Work history per agent identity]
        cost[Cost tracking per agent identity]
    end

Set in HelmRelease values:

extraEnv:
  - name: GIT_AUTHOR_NAME
    value: "dev-e-node"
  - name: GIT_AUTHOR_EMAIL
    value: "agent@dashecorp.com"

For humans, rig-tools install sets CONDUCTOR_AGENT_ID=human-$(whoami).

Effort: Small (env vars in HelmRelease + rig-tools)


4. Centralized Hooks Config

Problem

Each developer and agent workspace configures Claude Code hooks independently. No consistency. New team members miss critical hooks. Updates require manual changes everywhere.

Current state

graph TB
    ws1[Workspace 1<br/>settings.json] -->|manual| hooks1[heartbeat hook]
    ws2[Workspace 2<br/>settings.json] -->|manual| hooks2[heartbeat + guard]
    ws3[Workspace 3<br/>settings.json] -->|missing| hooks3[no hooks]
    style hooks3 fill:#ff9999,color:#000

Proposed state

graph TB
    base[rig-tools/hooks-base.json<br/>Base config for everyone]
    dev_override[hooks-overrides/dev.json<br/>Dev-specific overrides]
    review_override[hooks-overrides/reviewer.json<br/>Reviewer overrides]

    base --> merge1[Merge]
    dev_override --> merge1
    merge1 --> ws1[Dev workspace<br/>settings.json]

    base --> merge2[Merge]
    review_override --> merge2
    merge2 --> ws2[Reviewer workspace<br/>settings.json]

    base --> ws3[Default workspace<br/>settings.json]

Base hooks (all roles)

Hook Event Purpose
conductor-e-prime SessionStart Resume context after restart
conductor-e-hook PostToolUse Heartbeat + event detection
conductor-e-hook Stop Mark idle
pretool-guard PreToolUse Block dangerous commands

Implementation

Add to rig-tools:

hooks-base.json                    # Shared base config
hooks-overrides/
  dev.json                         # Dev-E specific
  reviewer.json                    # Review-E specific
scripts/hooks-sync.sh             # Generate settings.json from merged config

./install.sh runs hooks-sync.sh automatically.

Effort: Medium (config files + merge script + install update)


5. Escalation with Severity Routing

Problem

When agents get stuck, they post a message to Discord. No severity levels, no routing, no tracking, no re-escalation. Critical issues get the same treatment as minor blockers.

Current flow

graph LR
    agent[Stuck Agent] -->|"🛑 Stuck on repo#42"| discord[Discord #tasks]
    discord -->|human notices... eventually| human[Human]
    style discord fill:#ffcc00,color:#000

Proposed flow

graph TB
    agent[Agent] -->|ESCALATE P2| conductor[Conductor-E]

    conductor -->|P2: Medium| thread[Discord Thread<br/>on the PR]
    conductor -->|P1: High| channel[Discord Channel<br/>#admin]
    conductor -->|P0: Critical| dm[Discord DM<br/>+ @mention]

    conductor -->|4h unacked?| bump[Bump Severity<br/>P2→P1→P0]
    bump -->|re-route| conductor

Severity levels

Level When Notification Auto-escalate
P2 Minor blocker, needs guidance Discord thread → P1 after 4h
P1 CI stuck, review conflict Discord #admin → P0 after 4h
P0 Security issue, data risk Discord DM + @mention Stays P0

Implementation

New Conductor-E events:

ESCALATION_CREATED  { severity, reason, agentId, repo, issueNumber }
ESCALATION_ACKED    { escalationId }
ESCALATION_CLOSED   { escalationId, resolution }

New rig-tools command:

conductor-e-hook ESCALATE --severity P1 "CI fails on auth tests, tried 3 times"

Conductor-E cron job checks unacked escalations every hour, bumps severity after threshold.

Effort: Medium (Conductor-E API changes + rig-tools CLI + Discord routing)


Implementation Roadmap

gantt
    title Rig Improvements
    dateFormat YYYY-MM-DD

    section Phase 1 (Quick Wins)
    Session Recovery (Prime)        :p1, 2026-04-17, 2d
    Pre-Tool Guards                 :p2, 2026-04-17, 1d
    Agent Identity Attribution      :p3, 2026-04-17, 1d

    section Phase 2 (Consistency)
    Centralized Hooks Config        :p4, after p1, 3d

    section Phase 3 (Reliability)
    Escalation System               :p5, after p4, 5d

What This Does NOT Change

  • Conductor-E stays as the central coordinator (not replaced by a CLI)
  • GitHub Issues stays as the issue tracker (not replaced by Beads)
  • FluxCD stays for GitOps (no change)
  • PostgreSQL + Marten stays for event sourcing (not replaced by Dolt)
  • Discord stays for communication (enhanced, not replaced)
  • KEDA scale-to-zero stays (no change)

These improvements layer on top of the existing architecture. No rewrites.