Engineering Rig — Current Architecture¶
Overview¶
The engineering rig is an AI-assisted development platform where AI agents and humans collaborate on code. Agents run on a GCP k8s cluster. Humans run Claude Code locally. Both report to a central coordinator (Conductor-E) and follow the same workflow rules.
System Diagram¶
graph TB
subgraph "GitHub"
issues[GitHub Issues]
prs[Pull Requests]
webhooks[Webhooks]
end
subgraph "GCP k3s Cluster"
subgraph "Conductor-E"
api[Conductor-E API<br/>.NET 10]
postgres[(PostgreSQL<br/>Event Store)]
valkey[(Valkey<br/>Streams + Signals)]
cost[Cost Dashboard]
end
subgraph "Dev-E Agents"
dev_node[Dev-E Node<br/>StatefulSet]
dev_dotnet[Dev-E Dotnet<br/>StatefulSet]
dev_python[Dev-E Python<br/>StatefulSet]
end
subgraph "Review-E"
review[Review-E<br/>StatefulSet]
end
subgraph "Infrastructure"
keda[KEDA<br/>Autoscaler]
flux[FluxCD<br/>GitOps]
tunnel[Cloudflare<br/>Tunnel]
weave[Weave GitOps<br/>Dashboard]
end
end
subgraph "Human Workstations"
claude[Claude Code<br/>Workspaces]
hooks[rig-tools<br/>Hooks]
end
subgraph "Monitoring"
discord[Discord<br/>Channels]
flux_dash[flux.dashecorp.com]
conductor_dash[conductor-e.dashecorp.com]
end
issues -->|label: agent-ready| webhooks
webhooks -->|POST /api/webhook/github| api
api --> postgres
api --> valkey
valkey -->|signal| keda
keda -->|scale 0→1| dev_node
keda -->|scale 0→1| dev_dotnet
keda -->|scale 0→1| dev_python
keda -->|scale 0→1| review
dev_node -->|clone, branch, implement| prs
dev_dotnet -->|clone, branch, implement| prs
review -->|review PR| prs
prs -->|webhook| api
api -->|alerts| discord
claude -->|hooks| hooks
hooks -->|events| api
flux -->|reconcile| dev_node
flux -->|reconcile| review
tunnel --> api
tunnel --> weave
Components¶
Conductor-E (the brain)¶
Event-sourced coordinator. Receives GitHub webhooks, assigns work to agents, tracks progress.
| Endpoint | Method | Purpose |
|---|---|---|
/api/webhook/github |
POST | Receives issues, PRs, reviews, check_runs |
/api/events |
POST | Agents report: WORK_STARTED, PR_CREATED, HEARTBEAT, AGENT_STUCK |
/api/assignments/next |
GET | Agent claims next assignment |
/api/issues |
GET | All tracked issues with state |
/api/agents |
GET | Agent status (working/idle/stuck) |
Tech: .NET 10, Marten event sourcing, PostgreSQL, Valkey for streams/signals.
State machine for each issue:
stateDiagram-v2
[*] --> queued: issue labeled agent-ready
queued --> assigned: agent claims
assigned --> in_progress: WORK_STARTED
in_progress --> in_review: PR_CREATED
in_review --> changes_requested: review rejects
changes_requested --> in_review: agent pushes fix
in_review --> ready_to_merge: review approves
ready_to_merge --> done: PR merged
in_progress --> stuck: AGENT_STUCK
stuck --> in_progress: unstuck / reassigned
Rig Agent Runtime (the hands)¶
Shared Node.js runtime that all agents use. Loaded with a character config (personality, tools, LLM provider, MCP servers). One image, many agents.
graph LR
char[character.json] --> runtime[Rig Agent Runtime]
runtime --> discord[Discord Gateway]
runtime --> llm[LLM Provider<br/>Claude CLI / Codex / API]
runtime --> mcp[MCP Servers<br/>GitHub, Advisor, Memory]
runtime --> heartbeat[Heartbeat<br/>→ Conductor-E]
runtime --> dashboard[Dashboard<br/>:3000]
Character config defines everything about an agent:
character:
name: "Dev-E (Node)"
personality: "You are Dev-E, a development agent..."
llm:
provider: claude-cli
model: claude-sonnet-4-6
mcpServers:
github:
command: npx
args: ["-y", "@modelcontextprotocol/server-github"]
cron:
schedule: "*/5 * * * *"
prompt: "Check for work..."
Multi-stack images — same runtime, different language tooling:
| Tag | Tools | Used by |
|---|---|---|
base |
Node.js 22, Claude CLI, Codex CLI, gh | Base for all |
node |
+ TypeScript, Jest, ESLint | Dev-E Node |
dotnet |
+ .NET 10 SDK | Dev-E Dotnet |
python |
+ Python 3, pytest, black | Dev-E Python |
Dev-E (the developers)¶
Three stack variants, all running the same runtime with different character configs. Each polls Conductor-E for assignments every 5 minutes.
Work flow:
sequenceDiagram
participant C as Conductor-E
participant D as Dev-E
participant G as GitHub
participant R as Review-E
D->>C: GET /api/assignments/next?agentId=dev-e
C-->>D: Assignment: repo#42 "Add login"
D->>C: POST WORK_STARTED
D->>G: Clone repo, create branch
D->>D: Implement with Claude Code CLI
D->>G: Push branch, create PR
D->>C: POST PR_CREATED
G->>C: Webhook: pull_request opened
C->>R: Routes to Review-E
R->>G: Review PR
alt Approved
R->>G: Approve
G->>G: Auto-merge
G->>C: Webhook: PR merged
C->>C: Issue → done
else Changes Requested
R->>G: Request changes
G->>C: Webhook: review submitted
C->>D: Routes back to Dev-E
D->>D: Fix, push
end
Review-E (quality gate)¶
Reviews every PR from Dev-E. Structurally separate — the agent that writes code cannot approve it.
Review checklist: 1. Correctness — does it match the issue? 2. Security — OWASP top 10 3. Tests — adequate coverage 4. Docs — updated if behavior changed, valid YAML frontmatter 5. Commits — conventional format
Human gate: Sensitive files (auth, payment, migration, GDPR, schema) trigger escalation to human. Review-E will NOT approve these.
KEDA (scale-to-zero)¶
Agents idle most of the time. KEDA watches Valkey for signals and scales agents 0→1 when work arrives. Cooldown: 20 minutes.
No work → 0 pods (zero cost)
Issue labeled → Conductor-E writes signal to Valkey
KEDA detects signal → scales Dev-E to 1 pod
Work completes → 20 min cooldown → back to 0
Human Developers¶
Humans use Claude Code locally with the same workflow rules as agents.
rig-tools hooks connect humans to Conductor-E:
graph LR
human[Human + Claude Code] -->|PostToolUse| hook[conductor-e-hook]
hook -->|HEARTBEAT| conductor[Conductor-E API]
human -->|git checkout -b| hook
hook -->|WORK_STARTED| conductor
human -->|gh pr create| hook
hook -->|PR_CREATED| conductor
Hooks fire automatically via Claude Code settings.json. For other AI tools (Codex, Copilot, Cursor), call the CLI directly.
Devcontainers¶
Humans can work inside the same container image as agents:
Agent on k8s: rig-agent-runtime:dotnet → Conductor-E
Human locally: rig-agent-runtime:dotnet (devcontainer) → Conductor-E
Each repo has .devcontainer/devcontainer.json pointing to the right stack image.
Infrastructure¶
GCP k3s Cluster¶
| Resource | Detail |
|---|---|
| VM | invotek-k3s — e2-standard-2 (2 vCPU, 8GB) |
| Region | europe-north1-b |
| K8s | k3s v1.34, single node |
| GitOps | FluxCD — watches dashecorp/rig-gitops |
| Images | GCP Artifact Registry: europe-north1-docker.pkg.dev/invotek-github-infra/dashecorp/ |
| Tunnel | Cloudflare dashecorp-gcp → conductor-e.dashecorp.com, flux.dashecorp.com |
| Access | gcloud compute ssh invotek-k3s --zone europe-north1-b --project invotek-github-infra |
Monitoring¶
| System | What | Where |
|---|---|---|
| Flux Discord alerts | Reconciliation failures | Discord channel |
| Weave GitOps | Visual Flux dashboard | https://flux.dashecorp.com |
| Conductor-E cost dashboard | Per-agent token usage | https://conductor-e.dashecorp.com |
| Conductor-E API | Agent status, issue state | https://conductor-e.dashecorp.com/api/agents |
GitOps Flow¶
graph LR
dev[Developer] -->|PR| gitops[dashecorp/rig-gitops]
gitops -->|FluxCD watches| flux[Flux on k3s]
flux -->|reconcile| cluster[k8s Resources]
flux -->|error| discord[Discord Alert]
All deployments go through git. No kubectl apply manually.
Repositories¶
| Repo | Purpose |
|---|---|
dashecorp/conductor-e |
Event store, assignment engine, API |
dashecorp/rig-agent-runtime |
Shared agent runtime + Helm chart |
dashecorp/rig-gitops |
FluxCD manifests, AGENTS.md, docs, templates |
dashecorp/dev-e |
Dev agent .NET worker (future replacement) |
dashecorp/review-e |
Review agent .NET worker (future replacement) |
dashecorp/rig-tools |
Developer hooks, workflow sync |
dashecorp/infra |
OpenTofu — GCP VM, Cloudflare, GitHub repos |
Memory¶
All agents use a shared rig-memory-mcp server backed by the Marten Postgres (with pgvector).
Memory is cross-agent — Dev-E, Dev-E Dotnet, Dev-E Python, and Review-E all read/write to the same store.
| Component | Detail |
|---|---|
| MCP server | @dashecorp/rig-memory-mcp (pre-installed in rig-agent-runtime image) |
| Backend | PostgreSQL + pgvector extension — same instance as Marten event store |
| Connection | DB_URL env var from {agent}-secrets.database-url |
| Extension init | apps/conductor-e/postgres-pgvector-job.yaml (one-time Job) |
Gaps and Limitations¶
- No pre-tool guards — agents can run destructive commands unchecked
- No agent identity attribution — git commits use generic names
- No escalation routing — stuck agents just post to Discord
- No centralized hooks config — each workspace configured independently
- Single node — no HA, single point of failure
- No inter-agent messaging — all communication routes through Conductor-E or Discord