Skip to content

Engineering Rig — Current Architecture

Overview

The engineering rig is an AI-assisted development platform where AI agents and humans collaborate on code. Agents run on a GCP k8s cluster. Humans run Claude Code locally. Both report to a central coordinator (Conductor-E) and follow the same workflow rules.

System Diagram

graph TB
    subgraph "GitHub"
        issues[GitHub Issues]
        prs[Pull Requests]
        webhooks[Webhooks]
    end

    subgraph "GCP k3s Cluster"
        subgraph "Conductor-E"
            api[Conductor-E API<br/>.NET 10]
            postgres[(PostgreSQL<br/>Event Store)]
            valkey[(Valkey<br/>Streams + Signals)]
            cost[Cost Dashboard]
        end

        subgraph "Dev-E Agents"
            dev_node[Dev-E Node<br/>StatefulSet]
            dev_dotnet[Dev-E Dotnet<br/>StatefulSet]
            dev_python[Dev-E Python<br/>StatefulSet]
        end

        subgraph "Review-E"
            review[Review-E<br/>StatefulSet]
        end

        subgraph "Infrastructure"
            keda[KEDA<br/>Autoscaler]
            flux[FluxCD<br/>GitOps]
            tunnel[Cloudflare<br/>Tunnel]
            weave[Weave GitOps<br/>Dashboard]
        end
    end

    subgraph "Human Workstations"
        claude[Claude Code<br/>Workspaces]
        hooks[rig-tools<br/>Hooks]
    end

    subgraph "Monitoring"
        discord[Discord<br/>Channels]
        flux_dash[flux.dashecorp.com]
        conductor_dash[conductor-e.dashecorp.com]
    end

    issues -->|label: agent-ready| webhooks
    webhooks -->|POST /api/webhook/github| api
    api --> postgres
    api --> valkey
    valkey -->|signal| keda
    keda -->|scale 0→1| dev_node
    keda -->|scale 0→1| dev_dotnet
    keda -->|scale 0→1| dev_python
    keda -->|scale 0→1| review
    dev_node -->|clone, branch, implement| prs
    dev_dotnet -->|clone, branch, implement| prs
    review -->|review PR| prs
    prs -->|webhook| api
    api -->|alerts| discord
    claude -->|hooks| hooks
    hooks -->|events| api
    flux -->|reconcile| dev_node
    flux -->|reconcile| review
    tunnel --> api
    tunnel --> weave

Components

Conductor-E (the brain)

Event-sourced coordinator. Receives GitHub webhooks, assigns work to agents, tracks progress.

Endpoint Method Purpose
/api/webhook/github POST Receives issues, PRs, reviews, check_runs
/api/events POST Agents report: WORK_STARTED, PR_CREATED, HEARTBEAT, AGENT_STUCK
/api/assignments/next GET Agent claims next assignment
/api/issues GET All tracked issues with state
/api/agents GET Agent status (working/idle/stuck)

Tech: .NET 10, Marten event sourcing, PostgreSQL, Valkey for streams/signals.

State machine for each issue:

stateDiagram-v2
    [*] --> queued: issue labeled agent-ready
    queued --> assigned: agent claims
    assigned --> in_progress: WORK_STARTED
    in_progress --> in_review: PR_CREATED
    in_review --> changes_requested: review rejects
    changes_requested --> in_review: agent pushes fix
    in_review --> ready_to_merge: review approves
    ready_to_merge --> done: PR merged
    in_progress --> stuck: AGENT_STUCK
    stuck --> in_progress: unstuck / reassigned

Rig Agent Runtime (the hands)

Shared Node.js runtime that all agents use. Loaded with a character config (personality, tools, LLM provider, MCP servers). One image, many agents.

graph LR
    char[character.json] --> runtime[Rig Agent Runtime]
    runtime --> discord[Discord Gateway]
    runtime --> llm[LLM Provider<br/>Claude CLI / Codex / API]
    runtime --> mcp[MCP Servers<br/>GitHub, Advisor, Memory]
    runtime --> heartbeat[Heartbeat<br/>→ Conductor-E]
    runtime --> dashboard[Dashboard<br/>:3000]

Character config defines everything about an agent:

character:
  name: "Dev-E (Node)"
  personality: "You are Dev-E, a development agent..."
  llm:
    provider: claude-cli
    model: claude-sonnet-4-6
  mcpServers:
    github:
      command: npx
      args: ["-y", "@modelcontextprotocol/server-github"]
  cron:
    schedule: "*/5 * * * *"
    prompt: "Check for work..."

Multi-stack images — same runtime, different language tooling:

Tag Tools Used by
base Node.js 22, Claude CLI, Codex CLI, gh Base for all
node + TypeScript, Jest, ESLint Dev-E Node
dotnet + .NET 10 SDK Dev-E Dotnet
python + Python 3, pytest, black Dev-E Python

Dev-E (the developers)

Three stack variants, all running the same runtime with different character configs. Each polls Conductor-E for assignments every 5 minutes.

Work flow:

sequenceDiagram
    participant C as Conductor-E
    participant D as Dev-E
    participant G as GitHub
    participant R as Review-E

    D->>C: GET /api/assignments/next?agentId=dev-e
    C-->>D: Assignment: repo#42 "Add login"
    D->>C: POST WORK_STARTED
    D->>G: Clone repo, create branch
    D->>D: Implement with Claude Code CLI
    D->>G: Push branch, create PR
    D->>C: POST PR_CREATED
    G->>C: Webhook: pull_request opened
    C->>R: Routes to Review-E
    R->>G: Review PR
    alt Approved
        R->>G: Approve
        G->>G: Auto-merge
        G->>C: Webhook: PR merged
        C->>C: Issue → done
    else Changes Requested
        R->>G: Request changes
        G->>C: Webhook: review submitted
        C->>D: Routes back to Dev-E
        D->>D: Fix, push
    end

Review-E (quality gate)

Reviews every PR from Dev-E. Structurally separate — the agent that writes code cannot approve it.

Review checklist: 1. Correctness — does it match the issue? 2. Security — OWASP top 10 3. Tests — adequate coverage 4. Docs — updated if behavior changed, valid YAML frontmatter 5. Commits — conventional format

Human gate: Sensitive files (auth, payment, migration, GDPR, schema) trigger escalation to human. Review-E will NOT approve these.

KEDA (scale-to-zero)

Agents idle most of the time. KEDA watches Valkey for signals and scales agents 0→1 when work arrives. Cooldown: 20 minutes.

No work → 0 pods (zero cost)
Issue labeled → Conductor-E writes signal to Valkey
KEDA detects signal → scales Dev-E to 1 pod
Work completes → 20 min cooldown → back to 0

Human Developers

Humans use Claude Code locally with the same workflow rules as agents.

rig-tools hooks connect humans to Conductor-E:

graph LR
    human[Human + Claude Code] -->|PostToolUse| hook[conductor-e-hook]
    hook -->|HEARTBEAT| conductor[Conductor-E API]
    human -->|git checkout -b| hook
    hook -->|WORK_STARTED| conductor
    human -->|gh pr create| hook
    hook -->|PR_CREATED| conductor

Hooks fire automatically via Claude Code settings.json. For other AI tools (Codex, Copilot, Cursor), call the CLI directly.

Devcontainers

Humans can work inside the same container image as agents:

Agent on k8s:  rig-agent-runtime:dotnet → Conductor-E
Human locally: rig-agent-runtime:dotnet (devcontainer) → Conductor-E

Each repo has .devcontainer/devcontainer.json pointing to the right stack image.

Infrastructure

GCP k3s Cluster

Resource Detail
VM invotek-k3se2-standard-2 (2 vCPU, 8GB)
Region europe-north1-b
K8s k3s v1.34, single node
GitOps FluxCD — watches dashecorp/rig-gitops
Images GCP Artifact Registry: europe-north1-docker.pkg.dev/invotek-github-infra/dashecorp/
Tunnel Cloudflare dashecorp-gcpconductor-e.dashecorp.com, flux.dashecorp.com
Access gcloud compute ssh invotek-k3s --zone europe-north1-b --project invotek-github-infra

Monitoring

System What Where
Flux Discord alerts Reconciliation failures Discord channel
Weave GitOps Visual Flux dashboard https://flux.dashecorp.com
Conductor-E cost dashboard Per-agent token usage https://conductor-e.dashecorp.com
Conductor-E API Agent status, issue state https://conductor-e.dashecorp.com/api/agents

GitOps Flow

graph LR
    dev[Developer] -->|PR| gitops[dashecorp/rig-gitops]
    gitops -->|FluxCD watches| flux[Flux on k3s]
    flux -->|reconcile| cluster[k8s Resources]
    flux -->|error| discord[Discord Alert]

All deployments go through git. No kubectl apply manually.

Repositories

Repo Purpose
dashecorp/conductor-e Event store, assignment engine, API
dashecorp/rig-agent-runtime Shared agent runtime + Helm chart
dashecorp/rig-gitops FluxCD manifests, AGENTS.md, docs, templates
dashecorp/dev-e Dev agent .NET worker (future replacement)
dashecorp/review-e Review agent .NET worker (future replacement)
dashecorp/rig-tools Developer hooks, workflow sync
dashecorp/infra OpenTofu — GCP VM, Cloudflare, GitHub repos

Memory

All agents use a shared rig-memory-mcp server backed by the Marten Postgres (with pgvector). Memory is cross-agent — Dev-E, Dev-E Dotnet, Dev-E Python, and Review-E all read/write to the same store.

Component Detail
MCP server @dashecorp/rig-memory-mcp (pre-installed in rig-agent-runtime image)
Backend PostgreSQL + pgvector extension — same instance as Marten event store
Connection DB_URL env var from {agent}-secrets.database-url
Extension init apps/conductor-e/postgres-pgvector-job.yaml (one-time Job)

Gaps and Limitations

  1. No pre-tool guards — agents can run destructive commands unchecked
  2. No agent identity attribution — git commits use generic names
  3. No escalation routing — stuck agents just post to Discord
  4. No centralized hooks config — each workspace configured independently
  5. Single node — no HA, single point of failure
  6. No inter-agent messaging — all communication routes through Conductor-E or Discord