Skip to content

The Trusted Rig — Whitepaper

Read this first

This is a target document, not a description of the rig as it exists today. It describes the engineering rig we would trust to complete architect-level work, ship autonomously within bounded blast radius, and fix its own bugs with near-zero downtime. Closing the gap between today and this target is a roughly six-month roadmap (see below). For what runs in production right now, read architecture-current.md. For the immediate next step, read architecture-proposed-v2.md.

A multi-agent engineering rig only earns trust when it reliably translates intent into shipped production code within bounded blast radius, measures what it does, and fails known rather than guessing. This whitepaper is the architectural specification for that end state.

The whitepaper is the reading-order master. Each concern has a companion document that goes deep. The chain of companions:

  1. principles.md — the design rules every other document is answerable to
  2. trust-model.md — what the rig decides alone, what needs humans, tiered autonomy by blast radius
  3. safety.md — guards, stuck detection, hallucination mitigation, prompt injection defense
  4. security.md — supply chain (Sigstore + SLSA), runtime (Kyverno + Cilium), attestation chain, secrets
  5. observability.md — OpenTelemetry + Langfuse/Phoenix + Prometheus, SLOs, traces, cost attribution
  6. cost-framework.md — LiteLLM proxy, per-agent budgets, rate-limit-aware dispatch, prompt caching
  7. self-healing.md — Flagger canary, SLO-gated rollout, kill switches, DB migrations, reproduction harness
  8. quality-and-evaluation.md — nightly eval harness, SWE-bench Pro, property-based testing, DORA-adapted metrics
  9. drift-detection.md — model drift, prompt drift, code drift, config drift
  10. memory.md — agent memory (pgvector-backed): storage schema, aspirational 4-tier scoping vs. soft-tagging reality, 5 MCP tools, advisor handoff protocol, 11 honest limitations, memory-as-attack-surface, integration points.
  11. implementation-status.mdsingle source of truth for what's deployed vs. planned vs. deferred vs. rejected. 78 capabilities tracked across 11 domains. Updated as PRs land.
  12. mvp-scope.mdthe minimum viable rig: 10 capabilities in Phase 0-3a, ~3-4 pair-mode weeks. Exit criterion: one full issue→merge with safety + cost + stuck detection without destructive autonomy.
  13. tool-choices.md — ADR-style evaluation of every tool picked (license, backing, pricing, lock-in, migration path). Every "we use X" line in the other docs cites this for the why.
  14. provider-portability.md — the rig is vendor-neutral at four layers (coordination, gateway, instrumentation, instructions). Claude + Anthropic is the default runtime; OpenAI, Gemini, Ollama, and any LiteLLM-supported provider work too.
  15. development-process.md — how the rig gets built and operated: three-era bootstrap, team topology, per-tier testing, quality gates, release cadence, emergency process. The operating manual.
  16. example-first-story.md — a worked example of the development process applied to the first user story (dangerous-command guard). TaskSpec, issue decomposition, dependency graph, per-issue specs, rollout sequence, risks. Reusable template.
  17. limitations.md — what the rig cannot do, where humans remain indispensable
  18. glossary.md — vocabulary for both humans and AI readers

Who this is for

Both humans and AI readers. A senior engineer joining the team and an agent invoked to implement the next feature should both be able to read top-to-bottom and know where the shape comes from. To serve both:

  • Concepts are defined before they are used.
  • Every technical decision states the problem, the candidate options, the pick, and the reason.
  • Every section has a "what NOT to do" companion to prevent cargo-culting.
  • External references are concrete (paper links, CVE numbers, specific tools, specific version numbers).
  • Diagrams are mermaid, not prose-hidden.

Executive summary

The ten properties in one breath

Measurable. Bounded blast radius. Reversible before irreversible. Execute, don't trust. Attestable. Progressive autonomy. Humans at semantic boundaries. Trusted control plane, untrusted data plane. Fail closed, fail known. Simple enough to operate.

The rig we would trust has ten properties. They are not independent — each one fails without the others.

graph TB
    classDef foundation fill:#e8f5e9,stroke:#2e7d32,color:#000
    classDef safety fill:#fff3e0,stroke:#e65100,color:#000
    classDef autonomy fill:#e3f2fd,stroke:#1565c0,color:#000
    classDef honesty fill:#fce4ec,stroke:#ad1457,color:#000

    P1[1. Measurable]:::foundation
    P2[2. Bounded blast radius]:::safety
    P3[3. Reversible before irreversible]:::safety
    P4[4. Execute, don't trust]:::foundation
    P5[5. Attestable + replayable]:::foundation
    P6[6. Progressive autonomy]:::autonomy
    P7[7. Humans at semantic boundaries]:::autonomy
    P8[8. Trusted control + untrusted data]:::safety
    P9[9. Fail closed, fail known]:::honesty
    P10[10. Simple enough to operate]:::honesty

    P1 --> P6
    P4 --> P2
    P5 --> P6
    P8 --> P2
    P2 --> P3
    P9 --> P7
    P6 --> P7

These are the ten principles of the trusted rig. principles.md unpacks each one with the engineering consequences that follow. The rest of this whitepaper is an application of these ten rules to concrete subsystems.

The philosophical point in one sentence

The trusted rig is not a collection of features — it is a closed loop where every action is measured, attested, bounded, and observable, and the rig uses those measurements to decide what to do next. Without the loop, adding features compounds risk; with the loop, each feature adds verified capability.

The central claim

A rig earns trust for a given task only when:

  1. The task's blast radius is bounded — code can be rolled back, effects can be reversed, side channels are closed.
  2. The rig has a measured track record on that task class — not just "agents are good at this" but "this rig's agents, with this prompt, with these tools, on this repo, succeed N% of the time."
  3. Every action is attestable — there is a cryptographic chain from the original intent to the deployed artifact, and any step can be replayed.
  4. Failure modes are known and handled — loops detected, stuck states escalated, budget exhaustion stops work, drift is measured.

Trust is a function of all four. Any one missing and the rig must fall back to human-gated execution. The whitepaper defines what "all four present" looks like for each class of work.

Architecture at a glance

graph TB
    subgraph "Intent Layer"
        GH[GitHub Issues + Spec Kit]
        Chat[Discord / Human]
    end

    subgraph "Control Plane — Conductor-E"
        CE[Conductor-E API<br/>Event store<br/>Marten + Postgres]
        AR[Agent Cursor +<br/>Subscription Registry]
        EB[Error Budget<br/>Projection]
        ATT[Attestation<br/>Projection]
    end

    subgraph "Agent Plane — Execution"
        DE[Dev-E<br/>writes code]
        RE[Review-E<br/>reviews PRs]
        HE[Dev-E repair-dispatch mode<br/>triggered by SLO burn]
        AE[Architect-E<br/>shapes interfaces]
        SE[Spec-E<br/>refines intent]
    end

    subgraph "Gate Layer — Safety"
        SG[StuckGuard]
        DG[Dangerous-cmd Guard]
        PG[Prompt-injection Guard<br/>CaMeL separation]
        BG[Budget Gate<br/>LiteLLM proxy]
    end

    subgraph "Delivery — Self-healing"
        CI[CI + SLSA L3]
        KV[Kyverno admission]
        FL[Flagger canary]
        FD[flagd feature flags]
        RB[Auto-rollback]
    end

    subgraph "Observability"
        OT[OpenTelemetry]
        LF[Langfuse — LLM traces]
        PR[Prometheus — SLO metrics]
        GC[Grafana Cloud — logs + traces]
    end

    subgraph "Production"
        K3S[k3s cluster<br/>+ service mesh]
        Apps[User-facing services]
    end

    GH --> SE
    Chat --> SE
    SE --> CE
    CE --> AR
    AR --> DE
    AR --> RE
    AR --> HE
    DE --> SG
    DE --> DG
    DE --> PG
    SG --> CE
    DG --> CE
    PG --> CE
    DE --> BG
    BG --> CE
    DE --> CI
    CI --> KV
    KV --> FL
    FL --> K3S
    K3S --> Apps
    Apps --> PR
    PR --> EB
    EB --> CE
    RB --> FL
    PR --> RB
    FD --> K3S
    CE --> ATT
    OT --> LF
    OT --> GC
    DE --> OT
    RE --> OT
    HE --> OT
    AE --> CE
    CE --> HE

The shape in one sentence: GitHub Issues feed a spec-refinement agent, which commits typed intent to Conductor-E; Conductor-E's cursor-driven registry dispatches to execution agents; agents pass through guard middleware and budget proxies before touching tools; every commit is signed, every image attested, every deploy canaried with SLO-gated promotion; Langfuse + Prometheus measure everything; failures escalate via severity-routed channels; production repair runs the same pipeline in reverse.

Eight subsystems, drawn as a pipeline:

graph LR
    classDef in fill:#e1f5fe,color:#000
    classDef p fill:#fff3e0,color:#000
    classDef o fill:#e8f5e9,color:#000

    A[Intent<br/>Issue / Spec]:::in
    B[Refine<br/>Spec-E]:::p
    C[Plan<br/>Architect-E]:::p
    D[Implement<br/>Dev-E]:::p
    E[Review<br/>Review-E + human]:::p
    F[Attest<br/>Sigstore]:::p
    G[Admit<br/>Kyverno]:::p
    H[Roll<br/>Flagger]:::p
    I[Measure<br/>Prometheus]:::o
    J[Repair<br/>Dev-E repair mode]:::p

    A --> B --> C --> D --> E --> F --> G --> H --> I
    I -.->|SLO breach| J
    J --> D

Subsystem tour

Intent and spec refinement

Raw GitHub Issues are too fuzzy. A Spec-E agent reads each new issue, asks clarifying questions, and produces a typed TaskSpec artifact (acceptance criteria, non-goals, blast-radius class, expected test surface) before any implementation work is dispatched. Inspired by Camel's TaskSpecifyAgent and GPT Pilot's Spec Writer pattern. For multi-PR work the spec is refined via GitHub Spec Kit in a .specify/ directory in the target repo.

Without this step, implementation agents burn tokens reverse-engineering intent. With it, Dev-E gets a deterministic scope and the CI can check acceptance criteria against the delivered PR.

Control plane: Conductor-E with cursor + registry

The existing event-sourced Conductor-E stays the nervous system. Added:

  • Agent Cursor Projection — per-agent (AgentId, LastEventOrdinal, SubscribedEventTypes, ConcurrentSlots, InFlightAssignments). Derived from LangGraph's versions_seen and MetaGPT's _watch + msg_buffer patterns. Makes "exactly-once-per-agent" and "capacity-aware-assignment" queryable instead of inferred.
  • Agent Subscription Registry — YAML-in-git declaring per-agent consumes and produces event types. AutoGen 0.4's declared message-handler contract, adapted. Enables deploy-time topology validation: a produced event with no consumer is a build-break.
  • Error Budget Projection — per-service SLO compliance over a rolling 28-day window. Google SRE's error-budget pattern. Becomes the gate every deploy goes through.
  • Attestation Projection — per-change cryptographic chain (plan → commit → build → image → deploy). Materializes the evidence needed for post-hoc audit or replay.

See trust-model.md for how the registry + cursor + budget interact with autonomy tiers.

Agent plane: five roles

Dev-E writes code. Review-E reviews PRs. Spec-E (new) refines fuzzy intent. Architect-E (new, high bar) shapes interfaces where the semantic decision matters.

A production-incident dispatch ("Repair-E") is a mode of Dev-E, not a fifth agent — same pod class, same model, same tools, different system prompt triggered by an SLO-burn alert rather than an issue assignment. The event-shaped-boundary test (which is my own standard from principles.md) is not cleanly met by a separate Repair-E role, so we don't claim one.

Four roles is the upper bound. GPT Pilot tried six and is archived. Cognition's published "Don't Build Multi-Agents" essay argues against fine-grained intra-task multi-agent — our coarse role separation with explicit event handoffs is the defensible form that Cognition's warning specifically does not target. Humans remain the fifth participant, always on call for blast-radius-raising actions.

All agents run as stateless K8s pods. KEDA scales to zero. Git worktrees per task (Cursor 2026 pattern) replace full per-task clones for fast cold-start.

Gate layer: safety middleware

Every agent call passes through:

  1. StuckGuard — deterministic loop detection at the tool-call layer. Five patterns from OpenHands' StuckDetector, Goose's RepetitionInspector, Sweep's visited_set — three independent codebases converged here, strongest "build this" signal in the multi-agent research (see research-multi-agent-platforms.md).
  2. Dangerous-command Guard — PreToolUse hook, rejects sudo, rm -rf /, git push --force (without --force-with-lease), drop table, kubectl delete namespace, and package-manager installs. No override flag — Gastown's deliberate design, copied wholesale.
  3. Prompt-injection Guard — CaMeL-style separation: a trusted control-plane LLM plans, a quarantined data-plane LLM processes untrusted content without tool access. DeepMind's arXiv:2503.18813 shows this is the only prompt-injection defense with a formal guarantee. The 2025-2026 CVE wave (CVE-2025-54794/54795, CVE-2025-59536/CVE-2026-21852, CVE-2025-68143/68144/68145) makes this non-optional.
  4. Budget Gate — LiteLLM proxy between agent and the LLM provider, enforcing per-agent hourly + daily token budgets and 429 circuit-breaking. Prevents one looping agent from burning a shared plan (default: Anthropic Max) for everyone. LiteLLM also handles automatic fallback to secondary providers on 429/529 — see provider-portability.md.

See safety.md and security.md.

Delivery: supply chain + self-healing

Every image cosign-signed at build time (keyless, Sigstore). Every build emits SLSA v1.0 Provenance (L3 via slsa-github-generator). Every agent commit gitsign-signed. Kyverno's ImageValidatingPolicy rejects any manifest reaching namespace=prod without (a) a valid Fulcio-bound signature from our org's GitHub Actions workflows and (b) a valid SLSA attestation. "Human co-signer for prod" is expressed as a two-attestor policy.

Deploys flow through Flagger (Flux-native, idiomatic for our GitOps): 5% canary → Prometheus SLI analysis → 25% → 50% → 100%. Error-budget-exhausted services cannot promote non-fix changes. flagd + OpenFeature Operator provide kill switches smaller than rollback (~30 seconds vs ~5 minutes). DB migrations use pgroll for enforced expand/contract with never-destructive first-deploy rules.

Production failures trigger Dev-E in repair-dispatch mode. It reads the OTel trace, extracts the offending span's code.function + code.filepath, runs git log -p -S<function> --since="24h", cross-references recent deploys, proposes a forward-fix or revert PR. The same canary pipeline promotes the fix.

See self-healing.md.

Observability: OTel, Langfuse, Prometheus

Claude Code, Codex CLI, and Gemini CLI all support OpenTelemetry natively as of late 2025 (e.g., Claude Code via CLAUDE_CODE_ENABLE_TELEMETRY=1). One OTel Collector forwards spans + metrics + logs from all agent pods to Langfuse (self-hosted, for LLM traces and cost attribution) and Grafana Cloud Free (for traces, logs, long-term metrics). Because every agent runtime emits OpenTelemetry GenAI semantic conventions, swapping runtime or provider doesn't break the dashboards — see provider-portability.md. Local Prometheus stays for Flagger canary analysis (so SLO gates work even if egress is down). Total memory budget: ~1.5 GB added.

LLM-specific signals we track: tokens per agent per task, tool-call latency + error rate, compaction events, 429/529 rate, prompt cache hit rate, session duration.

Production SLO signals: error rate, p99 latency, availability, budget burn rate. Honeycomb-style burn-rate alerts (interpolate the last hour forward) are the alert shape agents consume.

See observability.md.

Cost framework

Per-agent token budgets enforced at the LiteLLM proxy layer. Pre-flight cost prediction using a cheap model (default: Haiku 4.5, or Ollama local; configurable per provider-portability.md) on large tasks. Prompt caching on the long system prompts (10× cheaper reads on Anthropic; OpenAI and Gemini offer their own cache economics). Conductor-E token-bucket on dispatch prevents runaway loops from even reaching the proxy. A shared plan (default: Anthropic Max) across agents with hard per-agent ceilings ensures no single looping agent evicts the others; LiteLLM fallback_models hops to a secondary provider on 429/529.

See cost-framework.md.

Quality and evaluation

Nightly eval harness runs a 30-task subset of SWE-bench Pro (SWE-bench Verified is contaminated as of late 2025 — see quality-and-evaluation.md) plus a 10-task internal-repo golden suite, results piped to Langfuse. GitHub Action fails PRs that regress the golden suite beyond tolerance. Property-based testing via Hypothesis (arXiv:2510.09907 shows LLM-generated property tests find bugs beyond unit-test coverage) runs on every agent-authored change. DORA metrics adapted to agents: PR-merge-rate-without-rework, review-comment-count, rollback-rate, time-to-merge, tokens-per-merged-PR.

Drift

Four drift channels, each measured independently:

  1. Model drift — same model string, silently different behavior. Measured via a 20-prompt canary suite nightly; >30% output hash delta week-over-week pages on-call. InsightFinder data: 91% of production LLMs drift within 90 days, median detection lag 14-18 days without explicit monitoring.
  2. Prompt drift — agent system prompt changes. Versioned in git; regression eval on every change.
  3. Code drift — deployed code vs. main branch. Flux already detects this; surfaced as a Conductor-E event.
  4. Config drift — deployed manifests vs. gitops source. Same Flux pathway.

See drift-detection.md.

Trust model: tiered autonomy by blast radius

Trust is earned per task class. The rig has four tiers:

Tier Example Autonomy Gates
T0 — Non-blast Doc updates, YAML linting, test scaffolding Full — no human CI, Review-E
T1 — Contained Single-repo feature, test-covered refactor Full under canary CI + Review-E + Flagger + SLO gate
T2 — Multi-repo or architect-level Event schema change, new subscription type Plan by agent, human approves interface CI + Review-E + human co-sign
T3 — Irreversible DB migration destructive steps, auth changes, payment code Human drives, agent assists Explicit human approval, two-attestor Kyverno policy

The tier for any change is classified by Spec-E at intake, stored as a TaskSpec.blastRadius field, enforced by Conductor-E's dispatch policy and Kyverno's admission policy.

See trust-model.md for the full tier policy, classification rules, and escalation paths.

Failure modes and mitigations

The rig is designed around named failures, not hoped-for successes. Twelve classes, drawn from the research and from known incidents:

Failure mode Where it shows Mitigation
Agent loops indefinitely on tool calls Runaway cost, no progress StuckGuard (5 patterns) → emit AgentStuck → escalate
Agent fabricates API / package names (slopsquatting) Install step, runtime crash Package allowlist, ephemeral install sandbox, SBOM check
Agent hallucinates tool name or arguments Tool call fails or misfires Schema-validated tool use (Pydantic), deferred-tools pattern to keep active toolset <50
Prompt injection via issue/comment/README Agent exfiltrates secrets, runs attacker commands CaMeL-style trust separation + L7 egress allowlist + tool scoping
Event loss during Conductor-E downtime Heartbeats vanish, silent agent Hook reliability spool, at-least-once delivery
Model provider silent behavior drift (any vendor) Outputs change without version bump Nightly canary suite per provider, output-hash delta alerting
Shared rate limit burned by one agent All agents 429, work stalls on that provider LiteLLM proxy per-agent budgets + circuit breaker + cross-provider fallback
Canary false-positive abort Flaky deploy, human on call unnecessarily Analysis template requires consecutive successes + failureLimit ≥ 2
Canary false-negative promotion Bad code reaches 100% SLO burn-rate alerts, flagd kill switch flips in ~30s
Emergency fast path bypasses canary Global outage (Cloudflare Dec 2025 lesson) No fast path — all mutable surfaces flow through the same gated pipeline
Destructive DB migration Data loss pgroll enforces expand/contract, Kyverno rejects migrations without human-approved attestation
Stuck production incident, no human available Prolonged outage Severity-routed escalation (P2 thread → P1 channel → P0 DM + @mention), stale escalations auto-bump

Every failure in this table has a specific code path, alert, and on-call response documented in self-healing.md and safety.md.

Evaluation: how we know any of this works

Claims without measurement are hope. The trusted rig publishes metrics on itself:

  • Weekly rig-quality dashboard: SWE-bench-Pro 30-task pass rate, internal golden-suite pass rate, PR-merge-rate-without-rework per agent, rollback rate, cost-per-merged-PR, p50/p99 time-to-merge, escalation rate, false-positive escalation rate.
  • Drift dashboard: model-output-hash delta week-over-week, prompt-eval regression count, Flux-detected code/config drift.
  • Production health: standard SRE golden signals (rate, errors, duration) per service, error-budget burn across services, mean-time-to-resolve for auto-fixed incidents.
  • Cost dashboard: token cost per agent per repo, prompt cache hit rate, budget-gate rejection count.

Every number on every dashboard comes from Langfuse, Grafana, or Conductor-E projections. No "we feel like it's working" allowed.

Limitations

The trusted rig is not a silver bullet. It does not:

  • Replace human judgment on ambiguous value trade-offs. "Is this the right feature to build" remains human.
  • Replace humans for T3 actions (DB destructive steps, auth, payments, anything truly irreversible).
  • Write security-critical code unsupervised (review + attestation + human co-sign required).
  • Handle novel categories of work reliably on the first try — track record is earned per task class.
  • Recover from loss of foundational infra (Conductor-E Postgres corruption, full Flux outage) without human intervention.
  • Guarantee against new classes of prompt-injection attack. CaMeL formally secures against today's class; tomorrow's novel variants need new guards.
  • Scale linearly with added agents — Anthropic's "Building Effective Agents" warning (applicable regardless of which provider backs the agent) stands: most multi-agent setups are slower than single-agent-with-good-tools. Our five-role shape is the ceiling.

limitations.md enumerates these and more with honest reasons.

Roadmap: from today to the trusted rig

gantt
    title Phases to the Trusted Rig
    dateFormat YYYY-MM-DD

    section Phase 0 — Safety floor
    Dangerous-command guard       :p01, 2026-04-17, 1d
    Agent identity in git         :p02, 2026-04-17, 1d
    Egress NetworkPolicy          :p03, 2026-04-17, 2d
    Git worktrees per task        :p04, 2026-04-18, 2d

    section Phase 1 — Reliability floor
    Hook reliability spool        :p11, after p04, 3d
    StuckGuard middleware         :p12, after p04, 3d
    Human Prime SessionStart      :p13, after p04, 2d

    section Phase 2 — Measurement
    OTel + Langfuse self-hosted   :p21, after p11, 5d
    Nightly eval harness          :p22, after p21, 5d
    LiteLLM budget proxy          :p23, after p21, 4d

    section Phase 3 — Coordination
    Per-consumer cursor           :p31, after p22, 7d
    Subscription registry         :p32, after p31, 3d
    Bounded-loop sentinel         :p33, after p31, 3d

    section Phase 4 — Supply chain
    Sigstore image signing        :p41, after p11, 3d
    SLSA L3 provenance            :p42, after p41, 2d
    Kyverno admission             :p43, after p42, 4d
    Gitsign agent commits         :p44, after p43, 3d

    section Phase 5 — Self-healing
    Flagger canary                :p51, after p23, 5d
    flagd feature flags           :p52, after p51, 3d
    pgroll DB migrations          :p53, after p51, 4d
    Repair-E + reproduction       :p54, after p53, 14d
    Error-budget projection       :p55, after p51, 3d

    section Phase 6 — Defense in depth
    CaMeL trust separation        :p61, after p44, 14d
    Escalation routing            :p62, after p33, 5d
    Drift canary suite            :p63, after p22, 3d

    section Phase 7 — New agent roles
    Spec-E intake refinement      :p71, after p22, 7d
    Architect-E interface gate    :p72, after p32, 14d
    Repair-dispatch integration   :p73, after p54, 5d

Phase 0 must ship before Phase 1+

The dependency ordering is load-bearing: if safety guards and identity attribution are not in place first, every later phase raises the blast radius of a bug in that phase. Do not skip Phase 0.

Phases are dependency tiers, not calendar weeks. Phase 0 is ~1-3 weeks of focused pair-mode work (bootstrap the floor). Phases 1-3 each are ~2-4 weeks. Phase 4 is ~6-8 weeks. Phase 5 is honestly 3-6 months, possibly longer — stage 3 autonomous logic-bug repair with a reproduction harness is at the public frontier of what anyone ships, not a routine engineering project. Total nominal effort to a full trusted rig: ~9-12 months by a small team, assuming parallelism where the graph allows. The original "6 months" estimate was aggressive; this is the honest number.

The ordering is load-bearing: safety floor first (Phase 0) so that bugs in later phases don't take down production; measurement next (Phase 2) so phases 3+ can be evaluated rather than hoped about; supply chain parallel to coordination (Phases 3 + 4) because they don't interact; self-healing and defense-in-depth (Phases 5 + 6) only on top of measurement + supply chain; new agent roles last (Phase 7) because roles are cheap to add and expensive to retire.

Each phase has explicit exit criteria: measurable outcomes that confirm the phase is done. See self-healing.md and quality-and-evaluation.md for the phase-level exit criteria.

Why these choices and not others

The trusted rig is opinionated. A quick log of the picks and their opposites:

  • Flagger over Argo Rollouts — Flux-native, YAML CRDs instead of Argo Rollouts' Rollout replacement. Argo is better if you run ArgoCD; we run Flux.
  • Langfuse self-hosted over SaaS (LangSmith, Braintrust) — data stays in-house, matches our SOPS + Postgres pattern.
  • Grafana Cloud Free over self-hosted LGTM — self-hosted on 8GB VM memory-starves the rig. Hybrid keeps Prometheus local for Flagger, everything else managed.
  • Kyverno over OPA Gatekeeper — YAML CRDs, native Sigstore verification, operational cost lower for a 2-person team. Gatekeeper is better at general-purpose policy; we don't need that breadth.
  • Sigstore keyless over HSM-backed keys — no long-lived secrets to rotate. For our threat model (internal code on internal infra), keyless is strictly better.
  • Cilium L7 over Istio — L7 DNS + HTTP allowlists cover 80% of the egress-control value; Istio's mesh is overkill until we need mTLS to external services.
  • GitHub Issues + Spec Kit over Backlog.md or Beads + Dolt — switching costs real (ATL-E, Review-E, dashboards, webhooks all wired to Issues); add specs on top via Spec Kit's .specify/ layout, not underneath via a new store.
  • pgroll over hand-written migrations — automates expand/contract safely; human-written "I'll be careful" is how destructive migrations ship.
  • flagd + OpenFeature over GrowthBook or Unleash — CNCF spec alignment, operator-injected sidecar pattern, YAML flag definitions in the Flux repo.
  • LiteLLM proxy over direct API calls — per-agent budget enforcement at the request layer, not trust-based in-agent.
  • CaMeL separation over "better prompting" — only the former has a formal guarantee against prompt injection.

Every pick is justified in the companion document for the relevant concern, with links to opposing implementations and the reason we rejected them.

Reading order for AI agents

If an AI agent is reading this to implement a task:

  1. Check TaskSpec.blastRadius to determine the tier (T0/T1/T2/T3).
  2. Read trust-model.md for what you are and aren't allowed to decide alone.
  3. Read safety.md for the guards your tools will pass through.
  4. Read cost-framework.md for budget awareness.
  5. Read the specific companion doc for the area being changed (security, observability, self-healing, etc.).
  6. Check limitations.md to ensure the task is in scope for agent work.
  7. If the task touches an irreversible surface (DB migration, auth, payments), stop and escalate.

Reading order for human engineers: same as above, but also read principles.md first for the underlying design rules.

Reading order for humans onboarding

Read architecture-current.md first for what the rig is today, then this whitepaper for the target, then architecture-proposed-v2.md for the near-term step between the two. onboarding.md covers devcontainer setup.

The philosophical point

The trusted rig is not a collection of features. It is a closed loop — every action the rig takes is measured, attested, bounded, and observable, and the rig uses those measurements to decide what to do next. Without the loop, adding features compounds risk; with the loop, each feature adds verified capability.

Gastown gets most of this right for fully-autonomous direct-to-main work. Cursor and Devin get most of this right for scale. Anthropic's internal tooling gets most of this right for reliability. No single system gets all of it. The claim of this whitepaper is that a small team can get all of it by composing open-source pieces — Flagger, Langfuse, Sigstore, Kyverno, Cilium, flagd, pgroll, LiteLLM, OpenTelemetry — into the single closed loop described above.

That composition, not any single tool, is the contribution.

What happens next

This whitepaper is documentation. It is not a PR. The phases in the roadmap turn into tracking issues, then into PRs, then into working software. The first PR is Phase 0 (dangerous-command guard + identity + egress + worktrees); the last PR is the Phase 7 repair-dispatch integration. Between them is ~9-12 months of focused work by a small team.

Open questions the whitepaper asserts without fully resolving

Honest disclosure. Things presented with more confidence than the evidence supports:

  • Is Conductor-E over-engineered for our scale? It's already built, so it's not going away. But a 3-agent rig arguably doesn't need event sourcing + Marten + projections. A 500-line Python dispatcher polling GitHub Issues + SQLite for track records might cover it. The whitepaper assumes Conductor-E stays load-bearing; that assumption hasn't been tested against a simpler alternative.
  • Does CaMeL scale operationally for a small team? The paper shows formal security guarantees in lab conditions. Whether a 1-2 person team can maintain the privileged/quarantined split across many tool surfaces without drift is unproven. We adopt it because the alternative (hope-based prompt-injection defense) is worse, not because we have evidence the operational cost is sustainable.
  • Does the weekly 30-minute review actually work? I called it the most load-bearing ritual. I have zero production evidence. It's an educated guess that sounds right. Might need to be 15 minutes, might need to be 2 hours, might not be weekly. Calibrate once it's been running for 60 days.
  • Can Dev-E in repair-dispatch mode reliably diagnose production incidents? The whole Stage-2-to-Stage-3 story rests on this. No production system publicly demonstrates reliable AI-driven incident diagnosis for logic bugs. We treat it as a design target, not a proven capability.
  • Is the evaluation harness cost-sustainable? SWE-bench Pro at 30 tasks × nightly could be $50-150 per run at today's Sonnet 4.6 pricing (comparable numbers on GPT-5.2 / Gemini 3.1 Pro — see provider-portability.md for the cross-vendor view), not the $20-40 originally quoted. Times 365 nights = $18-55k/year on eval alone. Might need to drop to weekly, change the task count, or route eval to a cheaper-per-token provider.

Each of these is a candidate for an ADR update after we have operating data. They are explicitly not decided.

The gap between where we are and the trusted rig is what this document makes legible. Closing that gap is the project.


This whitepaper is maintained in dashecorp/rig-gitops/docs/whitepaper/. Published at https://rig-docs.pages.dev/whitepaper/. The companion documents referenced above live next to this file.