Dashecorp Rig — Brain¶

Fresh-agent entry point. Read this first. One fetch (~27 KB) gives you the repo manifest, deployed surfaces (including rig-conductor's 13 endpoints and built-in Dashboard), agent instances, primary flows, frontmatter schema, 40+ event types (summary; full schemas at /events.md), 18-whitepaper catalog, and the current backlog with prior_art links. Every claim traces to its source file in facts/.

Compiled from facts/*.yaml + live GitHub state (gh api /orgs/dashecorp/repos for the repo list; manifest validation for agents). Do not hand-edit BRAIN.md. Regenerate with npm run brain. CI runs --check and fails on drift.

What this is¶

The Dashecorp rig is an autonomous coding-agent system. A human posts a user story; agents research, propose, code, review, and ship. Canonical docs live in dashecorp/rig-docs (Astro Starlight); operational memory lives in a Postgres + pgvector Memory MCP; deployments are Flux-managed on a k3s cluster running on a GCE VM (invotek-k3s in invotek-github-infra).

Published surfaces¶

Rig landing — discoverable index of all surfaces¶

URL: https://rig.dashecorp.com/
Type: html

Canonical brain entry point (this file, rendered)¶

URL: https://docs.rig.dashecorp.com/brain/
Raw: https://research.rig.dashecorp.com/BRAIN.md
Type: markdown

Brain map — visual architecture + doc-linkage graph¶

URL: https://research.rig.dashecorp.com/map/
Type: astro-starlight
Note: Two auto-derived diagrams (architecture from facts/, linkage from doc frontmatter). See the shape of what the rig knows before fetching individual pages.

LLM site map (research, proposals, user-stories)¶

URL: https://research.rig.dashecorp.com/llms.txt
Type: llms-txt

Full content dump (single-shot ingestion)¶

URL: https://research.rig.dashecorp.com/llms-full.txt
Type: llms-full-txt

Research, proposals, user-stories (rendered Starlight site)¶

URL: https://research.rig.dashecorp.com/
Type: astro-starlight
Source: dashecorp/rig-docs

Aggregated engineering docs (architecture, guides, whitepapers, per-repo docs)¶

URL: https://docs.rig.dashecorp.com/
Type: mkdocs-material
Source: dashecorp/rig-gitops (docs-site/)
Note: Built by scripts/build-docs.sh in rig-gitops on push + hourly cron. Pulls each rig repo's docs/ via gh api. Different scope from research.rig.dashecorp.com (engineering reference vs. research).

Sitemap (XML)¶

URL: https://research.rig.dashecorp.com/sitemap-index.xml
Type: sitemap-xml

rig-conductor API (cluster-internal)¶

Type: rest-api
Visibility: cluster-internal-only
Endpoints:
POST /api/events — Submit any of the 40+ event types — see /events.md
GET /api/assignments/next — Claim next issue assignment. Query: agentId=dev-e-node
GET /api/pr-reviews/next — Claim direct-PR review (no issue) for infra/tooling PRs
GET /api/pr-reviews/item — Inspect a single PR review item. Query: repo, prNumber
POST /api/pr-reviews/merge — Server-side merge gate for direct PR reviews (rc#1028)
GET /api/issues — List tracked issues. Query: state=open|done|stuck
GET /api/issues/item — Fetch a single issue projection by (repo, issueNumber)
GET /api/issues/trace — Per-issue event trace + state transitions for debugging
GET /api/stuck-issues — List issues in a non-terminal state for too long (stuck-watcher candidate set)
GET /api/queue — Current dispatch queue state
GET /api/usage — Token / cost usage by agent and/or repo. Query: agentId, repo
GET /api/costs/issue — Cost for a specific issue. Query: repo, issueNumber
GET /api/costs/summary — Aggregate cost. Query: days (default 7)
GET /api/costs/daily — Daily cost time series. Query: days
GET /api/events/live — SSE stream of live events (for Dashboard.html)
GET /api/streams/status — Stream consumer status
GET /api/streams/{agentId} — Per-agent stream tail (recent assignment messages). Query: count
GET /api/agents — List registered agents (heartbeat + status). Query: archived=true
DELETE /api/agents/{agentId} — Forcibly archive a specific agent (admin)
DELETE /api/agents/offline — Bulk-archive all agents that are offline (no recent heartbeat)
GET /api/agent-capacity — Per-agent capacity / quota / dispatch eligibility snapshot
POST /api/webhook/github — GitHub webhook intake — normalizes GH events into rig-conductor stream
POST /api/webhook/flux — Flux deploy confirmation webhook (rc#413 in_deploy → deployed)
POST /api/merge — Server-side merge gate
POST /api/execution-logs — Create execution log envelope
POST /api/execution-logs/{id}/logs — Append log entries
POST /api/execution-logs/{id}/steps — Append structured step
POST /api/execution-logs/{id}/complete — Mark log complete
GET /api/execution-logs/{id} — Fetch log by id
GET /api/execution-logs/issue — Logs per issue. Query: repo, issueNumber
GET /api/execution-logs — List logs. Query: limit, status
POST /api/execution-logs/cleanup — Prune old logs
GET /api/repo-learnings — Fetch learnings. Query: repo
POST /api/repo-learnings — Upsert learning
DELETE /api/repo-learnings — Delete learning. Query: repo, key
GET /api/guard-blocked — Guard-block counts per agent. Query: agentId
GET /health — Liveness probe — always 200 if the process is alive
GET /healthz/deep — Deep readiness probe — Marten + Valkey + dependency checks (rc#1188)
GET /api/health — Detailed health snapshot for the dashboard (component-level)
GET /api/version — Build version + git SHA
GET /dashboard — Built-in single-page dashboard (HTML) — Engineering Rig control plane
GET /api/events/stream — Single event-stream tail by stream id. Query: id
GET /api/events/recent — Recent events across all streams. Query: hours
GET /api/main-ci — Main-branch CI status snapshot. Query: repo
GET /api/ci-failures — List CI failures across repos. Query: repo, includeAcked
POST /api/ci-failures/{repo}/{workflowName}/{runId:long}/ack — Ack a CI failure so it stops showing as active
GET /api/main-guard/incidents — Main-guard incidents (rc#1226 + rc#1234). Query: repo, status
GET /api/a11y — Accessibility scan results per repo. Query: repo
GET /api/stuck-watch — Live stuck-watch snapshot (proxies upstream cluster check)
GET /api/stuck-patterns — List active stuck patterns. Query: includeResolved=true for all
POST /api/stuck-patterns/{fingerprint}/resolve — Mark a stuck pattern as resolved (writes memory)
GET /api/stuck-patterns/brain-section — Generate the ## Known stuck patterns markdown for BRAIN.md
GET /api/agent-logs — List recent agent log entries across all agents. Query: count
GET /api/agent-logs/{agentId} — Tail recent log entries for one agent. Query: count
POST /api/agent-logs — Append a batch of log entries from an agent (push from pod)
GET /api/agent-logs/live — SSE stream of live agent log entries
GET /api/self-improvement/signatures — Watcher signature states (rc#947): occurrences, OpenIssue, clean-tick counter
POST /api/admin/issues/force-done — Operator force-close an issue's read-model state to Done (admin)
POST /api/admin/overrides — Record an operator override event (audit trail)
GET /api/admin/overrides — List recent operator overrides for audit
POST /api/planner/trigger — Dispatch a planner task (planner agent stream)
Note: The conductor's in-cluster API endpoint. Reachable only from inside the cluster — exact host/port intentionally not surfaced publicly.

rig-conductor Dashboard (the built-in cost/activity UI)¶

Type: html-dashboard
Source: dashecorp/rig-conductor (src/ConductorE.Api/Dashboard.html)
Visibility: cluster-internal-only
Note: 42 KB single-page HTML dashboard — "Engineering Rig — Control Plane". Has Costs, Issues, Agents, Streams tabs. Driven by /api/costs/, /api/usage, /api/issues, /api/streams/ endpoints. No separate Grafana/Starlight dashboard is needed — this one already renders per-agent / per-issue / per-day cost.

Memory MCP (Postgres + pgvector)¶

Type: mcp-server
Package: @dashecorp/rig-memory-mcp
Tools:
read_memories — Query prior memory by topic/repo/scope with vector similarity
write_memory — Persist a new memory with scope/kind/importance/tags
mark_used — Increment hit_count on a memory that informed a decision

Discord agent channels (notifications)¶

Type: discord
Channels: #dev-e, #review-e, #ibuild-e, #admin
Note: Agents post thread updates here; humans watch for stuck / pending state.

Repos¶

Live from gh api /orgs/dashecorp/repos merged with facts/repos.yaml annotations. Archived repos are dropped automatically.

Repo	Purpose	Language	Depends on	AGENTS.md
`rig-gitops`	GitOps manifests (Flux HelmReleases, Kustomize bases) and the canonical AGENTS.md shared by every rig repo via `@dashecorp/rig-gitops/AGENTS	shell	—	compiled
`rig-agent-runtime`	The AI agent runtime (Node) — one image that deploys as Dev-E, Review-E, or iBuild-E depending on character file + environment. Handles prom	javascript	rig-memory-mcp, rig-conductor	imports-rig-gitops
`rig-memory-mcp`	MCP server backing persistent agent memory with Postgres + pgvector. Exposes `read_memories` / `write_memory` / `mark_used` tools consumed b	javascript	postgres-pgvector	claude-md
`rig-conductor`	Event store + dispatch service (C# + Marten + Postgres). Receives PR/issue events, assigns work, tracks turns/cost/stuck state, serves the `	csharp	postgres, pgvector	imports-rig-gitops
`rig-manager`	Agent runtime, caged executor, verb schemas, and the reproducible container image build. The build produces a deterministic SHA-256 digest t	dockerfile	—	none
`rig-docs`	Research, proposals, user-stories, and rig-wide reference (Astro Starlight). This repo — you're reading its BRAIN.md. Deploys to research.ri	astro	—	hand
`rig-tools`	Shell scripts, Git hooks, and workflow sync for AI-assisted development. Developer tooling, not deployed. The one repo without an AGENTS.md	shell	—	none
`infra`	OpenTofu/Terraform for GitHub org settings, Cloudflare (DNS, Pages, tunnels), GCP (k3s cluster on a GCE VM (invotek-k3s) hosting the rig), a	hcl	—	imports-rig-gitops

Per-repo doc index (token-efficient discovery)¶

Before cloning a repo to find docs, consult this list to decide which docs are relevant to your issue. Then fetch raw markdown for only the relevant ones:

gh api repos/dashecorp/<repo>/contents/docs/<file>.md --header 'Accept: application/vnd.github.raw'

Auto-derived per compile via gh api /repos/<r>/contents/docs. Repos without a docs/ dir are omitted.

rig-gitops — architecture-current.md, architecture-proposed-v2.md, architecture-proposed.md, documentation-standard.md, onboarding.md, research-multi-agent-platforms.md, review-e-bootstrap.md, sops.md
rig-agent-runtime — architecture.md, configuration.md, dashboard.md, deployment.md, discord-setup.md, heartbeat.md, index.md, memory.md, messaging.md, observability.md, quickstart.md, usage-tracking.md
rig-memory-mcp — api.md
rig-conductor — api.md, architecture.md, deployment.md, event-store.md, index.md, principles.md
rig-manager — container.md, reproducible-build.md
rig-tools — agent-workflow.md

Agents (deployment instances)¶

Dev-E — writes code¶

Runtime: dashecorp/rig-agent-runtime
Deployed in: k3s cluster on GCE VM (invotek-k3s, invotek-github-infra)
Manifest: dashecorp/rig-gitops/apps/dev-e/
Variants:
node: apps/dev-e/rig-agent-helmrelease.yaml
python: apps/dev-e/python-helmrelease.yaml
dotnet: apps/dev-e/dotnet-helmrelease.yaml
Character: baked into HelmRelease values
Triggers: signal:dev-e-node/-python/-dotnet LIST + assignments:dev-e STREAM
Notes: Stream-consumed via Valkey, NOT REST polling — there is no issue.assigned poll loop. Each variant's KEDA ScaledObject (apps/dev-e/scaledobject.yaml = node, dev-e-python-scaledobject.yaml, dev-e-dotnet-scaledobject.yaml) watches its signal: Redis LIST at valkey-primary.rig-conductor.svc.cluster.local:6379 to scale 0→1; the agent then drains its assignment from the assignments:dev-e Redis STREAM.

Review-E — reviews PRs¶

Runtime: dashecorp/rig-agent-runtime
Deployed in: k3s cluster on GCE VM (invotek-k3s, invotek-github-infra)
Manifest: dashecorp/rig-gitops/apps/review-e/rig-agent-helmrelease.yaml
Triggers: signal:review-e LIST + assignments:review-e STREAM
Discord: #review-e
Notes: DISPATCH — stream-consumed via Valkey + KEDA, NOT cron/REST polling. The ScaledObject watches the signal:review-e Redis LIST at valkey-primary.rig-conductor.svc.cluster.local:6379 to scale 0→1; the work payload is drained from the assignments:review-e STREAM. The cron "/5 * * * " + agent-bot search_filter (author:app/dev-e-bot author:app/ibuild-e-bot -reviewed-by:app/review-e-bot) are preserved DEAD CODE in the helmrelease (cron.enabled: false) — verbatim: "this cron prompt is no longer invoked. Review-E is stream-consumed via assignments:review-e (KEDA-scaled) ... unreachable in the running pod." ROUTING — review is OPT-IN for non-agent authors. Agent-bot PRs (dev-e-bot / ibuild-e-bot / dependabot) auto-route. Human/operator-authored PRs do NOT auto-route; to request Review-E on an operator PR, apply the needs-review label (the working opt-in since 2026-06-12 — rig-conductor docs/2026-06-12-operator-review-opt-in-label.md + ReviewRoutingPolicy.cs). The legacy opt-in (requesting the review-e-dashecorp reviewer) is RETIRED: that machine user no longer exists, so GitHub returns 422; the App is now review-e-bot.

iBuild-E — macOS / iOS builds¶

Runtime: dashecorp/rig-agent-runtime
Deployed in: Mac Mini (Oslo, on the operator's Tailnet)
Manifest: not-in-cluster
Discord: #ibuild-e
Notes: Apple Silicon host, Xcode + App Store Connect. Auto-reauth cron refreshes OAuth every 5 min. Separate from the GCE-hosted agents because iOS builds require macOS.

Planner-E — plans sprints, manages backlog, assigns issues to agents¶

Runtime: dashecorp/rig-agent-runtime
Deployed in: k3s cluster on GCE VM (invotek-k3s, invotek-github-infra)
Manifest: dashecorp/rig-gitops/apps/rig-planner/
Triggers: signal:rig-planner LIST + assignments:rig-planner STREAM
Discord: #planner
Notes: GitHub App rig-planner-bot (App ID 3546083) handles GitHub issue intake. KEDA scales 0→1 on signal:rig-planner (Redis LIST); also reads assignments:rig-planner (Redis STREAM). Provider: claude-cli + claude-sonnet-4-6. Persona reference: /whitepaper/planner/.

Primary flows¶

PR lifecycle in dashecorp (orchestrator-owned — DO NOT copy legacy personal-org workflow files)¶

Trigger: Any PR opened in a dashecorp-org repo (dashe-, rig-, infra, etc.)

GitHub — Fires webhook to POST rig-conductor /api/webhook/github
rig-conductor — Normalizes the PR event, enforces gates (issue-link rule, labels), assigns review
Review-E — Consumes its review assignment from the assignments:review-e Valkey stream (woken by KEDA scaling 0→1 on the signal:review-e list), reviews, posts approval or CHANGES_REQUESTED
rig-conductor — On approval + green CI + no unresolved threads + no manual-merge label, calls POST /api/merge to merge server-side

Rules: - Do NOT copy the operator's per-repo .github/workflows/request-review.yml or auto-merge.yml from legacy personal-org repos into dashecorp repos. Those files are the legacy pattern from before rig-conductor. The conductor endpoints above own this lifecycle for dashecorp. If a dashecorp repo isn't getting reviewed or merged, the fix is configure the GitHub webhook, not add a workflow file. - The operator's personal-org repos still use the per-repo workflow pattern because they predate rig-conductor's scope. That pattern stays until those repos are archived post-migration. Complete when: conductor emits PR_MERGED event and downstream consumers (CF Pages, iBuild-E, etc.) react

Epic to merged work¶

Trigger: Human opens a user-story GitHub issue in dashecorp/rig-docs

rig-conductor — Scans open issues, classifies, dispatches to appropriate agent
Dev-E — Reads issue + relevant research; authors research / proposal / code PR
Review-E (stream-consumed via assignments:review-e, KEDA-scaled) — Finds PR, reviews against AGENTS.md + memory, requests changes or approves
Human — Merges (or Review-E's approval satisfies branch protection; auto-merge fires)
Cloudflare Pages — Redeploys research.rig.dashecorp.com and docs.rig.dashecorp.com Complete when: issue closed via `Closes

rig-conductor self-deploy (post-merge image rollout via Flux)¶

Trigger: A PR merges to dashecorp/rig-conductor main (the PR_MERGED step in the PR lifecycle flow)

GitHub Actions (.github/workflows/publish-image.yml — job publish) — Builds the container and pushes :latest + :sha-<commit> to GHCR and Artifact Registry (europe-north1-docker.pkg.dev/invotek-github-infra/dashecorp/rig-conductor); 3-attempt retry on transient registry / WIF / BuildKit errors
GitHub Actions (job update-gitops, needs publish) — sed-bumps the image tag in deploy/k8s/deployment.yaml to :sha-<commit>, opens a chore: pin rig-conductor image to sha-<sha> PR, validates the diff is exactly the one-line image bump, then admin-merges it
Flux source-controller (GitRepository rig-conductor) — Polls dashecorp/rig-conductor main every 5m and picks up the new deployment.yaml
Flux kustomize-controller (Kustomization rig-conductor-api) — Reconciles every 10m from path: deploy (prune true); the new image (imagePullPolicy Always) rolls the pod

Rules: - paths-ignore: deploy/k8s/deployment.yaml on the workflow stops the pin commit from re-triggering Publish Image (loop guard). The publish build+push is the real success signal — a pin PR that fails because a newer pin already landed is benign supersession (rc#1532), not a failure. - Worst-case propagation is ~15m (5m GitRepository interval + 10m Kustomization interval), not 5m. - Verify a deploy from inside the cluster: kubectl -n rig-conductor exec deploy/rig-conductor-api -- curl -s localhost:8080/api/version (returns the git SHA) and the same for /healthz/deep (200 = Marten + Valkey + deps healthy). Both endpoints are cluster-internal — already in surfaces.yaml. Complete when: the rig-conductor-api pod runs the new sha- image and /healthz/deep passes

Research and proposal authoring¶

Trigger: An Epic needs investigation before implementation

author dated research/YYYY-MM-DD-slug.md with user_story frontmatter
author proposals/YYYY-MM-DD-slug.md with source_research frontmatter
user_story file gets research_docs and proposal fields pointing back
RelatedDocs component auto-renders the graph; no manual cross-linking

Rules: - bidirectional links required - schema enforced in src/content.config.ts - CI rejects PRs missing required fields

Cold-start agent session¶

Trigger: Fresh agent with blank memory receives an Epic or task

WebFetch https://research.rig.dashecorp.com/brain/ (or raw BRAIN.md)
Parse facts/repos.yaml equivalent in BRAIN.md — learn repo manifest
Parse facts/surfaces.yaml equivalent — learn URLs and endpoints
WebFetch https://research.rig.dashecorp.com/llms.txt for topic index
WebFetch relevant research/proposal docs directly via raw URL
For the target repo, fetch its AGENTS.md (compiled or imports-rig-gitops)
read_memories scoped to repo + topic via Memory MCP
Begin work with full context in ~15 KB total Token budget: ~15 KB read, leaves 200K+ for actual work on Opus

Docs-memory promotion (weekly Lint)¶

Trigger: Weekly scheduled Lint job

Scan Memory MCP for rows with importance >= 4 AND hit_count >= 5
For each candidate, check if docs already cover the topic (BM25 sim)
If not covered, propose a docs PR with the memory content promoted
Human approves PR, merge triggers redeploy Status: not-yet-built (design in research/2026-04-18-docs-memory-drift-lint)

Diagram-as-code authoring¶

Trigger: A research / proposal / user-story needs a diagram Rule: Mermaid source inline in fenced code block. No PNG or SVG ever committed. Rendering: remark-mermaid plugin wraps in <figure> with <pre class=mermaid> and <details> source; mermaid.js renders client-side; source preserved post-render for agent readers.

Frontmatter schema (for authoring rig-docs content)¶

type (optional): one of research | proposal | decision | postmortem | reference | user-story | runbook
audience (optional): one of human | agent | both — not a free-form array
Required: title, description
Optional linkage fields (paths are relative to src/content/docs/, no leading slash, no .md or .mdx extension):
type — See type enum above.
subtype — See subtype enum above (whitepapers only).
audience — See audience enum above.
created — ISO date string YYYY-MM-DD.
updated — ISO date string YYYY-MM-DD.
topic — Short slug grouping related docs.
source_refs — Array of URLs (external sources supporting this doc).
supersedes — Path to doc this replaces (no leading slash, no .md extension).
superseded_by — Path to newer doc that replaces this (same format).
user_story — (research/proposal only) Path to the user story this supports.
research_docs — (user-story only) Array of research doc paths this story spawned.
proposal — (user-story only) Path to the proposal answering this story.
source_research — (proposal only) Array of research paths this proposal synthesises.
github_issue — (user-story only) Full GitHub issue URL. Omit the field entirely if there is no issue — do NOT use empty string.
whitepaper — (user-story, optional) Slug of the single whitepaper this story primarily supports. Matches the whitepaper filename without extension (e.g. "safety", "memory", "observability"). Used by the Starlight sidebar to roll up story counts next to each whitepaper link at build time.
whitepapers — (user-story, optional) List form of whitepaper: — use when a story supports more than one whitepaper (e.g. a domain paper AND a synthesis paper). Format whitepapers: [a, b] or a block list. A story tagged for multiple papers counts on each paper's sidebar badge and appears in each paper's page-level Related list. Accepts both inline and block-list YAML. Mutually exclusive in spirit with whitepaper:, but supplying both is tolerated (values are merged and deduped).

Path examples: user-stories/2026-04-18-docs-memory-strategy, research/2026-04-18-docs-tools-evaluation, proposals/2026-04-18-docs-tooling-decision, decisions/2026-04-18-docs-tooling-decision.

Omit a field entirely when it has no value — do not use empty string.

Whitepapers (private — catalog only)¶

These whitepapers live at dashecorp/rig-gitops/docs/whitepaper/*.md (private repo — requires gh auth to fetch). BRAIN.md surfaces their titles + 1-line summaries so agents know what exists. Full content must be fetched with: gh api /repos/dashecorp/rig-gitops/contents/docs/whitepaper/<file> --jq .download_url | xargs curl -sL.

Whitepaper index (index.md) — Entry point listing all whitepaper sections and their companion docs.
MVP scope (mvp-scope.md) — What the rig does in the minimum viable release. Gatekeeper for "is this in scope?"
Design principles (principles.md) — First principles (measurement precedes trust; honest gaps; provider portability).
Trust model (trust-model.md) — Who can approve what, which gates exist, human-in-the-loop rules.
Safety (safety.md) — Dangerous-command guards, sandboxing, blast-radius containment.
Security (security.md) — Secrets handling, attestation, audit trail, SOPS+age.
Agent secrets broker (agent-secrets-broker.md) — Capability-based secret lifecycle broker for LLM agents. Agents operate on opaque references; the broker handles plaintext across Bitwarden, GitHub, SOPS, k8s, and Cloudflare — plaintext never enters a prompt, tool argument, or log line. Covers tool surface (mint/store/deploy/rotate/ retire/verify/list/generate_and_deploy), destination ref grammar (gh:, gh-env:, sops:, k8s:, cf-worker:, bw:), policy model with hardware-key override, and append-only audit schema. Complementary to security.md (supply-chain: Sigstore/SLSA/Kyverno); covers the runtime secret-lifecycle layer.
Provider portability (provider-portability.md) — Multi-runtime (Claude Code, Codex CLI, Gemini CLI) via OTel GenAI conventions. Swap runtime without changing backend.
Observability — OTel, Langfuse, Prometheus, SLOs (observability.md) — Self-hosted Langfuse (agent traces) + Grafana Cloud (infra) + local Prometheus (SLO gates) hybrid. Native OTel via CLAUDE_CODE_ENABLE_TELEMETRY=1. OTel Collector runs per-cluster, routes LLM traces to Langfuse, infra to managed. Per implementation-status: OTel Collector "Partial" (deployed for rig-conductor, agents not yet emitting), Langfuse "Planned", cost dashboard "Partial" (TokenUsageProjection exists, no LiteLLM proxy yet).
Cost framework (cost-framework.md) — Budget policy, per-model rate tables, cost attribution strategy. Companion to observability.
Self-healing (self-healing.md) — Automatic recovery loops, StaleHeartbeatService, escalation severity routing.
Memory architecture (memory.md) — Memory MCP scope, importance/hit_count model, promotion-to-docs threshold design.
Quality and evaluation (quality-and-evaluation.md) — How the rig evaluates its own output. Judge-agent pattern, fixed rubrics.
Drift detection (drift-detection.md) — Schema drift, docs drift, infra drift — detection thresholds and response.
Development process (development-process.md) — Issue → Epic → research → proposal → PR lifecycle, agent-human gates.
Example first story (example-first-story.md) — Worked walkthrough of one Epic end-to-end.
Glossary (glossary.md) — Rig-specific terminology (Epic, proposal, rig-conductor, Review-E, etc).
Known limitations (limitations.md) — Honest catalog of what the rig can't do today.
Implementation status (implementation-status.md) — Single source of truth for deployed vs planned per capability. 78 tracked across 11 domains; 21 deployed/partial (27%), 44 planned/deferred (56%). Every capability named in the whitepapers gets a row with status + whitepaper section + ticket/evidence.
Tool choices (ADRs) (tool-choices.md) — Decision records for tooling. Includes rejection list with rationale.

Most agents should start with: the /implementation/ dashboard (structured per-capability status — see summary below) and whichever domain-specific whitepaper matches the Epic.

Capability status (38 in registry · full dashboard)¶

shipped:15 · partial:7 · planned:15 · deferred:0 (registry seed — full migration tracked in rig-docs#124)

Top blockers: default-deny-egress (dashecorp/rig-docs#57)

Multi-tenancy¶

Shared control plane + per-tenant siloed data plane. A single tenant_id is resolved ONCE at a trusted edge (GitHub installation_id for webhooks, Cloudflare Access identity for the dashboard, the conductor-issued session token for agents) and threaded immutably as a Marten event header — never asserted by the LLM, never read from a request body or a tool argument ("the LLM is the threat model, not the guard").

Boundary: The database IS the tenant boundary. Marten master-table tenancy (a rig_control registry) routes each tenant to its own Postgres database (rig_t__evt); read models stay tenant-naive (no WHERE tenant_id) and isolation is enforced at the per-request connection, not a row filter.

Active tenants: invotek (type: B2B) — Tenant-0 / historical lineage. Runs on its legacy database (renamed to rig_t_invotek_evt at the PR-5 cutover). Post-handover (rig-docs#324) githubOrg=null — invotek is verifiably NOT a GitHub org or user (both /orgs/invotek and /users/invotek return 404) so it owns no GitHub namespace. It retains pinned ownership of the 6 rig-internal repos (rig-conductor, rig-agent-runtime, rig-gitops, rig-docs, rig-memory-mcp, rig-tools) via explicit RepoTenant rows, plus all pre-cutover dashe- event/memory history (drain-don't-migrate).; dashecorp (type: B2B) — Org-default tenant for the dashecorp GitHub org. First-party / Model B (operator-owned, no external DPA counterparty). Owns the 8 dashe- app / site repos (Cuti-E, Count-E, Drink-E, Fast-E, Heart-E, Nutri-E, Reward-E, dashe-website) — pinned via RepoTenant during the suspended provisioning window, then caught by the org-default after activation (pins remain as belt-and-braces). New data-plane DB rig_t_dashecorp_evt; starts FRESH for dashe-* (pre-cutover history stays in invotek's DB). Activation gated on rig-conductor#1926 / #1476; cutover follows the pin-first / activate-last sequence so the interim window can only fail-to-null/skip, never mis-route.

Target (gated, 4-tenant lock): - dashecorp — type: B2B, model: B (hosted), first-party — org-default host; active per rig-docs#324 (activation gated on rc#1926/#1476). - invotek — type: B2B, model: B (hosted), first-party — tenant-0; legacy rig_conductor DB pre-PR-5 cutover; githubOrg=null post-handover (rig-docs#324). - stigjohnny — type: B2C, model: A (BYO), first-party — consumer-facing persona; B2C activation gate (consent + age-gate + rc#1496 subject-level erasure) blocks Status=active until rc#1496 ships. - run-the-docs — type: B2B, model: A (BYO), first-party — docs/content pipeline; subjects = contributors.

Type axis status: ratified-design / not-yet-enforced-in-code (operator-ratified 2026-06-20, per the durable reference at /reference/tenant-types/ and the named follow-up of rig-docs#314). The type values below are policy metadata only; the code-enforced Tenant.Type field is tracked under dashecorp/rig-conductor#1476.

Human gates: - Review — ReviewRoutingPolicy: a PR routes to Review-E iff authored by a known bot (dev-e-bot/ibuild-e-bot) or dependabot, OR it carries the needs-review operator opt-in label. - Merge — MergeGate (event-driven): merges only on review-approved + CI-passed + not-blocked; agents never self-merge (fallback is a manual operator squash). - Fail-closed write — RequireTenant throws UnattributedTenantException on a blank/invalid tenant, so an unattributed write lands in no tenant's DB; a CI scope-guard blocks new unattributed write paths. Dashboard READS stay lenient (coalesce to invotek). - Onboarding — TenantOnboardingGate refuses activating an external tenant until its data-plane DB + erasure prerequisites exist (invotek exempt; preserves the GDPR no-window invariant). - Schema fence — env-gated (MARTEN_TENANT_SCHEMA_LOCKED): armed only at the PR-5 cutover window, then the boot assert checks EVERY registered tenant's schema and fails closed on drift; default-off and behavior-neutral until armed. - GDPR — dashecorp/rig-conductor#1486 is the launch blocker. The DPA SIGNATURE is EXTERNAL-tenant-only: a first-party (operator/Invotek-owned) tenant — all four targets are — has no external counterparty to sign a DPA with and needs NO operator signature to launch. First-party launches are gated instead on the technical controller duties toward the third-party contributors in those repos — Art.17 erasure (shipped), sub-processor disclosure, EU residency — plus the DPO minimal-retained-field-set opinion (#1486 §3.4). Only an EXTERNAL tenant additionally requires a signed DPA.

Built (merged + deployed): DB-per-tenant (rc#1515), the fail-closed write boundary (rc#1608), tenant resolution (TenantMatch), per-tenant dispatch (rc#1481) + per-installation GitHub tokens (rc#1665), the cross-tenant isolation gate (rc#1617), the onboarding gate (rc#1614), per-tenant Discord notification routing (rc#1643/#1661/#1668), the PR-5 schema-fence mechanism (rc#1685), and pgvector per-tenant memory DB-per-tenant (#1478, rig-memory-mcp#24).

Pending (gated): secrets-broker tenant-prefixed refs (#1479, partner-gated), the PR-5 cutover execution, the dashecorp org-default activation per rig-docs#324 (pin-first / activate-last sequence; gated on rig-conductor#1926/#1476), and the #1486 GDPR pack (launch blocker). invotek runs as the sole active tenant until the dashecorp activation step lands.

Launch blocker: dashecorp/rig-conductor#1486. Canonical operating guide: https://research.rig.dashecorp.com/proposals/multi-tenancy/.

rig-conductor event types (POST /api/events)¶

All events from dashecorp/rig-conductor/src/ConductorE.Core/UseCases/SubmitEvent.cs MapToEvent switch. Names only here — fetch /events.md for full field schemas (no auth required).

Pipeline (issue → PR → merge → deploy): ISSUE_APPROVED, ISSUE_ASSIGNED, ISSUE_UNASSIGNED, WORK_STARTED, BRANCH_CREATED, PR_CREATED, CI_PASSED, CI_FAILED, REVIEW_ASSIGNED, REVIEW_PASSED, REVIEW_DISPUTED, HUMAN_GATE_TRIGGERED, HUMAN_GATE_REMINDER, MERGED, MERGE_GATE_WAITING, MERGE_GATE_MERGED, MERGE_GATE_TIMEOUT, MAIN_CI_STARTED, MAIN_CI_PASSED, MAIN_CI_FAILED, DEPLOYED_STAGING, DEPLOYED_PRODUCTION, SMOKE_PASSED, SMOKE_FAILED, BUILD_FAILED, VERIFIED, ISSUE_DONE, ESCALATED, MILESTONE_COMPLETE, DUPLICATE_PR_CLOSED

Direct PR path (no issue): PR_OPENED, PR_REVIEW_ASSIGNED, PR_REVIEW_APPROVED, PR_REVIEW_REJECTED

Agent lifecycle: AGENT_STARTED, HEARTBEAT, AGENT_STUCK

CLI sessions: CLI_STARTED, CLI_PROGRESS, CLI_COMPLETED

Observability (cost + tooling): TOKEN_USAGE, TOOL_USED

Memory MCP: MEMORY_WRITE, MEMORY_READ, MEMORY_HIT_USED

Known gaps (rig backlog)¶

Cold-start agents should see these so they don't re-discover what's already identified. Each gap links to prior_art — existing stubs, research, or PRs that have already touched it. When a gap is being worked, linked_user_story points to the user story; when closed, the entry is removed from facts/backlog.yaml.

[observability] Cost tracking mostly deployed — LiteLLM proxy + external access are the remaining gaps¶

DO NOT propose "build a cost pipeline" — most is shipped. Shipped: (1) data — TokenUsageProjection + CostProjection consume TOKEN_USAGE + CLI_COMPLETED into Marten/Postgres read models; (2) API — GET /api/usage and /api/costs/{issue,summary,daily} (cluster-internal, query by agent/repo/date — see Published surfaces); (3) dashboard — Dashboard.html (~42 KB SPA) Costs tab at / and /dashboard. Remaining: (a) LiteLLM proxy not deployed (blocks hard budget enforcement / agent kill-switch); (b) /dashboard is cluster-internal only — no external read-only view (port-forward or CF tunnel today); (c) no Discord alert on cost-threshold breach. Rough spend ~$5-15/day fleet-wide.

Prior art: - rig-conductor cost endpoints and Dashboard.html — dashecorp/rig-conductor src/ConductorE.Api/ - TokenUsageProjection + CostProjection source: dashecorp/rig-conductor src/ConductorE.Api/Adapters/MartenProjections.cs - TOKEN_USAGE + CLI_COMPLETED events defined and emitted — see /events.md - Cost framework design: rig-gitops/docs/whitepaper/cost-framework.md (private) - Observability whitepaper: rig-gitops/docs/whitepaper/observability.md (private; summary in facts/whitepapers.yaml) - LiteLLM proxy not yet deployed — blocks hard budget enforcement

Status: mostly-deployed

[observability] OTel collector deployed for rig-conductor only — agents not yet emitting¶

OpenTelemetry Collector is "Partial": deployed for rig-conductor; agent pods (Dev-E, Review-E, iBuild-E) have not yet enabled native OTel via CLAUDE_CODE_ENABLE_TELEMETRY=1. Langfuse (self-hosted) and Grafana Cloud ingest are both "Planned". Full design in the observability whitepaper.

Prior art: - Observability whitepaper: rig-gitops/docs/whitepaper/observability.md (private; summary in facts/whitepapers.yaml) - Implementation status: whitepaper/implementation-status.md marks OTel Collector 'Partial', Langfuse 'Planned' - rig-memory-mcp/events.js FUTURE comment: migrate to OTel GenAI spans - Env var to enable native OTel: CLAUDE_CODE_ENABLE_TELEMETRY=1 + OTEL_EXPORTER_OTLP_ENDPOINT pointed at the in-cluster collector

Status: partial

[docs-memory] Docs-memory drift lint not implemented¶

Weekly LLM-as-judge pass that promotes memory→docs (when importance≥4 AND hit_count≥5), flags stale research, catches orphan docs. Designed but no runtime built.

Prior art: - Full design in research/2026-04-18-docs-memory-drift-lint - Parent user story: user-stories/2026-04-18-docs-memory-strategy - Principles synthesis: research/2026-04-18-docs-vs-memory-principles

Linked user story: user-stories/2026-04-18-docs-memory-strategy

Status: open

[docs-surfaces] Two docs surfaces with overlapping scope¶

docs.rig.dashecorp.com (MkDocs aggregation from rig-gitops/docs-site/) and research.rig.dashecorp.com (Starlight research hub from dashecorp/rig-docs). Both host rig docs; boundaries not formalised. Agents currently learn this empirically. Eventually unify or formalise the split.

Prior art: - MkDocs site built by dashecorp/rig-gitops/scripts/build-docs.sh - Starlight site defined in dashecorp/rig-docs/ (this repo) - Docs tooling decision: decisions/2026-04-18-docs-tooling-decision (picked Starlight for research hub; MkDocs kept for aggregation)

Status: open

[agents] ATL-E retired, no active coordinator agent¶

ATL-E (a legacy personal-org atl-agent repo) was previously deployed as a k3s CronJob on a personal host and handled handoff-stall Discord notifications. As of ~2026-03-26 it is no longer deployed (not present in the operator's personal-org cluster GitOps manifests). The repo still exists but is dormant. If an Epic needs a coordinator/team-lead role, decide whether to redeploy ATL-E or build a replacement.

Prior art: - Dormant personal-org atl-agent repo (last push 2026-03-26) - Operator's personal-org cluster GitOps repo — no atl-agent ArgoCD manifest

Status: open

[cleanup] Plane residue — uninstall GitHub App + archive workspace¶

Plane was retired 2026-04-18 but the makeplane GitHub App is still installed on the dashecorp org, and the Plane workspace at app.plane.so is still alive (token revoked). Manual UI action needed.

Prior art: - Retraction decision: decisions/2026-04-18-docs-tooling-decision (What retires section) - Retirement commit: dashecorp/infra PR #74

Status: open

Architecture at a glance¶

flowchart LR
  H[Human]

  subgraph Code["Code repos"]
    RD[rig-docs]
    RG[rig-gitops]
    RAR[rig-agent-runtime]
    CE_R[rig-conductor]
    RMM_R[rig-memory-mcp]
    RT[rig-tools]
    INF[infra]
  end

  subgraph Deployed["Deployed services + agents"]
    direction TB
    CE[rig-conductor svc]
    RMM[rig-memory-mcp svc]
    DE[Dev-E pod]
    RE[Review-E cron]
    IB[iBuild-E — Mac Mini]
  end

  subgraph Publish["Published surfaces"]
    direction TB
    S1[research.rig.dashecorp.com<br/>Astro Starlight]
    S2[docs.rig.dashecorp.com<br/>MkDocs aggregator]
    CFP[Cloudflare Pages]
  end

  %% Authoring + dispatch
  H -->|user-story issue| RD
  RD -->|dispatch| CE
  CE -->|assign issue| DE
  CE -->|assign PR review| RE
  CE -->|assign iOS build| IB
  DE -->|author PR| RD
  RD -->|PR opens| RE
  RE -->|approve / request changes| RD
  RD -->|merge| CFP
  CFP -->|publish| S1
  RG -->|docs aggregation| S2

  %% MCP + memory
  DE -->|tool use| RMM
  RE -->|tool use| RMM
  IB -->|tool use| RMM
  RMM_R -.implements.-> RMM

  %% Flux GitOps
  RG -->|Flux deploys| CE
  RG -->|Flux deploys| RMM
  RG -->|Flux deploys| DE
  RG -->|Flux deploys| RE

  %% Runtime image used by all agent deployments
  RAR -.image.-> DE
  RAR -.image.-> RE
  RAR -.image.-> IB
  CE_R -.image.-> CE

  %% Per-repo docs/ feeding into the MkDocs aggregator
  RG -.docs/.-> S2
  RAR -.docs/.-> S2
  CE_R -.docs/.-> S2
  RMM_R -.docs/.-> S2
  RT -.docs/.-> S2

  %% Infra — outside the loop but manages everything above
  INF -.OpenTofu.-> CFP

Legend: solid arrows are runtime flows (dispatch, tool calls, deploys). Dashed arrows are source-of relationships — "this repo's image powers that pod" or "this repo's docs/ feeds that site". Every rig repo from facts/repos.yaml is represented.

Conventions (rig-wide)¶

Docs are markdown with YAML frontmatter. Required fields: title, description, type, audience, created/updated, topic. See AGENTS.md in this repo.
Bidirectional linkage. User story ↔ research ↔ proposal → decision via research_docs, proposal, user_story, source_research, supersedes/superseded_by. RelatedDocs component renders the graph.
Diagrams as code. Mermaid source inline in markdown. No PNG or SVG committed. Source preserved post-render via <details> blocks.
Per-repo CLAUDE.md auto-loads when Claude Code starts a session in that repo's cwd (Claude Code reads CLAUDE.md, not AGENTS.md — cross-vendor standard is AGENTS.md but the loader is CLAUDE.md). Same-repo local @AGENTS.md imports work; cross-repo @owner/repo/file does not fetch from GitHub (filesystem-only, max 5 hops).
Rig-wide agent instructions live in TWO places: (1) each running agent's HelmRelease character.personality prompt (authoritative for Dev-E, Review-E in-cluster), (2) each repo's root CLAUDE.md (authoritative for interactive sessions). Both include the BRAIN.md fetch at session start.
Closes #N required in PR bodies. Review-E blocks on this.
Memory MCP scope: operational / ephemeral state only. Durable knowledge goes to rig-docs.
Default to a two-PR split for feature work >500 LOC. large-pr-ok is reserved for migrations, codemods, dependency bumps, and generated code — not feature work that decomposes into policy + adapter. A/B-validated 2026-05-18: same code shipped as a labelled single PR got zero code-level feedback; the disciplined split caught 3 real bugs. Rig-side enforcement in rar#492; full decision tree in research/2026-05-18-pr-size-and-large-pr-ok-semantics.
Behavior PRs ship their doc updates in the same PR. Per-file convention: when src/<X>.{cs,js,ts,go,py,...} changes, docs/<X>.md (if it exists) updates alongside. Rig-side enforcement in rar#497 (detectDocMismatches surfaces a warning in the size-gate review body).
Three-layer drift-prevention playbook. When the operator catches the orchestrator drifting on a discipline recurringly + structurally observable + measurable cost: ship L1 memory rule + L2 rig-side enforcement at the trigger point + L3 durable artifact. Three instances codified the week of 2026-05-18 (PR-split shortcut, doc-staleness, main-guard rig-internal dispatch). Meta-playbook in research/2026-05-18-three-layer-drift-prevention-playbook.

Token-efficient cold start¶

When you pick up a new Epic with blank memory, the cheapest order of operations:

Fetch this file (https://research.rig.dashecorp.com/BRAIN.md, public, no auth) — ~27 KB.
Fetch /llms.txt for the research hub topic index — ~2 KB.
Identify 1-3 relevant research / proposal docs, fetch raw — ~5-15 KB.
Fetch target repo's AGENTS.md (each repo's is ≤8 KB) — ~5 KB.
read_memories from Memory MCP scoped to repo + topic — ~2 KB.

Total cold-start context: ~35-45 KB. Leaves the rest of the budget for actual work.

When this file needs updating¶

Manual fields that live in facts/*.yaml — update when the matching reality changes:

facts/repos.yaml — annotations only (purpose, depends_on, used_by, agents_md, docs_surface). The repo list itself is auto-derived from gh api on every compile. Adding a new annotation, or updating an existing one, happens here.
facts/surfaces.yaml — URLs, API endpoints, MCP tools. Update when an endpoint changes or a new surface is published.
facts/agents.yaml — agent deployment instances. Compile validates each manifest: path exists on GitHub and warns on drift (how ATL-E retirement was caught).
facts/flows.yaml — documented rig processes. Update after retrospectives.
facts/schema.yaml — mirrors the Zod schema in src/content.config.ts. Keep in sync manually when the schema changes.
facts/events.yaml — rig-conductor event types. Keep in sync with MapToEvent in the C# source.
facts/backlog.yaml — known gaps. Add when identified; remove when closed.

Then run npm run brain. CI (build workflow) runs brain:check and fails on drift.