Architecture¶
How the rig-agent-runtime runtime turns a character.json into a running Discord agent.
Runtime vs Agent vs Tool APIs¶
rig-agent-runtime has three distinct layers. Understanding this separation is key to deploying your own agents.
graph TB
subgraph "Layer 1: Runtime (shared)"
RT["rig-agent-runtime<br/>ghcr.io/stig-johnny/rig-agent-runtime<br/><i>Generic engine — same image for every agent</i>"]
end
subgraph "Layer 2: Agent Config (per agent)"
CE["character.json + values.yaml<br/><i>Personality, tools, Discord channel, model</i>"]
CB["character.json + values.yaml<br/><i>Different personality, tools, channel, model</i>"]
end
subgraph "Layer 3: Tool APIs (per agent, optional)"
AE["Example-E API<br/>ghcr.io/stig-johnny/example-e-api<br/><i>Quotes + facts endpoints</i>"]
AB["Your Backend API<br/><i>Any HTTP service in any language</i>"]
end
RT --- CE
RT --- CB
CE -.->|"HTTP tool calls"| AE
CB -.->|"HTTP tool calls"| AB
Layer 1: Runtime (this repo)¶
The rig-agent-runtime runtime is a generic engine. It handles Discord connectivity, the Claude agent loop, memory, and the dashboard. It reads a character.json at startup to know who it is and what it can do. The runtime image (ghcr.io/stig-johnny/rig-agent-runtime) is shared by all agents — you never need to rebuild it for a new agent.
Layer 2: Agent Configuration (your config)¶
Each agent is defined by a character.json file and a values.yaml for Helm. This is where you set the agent's personality, which Discord channels it listens on, which tools it can call, and which Claude model to use. Deploying a new agent means creating a new Helm release of the same chart with different values — no code changes required.
Layer 3: Tool APIs (your backend, optional)¶
Tool APIs are separate services that the agent calls via HTTP. They are not part of the runtime — they are independent applications you build and deploy yourself. An agent with no tools still works (it just has conversations without calling APIs). When you define tools in character.json, the runtime converts them into Claude tool definitions, and Claude decides when to call them.
Example: Three agents on one cluster¶
Namespace: example-e Namespace: atl-e
┌────────────────────────┐ ┌────────────────────────┐
│ Example-E pod │ │ ATL-E pod (Deployment) │
│ image: rig-agent-runtime │ │ image: rig-agent-runtime │
│ config: example-e char │ │ config: atl-e char │
│ channel: #example-e │ │ channel: #admin │
│ mode: single │ │ mode: split + cron │
└──────────┬─────────────┘ └──────────┬─────────────┘
│ HTTP │ MCP (stdio)
┌──────────▼─────────────┐ ┌──────────▼─────────────┐
│ example-e-api │ │ GitHub MCP Server │
│ (/quotes/random, │ │ (list PRs, reviews, │
│ /facts/random) │ │ check runs, issues) │
└────────────────────────┘ └────────────────────────┘
+ CronJob every hour
+ Webhook receiver
+ Kanban board
All agents run the exact same runtime image. The only differences are the character config, which tools they use (HTTP or MCP), and the deployment mode.
Deployment Modes¶
Single-process mode¶
index.js runs everything in one process: Discord gateway, agent loop, memory, dashboard.
graph TB
subgraph "Single Process (index.js)"
CL[Character Loader<br/>character.js]
DG[Discord Gateway<br/>discord.js]
AL[Agent Loop<br/>agent.js]
TD[Tool Dispatcher<br/>agent.js]
MS[Memory Store<br/>memory.js]
UT[Usage Tracker<br/>usage.js]
DB[Dashboard<br/>dashboard/]
end
CF[character.json<br/>ConfigMap] --> CL
DC[Discord] <--> DG
DG --> AL
AL --> TD
AL <--> MS
AL --> UT
TD --> API[Tool APIs<br/>HTTP + MCP]
AL <--> Claude[Claude API]
DB <--> WS[WebSocket Clients]
DB --> WH[Webhook<br/>Receiver]
UT --> DB
Split mode (gateway + workers)¶
In production, the system runs as separate processes connected by Redis Streams.
- gateway.js (1 replica) -- connects to Discord, publishes messages to the
rig-agent-runtime:messagesRedis Stream - worker.js (N replicas) -- consumes messages via a Redis consumer group, runs the agent loop, sends replies directly via Discord REST API
graph LR
DC[Discord] <--> GW[gateway.js<br/>1 replica]
WH[Webhooks] -->|POST /webhook| GW
GW -->|XADD| RS[Redis Stream<br/>rig-agent-runtime:messages]
RS -->|XREADGROUP| W1[worker.js<br/>replica 1]
RS -->|XREADGROUP| W2[worker.js<br/>replica 2]
W1 -->|REST API| DC
W2 -->|REST API| DC
W1 <--> CL[Claude API]
W2 <--> CL
W1 <--> MEM[Memory<br/>Postgres]
W2 <--> MEM
W1 --> MCP[MCP Servers]
W2 --> MCP
Key details of split mode:
- Redis consumer group (
workers) ensures each message is delivered to exactly one worker - Redis SETNX lock (
lock:<stream-id>, 300s TTL) prevents duplicate processing if a message is redelivered - Workers send replies directly via Discord REST API -- there is no reply stream back through the gateway
- Gateway handles thread creation and typing indicators before publishing
Cron mode (scheduled one-shot)¶
run-once.js runs the agent loop once with a predefined prompt, posts results to a Discord webhook, and exits. Deployed as a Kubernetes CronJob.
- Can run alongside single or split mode (same character, same database)
- No Discord bot connection needed -- output goes to a webhook
- K8s handles scheduling, retries, and concurrency
Use cron.enabled: true in Helm values to add a CronJob alongside the Discord bot.
Startup Sequence (single-process)¶
sequenceDiagram
participant R as Runtime
participant CL as Character Loader
participant MS as Memory Store
participant DG as Discord Gateway
participant DB as Dashboard
R->>CL: Load CHARACTER_FILE
CL->>CL: Validate required fields
CL->>CL: Apply defaults
R->>MS: Connect to Postgres (or init in-memory)
R->>DG: Login with DISCORD_BOT_TOKEN
DG->>DG: Register messageCreate handler
R->>DB: Start HTTP + WebSocket server
Note over R: Agent is ready
Message Processing¶
When a Discord message arrives, the runtime processes it through these stages:
flowchart TD
MSG[Discord messageCreate] --> FILTER{Channel match?}
FILTER -->|No| DROP[Ignore]
FILTER -->|Yes| BOT{From allowed bot<br/>or human?}
BOT -->|No| DROP
BOT -->|Yes| LOAD[Load conversation<br/>history from memory]
LOAD --> BUILD[Build system prompt:<br/>personality + lore +<br/>user facts + style]
BUILD --> CALL[Call Claude API<br/>with tools]
CALL --> TOOL{Tool use<br/>response?}
TOOL -->|Yes| EXEC[Execute HTTP call<br/>to tool API]
EXEC --> RESULT[Return result<br/>to Claude]
RESULT --> CALL
TOOL -->|No| TEXT[Extract text response]
TEXT --> SAVE[Save messages<br/>to memory]
SAVE --> REPLY[Post reply<br/>in Discord thread]
In split mode, the gateway handles filtering and thread creation, then publishes to Redis. The worker handles everything from LOAD onward and sends the reply via Discord REST API.
Key Design Decisions¶
Tool Calling via HTTP¶
Tools are HTTP endpoints, not code plugins. This means:
- Agents can call any REST API without runtime changes
- Tool definitions are pure configuration (no code deployment)
- APIs can be written in any language
- Tools are independently scalable Kubernetes services
Character as Configuration¶
The entire agent personality and behavior is defined in character.json:
- No agent-specific code in the runtime
- Multiple agents share the same runtime image
- Character changes deploy via ConfigMap update (no image rebuild)
- Version control and review for personality changes
Memory Layers¶
The memory system has three layers:
| Layer | Scope | Retention | Purpose |
|---|---|---|---|
| Conversations | Per thread | Configurable (default 30d) | Context for ongoing conversations |
| Facts | Per user | Indefinite | Learned preferences and patterns |
| Patterns | Per entity (e.g., merchant) | Indefinite | Auto-approval confidence scores |
Agent Loop Constraints¶
- Maximum 5 tool calls per message (prevents runaway loops)
- Each tool call is an independent HTTP request
- The agent loop is synchronous per message (no parallel tool calls)
- Failed tool calls return error text to Claude (does not crash the loop)
Provider chain + quota handling¶
createAgent in src/agent.js builds a providers Map from character.llm.provider + character.llm.providers. agent.process() delegates to processWithProviders which iterates the chain on fallback-eligible errors:
[active, ...rest] → try each in order → success returns; failure with fallbackEligible=true continues; all-failed throws the last error
The codex provider has an additional fail-fast gate at process() entry: when _quotaExhaustedUntil is in the future (set by a prior 429 usage_limit_reached), process() throws in <10ms without spawning the CLI subprocess. The thrown error sets the standard fallback userMessage, so the chain iteration routes the work to claude immediately.
The heartbeat exposes a rolled-up degraded flag plus degradedReasons array (e.g. codex_quota_exhausted_until_<iso>) so the conductor's dispatchers and dashboards can distinguish "alive and ready" from "alive but degraded."
End-to-end flow when codex hits 429:
- Codex CLI parses
usage_limit_reached→handleUsageLimitErrorsets_quotaExhaustedUntil+ emitsAGENT_QUOTA_REPORTED - Next
codex.process()call → fail-fast gate throws in <10ms processWithProviderscatches the fallback-eligible error → triesclaude-cli- claude delivers the review
- Heartbeat carries
degraded: true, degradedReasons: ['codex_quota_exhausted_until_<iso>'] - When the conductor's reconciler detects quota recovery (
<80%), any stuck PRs from before fallback was wired get re-dispatched via the third recovery path (rig-conductor#944)
Per-file references:
- Fail-fast policy: docs/2026-05-18-codex-fail-fast.md
- Chain iteration: docs/2026-05-18-provider-chain-iteration.md
- Degraded heartbeat: docs/heartbeat.md (degraded field)
- Reconciler quota recovery (conductor side): dashecorp/rig-conductor docs/2026-05-18-quota-recovery-reconciliation.md
Claude CLI Session Resumption¶
The stream consumer stores { "session:owner/repo#N": sessionId } in Valkey
when a Claude CLI run completes, so a follow-up dispatch (e.g. iterate on
review feedback) can --resume <id> with the prior context.
| Failure mode | Source | Recovery |
|---|---|---|
Pod restart wipes ~/.claude/ session files |
KEDA scale, OOM, deploy | Detect No conversation found with session ID: … on stderr → drop stored session ID, retry with no --resume on next attempt. No backoff (this is our bug, not upstream's). |
| Session expired server-side | TTL > 7 days | Same path. |
| Generic CLI error after some turns | API 5xx, tool timeout | Existing 60s/120s API backoff; the session ID is preserved. |
The detection lives in src/agent/providers/claude-cli.js#detectSessionNotFound
and propagates as AgentProviderError({ sessionNotFound: true }). The stream
consumer's catch block in src/stream-consumer.js clears the Valkey key and
continues the retry loop without --resume. A warning is logged so operators
can correlate cost spikes with restart-induced fresh runs.
See dashecorp/rig-agent-runtime#164. Pairs with #162 (SIGTERM hook, which
emits ISSUE_UNASSIGNED on graceful shutdown so the new pod starts fresh
from dispatch instead of retrying a resume).
System Prompt Structure¶
buildSystemPrompt(character) in src/agent/shared.js assembles the prompt every agent receives. Sections in order:
| Section | Source | Always present |
|---|---|---|
| Personality | character.personality |
✅ |
| Background knowledge | character.lore[] |
✅ |
| Style | character.style (language, tone, format) |
✅ |
| Rig rules (all agents) | hardcoded in shared.js |
✅ |
| Available tools | hardcoded | Only if tools configured |
| Rules (tool-use) | hardcoded | Only if tools configured |
Rig rules are universal engineering hygiene baked into the base prompt so every character inherits them without per-agent repetition:
- Every PR MUST include
Closes #N(orCloses owner/repo#Nfor cross-repo) in the PR body. - Every PR that adds a feature, changes behavior, or fixes a bug MUST include doc updates in the same PR.
Base Image¶
Dockerfile.base builds ghcr.io/dashecorp/rig-agent-runtime:base — the shared foundation for all agent stack images.
| Package | Source | Install path |
|---|---|---|
@anthropic-ai/claude-code |
npmjs.org | /usr/local/lib/node_modules/@anthropic-ai/claude-code |
@openai/codex |
npmjs.org | /usr/local/lib/node_modules/@openai/codex |
@dashecorp/rig-memory-mcp |
GitHub Packages (npm.pkg.github.com) |
/usr/local/lib/node_modules/@dashecorp/rig-memory-mcp |
@dashecorp/rig-memory-mcp provides the memory MCP server. Start it with:
The package is installed at build time via a Docker build secret (GITHUB_TOKEN). The token is never stored in image layers.
Stack Images¶
Stack-specific images (Dockerfile.node, Dockerfile.dotnet, Dockerfile.python) are thin layers on top of :base. KEDA uses the rig.stack label to route work to a stack-tagged pod.
| Image | Adds on top of base | Purpose |
|---|---|---|
:node |
typescript, tsx, jest, vitest, eslint, prettier (global npm) |
Node/TS repos |
:dotnet |
nothing (label only) | C#/dotnet repos — .NET 10 SDK is in base (#153) |
:python |
python3, pip, pytest, black, ruff (apt + pip) |
Python repos |
Anything universally needed (Node 22, .NET 10 SDK, claude-cli, codex, gh, jq, git) lives in Dockerfile.base exactly once. Re-installing the same toolchain in a stack image wastes bandwidth and risks layer conflicts (e.g. a duplicate ln -s failing with "File exists" — which is what broke the dotnet build before this change).
Baked Brain (Tier 1 Offline-first)¶
Both Dockerfile and Dockerfile.base bake the rig brain into the image via a multi-stage build. The brain lands at /app/brain/ and is exposed via RIG_BRAIN_DIR.
| Property | Value |
|---|---|
| Path | /app/brain/ |
| Env var | RIG_BRAIN_DIR=/app/brain |
| Files | entry, repos, agents, surfaces, flows, events, donts, endpoints |
| Freshness | Per image release (tied to image version) |
| Build failure | Noisy — any missing file aborts the build |
This replaces runtime WebFetch calls to research.rig.dashecorp.com, which are blocked by the Block-AI-Bots CF rule. See docs/brain-offline.md for full details.
File Structure¶
rig-agent-runtime/
src/ # Layer 1: Runtime (generic engine)
index.js # Single-process entry point
gateway.js # Split mode: Discord gateway
worker.js # Split mode: agent workers
character.js # Loads and validates character.json
agent.js # Agent loop, tool dispatch, prompt building
memory.js # Postgres + in-memory storage
usage.js # Token counting and cost calculation
dashboard/
server.js # HTTP server + WebSocket
index.html # Dashboard UI
charts/
rig-agent-runtime/ # Helm chart (deploy any agent)
Dockerfile # Builds the runtime image
package.json
examples/
example-e/ # Layer 2+3: Complete agent example
character.json # Agent config (personality, tools, channel)
values.yaml # Helm values for K8s deployment
api/ # Tool API backend (separate service)
server.js # Quotes + facts HTTP server
Dockerfile # Builds the tool API image
k8s/ # K8s manifests for the tool API
api-deployment.yaml # Deployment + Service
namespace.yaml # Agent namespace