Deployment¶

rig-agent-runtime supports two deployment modes via its Helm chart (charts/rig-agent-runtime/).

Deployment Modes¶

Single-process mode¶

One pod runs index.js, which handles the Discord gateway, agent loop, and dashboard. Simple to operate, suitable for low traffic.

Gateway + worker mode (production)¶

Set mode: split and enable Redis. The gateway pod connects to Discord and publishes messages to a Redis Stream. Worker pods consume messages via a consumer group and send replies directly through the Discord REST API. A Redis SETNX lock prevents duplicate processing.

Helm Chart¶

The Helm chart is the primary deployment method. It lives in charts/rig-agent-runtime/.

Single-process values.yaml¶

mode: single

image:
  repository: ghcr.io/stig-johnny/rig-agent-runtime
  tag: latest

secrets:
  existingSecret: my-agent-secrets

character:
  name: My Agent
  bio: "A helpful assistant"
  # ... full character.json contents

dashboard:
  enabled: true
  port: 3000

resources:
  requests:
    memory: 64Mi
    cpu: 50m
  limits:
    memory: 256Mi
    cpu: 500m

Gateway + worker values.yaml (production)¶

mode: split

image:
  repository: ghcr.io/stig-johnny/rig-agent-runtime
  tag: latest

secrets:
  existingSecret: my-agent-secrets

redis:
  enabled: true
  deploy: true
  storage: 1Gi
  storageClass: nfs-csi

workers:
  replicas: 2

character:
  name: My Agent
  bio: "A helpful assistant"
  # ... full character.json contents

dashboard:
  enabled: true
  port: 3000

tunnel:
  enabled: true
  tokenSecretName: cloudflared-rig-agent-runtime-token
  hostname: my-agent.example.com

Namespace¶

All resources deploy to the rig-agent-runtime namespace:

kubectl create namespace rig-agent-runtime

Secrets¶

Create secrets manually before deploying. The chart references them via secrets.existingSecret:

# Agent secrets (Discord token + Anthropic/OpenAI key + optional DB URL)
kubectl create secret generic my-agent-secrets \
  -n rig-agent-runtime \
  --from-literal=discord-bot-token=<token> \
  --from-literal=anthropic-api-key=<key> \
  --from-literal=openai-api-key=<key> \
  --from-literal=database-url=<url>

# GHCR pull secret (for private images)
kubectl create secret docker-registry ghcr-pull-secret \
  -n rig-agent-runtime \
  --docker-server=ghcr.io \
  --docker-username=<user> \
  --docker-password=<pat>

# Cloudflare Tunnel token (if using tunnel)
kubectl create secret generic cloudflared-rig-agent-runtime-token \
  -n rig-agent-runtime \
  --from-literal=token=<tunnel-token>

Do not use SealedSecrets. Secrets are created manually and referenced by name.

For llm.provider: codex-cli, there are two auth patterns:

Persistent host: run codex login on the machine and preserve the Codex auth directory.
If you set llm.authMode: device-auth, the runtime can trigger codex login --device-auth, post the browser URL/code through progress updates, and wait for a human to complete login.
Kubernetes/ephemeral pod: prefer OPENAI_API_KEY because interactive ChatGPT login is brittle across pod restarts.

If you must use ChatGPT login in Kubernetes, preserve the Codex home directory and mount it at the real runtime path:

codexHomePersistence:
  enabled: true
  size: 1Gi

extraEnv:
  - name: CODEX_HOME
    value: /home/node/.codex

This only works reliably if:

the pod uses a stable runtime home
the PVC is mounted at the actual Codex home path
the workload stays single-replica
Discord/webhook progress messages are enabled so operators can complete device-auth

Shared agent home (sharedClaudeVolume)¶

For multi-pod / multi-agent setups (dev-e, review-e, etc.) you can mount a single pre-existing PVC — typically ReadWriteMany (Filestore/NFS) — into every agent so ~/.claude/projects (and optionally ~/.codex) survive pod restarts and can be shared across all agent variants.

The chart only consumes the PVC; provision it out-of-band (see the matching dashecorp/infra Filestore ticket).

Field	Type	Default	Notes
`sharedClaudeVolume.enabled`	bool	`false`	Off by default — opt in per release.
`sharedClaudeVolume.pvcName`	string	`""`	Required when enabled. Existing PVC in the release namespace.
`sharedClaudeVolume.mountPath`	string	`/home/agent/.claude`	Read-write mount on the agent container.

Example helmrelease values:

sharedClaudeVolume:
  enabled: true
  pvcName: rig-shared-claude
  mountPath: /home/agent/.claude/projects

When enabled, the volume + mount render on the single-mode StatefulSet, the split-mode gateway and worker Deployments, and the optional CronJob. With enabled: true and an empty pvcName, helm template/helm install fails with sharedClaudeVolume.pvcName is required when sharedClaudeVolume.enabled is true — the chart will not produce a half-configured manifest.

Heartbeat Overview¶

When heartbeat.url and heartbeat.agentId are configured, rig-agent-runtime sends a richer HEARTBEAT payload to rig-conductor every interval.

That heartbeat now includes:

status, currentIssue, currentRepo
activeProvider
availableProviders
providers[] with health for each configured AI service
integrations[] with runtime integration status

The integration snapshot covers the things rig-conductor needs for operator visibility, including:

Discord connectivity
Conductor heartbeat target configuration
MCP server connectivity snapshot
webhook configuration
GitHub auth configuration

rig-conductor projects this into /api/agents, which becomes the single overview of which agents are online, which AI service each agent is actively using, and which integrations are healthy or degraded.

ArgoCD¶

ArgoCD syncs the Helm chart directly from the rig-agent-runtime repo:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: rig-agent-runtime
  namespace: argocd
spec:
  project: default
  sources:
    - repoURL: https://github.com/Stig-Johnny/rig-agent-runtime.git
      targetRevision: HEAD
      path: charts/rig-agent-runtime
      helm:
        valueFiles:
          - $values/deploy/my-agent/values.yaml
    - repoURL: https://github.com/your-org/your-config-repo.git
      targetRevision: HEAD
      ref: values
  destination:
    server: https://kubernetes.default.svc
    namespace: rig-agent-runtime
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Image tags use git SHAs. CI builds push new images to GHCR, then open a chore-pin issue in this repo (label agent-ready) which rig-conductor routes to Dev-E. Dev-E opens a PR in dashecorp/rig-gitops updating the HelmReleases. See Deploy-pin pattern for the full flow and the rationale for using a self-issue instead of a cross-repo PAT.

Cloudflare Tunnel¶

The dashboard is exposed via Cloudflare Tunnel (no public LoadBalancer or Ingress). Enable it in values:

tunnel:
  enabled: true
  tokenSecretName: cloudflared-rig-agent-runtime-token
  hostname: my-agent.example.com

This deploys a cloudflared sidecar that routes traffic from the public hostname to the dashboard service inside the cluster.

Cron Mode (Scheduled Agents)¶

Add a CronJob alongside the Discord bot by setting cron.enabled: true. The CronJob runs run-once.js which executes the prompt from character.cron.prompt and posts results to a Discord webhook.

mode: single  # Discord bot as usual

cron:
  enabled: true
  schedule: "*/5 * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 3

# Extra env vars for MCP servers (e.g., GitHub token)
extraEnv:
  - name: GITHUB_PERSONAL_ACCESS_TOKEN
    valueFrom:
      secretKeyRef:
        name: my-agent-secrets
        key: github-token

The CronJob shares the same character, secrets, and database as the Deployment. Add discord-webhook-url to your secret:

kubectl create secret generic my-agent-secrets \
  -n my-namespace \
  --from-literal=discord-bot-token=<token> \
  --from-literal=anthropic-api-key=<key> \
  --from-literal=openai-api-key=<key> \
  --from-literal=discord-webhook-url=<webhook-url> \
  --from-literal=github-token=<pat>

You can also run cron-only (no Discord bot) by omitting the discord-bot-token.

For Codex device-auth in cron / one-shot mode, that webhook is also where operator alerts go:

login required
login already in progress
cooldown / rate-limit
login complete
selected provider failed

Adding a New Agent¶

Create a new character.json for the agent
Create a new values file referencing a new existingSecret
Deploy as a separate Helm release in the same or different namespace

Each agent runs independently with its own Discord connection, memory, and optional worker pool.

Resource Guidelines¶

Agent Load	CPU Request	Memory Request	CPU Limit	Memory Limit
Low (< 10 messages/day)	50m	64Mi	500m	256Mi
Medium (10-100/day)	100m	128Mi	1000m	512Mi
High (100+/day)	200m	256Mi	2000m	1Gi

Most CPU is spent on HTTP calls to the Claude API (network I/O, not compute).