Skip to content

Heartbeat

The heartbeat module (src/heartbeat.js) periodically POSTs agent status to rig-conductor, providing visibility into agent health, workload, and error rates.

Configuration

Add a heartbeat block to character.json:

{
  "heartbeat": {
    "url": "http://rig-conductor-api:8080/api/events",
    "intervalSeconds": 60,
    "agentId": "dev-e"
  }
}
Field Required Description
url Yes Endpoint to POST heartbeat payloads
agentId Yes Stable agent identifier
intervalSeconds No Interval between beats (default: 60)

Payload

Each heartbeat POST sends:

{
  "type": "HEARTBEAT",
  "agentId": "dev-e",
  "status": "idle",
  "currentIssue": null,
  "currentRepo": null,
  "uptimeSeconds": 3600,
  "memoryMB": 128,
  "activeProvider": "claude-cli",
  "availableProviders": ["claude-cli"],
  "providers": [...],
  "integrations": [...],
  "errorCount": 0,
  "startup": "Startup: 1.2s",
  "degraded": false,
  "degradedReasons": []
}
Field Description
status Current agent status (idle, working, etc.)
errorCount Unhandled errors since last successful work completion
uptimeSeconds Process uptime in seconds
memoryMB RSS memory usage in MB
activeProvider Currently active LLM provider
availableProviders All configured providers
providers Provider health details (auth status, active flag)
integrations Integration health (Discord, Conductor, MCP, GitHub, webhooks)
startup First beat only — time from module load to first heartbeat
degraded Rolled-up alive-but-degraded flag (rar#482 / rc#959 gap #2). true when the pod is processing fine but its active provider is in a known-degraded state (e.g., codex 429 quota lockout).
degradedReasons Array of specific reasons. Format: <provider>_<condition>_until_<iso> for time-bound conditions (e.g., codex_quota_exhausted_until_2026-05-18T15:00:00.000Z). Empty array when degraded=false.

Error Tracking

errorCount tracks unhandled errors per work cycle:

  • Incremented when agent.process() throws in any runtime
  • Reset to 0 after each successful work completion
  • Persists across heartbeat intervals — rig-conductor sees the running total until reset
Work succeeds → errorCount = 0
Work fails    → errorCount++ (now 1)
Work fails    → errorCount++ (now 2)
Work succeeds → errorCount = 0

API

startHeartbeat(character, options?) returns a controller:

Method Description
setStatus(s, issue, repo) Update status, current issue and repo
incrementError() Increment errorCount by 1
resetErrors() Reset errorCount to 0
stop() Stop the heartbeat timer

Runtime Support

Heartbeat is active in all three runtimes:

Runtime File Heartbeat Error tracking
Single-process src/index.js
Messaging v2 src/index-v2.js
Worker src/worker.js

TDZ Safety

messagesProcessed is declared before any await in src/index.js (before createMemory() and connectMcpServers()). This prevents a temporal dead zone on the first heartbeat tick, which fires synchronously inside startHeartbeat before the module's async initialization suspends (#313).