Heartbeat¶
The heartbeat module (src/heartbeat.js) periodically POSTs agent status to rig-conductor, providing visibility into agent health, workload, and error rates.
Configuration¶
Add a heartbeat block to character.json:
{
"heartbeat": {
"url": "http://rig-conductor-api:8080/api/events",
"intervalSeconds": 60,
"agentId": "dev-e"
}
}
| Field | Required | Description |
|---|---|---|
url |
Yes | Endpoint to POST heartbeat payloads |
agentId |
Yes | Stable agent identifier |
intervalSeconds |
No | Interval between beats (default: 60) |
Payload¶
Each heartbeat POST sends:
{
"type": "HEARTBEAT",
"agentId": "dev-e",
"status": "idle",
"currentIssue": null,
"currentRepo": null,
"uptimeSeconds": 3600,
"memoryMB": 128,
"activeProvider": "claude-cli",
"availableProviders": ["claude-cli"],
"providers": [...],
"integrations": [...],
"errorCount": 0,
"startup": "Startup: 1.2s",
"degraded": false,
"degradedReasons": []
}
| Field | Description |
|---|---|
status |
Current agent status (idle, working, etc.) |
errorCount |
Unhandled errors since last successful work completion |
uptimeSeconds |
Process uptime in seconds |
memoryMB |
RSS memory usage in MB |
activeProvider |
Currently active LLM provider |
availableProviders |
All configured providers |
providers |
Provider health details (auth status, active flag) |
integrations |
Integration health (Discord, Conductor, MCP, GitHub, webhooks) |
startup |
First beat only — time from module load to first heartbeat |
degraded |
Rolled-up alive-but-degraded flag (rar#482 / rc#959 gap #2). true when the pod is processing fine but its active provider is in a known-degraded state (e.g., codex 429 quota lockout). |
degradedReasons |
Array of specific reasons. Format: <provider>_<condition>_until_<iso> for time-bound conditions (e.g., codex_quota_exhausted_until_2026-05-18T15:00:00.000Z). Empty array when degraded=false. |
Error Tracking¶
errorCount tracks unhandled errors per work cycle:
- Incremented when
agent.process()throws in any runtime - Reset to 0 after each successful work completion
- Persists across heartbeat intervals — rig-conductor sees the running total until reset
Work succeeds → errorCount = 0
Work fails → errorCount++ (now 1)
Work fails → errorCount++ (now 2)
Work succeeds → errorCount = 0
API¶
startHeartbeat(character, options?) returns a controller:
| Method | Description |
|---|---|
setStatus(s, issue, repo) |
Update status, current issue and repo |
incrementError() |
Increment errorCount by 1 |
resetErrors() |
Reset errorCount to 0 |
stop() |
Stop the heartbeat timer |
Runtime Support¶
Heartbeat is active in all three runtimes:
| Runtime | File | Heartbeat | Error tracking |
|---|---|---|---|
| Single-process | src/index.js |
✅ | ✅ |
| Messaging v2 | src/index-v2.js |
✅ | ✅ |
| Worker | src/worker.js |
✅ | ✅ |
TDZ Safety¶
messagesProcessed is declared before any await in src/index.js (before
createMemory() and connectMcpServers()). This prevents a temporal dead zone
on the first heartbeat tick, which fires synchronously inside startHeartbeat
before the module's async initialization suspends (#313).