Skip to content

API Reference

Base URL: http://rig-conductor-api:8080 (internal ClusterIP)

Endpoints

Health

GET /health
{"status": "healthy", "timestamp": "2026-04-02T06:56:08Z"}

Deep Health (rc#1188)

GET /healthz/deep

Returns the status of every dependent system the conductor needs to function. Critical deps (Valkey, Marten, GitHub API) drive the readiness verdict; non-critical deps (Discord) can soft-degrade overall but cannot trip the 503.

Status-code mapping:

Overall HTTP Meaning
Ok 200 all critical deps reachable; non-critical deps Ok
Degraded 200 reachable but slow / rate-limited / a non-critical dep is non-Ok — pod stays in the load balancer
Unreachable 503 at least one critical dep is unreachable — readiness probe should pull the pod

Post-merge baseline (PR-B1 only):

{
  "overall": "Ok",
  "dependencies": [],
  "checkedAt": "2026-05-19T13:42:00Z"
}

Populated response (after PR-B2 wires production checkers):

{
  "overall": "Degraded",
  "dependencies": [
    { "name": "valkey",  "status": "Ok",         "critical": true,  "reason": null,         "latencyMs": 3,    "lastCheckAt": "..." },
    { "name": "marten",  "status": "Ok",         "critical": true,  "reason": null,         "latencyMs": 12,   "lastCheckAt": "..." },
    { "name": "github",  "status": "Degraded",   "critical": true,  "reason": "http 429",   "latencyMs": 88,   "lastCheckAt": "..." },
    { "name": "discord", "status": "Ok",         "critical": false, "reason": null,         "latencyMs": 41,   "lastCheckAt": "..." }
  ],
  "checkedAt": "2026-05-19T13:42:00Z"
}

reason is null on Ok and short on non-Ok ("high ping latency", "http 429", "ping failed: RedisConnectionException", etc.). latencyMs is the per-checker round-trip; the orchestrator runs checkers in parallel with a 2-second per-checker hard timeout that maps to Status=Degraded, reason="timeout".

Rollout slices: - PR-B1 (this slice): port + orchestrator + /healthz/deep endpoint; empty deps array - PR-B2 (rc#1204): four production checkers (Valkey / Marten / GitHub API / Discord) + DI - PR-D: Kubernetes readiness + liveness probes switched from /health to /healthz/deep + decision doc

See rc#1188 for the overall design and rc#1173 for the Valkey silent-degrade incident that motivated it.

Submit Event

POST /api/events

Submit any rig event. The stream ID is derived from repo#issueNumber for issue events, or agentId for heartbeats.

Request:

{
  "type": "ISSUE_APPROVED",
  "repo": "dashecorp/rig-conductor",
  "issueNumber": 42,
  "title": "feat: Add health check endpoint",
  "priority": "normal",
  "dependsOn": []
}

Response (200):

{
  "streamId": "dashecorp/rig-conductor#42",
  "type": "ISSUE_APPROVED",
  "timestamp": "2026-04-02T06:56:22Z"
}

Error (400):

{"error": "Unknown event type: INVALID"}

See Event Store for all event types and their fields.

Get Issues

GET /api/issues
GET /api/issues?state=queued

Returns all tracked issues, optionally filtered by state.

States: queued, assigned, in_progress, in_review, deploying, done, failed

Get Issue Trace (rc#951)

GET /api/issues/trace?repo={repo}&issueNumber={n}

Returns the ordered Marten event history for one issue stream, plus the current IssueStatus.State from the projection. Events are sorted by Marten's global Sequence (authoritative — not by timestamp, which can collide or be reported out of clock-skew order).

404 Not Found when the stream has zero events (issue never landed in conductor).

{
  "issueId": "dashecorp/rig-conductor#244",
  "currentState": "in_progress",
  "eventCount": 28,
  "events": [
    {"at": "2026-05-14T07:26:17Z", "type": "work_started", "data": {...}, "sequence": 364467},
    {"at": "2026-05-14T07:26:46Z", "type": "merged",       "data": {...}, "sequence": 364470}
  ]
}

Replaces the prior psql-into-Marten + log-tail workflow (~10 min per triage cycle). Also feeds the rc#947 SelfImprovementService a stable per-issue event-sequence API.

Get Priority Queue

GET /api/queue

Returns unassigned issues sorted by priority (critical > high > normal > oldest).

Get Next Assignment

GET /api/assignments/next?agentId=dev-e-1

Returns the top-priority unassigned issue, or 204 No Content if nothing available.

Response (200):

{
  "streamId": "dashecorp/rig-conductor#42",
  "issue": {
    "number": 42,
    "repo": "dashecorp/rig-conductor",
    "title": "feat: Add health check endpoint",
    "milestone": null
  },
  "priority": "normal",
  "attempt": 1
}

Get Agent Status

GET /api/agents

Returns status of all known agents. The status and currentIssue/currentRepo fields are updated by both heartbeat events and assignment events (ISSUE_ASSIGNED, WORK_STARTED) so the dashboard reflects live work immediately without waiting for a heartbeat cycle.

Each item includes:

  • isOnline — computed from recent heartbeat freshness (last heartbeat within 2 minutes)
  • statusidle, working, or stuck. Set to working when ISSUE_ASSIGNED or WORK_STARTED is dispatched; cleared to idle on ISSUE_DONE, ISSUE_CANCELLED, or next idle heartbeat.
  • currentIssue — issue number the agent is currently working on (null if idle)
  • currentRepo — repo of the current issue (null if idle)
  • activeProvider
  • availableProviders
  • providers[] — provider health snapshot
  • integrations[] — integration health snapshot
  • lastModel — last LLM model the agent reported via TOKEN_USAGE (e.g. claude-opus-4-7). null until the agent emits its first TOKEN_USAGE event after deploy. Added in #530 (rc#529) so the dashboard can show which agents have switched models after a config change.

Response:

[
  {
    "id": "dev-e-1",
    "status": "working",
    "isOnline": true,
    "currentIssue": 136,
    "currentRepo": "dashecorp/rig-conductor",
    "activeProvider": "claude",
    "availableProviders": ["claude", "codex"],
    "providers": [
      { "name": "claude", "status": "ready", "details": "Claude Code auth configured", "active": true },
      { "name": "codex", "status": "authenticated", "details": "logged in using ChatGPT", "active": false }
    ],
    "integrations": [
      { "name": "github", "status": "ready", "details": "GitHub App configured" },
      { "name": "discord-webhook", "status": "ready", "details": "operator webhook configured" }
    ],
    "lastModel": "claude-opus-4-7",
    "lastHeartbeat": "2026-04-02T09:14:22Z",
    "issuesCompleted": 12,
    "issuesFailed": 1
  }
]

lastModel is null for agents that have not yet emitted a TOKEN_USAGE event after deploy.

Get Execution Logs

GET /api/execution-logs?limit=50&status=running

Returns recent execution log summaries (without full step/log payload). Used by the dashboard overview.

Query parameters:

Parameter Type Default Description
limit int 50 Max results to return
status string (all) Filter by status. Must be one of: running, completed, failed, stuck. Case-insensitive — RUNNING, Running, and running are all accepted and treated identically.

Status filter validation: If status is provided and is not one of the allowed values, the endpoint returns 400 Bad Request:

{"error": "Invalid status 'xyz'. Allowed values: running, completed, failed, stuck."}

Note: cancelled is not a valid status — the domain uses stuck to represent issues that have stalled. Sending ?status=cancelled returns 400.

Response (200 OK):

[
  {
    "id": "uuid",
    "repo": "dashecorp/rig-conductor",
    "issueNumber": 42,
    "prNumber": 99,
    "agentId": "dev-e-1",
    "status": "completed",
    "startedAt": "2026-04-23T10:00:00Z",
    "completedAt": "2026-04-23T10:12:00Z",
    "durationSeconds": 720,
    "totalCostUsd": 0.18,
    "totalTurns": 24,
    "model": "claude-sonnet-4-5",
    "stepCount": 7,
    "logCount": 130
  }
]

Get Event Stream

GET /api/events/stream?id=dashecorp/rig-conductor%2342

Returns all events for a specific stream. The id parameter must be URL-encoded (use %23 for #).

Response:

[
  {
    "id": "uuid",
    "type": "IssueApproved",
    "data": { "repo": "...", "issueNumber": 42, "title": "..." },
    "timestamp": "2026-04-02T06:56:22Z"
  }
]

Token Usage

GET /api/usage
GET /api/usage?agentId=dev-e-node
GET /api/usage?repo=dashecorp%2Frig-conductor
GET /api/usage?days=7

Returns per-agent token usage totals including cache token counts.

Query parameters:

Param Default Description
agentId (all) Filter to a single agent
repo (all repos) Filter to a specific repo
days (all time) Rolling window in days. Required to compare with /api/costs/summary — without it the endpoint returns all-time projection totals which are naturally larger than any windowed cost summary.

Response:

[
  {
    "agentId": "dev-e-node",
    "totalInputTokens": 1200000,
    "totalOutputTokens": 480000,
    "totalCacheReadTokens": 5600000,
    "totalCacheCreationTokens": 920000,
    "totalCostUsd": 39.591,
    "byRepo": [
      {
        "repo": "dashecorp/rig-conductor",
        "inputTokens": 800000,
        "outputTokens": 300000,
        "cacheReadTokens": 3200000,
        "cacheCreationTokens": 600000,
        "costUsd": 25.4
      }
    ]
  }
]

Cost Summary

GET /api/costs/summary
GET /api/costs/summary?days=7

Returns agent-level cost breakdown by category for the rolling window. Uses the same raw-event source as /api/usage?days=N — both endpoints agree for the same window.

Cost computation (fixed in #148):

  • Events with zero tokens (inputTokens=outputTokens=cacheReadTokens=cacheCreationTokens=0) contribute $0, regardless of any costUsd value in the event. This prevents phantom idle costs.
  • Events with cache tokens (cacheReadTokens > 0 || cacheCreationTokens > 0) are recomputed using the Anthropic pricing table in AnthropicPricing rather than trusting the agent-reported costUsd, which historically excluded cache costs.
  • Legacy events (no cache token fields) continue to use the agent-reported costUsd.

TOKEN_USAGE event now accepts optional cache token fields:

{
  "type": "TOKEN_USAGE",
  "agentId": "dev-e-node",
  "repo": "dashecorp/rig-conductor",
  "issueNumber": 148,
  "model": "claude-sonnet-4-5",
  "inputTokens": 10,
  "outputTokens": 4994,
  "cacheReadTokens": 160855,
  "cacheCreationTokens": 28927,
  "costUsd": 0.242194,
  "category": "work"
}

See cost attribution for the full pricing model and fix details.

Daily Costs

GET /api/costs/daily?days=7

Returns a per-agent, per-day breakdown for the rolling window.

Response:

{
  "period": "7d",
  "entries": [
    { "date": "2026-04-23", "agentId": "dev-e-node", "costUsd": 12.30 }
  ]
}

Stream Status

GET /api/streams/status

Returns per-agent Valkey stream status. The primary field to watch is lag inside each consumer-group entry — it represents messages the consumer group has not yet delivered, i.e. the real backlog. xlen (total stream length) is included for diagnostics only; it does not decrease when messages are acknowledged and will accumulate until the stream is explicitly trimmed.

Response:

{
  "dev-e-dotnet": {
    "xlen": 17,
    "groups": {
      "agents": { "lag": 0, "pending": 0 }
    }
  },
  "dev-e-node": {
    "xlen": 27,
    "groups": {
      "agents": { "lag": 2, "pending": 1 }
    }
  }
}
  • lag — unread entries waiting to be delivered to the group (the real queue backlog)
  • pending — entries delivered but not yet acknowledged (in-flight)
  • xlen — total stream length (misleading as a backlog metric; stays high until trim)

When no consumer group has been created yet (stream never consumed), groups is empty and xlen reflects the raw message count.

Force-Done (Epic override)

POST /api/admin/issues/force-done?repo=owner%2Frepo&issueNumber=N

Operator escape hatch for a parent epic that was blocked by the epic-completion guard (rc#459). Use when the team has deliberately abandoned remaining sub-issues and wants to declare the epic done without waiting for all subs to reach done/production.

Auth: No token required. Relies on cluster-network trust (internal ClusterIP only — the same model used by all /api/admin/* endpoints). Do not expose via public ingress.

Query params:

Param Type Required Description
repo string Repository slug, e.g. dashecorp/rig-conductor (URL-encode the /)
issueNumber int GitHub issue number of the parent epic

Responses:

Status Body Meaning
200 {"forcedDone": true, "previousState": "deploying", "repo": "...", "issueNumber": N} Transition emitted
200 {"alreadyDone": true, "state": "done", ...} Epic was already done/production — no-op
404 {"error": "Issue ... not found in conductor"} Issue not tracked in conductor

Example:

curl -X POST \
  "http://rig-conductor-api:8080/api/admin/issues/force-done?repo=dashecorp%2Frig-conductor&issueNumber=433"

Architecture

POST /api/events → SubmitEvent (Use Case) → IEventStore (Port) → MartenEventStore (Adapter) → PostgreSQL

GET /api/queue   → IIssueQuery (Port) → MartenIssueQuery (Adapter) → PostgreSQL (Marten projection)

Clean Architecture: endpoints delegate to use cases/ports, never touch Marten directly.