Cost Attribution¶

rig-conductor tracks per-agent, per-repo LLM costs by accumulating TOKEN_USAGE events emitted by agents after each API call.

Endpoints¶

Endpoint	Time scope	Source
`GET /api/usage` (no `days`)	All time	Marten projection
`GET /api/usage?days=N`	Rolling N days	Raw events
`GET /api/costs/summary?days=N`	Rolling N days	Raw events
`GET /api/costs/daily?days=N`	Rolling N days	Raw events

Rule: to compare /api/usage with /api/costs/summary, always supply the same ?days=N to both. Without it /api/usage returns all-time totals which are naturally larger than any windowed summary.

Multi-tenancy (rc#1460)¶

Cost is attributed per tenant from each event's server-resolved tenant_id header (the multi-tenancy keystone, rc#1459). Tenant #0 is invotek.

?tenantId= filter — optional on GET /api/costs/summary, /api/costs/issue, /api/costs/daily, and /api/usage. It restricts results to events whose display tenant (header, coalesced) equals the value, case-insensitively. Omitting it is byte-for-byte the pre-1460 behavior. /public/cost-summary stays tenant-blind (never leak tenant structure externally).
Backfill display — a pre-keystone (header-less) TOKEN_USAGE event coalesces to invotek for display, so ?tenantId=invotek includes legacy dashecorp spend. This reuses MartenEventStore.ReadTenantId.
All-time per-tenant usage — GET /api/usage with a tenantId and no days scans raw events (not the stored TokenUsageProjection, which is tenant-blind in P0). The no-tenant all-time path still uses the fast projection. ⚠️ Perf: this is an O(N) type-filtered scan of all token_usage events with no event-type index (same shape as the existing windowed summary/daily scans) — fine at P0's modest volume, but prefer a ?days= window for large ranges. Phase 1 gives the projection a tenant dimension so all-time per-tenant reads stay on the fast path.
tenant=unknown is NOT a queryable tenant. Genuinely unattributable spend surfaces via the alarm below, not via ?tenantId=unknown (avoids implying "unknown" is a real tenant).

tenant=unknown alarm (`TenantUnknownCostAlerter`)¶

A background service posts to Discord #admin (DISCORD_ADMIN_WEBHOOK_URL) when a new TOKEN_USAGE event has real cost but no attributable tenant — i.e. its tenant_id header is absent/empty (TenantAttribution.IsUnknown), which is the inverse of the display coalesce. A literal "invotek" header is attributed and never alarms; only an absent/blank header does. This guards against unattributable spend silently becoming margin leak.

Skip-to-tip on cold start — HeadersEnabled was only turned on at rc#1459, so every pre-keystone event is header-less; the alerter starts at the current event tip so it only ever alarms on new (post-keystone) unattributed events. Post-keystone every append stamps the header, so an absent header means a real bug/bypass.
Gated on EffectiveCost > 0 (zero-cost idle events never alarm) and deduped per event Sequence (Valkey 24h key + in-memory fallback). No-op when the webhook env is unset.

Cost Formula¶

For each TOKEN_USAGE event, the effective cost is computed by MartenCostQuery.EffectiveCost:

if all token counts = 0:
    cost = $0          # no LLM call occurred

elif cacheReadTokens > 0 OR cacheCreationTokens > 0:
    cost = AnthropicPricing.ComputeCost(model, input, output, cacheRead, cacheCreate)

else:
    cost = event.CostUsd   # trust agent-reported value (backward compat)

Anthropic Pricing Table¶

Maintained in ConductorE.Core/UseCases/AnthropicPricing.cs. Prices in USD/M tokens:

Model family	Input	Output	Cache read	Cache create
claude-opus-4-5	$15.00	$75.00	$1.50	$18.75
claude-sonnet-4-5 / claude-3-5-sonnet	$3.00	$15.00	$0.30	$3.75
claude-haiku-4-5 / claude-3-5-haiku	$0.80	$4.00	$0.08	$1.00
claude-3-opus	$15.00	$75.00	$1.50	$18.75
claude-3-haiku	$0.25	$1.25	$0.03	$0.30

Unknown models fall back to claude-sonnet-4-5 pricing.

TOKEN_USAGE Event Fields¶

{
  "type": "TOKEN_USAGE",
  "agentId": "dev-e-dotnet",
  "repo": "dashecorp/rig-conductor",
  "issueNumber": 148,
  "model": "claude-sonnet-4-5",
  "inputTokens": 10,
  "outputTokens": 4994,
  "cacheReadTokens": 160855,
  "cacheCreationTokens": 28927,
  "costUsd": 0.242194,
  "category": "work"
}

cacheReadTokens and cacheCreationTokens are optional (default 0). When present, the conductor recomputes cost from the price table, overriding costUsd.

category is one of "work" (default), "idle", or "overhead". It controls which bucket the cost appears in on /api/costs/summary.

Three Bugs Fixed in #148¶

Bug 1 — Endpoint disagreement¶

Before: /api/usage read from an all-time Marten projection; /api/costs/summary queried raw events filtered by the days window. Same agent, different totals.

After: /api/usage?days=N queries the same raw event stream as /api/costs/summary?days=N. Both produce identical totals for the same window.

Bug 2 — Cache tokens ignored¶

Before: Agent-reported costUsd excluded cache token costs. A review with 160 k cache-read tokens was reported as $0.24 when the true cost was ~$0.56.

After: When cacheReadTokens > 0 || cacheCreationTokens > 0, the conductor recomputes cost using AnthropicPricing.ComputeCost, which adds cache pricing on top of input/output.

Bug 3 — Phantom idle cost¶

Before: TOKEN_USAGE events with inputTokens=0, outputTokens=0 but non-zero costUsd (heartbeat overhead attributed to agents) contributed to idleCostUsd.

After: Any event where all four token counts are zero contributes $0, regardless of the reported costUsd.

Adding a New Model¶

Edit AnthropicPricing.cs and add an entry to the Prices dictionary:

["claude-new-model-20270101"] = new(inputPer1M, outputPer1M, cacheReadPer1M, cacheCreatePer1M),

The Resolve method will also match on prefix, so "claude-new-model" is automatically covered.

Sizing cost levers — the stats endpoint¶

GET /api/execution-logs/stats (see API Reference) is the read-side instrument for the Review-E cost-reduction decision. It returns the turn distribution (p50/p90/p95/max, nearest-rank) and cost/token sums for an agent over a window.

The load-bearing use is sizing --max-turns: a CLI-subprocess agent re-sends its transcript plus BRAIN.md on every turn, so cost compounds with turn count. Set the cap just above the observed p95 of real completed runs — high enough not to truncate legitimate work, low enough to bound a runaway. Read the p95 from this endpoint rather than guessing; a typo'd status or non-positive days returns 400 precisely so a zero p95 can't be mistaken for "low turn usage" and drive the cap down.