Agent Secrets Broker — Autonomous Secret Lifecycle for LLM Agents¶

TL;DR

LLM agents that handle secrets via prompt, tool argument, or log entry are compromised by design — prompt-injection, transcript storage, and log shipping all become exfiltration vectors. The secrets broker applies capability-based mediation (Hardy 1988; Miller 2006): the LLM is the planner that operates on opaque references (bw:item/prod-db-password); the broker is the courier that handles plaintext. The agent never sees the bytes. This document specifies the tool surface, destination grammar, policy model, audit schema, and a 4-week implementation plan. Does not duplicate security.md (supply-chain: Sigstore/SLSA/Kyverno) — covers the complementary runtime-lifecycle layer.

Motivation¶

Anthropic's April 2026 third-party-tool policy change metered subscription-OAuth usage, pushing more agentic work onto hybrid local/cloud inference where the rig cannot assume that transcript storage is under Dashecorp's control. Three failure modes drove this design:

Failure mode	Mechanism	Risk
Prompt-in-plaintext	Agent receives `API_KEY=sk-abc123` in tool output; key is logged to transcript	Full compromise on transcript exfil
Tool-argument leak	`deploy_secret(value="sk-abc123")` appears in structured tool call log	Log aggregation → attacker
Rotation paralysis	No agent-driven rotation path; secrets age indefinitely	Long exposure window on compromise

The broker pattern eliminates all three: the agent requests operations by reference, the broker executes them against the backing store, and plaintext never crosses the LLM boundary.

Complementary scope: security.md covers the supply-chain layer (Sigstore, SLSA, Kyverno admission). This whitepaper covers the runtime lifecycle layer — what happens after a container is admitted and needs a secret to function.

The threat model¶

graph TB
    classDef threat fill:#ffcccc,color:#000
    classDef defense fill:#c8e6c9,color:#000
    classDef neutral fill:#e3f2fd,color:#000

    LLM[LLM Agent]:::neutral
    BROKER[Secrets Broker]:::defense
    BW[Bitwarden]:::neutral
    GH[GitHub Secrets]:::neutral
    SOPS[SOPS / age]:::neutral
    K8S[Kubernetes Secrets]:::neutral
    CF[Cloudflare Worker Secrets]:::neutral

    T1[Prompt injection exfil]:::threat
    T2[Transcript storage leak]:::threat
    T3[Log line leak]:::threat
    T4[Over-privileged rotation]:::threat
    T5[Unaudited secret access]:::threat
    T6[Hardware key bypass]:::threat

    D1[Reference-only API — no plaintext crosses LLM boundary]:::defense
    D2[Transcript-safe tool surface — values never in args or output]:::defense
    D3[Structured log sanitisation — broker logs ref not value]:::defense
    D4[Policy model — per-secret rotation scope + rate limits]:::defense
    D5[Append-only audit log — SQLite + R2 mirror]:::defense
    D6[hardware_key_required flag — policy blocks software-only rotation]:::defense

    T1 -.->|blocked by| D1
    T2 -.->|blocked by| D2
    T3 -.->|blocked by| D3
    T4 -.->|blocked by| D4
    T5 -.->|blocked by| D5
    T6 -.->|blocked by| D6

    LLM -->|ref only| BROKER
    BROKER -->|plaintext| BW
    BROKER -->|plaintext| GH
    BROKER -->|plaintext| SOPS
    BROKER -->|plaintext| K8S
    BROKER -->|plaintext| CF

The LLM → Broker boundary is the invariant: the arrow carries only references and operation names. Plaintext flows only within the broker process and onward to backing stores over authenticated, encrypted channels.

Secret-kind taxonomy¶

Not all secrets are equal. The broker distinguishes automatable secrets from human-bootstrap secrets:

Kind	Examples	Automatable?	Notes
`api-key`	Anthropic, GitHub PAT, Cloudflare token	Yes (generate + deploy)	Provider must support programmatic issuance
`symmetric-key`	SOPS age recipient, AES-256 data key	Yes	`generate_and_deploy` flow
`db-password`	Postgres service account	Yes	Must rotate with zero-downtime (dual-write period)
`jwt-signing-key`	RS256/ES256 private key	Yes	Key rotation requires public-key republish
`tls-cert`	Cluster internal CA	Delegated to cert-manager	Broker tracks ref; cert-manager issues
`oauth-client-secret`	GitHub App, Google OAuth	Human-bootstrap	Provider issues interactively; agent stores result
`hardware-backed`	YubiKey PIV, HSM-resident	Human-bootstrap always	`hardware_key_required: true` policy flag; broker refuses software rotation
`biometric`	Touch ID, passkeys	Human-bootstrap always	Never enters the broker at any stage

Automatable secrets complete the full mint → store → deploy → rotate → retire cycle without human intervention.

Human-bootstrap secrets require a human to perform initial issuance; the broker takes over for storage, deployment, and lifecycle tracking once the human has deposited the value via an authenticated, out-of-band channel (never via agent prompt).

Destination reference grammar¶

The broker uses a URI-like reference grammar for all secret locations. References are the only values that cross the LLM boundary.

<scheme>:<path>[?<params>]

Scheme	Backing store	Example
`bw:`	Bitwarden (personal or org vault)	`bw:item/prod-db-password`
`gh:`	GitHub repository secret	`gh:dashecorp/rig-conductor/PROD_DB_PASSWORD`
`gh-env:`	GitHub environment secret	`gh-env:dashecorp/rig-conductor/production/PROD_DB_PASSWORD`
`sops:`	SOPS-encrypted file at path, key name	`sops:apps/rig-conductor/secrets.sops.yaml#DB_PASSWORD`
`k8s:`	Kubernetes Secret, namespace/name/key	`k8s:rig-conductor/prod-secrets#db-password`
`cf-worker:`	Cloudflare Worker secret	`cf-worker:rig-conductor-api/DB_PASSWORD`

Refs are stable across rotations — the broker updates the backing store value; callers holding the ref do not need to change.

Resolution rules: - Refs are validated against the policy registry on every operation. - Unknown or malformed refs are rejected before any backing store call. - Cross-destination copy (e.g., mint to bw:, deploy to gh: and k8s:) is a single atomic broker operation, not two agent calls.

Multi-tenancy (Phase 1): tenant-prefixed refs¶

For multi-tenant operation the grammar carries a tenant segment immediately after the scheme, preserving the full single-tenant coordinate after it (purely additive — nothing is dropped). This form was ratified 2026-06-09 and is frozen forever (it is persisted in every ref and in the append-only WORM audit log): rig-conductor docs/2026-06-09-multi-tenancy-secret-ref-grammar-ratification.md (status: accepted), realizing #1479.

Scheme	Frozen tenant form
`gh:`	`gh:tenant-<id>/<owner>/<repo>/<name>`
`gh-env:`	`gh-env:tenant-<id>/<owner>/<repo>/<env>/<name>`
`sops:`	`sops:tenant-<id>/<path>#<key>`
`k8s:`	`k8s:tenant-<id>/<ns>/<secret>#<key>`
`cf-worker:`	`cf-worker:tenant-<id>/<worker>/<name>`
`bw:`	`bw:tenant-<id>/<collection>/<item>#<field>`

<id> is a canonical tenant_id (^[a-z][a-z0-9]{1,19}$, reserved-token blocklist), validated by the existing TenantId value object — no new id grammar.
The literal segment is exactly tenant-. extract_tenant(ref) is literal-tenant--or-reject + coordinate-required + normalized compare + fail-closed: a ref with no tenant- prefix, no coordinate after the tenant token, or an invalid id is rejected; there is no implicit default for external sessions.
The W1 ref parser is written against exactly these forms.
Control-plane lane: an enumerated first-segment set (rig-control, rig_control, cert-manager, flux-system, kube-system, the pg*/kube* reserved-prefix rule, and the full TenantId reserved set) is addressable only by the control session; every tenant session reaching it is a 403.
Tenant source = the session, never the ref: the broker derives the bound tenant from the conductor-issued, tenant-bound agent session token / mTLS principal and only ever compares the ref's tenant segment to it. The guard: extract_tenant(ref) != session.bound_tenant → 403, reject_reason=tenant_mismatch, audited. A multi-destination op with one foreign-tenant entry in dst_refs[] returns 403 with no result body (whole-call atomic — never a partial success / skip).
Tenant-0 (invotek) legacy: unprefixed refs are tolerated only when session.bound_tenant == invotek, behind ALLOW_UNPREFIXED_INVOTEK_REFS (default OFF), ending in hard-reject once invotek's refs migrate to tenant-invotek/…. The tolerance is a property of the bound tenant, never derived from the ref.

The single-tenant table above remains the form for control-plane refs and for tenant-0 until its migration completes.

Tool surface¶

The broker exposes eight tools to the LLM. None accept or return plaintext. All operations are synchronous unless noted.

Tool	Args	Returns	Effect
`mint`	`kind`, `ref`, `policy_ref`	`{ref, created_at}`	Generate new secret value, store at `ref`, register policy
`store`	`ref`, `policy_ref`	`{ref, stored_at}`	Deposit a value the human has provided out-of-band; agent provides only the ref
`deploy`	`src_ref`, `dst_refs[]`	`{deployed_to[], skipped[]}`	Copy from source ref to one or more destinations
`rotate`	`ref`, `strategy`	`{ref, rotated_at, old_version}`	Generate new value, dual-write if `strategy=zero-downtime`, retire old
`retire`	`ref`	`{ref, retired_at}`	Revoke and delete from all destinations; purge backing store
`verify`	`ref`	`{valid: bool, destinations[]}`	Check that the ref exists, is not expired, and all declared destinations are in sync
`list`	`filter`	`{refs[]}`	List refs matching filter; returns refs only, never values
`generate_and_deploy`	`kind`, `dst_refs[]`, `policy_ref`	`{ref, deployed_to[]}`	Mint + deploy in one call; common shorthand for new-secret flows

Tool call examples (what the LLM sees)¶

// Agent calls generate_and_deploy for a new Cloudflare token
{
  "tool": "generate_and_deploy",
  "args": {
    "kind": "api-key",
    "dst_refs": ["cf-worker:rig-conductor-api/CF_API_TOKEN", "bw:item/rig-cf-api-token"],
    "policy_ref": "policy:cloudflare-api-key-standard"
  }
}

// Broker returns — no plaintext
{
  "ref": "bw:item/rig-cf-api-token",
  "deployed_to": ["cf-worker:rig-conductor-api/CF_API_TOKEN", "bw:item/rig-cf-api-token"],
  "deployed_at": "2026-04-22T10:15:00Z"
}

The broker's logs record ref and operation, never value.

What the tool surface deliberately omits¶

read — no tool to retrieve a plaintext value. Backing stores expose their native fetch path directly to the consuming process (e.g., Kubernetes mounts the Secret as a volume; the agent pod reads the file). The broker is not in the read path.
patch — no partial update. Rotation replaces atomically.
impersonate — no tool to operate as a different principal. The broker's identity is fixed per deployment.

Policy model¶

Each ref is bound to a policy entry at creation time. Policies are stored in policy/secrets/<name>.yaml in the rig-gitops repo, version-controlled, and loaded into the broker at startup.

# policy/secrets/cloudflare-api-key-standard.yaml
apiVersion: secrets.rig.dashecorp.com/v1
kind: SecretPolicy
metadata:
  name: cloudflare-api-key-standard
spec:
  kind: api-key
  max_age_days: 90          # broker emits rotation alert after 90 days
  auto_rotate: true         # broker schedules rotation without agent prompt
  rotation_strategy: immediate  # no dual-write needed; Cloudflare invalidates old instantly
  rate_limit:
    max_rotations_per_day: 3    # prevent runaway rotation loops
  allowed_destinations:
    - cf-worker:*               # wildcards allowed within scheme
    - bw:item/*
  hardware_key_required: false  # software rotation is permitted

---
# policy/secrets/prod-tls-ca.yaml — hardware-backed example
apiVersion: secrets.rig.dashecorp.com/v1
kind: SecretPolicy
metadata:
  name: prod-tls-ca
spec:
  kind: tls-cert
  max_age_days: 365
  auto_rotate: false
  hardware_key_required: true   # broker REFUSES software rotation; emits HumanRequired event
  human_escalation_channel: "#admin"
  allowed_destinations:
    - k8s:cert-manager/*

Hardware-key override: when hardware_key_required: true, the broker: 1. Refuses any rotate or mint call for that ref. 2. Emits a HumanRequired event to rig-conductor, which routes to #admin. 3. Accepts a store call once the human has performed the hardware-backed issuance out-of-band.

Policy changes go through a PR; changes to policies covering T3 secrets require a human co-sign per trust-model.md.

Audit schema¶

Every broker operation is appended to an immutable audit log. No update or delete is possible on the log itself.

SQLite schema (primary, local to broker pod)¶

CREATE TABLE audit_log (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    ts          TEXT    NOT NULL,          -- ISO 8601, microseconds
    agent_id    TEXT    NOT NULL,          -- e.g. "dev-e"
    operation   TEXT    NOT NULL,          -- mint|store|deploy|rotate|retire|verify|list|generate_and_deploy
    ref         TEXT    NOT NULL,          -- the target ref (never the value)
    dst_refs    TEXT,                      -- JSON array for deploy/generate_and_deploy
    policy_ref  TEXT,
    outcome     TEXT    NOT NULL,          -- ok|rejected|escalated
    reject_reason TEXT,                    -- populated on rejected/escalated
    duration_ms INTEGER NOT NULL,
    CONSTRAINT no_update CHECK (TRUE)      -- enforced at application layer; SQLite has no DDL lock
) STRICT;

CREATE INDEX idx_audit_ref_ts ON audit_log (ref, ts);
CREATE INDEX idx_audit_agent_ts ON audit_log (agent_id, ts);

Append-only enforcement: the broker process opens the database with PRAGMA journal_mode=WAL and exposes no SQL connection to external callers. The schema has no DELETE or UPDATE code paths.

R2 mirror (durable, cross-region)¶

Every row is streamed to Cloudflare R2 in NDJSON format within 30 seconds of append. The R2 bucket has:

Object Lock (WORM) — objects are immutable for 7 years (configurable per compliance requirement).
Public access: disabled — audit reads require a signed URL issued by the broker's read-only audit endpoint.
Replication: standard R2 cross-region replication.

Mirror lag alert: if the broker's R2 flush lag exceeds 60 seconds, audit_mirror_lag_seconds Prometheus metric fires an alert and the broker continues but logs a warning. The SQLite log remains authoritative until mirror catches up.

Querying the audit log¶

-- All rotations for a ref in the last 30 days
SELECT ts, agent_id, outcome, reject_reason
FROM audit_log
WHERE ref = 'bw:item/prod-db-password'
  AND operation = 'rotate'
  AND ts > datetime('now', '-30 days')
ORDER BY ts DESC;

-- Rejected operations (policy violations)
SELECT ts, agent_id, ref, operation, reject_reason
FROM audit_log
WHERE outcome = 'rejected'
ORDER BY ts DESC
LIMIT 100;

User stories¶

US-1 — New deployment, new API key¶

As Dev-E provisioning a new Cloudflare Worker, I want to mint a new Cloudflare API token and deploy it to both the Worker and Bitwarden in one call, so that I never handle the token value and can hand the ref to the next deployment step.

Acceptance criteria: - generate_and_deploy(kind="api-key", dst_refs=["cf-worker:worker-name/CF_API_TOKEN", "bw:item/worker-name-cf-token"], policy_ref="policy:cloudflare-api-key-standard") succeeds. - Broker creates a token via Cloudflare API, stores it, and returns {ref, deployed_to} with no plaintext. - Audit log records the operation with outcome=ok. - The Worker can call its bound API using the new token within 5 seconds of the call returning.

US-2 — Scheduled rotation¶

As the rig's rotation scheduler, I want to rotate all secrets that exceed their max_age_days policy threshold, so that secret age never exceeds policy limits without human involvement.

Acceptance criteria: - A cron job calls list(filter={overdue_rotation: true}), then rotate(ref, strategy) for each returned ref. - Rotations complete for all auto_rotate: true secrets without agent prompting. - Secrets with hardware_key_required: true emit HumanRequired events instead of rotating. - Rate limit (max_rotations_per_day) is enforced: excess calls return {outcome: rejected, reject_reason: "rate_limit"}.

US-3 — Zero-downtime database password rotation¶

As the rig rotating the Postgres service-account password, I want the broker to use a dual-write strategy so no live connection is dropped, so that rig-conductor's connection pool continues without disruption.

Acceptance criteria: - rotate(ref="k8s:rig-conductor/prod-secrets#db-password", strategy="zero-downtime") executes the sequence: generate new value → deploy new value alongside old → wait for connection drain (configurable, default 30s) → retire old value. - No 5xx errors from rig-conductor during the rotation window. - Audit log records old_version ref alongside the new rotation event.

US-4 — Hardware-backed secret, human bootstrap¶

As a human operator provisioning the cluster's internal CA, I want to use my YubiKey to sign the CA key and deposit the result via the broker's out-of-band store endpoint, so that the broker tracks the cert lifecycle but the key material never passes through software-only paths.

Acceptance criteria: - store(ref="k8s:cert-manager/internal-ca#tls.key", policy_ref="policy:prod-tls-ca") accepts the human's deposit. - The broker verifies the calling principal is human (via OIDC, not agent identity) before accepting the store call. - Any subsequent agent call to rotate(ref=...) is rejected with outcome=escalated and reason hardware_key_required. - #admin receives a Discord notification with the ref and instructions.

US-5 — Secret retirement after service decommission¶

As Dev-E decommissioning a deprecated microservice, I want to retire all secrets associated with the service ref pattern, so that orphaned credentials cannot be abused after the service is removed.

Acceptance criteria: - list(filter={prefix: "k8s:legacy-service/"}) returns all refs for the service. - retire(ref) for each ref revokes the credential at the backing store (Cloudflare API, GitHub API, SOPS file update) and records retired_at in the audit log. - After retirement, verify(ref) returns {valid: false} for each retired ref. - Retired refs remain in the audit log permanently (they are never deleted).

US-6 — Cross-destination sync verification¶

As the rig's weekly integrity check job, I want to verify that all refs are present and in sync across all declared destinations, so that drift (e.g., a k8s secret manually overwritten) is surfaced before it causes an incident.

Acceptance criteria: - verify(ref) checks existence and hash-match across all destinations in the policy's allowed_destinations. - Mismatched destinations return {valid: false, destinations: [{ref, status: "drift"}]}. - Drift events are recorded in the audit log and fire a Prometheus alert secrets_destination_drift_total.

Implementation plan — 4 weeks¶

Week	Deliverable	Key tasks
W1	Core broker service	Go/Rust binary; ref parser; policy loader; in-memory operation dispatch; SQLite audit schema; unit tests for all 8 tools
W2	Backing store adapters	Bitwarden SDK adapter; GitHub Secrets API adapter; SOPS file adapter; Kubernetes Secret adapter; Cloudflare API adapter; integration tests per adapter
W3	Policy engine + R2 audit mirror	Policy YAML loader + validator; `hardware_key_required` escalation path; rate-limit enforcement; R2 NDJSON flush; WORM bucket config; `secrets_destination_drift_total` metric
W4	Rotation scheduler + hardening	Cron-triggered rotation loop; zero-downtime dual-write strategy; Cilium egress policy for broker pod; OIDC-based `store` human-principal verification; end-to-end smoke test; docs

Not in scope for v1: multi-cluster federation, secret sharing across Dashecorp org boundaries, dynamic Vault/OpenBao integration (see security.md trigger list for when that changes), and agent-to-agent secret delegation.

Deployment topology¶

graph LR
    AGENT[LLM Agent Pod\ndev-e namespace]
    BROKER[Secrets Broker Pod\nsecrets-broker namespace]
    SQLITE[(SQLite WAL\nPVC - local)]
    R2[(Cloudflare R2\nWORM bucket)]
    BW[Bitwarden]
    GH[GitHub API]
    SOPSREPO[SOPS git repo]
    K8SAPI[Kubernetes API]
    CF[Cloudflare API]

    AGENT -->|mTLS, ref only| BROKER
    BROKER --> SQLITE
    BROKER -->|async flush| R2
    BROKER -->|HTTPS| BW
    BROKER -->|HTTPS| GH
    BROKER -->|HTTPS| SOPSREPO
    BROKER -->|in-cluster| K8SAPI
    BROKER -->|HTTPS| CF

The broker pod runs in a dedicated secrets-broker namespace with its own Cilium egress policy covering only the five backing-store endpoints. Agent pods reach the broker via mTLS (cert-manager Certificate). No agent pod has direct egress to any backing store.

Residual risks (honest assessment)¶

The broker pattern significantly raises the bar but does not eliminate all risk.

Risk	Likelihood	Severity	Mitigation	Residual
Broker process memory scrape	Low	Critical	Run broker as non-root, no ptrace, pod security baseline; encrypt in-memory value buffers	Low — requires node compromise
Broker pod compromise via supply chain	Low	Critical	Broker image signed + SLSA L3; Kyverno admission required; see security.md	Low — in-depth chain
OIDC token replay for `store` endpoint	Medium	High	Short-lived Fulcio certs (10 min); Rekor log checked for replay	Low after mitigation
Policy misconfiguration (too-permissive `allowed_destinations`)	Medium	High	Policy PRs require human review; CI schema-validates policy YAML	Medium — human error in policy authoring
R2 mirror outage	Low	Medium	SQLite remains authoritative; broker continues; mirror lag alert fires; manual re-sync on recovery	Low — audit continuity preserved
Backing store API rate limits blocking rotation	Medium	Medium	Rate-limit policy per secret; rotation scheduler backs off exponentially	Low after mitigation
LLM hallucinates a valid-looking ref for a secret it shouldn't access	Medium	High	Ref registry validates every ref against policy on every call; hallucinated refs fail at validation	Low — gated by registry
Dual-write window for zero-downtime rotation	Low	Medium	Window is configurable; minimum 30s; connection draining is monitored	Low — narrow window

Structural gap: the broker cannot protect against a compromised backing store (e.g., a Bitwarden breach). This is out-of-scope for the broker layer and addressed by backing-store selection, credential isolation, and key hierarchies. See limitations.md for the rig's general stance on out-of-scope mitigations.

Relationship to existing security controls¶

The broker integrates with, not replaces, existing controls:

Existing control	How the broker uses it
Cilium L7 egress	Broker namespace has its own per-backing-store allowlist; agent pods have no direct backing-store egress
SOPS + age	Broker is the authorized mutator for SOPS files; no other process writes to encrypted manifests
Kyverno admission	Broker image must pass standard signed-image policy before admission
Gitsign on agent commits	Policy YAML changes committed by agents are signed; broker reloads policy only from verified git refs
Trust model tiers	Secret rotation of T3 secrets (auth, payment credentials) is `hardware_key_required: true` by policy
Audit log (observability.md)	Broker audit events feed the same observability stack; `secrets_rotation_total`, `secrets_drift_total` added to the cost dashboard