Agent Secrets Broker — Autonomous Secret Lifecycle for LLM Agents¶
TL;DR
LLM agents that handle secrets via prompt, tool argument, or log entry are compromised by design — prompt-injection, transcript storage, and log shipping all become exfiltration vectors. The secrets broker applies capability-based mediation (Hardy 1988; Miller 2006): the LLM is the planner that operates on opaque references (bw:item/prod-db-password); the broker is the courier that handles plaintext. The agent never sees the bytes. This document specifies the tool surface, destination grammar, policy model, audit schema, and a 4-week implementation plan. Does not duplicate security.md (supply-chain: Sigstore/SLSA/Kyverno) — covers the complementary runtime-lifecycle layer.
Motivation¶
Anthropic's April 2026 third-party-tool policy change metered subscription-OAuth usage, pushing more agentic work onto hybrid local/cloud inference where the rig cannot assume that transcript storage is under Dashecorp's control. Three failure modes drove this design:
| Failure mode | Mechanism | Risk |
|---|---|---|
| Prompt-in-plaintext | Agent receives API_KEY=sk-abc123 in tool output; key is logged to transcript |
Full compromise on transcript exfil |
| Tool-argument leak | deploy_secret(value="sk-abc123") appears in structured tool call log |
Log aggregation → attacker |
| Rotation paralysis | No agent-driven rotation path; secrets age indefinitely | Long exposure window on compromise |
The broker pattern eliminates all three: the agent requests operations by reference, the broker executes them against the backing store, and plaintext never crosses the LLM boundary.
Complementary scope: security.md covers the supply-chain layer (Sigstore, SLSA, Kyverno admission). This whitepaper covers the runtime lifecycle layer — what happens after a container is admitted and needs a secret to function.
The threat model¶
graph TB
classDef threat fill:#ffcccc,color:#000
classDef defense fill:#c8e6c9,color:#000
classDef neutral fill:#e3f2fd,color:#000
LLM[LLM Agent]:::neutral
BROKER[Secrets Broker]:::defense
BW[Bitwarden]:::neutral
GH[GitHub Secrets]:::neutral
SOPS[SOPS / age]:::neutral
K8S[Kubernetes Secrets]:::neutral
CF[Cloudflare Worker Secrets]:::neutral
T1[Prompt injection exfil]:::threat
T2[Transcript storage leak]:::threat
T3[Log line leak]:::threat
T4[Over-privileged rotation]:::threat
T5[Unaudited secret access]:::threat
T6[Hardware key bypass]:::threat
D1[Reference-only API — no plaintext crosses LLM boundary]:::defense
D2[Transcript-safe tool surface — values never in args or output]:::defense
D3[Structured log sanitisation — broker logs ref not value]:::defense
D4[Policy model — per-secret rotation scope + rate limits]:::defense
D5[Append-only audit log — SQLite + R2 mirror]:::defense
D6[hardware_key_required flag — policy blocks software-only rotation]:::defense
T1 -.->|blocked by| D1
T2 -.->|blocked by| D2
T3 -.->|blocked by| D3
T4 -.->|blocked by| D4
T5 -.->|blocked by| D5
T6 -.->|blocked by| D6
LLM -->|ref only| BROKER
BROKER -->|plaintext| BW
BROKER -->|plaintext| GH
BROKER -->|plaintext| SOPS
BROKER -->|plaintext| K8S
BROKER -->|plaintext| CF
The LLM → Broker boundary is the invariant: the arrow carries only references and operation names. Plaintext flows only within the broker process and onward to backing stores over authenticated, encrypted channels.
Secret-kind taxonomy¶
Not all secrets are equal. The broker distinguishes automatable secrets from human-bootstrap secrets:
| Kind | Examples | Automatable? | Notes |
|---|---|---|---|
api-key |
Anthropic, GitHub PAT, Cloudflare token | Yes (generate + deploy) | Provider must support programmatic issuance |
symmetric-key |
SOPS age recipient, AES-256 data key | Yes | generate_and_deploy flow |
db-password |
Postgres service account | Yes | Must rotate with zero-downtime (dual-write period) |
jwt-signing-key |
RS256/ES256 private key | Yes | Key rotation requires public-key republish |
tls-cert |
Cluster internal CA | Delegated to cert-manager | Broker tracks ref; cert-manager issues |
oauth-client-secret |
GitHub App, Google OAuth | Human-bootstrap | Provider issues interactively; agent stores result |
hardware-backed |
YubiKey PIV, HSM-resident | Human-bootstrap always | hardware_key_required: true policy flag; broker refuses software rotation |
biometric |
Touch ID, passkeys | Human-bootstrap always | Never enters the broker at any stage |
Automatable secrets complete the full mint → store → deploy → rotate → retire cycle without human intervention.
Human-bootstrap secrets require a human to perform initial issuance; the broker takes over for storage, deployment, and lifecycle tracking once the human has deposited the value via an authenticated, out-of-band channel (never via agent prompt).
Destination reference grammar¶
The broker uses a URI-like reference grammar for all secret locations. References are the only values that cross the LLM boundary.
| Scheme | Backing store | Example |
|---|---|---|
bw: |
Bitwarden (personal or org vault) | bw:item/prod-db-password |
gh: |
GitHub repository secret | gh:dashecorp/rig-conductor/PROD_DB_PASSWORD |
gh-env: |
GitHub environment secret | gh-env:dashecorp/rig-conductor/production/PROD_DB_PASSWORD |
sops: |
SOPS-encrypted file at path, key name | sops:apps/rig-conductor/secrets.sops.yaml#DB_PASSWORD |
k8s: |
Kubernetes Secret, namespace/name/key | k8s:rig-conductor/prod-secrets#db-password |
cf-worker: |
Cloudflare Worker secret | cf-worker:rig-conductor-api/DB_PASSWORD |
Refs are stable across rotations — the broker updates the backing store value; callers holding the ref do not need to change.
Resolution rules:
- Refs are validated against the policy registry on every operation.
- Unknown or malformed refs are rejected before any backing store call.
- Cross-destination copy (e.g., mint to bw:, deploy to gh: and k8s:) is a single atomic broker operation, not two agent calls.
Tool surface¶
The broker exposes eight tools to the LLM. None accept or return plaintext. All operations are synchronous unless noted.
| Tool | Args | Returns | Effect |
|---|---|---|---|
mint |
kind, ref, policy_ref |
{ref, created_at} |
Generate new secret value, store at ref, register policy |
store |
ref, policy_ref |
{ref, stored_at} |
Deposit a value the human has provided out-of-band; agent provides only the ref |
deploy |
src_ref, dst_refs[] |
{deployed_to[], skipped[]} |
Copy from source ref to one or more destinations |
rotate |
ref, strategy |
{ref, rotated_at, old_version} |
Generate new value, dual-write if strategy=zero-downtime, retire old |
retire |
ref |
{ref, retired_at} |
Revoke and delete from all destinations; purge backing store |
verify |
ref |
{valid: bool, destinations[]} |
Check that the ref exists, is not expired, and all declared destinations are in sync |
list |
filter |
{refs[]} |
List refs matching filter; returns refs only, never values |
generate_and_deploy |
kind, dst_refs[], policy_ref |
{ref, deployed_to[]} |
Mint + deploy in one call; common shorthand for new-secret flows |
Tool call examples (what the LLM sees)¶
// Agent calls generate_and_deploy for a new Cloudflare token
{
"tool": "generate_and_deploy",
"args": {
"kind": "api-key",
"dst_refs": ["cf-worker:rig-conductor-api/CF_API_TOKEN", "bw:item/rig-cf-api-token"],
"policy_ref": "policy:cloudflare-api-key-standard"
}
}
// Broker returns — no plaintext
{
"ref": "bw:item/rig-cf-api-token",
"deployed_to": ["cf-worker:rig-conductor-api/CF_API_TOKEN", "bw:item/rig-cf-api-token"],
"deployed_at": "2026-04-22T10:15:00Z"
}
The broker's logs record ref and operation, never value.
What the tool surface deliberately omits¶
read— no tool to retrieve a plaintext value. Backing stores expose their native fetch path directly to the consuming process (e.g., Kubernetes mounts the Secret as a volume; the agent pod reads the file). The broker is not in the read path.patch— no partial update. Rotation replaces atomically.impersonate— no tool to operate as a different principal. The broker's identity is fixed per deployment.
Policy model¶
Each ref is bound to a policy entry at creation time. Policies are stored in policy/secrets/<name>.yaml in the rig-gitops repo, version-controlled, and loaded into the broker at startup.
# policy/secrets/cloudflare-api-key-standard.yaml
apiVersion: secrets.rig.dashecorp.com/v1
kind: SecretPolicy
metadata:
name: cloudflare-api-key-standard
spec:
kind: api-key
max_age_days: 90 # broker emits rotation alert after 90 days
auto_rotate: true # broker schedules rotation without agent prompt
rotation_strategy: immediate # no dual-write needed; Cloudflare invalidates old instantly
rate_limit:
max_rotations_per_day: 3 # prevent runaway rotation loops
allowed_destinations:
- cf-worker:* # wildcards allowed within scheme
- bw:item/*
hardware_key_required: false # software rotation is permitted
---
# policy/secrets/prod-tls-ca.yaml — hardware-backed example
apiVersion: secrets.rig.dashecorp.com/v1
kind: SecretPolicy
metadata:
name: prod-tls-ca
spec:
kind: tls-cert
max_age_days: 365
auto_rotate: false
hardware_key_required: true # broker REFUSES software rotation; emits HumanRequired event
human_escalation_channel: "#admin"
allowed_destinations:
- k8s:cert-manager/*
Hardware-key override: when hardware_key_required: true, the broker:
1. Refuses any rotate or mint call for that ref.
2. Emits a HumanRequired event to rig-conductor, which routes to #admin.
3. Accepts a store call once the human has performed the hardware-backed issuance out-of-band.
Policy changes go through a PR; changes to policies covering T3 secrets require a human co-sign per trust-model.md.
Audit schema¶
Every broker operation is appended to an immutable audit log. No update or delete is possible on the log itself.
SQLite schema (primary, local to broker pod)¶
CREATE TABLE audit_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ts TEXT NOT NULL, -- ISO 8601, microseconds
agent_id TEXT NOT NULL, -- e.g. "dev-e"
operation TEXT NOT NULL, -- mint|store|deploy|rotate|retire|verify|list|generate_and_deploy
ref TEXT NOT NULL, -- the target ref (never the value)
dst_refs TEXT, -- JSON array for deploy/generate_and_deploy
policy_ref TEXT,
outcome TEXT NOT NULL, -- ok|rejected|escalated
reject_reason TEXT, -- populated on rejected/escalated
duration_ms INTEGER NOT NULL,
CONSTRAINT no_update CHECK (TRUE) -- enforced at application layer; SQLite has no DDL lock
) STRICT;
CREATE INDEX idx_audit_ref_ts ON audit_log (ref, ts);
CREATE INDEX idx_audit_agent_ts ON audit_log (agent_id, ts);
Append-only enforcement: the broker process opens the database with PRAGMA journal_mode=WAL and exposes no SQL connection to external callers. The schema has no DELETE or UPDATE code paths.
R2 mirror (durable, cross-region)¶
Every row is streamed to Cloudflare R2 in NDJSON format within 30 seconds of append. The R2 bucket has:
- Object Lock (WORM) — objects are immutable for 7 years (configurable per compliance requirement).
- Public access: disabled — audit reads require a signed URL issued by the broker's read-only audit endpoint.
- Replication: standard R2 cross-region replication.
Mirror lag alert: if the broker's R2 flush lag exceeds 60 seconds, audit_mirror_lag_seconds Prometheus metric fires an alert and the broker continues but logs a warning. The SQLite log remains authoritative until mirror catches up.
Querying the audit log¶
-- All rotations for a ref in the last 30 days
SELECT ts, agent_id, outcome, reject_reason
FROM audit_log
WHERE ref = 'bw:item/prod-db-password'
AND operation = 'rotate'
AND ts > datetime('now', '-30 days')
ORDER BY ts DESC;
-- Rejected operations (policy violations)
SELECT ts, agent_id, ref, operation, reject_reason
FROM audit_log
WHERE outcome = 'rejected'
ORDER BY ts DESC
LIMIT 100;
User stories¶
US-1 — New deployment, new API key¶
As Dev-E provisioning a new Cloudflare Worker, I want to mint a new Cloudflare API token and deploy it to both the Worker and Bitwarden in one call, so that I never handle the token value and can hand the ref to the next deployment step.
Acceptance criteria:
- generate_and_deploy(kind="api-key", dst_refs=["cf-worker:worker-name/CF_API_TOKEN", "bw:item/worker-name-cf-token"], policy_ref="policy:cloudflare-api-key-standard") succeeds.
- Broker creates a token via Cloudflare API, stores it, and returns {ref, deployed_to} with no plaintext.
- Audit log records the operation with outcome=ok.
- The Worker can call its bound API using the new token within 5 seconds of the call returning.
US-2 — Scheduled rotation¶
As the rig's rotation scheduler,
I want to rotate all secrets that exceed their max_age_days policy threshold,
so that secret age never exceeds policy limits without human involvement.
Acceptance criteria:
- A cron job calls list(filter={overdue_rotation: true}), then rotate(ref, strategy) for each returned ref.
- Rotations complete for all auto_rotate: true secrets without agent prompting.
- Secrets with hardware_key_required: true emit HumanRequired events instead of rotating.
- Rate limit (max_rotations_per_day) is enforced: excess calls return {outcome: rejected, reject_reason: "rate_limit"}.
US-3 — Zero-downtime database password rotation¶
As the rig rotating the Postgres service-account password, I want the broker to use a dual-write strategy so no live connection is dropped, so that rig-conductor's connection pool continues without disruption.
Acceptance criteria:
- rotate(ref="k8s:rig-conductor/prod-secrets#db-password", strategy="zero-downtime") executes the sequence: generate new value → deploy new value alongside old → wait for connection drain (configurable, default 30s) → retire old value.
- No 5xx errors from rig-conductor during the rotation window.
- Audit log records old_version ref alongside the new rotation event.
US-4 — Hardware-backed secret, human bootstrap¶
As a human operator provisioning the cluster's internal CA, I want to use my YubiKey to sign the CA key and deposit the result via the broker's out-of-band store endpoint, so that the broker tracks the cert lifecycle but the key material never passes through software-only paths.
Acceptance criteria:
- store(ref="k8s:cert-manager/internal-ca#tls.key", policy_ref="policy:prod-tls-ca") accepts the human's deposit.
- The broker verifies the calling principal is human (via OIDC, not agent identity) before accepting the store call.
- Any subsequent agent call to rotate(ref=...) is rejected with outcome=escalated and reason hardware_key_required.
- #admin receives a Discord notification with the ref and instructions.
US-5 — Secret retirement after service decommission¶
As Dev-E decommissioning a deprecated microservice, I want to retire all secrets associated with the service ref pattern, so that orphaned credentials cannot be abused after the service is removed.
Acceptance criteria:
- list(filter={prefix: "k8s:legacy-service/"}) returns all refs for the service.
- retire(ref) for each ref revokes the credential at the backing store (Cloudflare API, GitHub API, SOPS file update) and records retired_at in the audit log.
- After retirement, verify(ref) returns {valid: false} for each retired ref.
- Retired refs remain in the audit log permanently (they are never deleted).
US-6 — Cross-destination sync verification¶
As the rig's weekly integrity check job, I want to verify that all refs are present and in sync across all declared destinations, so that drift (e.g., a k8s secret manually overwritten) is surfaced before it causes an incident.
Acceptance criteria:
- verify(ref) checks existence and hash-match across all destinations in the policy's allowed_destinations.
- Mismatched destinations return {valid: false, destinations: [{ref, status: "drift"}]}.
- Drift events are recorded in the audit log and fire a Prometheus alert secrets_destination_drift_total.
Implementation plan — 4 weeks¶
| Week | Deliverable | Key tasks |
|---|---|---|
| W1 | Core broker service | Go/Rust binary; ref parser; policy loader; in-memory operation dispatch; SQLite audit schema; unit tests for all 8 tools |
| W2 | Backing store adapters | Bitwarden SDK adapter; GitHub Secrets API adapter; SOPS file adapter; Kubernetes Secret adapter; Cloudflare API adapter; integration tests per adapter |
| W3 | Policy engine + R2 audit mirror | Policy YAML loader + validator; hardware_key_required escalation path; rate-limit enforcement; R2 NDJSON flush; WORM bucket config; secrets_destination_drift_total metric |
| W4 | Rotation scheduler + hardening | Cron-triggered rotation loop; zero-downtime dual-write strategy; Cilium egress policy for broker pod; OIDC-based store human-principal verification; end-to-end smoke test; docs |
Not in scope for v1: multi-cluster federation, secret sharing across Dashecorp org boundaries, dynamic Vault/OpenBao integration (see security.md trigger list for when that changes), and agent-to-agent secret delegation.
Deployment topology¶
graph LR
AGENT[LLM Agent Pod\ndev-e namespace]
BROKER[Secrets Broker Pod\nsecrets-broker namespace]
SQLITE[(SQLite WAL\nPVC - local)]
R2[(Cloudflare R2\nWORM bucket)]
BW[Bitwarden]
GH[GitHub API]
SOPSREPO[SOPS git repo]
K8SAPI[Kubernetes API]
CF[Cloudflare API]
AGENT -->|mTLS, ref only| BROKER
BROKER --> SQLITE
BROKER -->|async flush| R2
BROKER -->|HTTPS| BW
BROKER -->|HTTPS| GH
BROKER -->|HTTPS| SOPSREPO
BROKER -->|in-cluster| K8SAPI
BROKER -->|HTTPS| CF
The broker pod runs in a dedicated secrets-broker namespace with its own Cilium egress policy covering only the five backing-store endpoints. Agent pods reach the broker via mTLS (cert-manager Certificate). No agent pod has direct egress to any backing store.
Residual risks (honest assessment)¶
The broker pattern significantly raises the bar but does not eliminate all risk.
| Risk | Likelihood | Severity | Mitigation | Residual |
|---|---|---|---|---|
| Broker process memory scrape | Low | Critical | Run broker as non-root, no ptrace, pod security baseline; encrypt in-memory value buffers | Low — requires node compromise |
| Broker pod compromise via supply chain | Low | Critical | Broker image signed + SLSA L3; Kyverno admission required; see security.md | Low — in-depth chain |
OIDC token replay for store endpoint |
Medium | High | Short-lived Fulcio certs (10 min); Rekor log checked for replay | Low after mitigation |
Policy misconfiguration (too-permissive allowed_destinations) |
Medium | High | Policy PRs require human review; CI schema-validates policy YAML | Medium — human error in policy authoring |
| R2 mirror outage | Low | Medium | SQLite remains authoritative; broker continues; mirror lag alert fires; manual re-sync on recovery | Low — audit continuity preserved |
| Backing store API rate limits blocking rotation | Medium | Medium | Rate-limit policy per secret; rotation scheduler backs off exponentially | Low after mitigation |
| LLM hallucinates a valid-looking ref for a secret it shouldn't access | Medium | High | Ref registry validates every ref against policy on every call; hallucinated refs fail at validation | Low — gated by registry |
| Dual-write window for zero-downtime rotation | Low | Medium | Window is configurable; minimum 30s; connection draining is monitored | Low — narrow window |
Structural gap: the broker cannot protect against a compromised backing store (e.g., a Bitwarden breach). This is out-of-scope for the broker layer and addressed by backing-store selection, credential isolation, and key hierarchies. See limitations.md for the rig's general stance on out-of-scope mitigations.
Relationship to existing security controls¶
The broker integrates with, not replaces, existing controls:
| Existing control | How the broker uses it |
|---|---|
| Cilium L7 egress | Broker namespace has its own per-backing-store allowlist; agent pods have no direct backing-store egress |
| SOPS + age | Broker is the authorized mutator for SOPS files; no other process writes to encrypted manifests |
| Kyverno admission | Broker image must pass standard signed-image policy before admission |
| Gitsign on agent commits | Policy YAML changes committed by agents are signed; broker reloads policy only from verified git refs |
| Trust model tiers | Secret rotation of T3 secrets (auth, payment credentials) is hardware_key_required: true by policy |
| Audit log (observability.md) | Broker audit events feed the same observability stack; secrets_rotation_total, secrets_drift_total added to the cost dashboard |
See also¶
- index.md — whitepaper master
- security.md — supply-chain layer (Sigstore, SLSA, Kyverno); complementary, not duplicated here
- trust-model.md — tier classification for T3 secret operations; hardware-key secrets are always T3
- safety.md — prompt-injection defenses that motivate the reference-only LLM boundary
- observability.md — how broker metrics surface in the cost dashboard
- limitations.md — what the broker does not cover (backing-store breaches, biometric secrets)
- docs/sops.md — SOPS operational mechanics; the broker delegates git-encrypted secret mutations here