SCP Security Architecture¶
Version: 0.1 (in progress — updated as each domain is implemented) Owner: Ohana Consulting LLC Audience: Internal engineering reference; basis for customer security review
Overview¶
SCP governs AI agent behavior in regulated environments. Its security model has six domains, implemented in order of deployment criticality:
| # | Domain | Status | Section |
|---|---|---|---|
| 1 | Audit Log Integrity | ✅ Complete | §1 |
| 2 | Agent API Key Lifecycle | ✅ Complete | §2 |
| 3 | Secrets Management | ✅ Complete | §3 |
| 4 | Transport Security (TLS) | ✅ Complete | §4 |
| 5 | Telemetry Beacon Security | ✅ Complete | §5 |
| 6 | License Key Enforcement | ✅ Complete | §6 |
| 7 | Admin API Authentication | ✅ Complete | §7 |
1. Audit Log Integrity¶
Purpose¶
The audit log is the foundation of the SCP governance claim. Every agent context request is recorded. If the log can be altered, the compliance story collapses. The log must be tamper-evident and independently verifiable.
Threat Model¶
| Threat | Mitigation |
|---|---|
| Direct SQL UPDATE/DELETE against audit table | Immutability trigger raises exception on any DML modification |
| Superuser bypassing trigger via DDL | RLS with FORCE — applies even to table owner for DML |
| Silent record insertion/deletion between records | Hash chain — any gap or reorder breaks the chain |
| Application bug writing empty/corrupt hashes | CHECK constraints reject empty record_hash / previous_hash |
Database Design¶
Table: agent_context_requests
Primary audit table. Records every context request — successful or rejected.
| Column | Type | Notes |
|---|---|---|
request_id |
TEXT PK | UUID per request |
agent_id |
TEXT FK | Requesting agent |
task_type |
TEXT | Intent requested |
request_params |
JSONB | Scoping parameters |
intent_valid |
BOOLEAN | Whether intent was allowed |
validation_errors |
JSONB | Populated if intent rejected |
context_bundle_ids |
JSONB | Bundles included in response |
context_scd_ids |
JSONB | SCDs included in response |
processing_time_ms |
INTEGER | Latency |
requested_at |
TIMESTAMPTZ | Wall clock at insert |
chain_sequence |
BIGSERIAL | Strict insertion order for chain traversal |
previous_hash |
TEXT | Hash of prior record (or genesis hash) |
record_hash |
TEXT | SHA256 of this record's content + previous_hash |
Constraints:
- CHECK (previous_hash != '') — empty string not permitted
- CHECK (record_hash != '') — empty string not permitted
- Unique index on chain_sequence
- Records that predate hash chain implementation carry sentinel value LEGACY_PRE_AUDIT
Table: audit_chain_genesis
Single row per deployment. Stores the genesis hash — the previous_hash value for the first audit record inserted after deployment initialization.
| Column | Type | Notes |
|---|---|---|
id |
SERIAL PK | |
genesis_hash |
TEXT | SHA256("scp-genesis:" + deployment_id + initialized_at) |
initialized_at |
TIMESTAMPTZ | Set once at deployment init |
Database Roles¶
| Role | Permissions | Used by |
|---|---|---|
scp_app |
INSERT on agent_context_requests; full access to agents, agent_api_keys |
Application service |
scp_audit |
SELECT on agent_context_requests, audit_chain_genesis |
Audit/verify endpoint, read-only queries |
Production note: The application must connect to PostgreSQL as
scp_app, notpostgres. Thepostgressuperuser role bypasses FORCE ROW LEVEL SECURITY for non-trigger operations.
Immutability Enforcement¶
Two independent layers:
Layer 1 — Trigger (audit_immutable)
BEFORE UPDATE OR DELETE ON agent_context_requests
→ RAISE EXCEPTION 'Audit records are immutable'
Layer 2 — Row Level Security
ALTER TABLE agent_context_requests FORCE ROW LEVEL SECURITY;
-- Policies: SELECT (all), INSERT (all)
-- No UPDATE or DELETE policy → denied
FORCE causes RLS to apply even to the table owner role. Only bypassed by pg_bypass_rls privilege (not granted to scp_app).
Both layers must be independently defeated for a modification to succeed.
Hash Chain Algorithm¶
Every new audit record is inserted with:
record_content = chain_sequence
|| request_id
|| agent_id
|| task_type
|| str(intent_valid)
|| requested_at.isoformat()
record_hash = SHA256(record_content + previous_hash)
Where previous_hash is:
- The record_hash of the record with chain_sequence = N-1 for all N > 1
- The genesis hash (from audit_chain_genesis) for the first record
Race condition prevention: Inserts acquire a PostgreSQL transaction-level advisory lock (pg_advisory_xact_lock) keyed to the audit table before reading the previous hash and inserting. This serializes concurrent writes without requiring SERIALIZABLE isolation on the entire connection.
Chain Verification (GET /audit/verify)¶
The verify endpoint walks the entire chain and recomputes every hash:
GET /audit/verify
Authorization: requires scp_audit role or admin API key
Response:
{
"status": "ok" | "tampered",
"record_count": 1042,
"legacy_record_count": 15, // records with LEGACY_PRE_AUDIT sentinel
"first_chained_sequence": 16,
"last_verified_at": "2026-04-18T14:00:00Z",
"tampered_at_sequence": null // populated if status == "tampered"
}
Algorithm:
1. Load genesis hash from audit_chain_genesis
2. Select all records with record_hash != 'LEGACY_PRE_AUDIT' ordered by chain_sequence ASC
3. For each record: recompute record_hash, compare with stored value
4. Return first sequence number where mismatch is detected, or "ok" if chain is intact
Files¶
| File | Purpose |
|---|---|
control-plane/database/schemas/agent_registry.sql |
Table definition, trigger, RLS, roles |
control-plane/database/migrations/001_audit_integrity.sql |
Migration for existing databases |
control-plane/api_server/routers/audit.py |
GET /audit/verify endpoint |
control-plane/shared/audit.py |
Hash computation, chain write, genesis init, chain verify |
control-plane/api_server/main.py |
Calls initialize_genesis() at startup |
control-plane/context_service/orchestrator.py |
Calls write_audit_record() on every context request |
2. Agent API Key Lifecycle¶
Purpose¶
Every agent authenticates with an API key. A compromised key with no revocation path is an unacceptable risk in a regulated deployment. Keys must expire, must be revocable immediately, and the authoritative status must be enforced on every request.
Threat Model¶
| Threat | Mitigation |
|---|---|
| Compromised key used indefinitely | Configurable expiry (default: 90 days); enforced on every auth check |
| No path to invalidate a leaked key | Explicit revocation: DELETE /api/agents/{id}/api-keys/{key_id} |
| Expired key still accepted if expiry not re-checked | Auth query filters on status = 'active'; expiry also checked in-process |
| Keys orphaned when agent is revoked | Agent revocation cascades to all active keys (status = 'revoked') |
Schema Changes¶
agent_api_keys additions (Migration 002):
| Column | Type | Notes |
|---|---|---|
status |
TEXT | active \| revoked \| expired — authoritative; CHECK constraint |
revoked_by |
TEXT | Actor who performed revocation, or system:expiry-job / system:agent-revoked |
is_active is retained for backward compatibility. status is authoritative — auth queries filter on status = 'active'.
Authentication Query¶
Every context request passes through the orchestrator's _authenticate method, which queries:
SELECT k.key_hash, k.expires_at, k.agent_id, a.status as agent_status
FROM agent_api_keys k
JOIN agents a ON k.agent_id = a.agent_id
WHERE k.key_prefix = $1 AND k.status = 'active'
After the DB check, expires_at is also validated in-process as a defense-in-depth layer against any DB-level inconsistency.
Key Lifecycle Events¶
| Event | Trigger | status after |
revoked_by |
|---|---|---|---|
| Issue | POST /api/agents/{id}/api-keys |
active |
— |
| Revoke | DELETE /api/agents/{id}/api-keys/{key_id} |
revoked |
caller-supplied revoked_by param |
| Agent revoked | DELETE /api/agents/{id} |
revoked |
system:agent-revoked |
| Expiry | Hourly background job | expired |
system:expiry-job |
The revoked_at timestamp and revoked_by columns on agent_api_keys are the audit record for key lifecycle events. Extension of the hash chain to cover key events is a v0.4 item.
Background Expiry Job¶
Started as an asyncio task at API server startup. Runs every 3600 seconds:
UPDATE agent_api_keys
SET is_active = FALSE, status = 'expired', revoked_by = 'system:expiry-job'
WHERE status = 'active'
AND expires_at IS NOT NULL
AND expires_at < NOW()
Logs count of keys expired per run.
Files¶
| File | Purpose |
|---|---|
control-plane/database/schemas/agent_registry.sql |
status, revoked_by columns; status index |
control-plane/database/migrations/002_api_key_lifecycle.sql |
Migration for existing databases |
control-plane/shared/models/agents.py |
APIKey model updated with status, revoked_by, revoked_at |
control-plane/api_server/routers/agents.py |
Key create/revoke/list; agent revoke cascade |
control-plane/context_service/orchestrator.py |
Auth query updated to status = 'active' |
control-plane/api_server/main.py |
Hourly expiry background job |
3. Secrets Management¶
Purpose¶
SCP requires a database password and a JWT signing secret. These must not be stored in source control, config files, or Docker images. The application must fail loudly at startup if secrets are missing or weak — silent degradation to an insecure default is not acceptable.
Threat Model¶
| Threat | Mitigation |
|---|---|
| Insecure default accepted silently | jwt_secret validator rejects known-weak strings and enforces minimum 32 chars |
| Empty password accepted | db_password validator rejects empty string |
| Weak DB password in production | Validator emits warning if db_password == 'postgres' |
| JWT_SECRET missing from container environment | docker-compose uses :? syntax — compose startup fails with a clear error if not set |
| Secrets in source control | .env in .gitignore; verified never committed |
Secret Requirements¶
| Secret | Minimum | How to generate |
|---|---|---|
JWT_SECRET |
32 chars, not a known-weak value | python -c "import secrets; print(secrets.token_hex(32))" |
DB_PASSWORD |
Non-empty; 24+ chars recommended for production | python -c "import secrets; print(secrets.token_urlsafe(24))" |
DEPLOYMENT_ID |
UUID, stable per deployment | python -c "import uuid; print(uuid.uuid4())" |
Startup Validation¶
settings.py uses Pydantic field_validator decorators:
jwt_secret_strength: rejects known-weak strings (change-me-in-production, etc.) and values under 32 characters. Startup raisesValidationErrorwith a generation hint.db_password_production_check: rejects empty string; emitswarnings.warnif value is'postgres'.
Both fields have no default — pydantic-settings raises ValidationError at startup if they are absent from the environment.
Docker Compose¶
All services use the :? expansion syntax for required secrets:
DB_PASSWORD: ${DB_PASSWORD:?DB_PASSWORD must be set in .env}
JWT_SECRET: ${JWT_SECRET:?JWT_SECRET must be set in .env}
docker compose up fails immediately with a clear message if these are not set in the shell environment or .env file.
Production Secret Injection¶
| Environment | Mechanism |
|---|---|
| Azure | Azure Key Vault via managed identity |
| AWS (Marketplace) | SSM Parameter Store (documented in docs/deployment/aws-setup.md) |
| Local / dev | .env file — explicitly excluded from git |
Local .env Update Required¶
After this change, the existing local .env value JWT_SECRET=change-me-in-production-please-use-strong-secret will be rejected at startup. Generate a new value:
python -c "import secrets; print(secrets.token_hex(32))"
Update control-plane/.env with the output before starting the server.
Files¶
| File | Purpose |
|---|---|
control-plane/shared/config/settings.py |
Validators, no-default fields |
control-plane/.env.example |
Updated with generation hints, removed stale vars |
control-plane/docker-compose.yaml |
:? required-env syntax on all services |
4. Transport Security¶
Purpose¶
All traffic between agents and SCP must be encrypted in transit. Plaintext connections to any external-facing endpoint are not acceptable in a regulated deployment.
Threat Model¶
| Threat | Mitigation |
|---|---|
| API traffic intercepted in transit | Caddy terminates TLS on port 443; all agent traffic goes HTTPS |
| Context webhook traffic intercepted | Caddy proxies context:8003 on port 8443 with TLS |
| Direct plaintext access to app ports | Firewall/security group blocks 8000–8004 from external networks in production |
| Postgres connection unencrypted | DB_SSL_MODE=require enforced via asyncpg ssl=True |
Architecture¶
Agent → HTTPS :443 → Caddy → HTTP api:8000 (internal Docker network)
Agent → HTTPS :8443 → Caddy → HTTP context:8003 (internal Docker network)
TLS termination is at Caddy. Internal service-to-service traffic stays on the scp-network Docker bridge and does not leave the host.
gRPC (port 8002) is proxied at the load balancer layer in production (AWS ALB with TLS termination, as configured in the CloudFormation stack). It is not routed through Caddy in the current setup.
Caddy Configuration¶
Dev (CADDY_DOMAIN=localhost):
- tls internal — Caddy generates a self-signed CA and cert on first run
- Run caddy trust once to install the CA in the system trust store
- Cert is stored in the caddy_data Docker volume (persists across restarts)
Production (CADDY_DOMAIN=scp.your-domain.com):
- Remove tls internal from Caddyfile — Caddy provisions a Let's Encrypt cert via ACME automatically
- Requires port 80 accessible for HTTP-01 challenge (or configure DNS-01 for internal deployments)
- Cert renewal is automatic
Database SSL¶
DB_SSL_MODE setting controls asyncpg's ssl parameter:
DB_SSL_MODE |
asyncpg ssl |
Use case |
|---|---|---|
disable |
False |
Local dev / docker-compose without server certs |
require |
True |
Any network-exposed deployment |
For production, the PostgreSQL server must have SSL enabled (configured at the infrastructure level — RDS, or manually on EC2 with ssl = on in postgresql.conf).
Redis¶
Redis TLS is not enabled in the docker-compose setup. For production:
- Use a Redis instance with TLS enabled (ElastiCache with TLS, or Redis with tls-cert-file configured)
- Update REDIS_HOST and REDIS_PORT to point to the TLS endpoint
- asyncio-redis connections will need the ssl=True parameter (context service change required)
This is documented as a production-deployment step rather than a docker-compose change.
Environment Variables¶
| Variable | Default | Production |
|---|---|---|
DB_SSL_MODE |
disable |
require |
CADDY_DOMAIN |
localhost |
your real hostname |
CADDY_CONTEXT_PORT |
8443 |
8443 (or as appropriate) |
Files¶
| File | Purpose |
|---|---|
control-plane/Caddyfile |
Caddy config — TLS termination and reverse proxy |
control-plane/docker-compose.yaml |
Caddy service, caddy_data/caddy_config volumes |
control-plane/shared/config/settings.py |
db_ssl_mode setting |
control-plane/database/connection.py |
Passes ssl=True/False to asyncpg based on db_ssl_mode |
control-plane/.env.example |
DB_SSL_MODE, CADDY_DOMAIN, CADDY_CONTEXT_PORT documented |
5. Telemetry Beacon Security¶
Purpose¶
SCP phones home daily to report agent counts for usage-based billing. The beacon payload must be: - Minimal — no PII, no agent names, no SCD content - Fixed — the exact set of transmitted fields is defined in code and inspectable by customers - Authenticated — the vendor can verify the payload came from a legitimate deployment
Threat Model¶
| Threat | Mitigation |
|---|---|
| Telemetry transmitting PII or sensitive data | BeaconPayload is a fixed Pydantic model — only 6 fields, all non-PII |
| Customer unable to audit what is sent | GET /telemetry/log exposes the exact payload for every report |
| Vendor receives spoofed or tampered beacons | HMAC-SHA256 signature on canonical payload JSON (X-SCP-Signature header) |
| License key transmitted in plaintext | Only SHA256(license_key) is sent — not the key itself |
Beacon Payload¶
The BeaconPayload model defines the complete set of transmitted fields. Nothing else is sent.
| Field | Type | Notes |
|---|---|---|
deployment_id |
string | UUID — no PII |
license_key_hash |
string | SHA256(license_key) — not the key |
agent_count |
int | Total registered agents |
active_agent_count |
int | Active agents |
reported_at |
string | ISO-8601 UTC timestamp |
scp_version |
string | SCP version string |
Payload Signing¶
If TELEMETRY_SIGNING_SECRET is configured:
- Serialize
BeaconPayloadto canonical JSON (sort_keys=True, compact separators) - Compute
HMAC-SHA256(signing_secret, canonical_json) - Attach as
X-SCP-Signature: sha256=<hex>request header
The signing secret must match the key registered with Ohana at license activation. Without a configured secret, reports are sent unsigned (beacon is still sent; vendor accepts but cannot verify authenticity).
Customer Inspection (GET /telemetry/log)¶
GET /telemetry/log?limit=30
Returns the last N beacon log entries — one per daily report — showing:
- The exact BeaconPayload that was (or will be) transmitted
- Whether the report was signed
- Transmission status and any error messages
Customers can verify at any time that no unexpected data leaves the deployment.
Files¶
| File | Purpose |
|---|---|
control-plane/shared/models/telemetry.py |
BeaconPayload, TelemetryLogEntry models |
control-plane/shared/config/settings.py |
telemetry_signing_secret, license_key settings |
control-plane/telemetry_service/reporter.py |
_build_beacon(), _sign_payload(), signed HTTP send |
control-plane/telemetry_service/main.py |
GET /telemetry/log endpoint |
control-plane/.env.example |
TELEMETRY_SIGNING_SECRET, LICENSE_KEY documented |
6. License Key Enforcement¶
Purpose¶
Agent count limits are only meaningful if they can't be bypassed. License keys encode the customer's entitlements and SCP enforces them — at startup and on every agent registration attempt.
Threat Model¶
| Threat | Mitigation |
|---|---|
| Customer exceeds licensed agent count | Cap enforced on POST /api/agents — returns HTTP 402 at cap |
| License key forged or tampered | RS256 signature verified against embedded Ohana public key |
| SCP started with expired license | Startup aborted with clear error message |
| Key rotating without redeployment | LICENSE_PUBLIC_KEY setting overrides embedded key |
License Key Format¶
License keys are RS256-signed JWTs issued by Ohana at customer activation. SCP holds the Ohana public key (embedded at build time) to verify signatures — the private key never leaves Ohana.
JWT Claims:
| Claim | Type | Description |
|---|---|---|
sub |
string | Customer identifier |
deployment_id |
string | Licensed deployment UUID |
agent_cap |
int | Max non-revoked agents (0 = unlimited) |
tier |
string | "standard" | "regulated" |
iss |
string | Must be "ohana-scp" |
iat |
int | Issued-at (Unix timestamp) |
exp |
int | Expiry (Unix timestamp) |
Startup Validation¶
On every API server startup:
- If
LICENSE_KEYis not set: log warning, continue in uncapped mode (dev only) - If set: validate RS256 signature against
LICENSE_PUBLIC_KEYor embedded key - Verify
iss == "ohana-scp"and required claims are present - If expired: abort startup with a clear error message
- If expiring in < 30 days: log warning each startup, continue
Validated claims are stored in a process-level singleton (get_license_claims()).
Agent Cap Enforcement¶
Every POST /api/agents request:
- Reads
agent_capfrom the cachedLicenseClaims(zero overhead — no DB query for the license itself) - If
agent_cap > 0: counts current non-revoked agents - If
current_count >= agent_cap: returns HTTP 402 with a clear upgrade message
Revoked agents do not count toward the cap — agents that are decommissioned free up capacity.
Files¶
| File | Purpose |
|---|---|
control-plane/shared/license.py |
validate_license(), check_license_at_startup(), LicenseClaims, cap singleton |
control-plane/shared/config/settings.py |
license_key, license_public_key settings |
control-plane/api_server/main.py |
License validation at startup, set_license_claims() |
control-plane/api_server/routers/agents.py |
Agent cap enforcement in create_agent |
control-plane/.env.example |
LICENSE_KEY, LICENSE_PUBLIC_KEY documented |
7. Admin API Authentication¶
Purpose¶
Bundle management, agent registration, and audit admin reads must be authenticated. An unauthenticated admin API allows any network-adjacent actor to register rogue agents, publish arbitrary bundles, revoke legitimate credentials, or read the full audit log.
Threat Model¶
| Threat | Mitigation |
|---|---|
| Unauthenticated agent registration (rogue agent injection) | All POST /api/agents requests require valid X-Admin-Key |
| Unauthenticated bundle publish (malicious governance context) | All bundle registry writes require valid X-Admin-Key |
| Audit log exfiltration via stream or verify endpoints | /audit/requests, /audit/stream, /audit/verify, /audit/chain/{id} require X-Admin-Key |
| Missing or unconfigured admin key silently open | Server returns HTTP 503 if ADMIN_API_KEY is not set — no silent open mode |
| Timing attack on key comparison | secrets.compare_digest used for all comparisons |
Scope¶
Admin-gated endpoints (require X-Admin-Key):
- All bundle registry operations (/api/bundles/*)
- All agent registry operations (/api/agents/*, including API key management)
- Audit admin reads: GET /api/audit/verify, /api/audit/requests, /api/audit/stream, /api/audit/chain/{id}
Agent-gated endpoints (require X-API-Key, unchanged):
- Context requests: POST /api/context
- Output logging: POST /api/audit/output, GET /api/audit/output/{id}
Implementation¶
The require_admin_key FastAPI dependency (api_server/dependencies.py) is applied at the router level for agents.py and bundles.py, and as an explicit dependency on the four audit admin endpoints. Every request to a gated endpoint triggers the dependency before any handler logic runs.
async def require_admin_key(
x_admin_key: Annotated[str, Header()] = "",
) -> None:
settings = get_settings()
if not settings.admin_api_key:
raise HTTPException(503, "Admin API not configured")
if not secrets.compare_digest(x_admin_key, settings.admin_api_key):
raise HTTPException(401, "Invalid admin API key")
Key Requirements¶
| Setting | Requirement | How to generate |
|---|---|---|
ADMIN_API_KEY |
Non-empty; 24+ random bytes recommended | python -c "import secrets; print('admin_' + secrets.token_hex(24))" |
Production Secret Injection¶
| Environment | Mechanism |
|---|---|
| Azure | Azure Key Vault via managed identity |
| AWS (Marketplace) | SSM Parameter Store |
| Local / dev | .env file — excluded from git |
Files¶
| File | Purpose |
|---|---|
control-plane/api_server/dependencies.py |
require_admin_key dependency |
control-plane/api_server/routers/agents.py |
router = APIRouter(dependencies=[Depends(require_admin_key)]) |
control-plane/api_server/routers/bundles.py |
Same |
control-plane/api_server/routers/audit.py |
Explicit dependency on four admin endpoints |
control-plane/shared/config/settings.py |
admin_api_key setting |
control-plane/.env.pilot.template |
ADMIN_API_KEY documented as [MUST CHANGE] |
Appendix: Migration History¶
| Migration | Description | Date |
|---|---|---|
001_audit_integrity.sql |
Hash chain, immutability trigger, RLS, roles | 2026-04-18 |
002_api_key_lifecycle.sql |
API key status column (active/revoked/expired), revoked_by |
2026-04-18 |
003_output_logging.sql |
agent_output_logs table — output-side audit trail linked to context requests |
2026-04-18 |
004_chain_correlation.sql |
chain_id and parent_request_id on agent_context_requests — multi-agent chain tracing |
2026-04-18 |