PRD 5 of 8

Agent Governance
& Trust Framework

Runtime oversight for autonomous agents — YAML identity manifests with trust levels 1-5, 3-tier tool classification, append-only audit logging, constitutional contracts with monotonically decreasing privileges, memory governance, and trust-mediated delegation with 6 sequential gates.

Governance Framework Architecture

1. Problem Statement

Autonomous AI agents can read files, write code, execute commands, access external services, and delegate to other agents. Without runtime governance, there's no enforcement boundary between what an agent can do and what it should do. "Trust but verify" doesn't work when agents operate faster than humans can review.

The governance framework provides enforcement at the speed of execution — policy evaluated in milliseconds, not minutes. It's implemented in two layers: the governance plugin (Python, SQLite audit bus, policy engine, trust broker) and 7 governance MCP tools in the memory system (constitutional contracts, compliance dashboards, guardrail proofs, data sovereignty).

2. Architecture Overview

Identity Manifests

Who is this agent? Trust 1-5, data classification, permitted tools/delegations

Policy Engine

3-tier tool classification × conductor tier matrix

Audit Bus

SQLite WAL, 19+ event types, sync/async emission

Memory Governor

Content classification, ceiling enforcement, provenance

Trust Broker

6-gate delegation mediation, delegation tokens

Constitutional Layer

7 MCP tools for contracts, monitoring, proofs, sovereignty

SessionStart → Load manifests, init audit session, register agents
PreToolUse  → Policy engine evaluates tool call against manifest + constitutional observer checks drift
PostToolUse → Audit bus logs outcome, memory governor checks content classification
Stop        → Deregister agents, flush constitutional assessments, archive session

3. Key Components

3.1 Agent Identity Manifests (YAML)

agent_id: code-reviewer
trust_level: 3              # 1 (minimal) to 5 (full)
data_classification: internal  # public | internal | confidential | restricted
permitted_tools: [Read, Glob, Grep, Bash]
permitted_delegations: ["qa-*", "builder"]  # fnmatch patterns
max_autonomy_depth: 2
max_delegation_count: 5
human_required: false

4-tier resolution: Static YAML + parent → ceiling enforcement (intersection); static only → authoritative; parent only → derived restrictive (trust-1); neither → default restrictive (trust=1, no tools, human_required=true). SHA-256 hashed for tamper evidence.

3.2 Tool Classification Tiers

How Governance Actually Feels in Practice

Most daily work: zero approval prompts, <200ms total overhead. Human approval only when elevated tool + MAJOR tier.

TierToolsOverhead
ExemptRead, Glob, Grep, TaskList, TaskGet, AskUserQuestionZero (async log only)
StandardEdit, Write, Task, Bash, WebFetch, WebSearch~50ms auto-allow + audit
Elevatedmemory_store, memory_forget, NotebookEdit, all MCP tools~100ms or human gate at MAJOR
Conductor TierExemptStandardElevated
TRIVIALAllowAllowAllow
MINORAllowAllowAllow
STANDARDAllowAllow (audited)Allow (audited)
MAJORAllowAllow (audited)Human Gate

3.3 Constitutional Layer (7 MCP Tools)

Beyond the governance plugin's policy engine, 7 MCP tools in the memory system add runtime constitutional governance:

The constitutional observer hook (PreToolUse, <2s, no LLM) checks every tool call for drift and feeds assessments to the compliance layer.

3.4 Audit Bus

SQLite WAL, 19+ event types, hybrid sync/async emission. Sync for denials/threats (with alerting), async for allows (bounded queue, 256 max, daemon worker). JSON-lines buffer fallback. 90-day retention with JSONL archival.

3.5 Trust Broker — 6 Sequential Gates

#GateCheck
1Breadthmax_delegation_count not exceeded?
2Depthmax_autonomy_depth > 0?
3ClassificationTarget classification ≤ source?
4TrustTarget trust ≤ source? (no escalation)
5Targetsfnmatch pattern match on permitted_delegations?
6RegistrationManifest registered in TTL-3600s session registry

3.6 Administrative Commands

/governance-audit

Query events, export JSONL

/governance-status

Health dashboard + metrics

/governance-review

Approve/deny confidential writes

4. Requirements

REQ-GOV-001 YAML agent identity manifests with SHA-256 integrity, trust_level 1-5, data_classification, permitted_tools, permitted_delegations, max_autonomy_depth, max_delegation_count, human_required.
REQ-GOV-002 4-tier manifest resolution with parent ceiling enforcement (intersection, never escalation).
REQ-GOV-003 3-tier tool risk classification (exempt/standard/elevated) × conductor tier matrix. Human gate only at MAJOR + elevated.
REQ-GOV-004 Unknown tools default to elevated tier (fail toward scrutiny).
REQ-GOV-005 Append-only audit bus: SQLite WAL, 19+ event types, sync for denials, async for allows, JSON-lines buffer fallback.
REQ-GOV-006 Constitutional contracts with monotonically decreasing privileges in delegation chains.
REQ-GOV-007 Constitutional observer (PreToolUse, <2s) checking every tool call for scope/target/destructive drift.
REQ-GOV-008 Content classification via regex: restricted (hard block), confidential (queue), internal, public. Ceiling enforcement on memory writes.
REQ-GOV-009 Provenance tagging: source agent_id + manifest_hash on every memory write.
REQ-GOV-010 6-gate delegation mediation with 24-char hex delegation tokens for forensic tracing.
REQ-GOV-011 ManifestRegistry with file locking and TTL-3600s purging.
REQ-GOV-012 Cryptographic guardrail proofs: Ed25519 signatures with Merkle batching.
REQ-GOV-013 Data sovereignty: 8 jurisdictions, GDPR cascading deletion, jurisdiction-filtered recall.
REQ-GOV-014 Multi-framework compliance: ISO 42001, EU AI Act, OWASP Agentic Top 10 with scoring dashboards.
REQ-GOV-015 3 administrative commands (/governance-audit, /governance-status, /governance-review).
REQ-GOV-016 90-day event retention with JSONL archival before purge.
REQ-GOV-017 Governance hooks wired via plugin system (SessionStart, PreToolUse, PostToolUse) for runtime enforcement.
REQ-GOV-018 Typical workflow overhead: <200ms total governance, zero approval prompts for TRIVIAL/MINOR/STANDARD tiers.

5. Prompt to Build It

Build an agent governance framework for Claude Code:

1. IDENTITY MANIFESTS (YAML): agent_id, trust_level 1-5,
   data_classification, permitted_tools/delegations, autonomy depth/count,
   SHA-256 hashing, 4-tier resolution with parent ceiling

2. POLICY ENGINE: 3-tier tool classification (exempt/standard/elevated)
   × conductor tier matrix. Human gate only at MAJOR+elevated.
   Unknown tools → elevated. Typical overhead: <200ms, zero prompts.

3. AUDIT BUS: SQLite WAL, 19+ event types, sync for denials,
   async for allows (bounded queue 256), JSON-lines buffer fallback,
   90-day retention + JSONL archival

4. CONSTITUTIONAL LAYER (7 MCP tools):
   - constitutional_contract: monotonically decreasing delegation privileges
   - constitutional_monitor: real-time drift detection
   - guardrail_proof: Ed25519 + Merkle attestation
   - data_sovereignty: jurisdiction tagging, GDPR cascading delete
   - governance_report/gap_analysis: ISO 42001 evidence
   - compliance_dashboard: ISO 42001 + EU AI Act + OWASP scoring
   Plus constitutional_observer hook (<2s PreToolUse, no LLM)

5. MEMORY GOVERNOR: regex content classification
   (restricted=block, confidential=queue), ceiling enforcement,
   provenance tagging (agent_id + manifest_hash)

6. TRUST BROKER: 6 sequential gates (breadth, depth, classification,
   trust, targets, registration), 24-char hex delegation tokens,
   ManifestRegistry with file locking + TTL purge

7. COMMANDS: /governance-audit, /governance-status, /governance-review

Build as Claude Code plugin (Python) + 7 MCP tools in memory server.

6. Design Decisions

Two Governance Layers

The governance plugin (Python, SQLite) handles runtime policy enforcement. The memory system's 7 MCP tools handle constitutional contracts, compliance, and cryptographic proofs. Neither alone is sufficient — together they cover both enforcement and evidence.

3-Tier over Binary Allow/Deny

Exempt tools add zero overhead. Standard tools add imperceptible ~50ms. Only elevated tools trigger full evaluation. This keeps governance invisible for routine work while maintaining scrutiny where it matters.

Constitutional Observer over Post-Hoc

Real-time drift detection on every tool call (<2s, no LLM) catches problems as they happen. Buffered flags flush to vector storage at session end for trend analysis. The compliance dashboard reads these for scoring.

Parent Ceiling over Additive

Child agents inherit the intersection of their permissions and parent permissions. A trust-3 parent cannot create a trust-5 child. This is mathematically guaranteed to prevent privilege escalation through delegation chains.

7. Integration Points

→ Plugin Ecosystem

Governance operates as a plugin using SessionStart (manifest loading), PreToolUse (policy enforcement), PostToolUse (audit logging). The two-layer hook system ensures reliable execution.

→ Multi-Agent Orchestration

Conductor's tier classification feeds the policy engine's tier matrix. NHI lifecycle events emit to the audit bus. The state schema's governance block links manifests to workflows.

→ Memory System

7 governance MCP tools operate within the memory server. Memory governor controls what enters vector storage. Constitutional assessments stored in dedicated Qdrant collection.

→ Context Guard

Context pressure events become CONTEXT_PRESSURE audit entries. Constitutional observer flags decisions made under context pressure for governance review.