PRD 4 of 8

Persistent Vector
Memory Architecture

60 MCP tools across 14 categories. 7+ Qdrant collections with tiered lifecycle. 15 n8n workflows for brain-inspired consolidation. A two-layer hook architecture with tool facades, constitutional alignment, and heuristic recording. Local Ollama embeddings. Zero cloud dependency.

60
MCP Tools
15
n8n Workflows
7+
Collections
2
Hook Layers
Persistent Vector Memory Architecture

1. Problem Statement

AI coding agents lose all context between sessions. Every new conversation starts from zero — no memory of past decisions, solved problems, learned patterns, infrastructure knowledge, or operational procedures. This makes agents structurally incapable of improvement.

But memory isn't just storage. Raw session data needs consolidation into knowledge. Knowledge needs to decay when outdated. Contradictions need detection and resolution. Agent identities need provenance tracking. Sensitive content needs classification and access control. Tool usage patterns need to reinforce successful workflows and let failed patterns fade. And all of this needs to happen automatically.

The deeper architectural challenge is that Claude Code reads hooks from settings.json, not from plugin manifests. Hooks defined only in plugin.json don't fire for PostToolUse or Stop events. This means the hook architecture must be two-layer: settings.json for the hooks CC actually executes, and plugin hooks.json for the plugin lifecycle management that CC does read.

2. Architecture Overview

Hot Path vs Cold Path

Two Complementary Data Paths

The hot path handles real-time session interaction via MCP tools. The cold path handles background maintenance via n8n. They never block each other. The hot path is fail-open — if Qdrant is down, sessions continue with flat-file memory.

Hot Path (Real-Time)

Trigger: Claude calls MCP tools (60 available)

Flow: MCP Server (Node.js) → Ollama (nomic-embed-text, 768-dim) → Qdrant

Latency: <500ms p95 recall, <1s p95 store

Also: Tool Facade scripts intercept Grep/Glob and serve memory content directly — Claude doesn't need to "decide" to check memory

Cold Path (Maintenance)

Trigger: 15 scheduled n8n workflows

Flow: n8n → Qdrant API → batch operations (2 use LLM inference)

Cadence: Every 2h (extraction), 6h (compaction), daily (consolidation, TTL, tier transfer, decay), weekly (pruning, abstraction, permissions), monthly (governance)

Two-Layer Hook Architecture

Critical Implementation Detail

Claude Code only reads hooks from settings.json for PostToolUse and Stop events. Plugin hooks.json works for SessionStart and PreToolUse but not reliably for all event types. The solution: wire critical hooks in both places.

Layer 1: settings.json (CC reads this)

  • SessionStart: load-project-memory.sh (Qdrant auto-recall)
  • UserPromptSubmit: prompt-memory-recall.sh (embed prompt → Qdrant search → inject context)
  • PreToolUse: memory-first-gate.sh (Tool Facade on Grep/Glob), constitutional_observer.py (all tools)
  • PostToolUse: tool_chain_tracker.py (all tools), world_model_observer.py (Bash only)
  • Stop: task_outcome.py (heuristics), flush_insights.py (stigmergy + constitutional assessments)
  • SessionEnd: capture-session.py (transcript archival), session-daily-note.sh (Obsidian breadcrumb)

Layer 2: plugin hooks.json

  • SessionStart: session_start.py (self-assessment + constitutional objective setup)
  • UserPromptSubmit: user_prompt_capture.py
  • PreToolUse: pre_store.py (on memory_store), constitutional_observer.py
  • PostToolUse: post_tool_failure.py, tool_chain_tracker.py, world_model_observer.py, auto_linker.py
  • PreCompact: pre_compact.py (state preservation)
  • Stop: assistant_response_capture.py, task_outcome.py, flush_insights.py

Container Infrastructure

ServiceImagePortPurpose
Qdrantqdrant/qdrant:latest6334Vector database — 7+ collections, API key auth, persistent volumes
OllamaNative binary11434nomic-embed-text (768-dim), llama3.3:70b (reasoning), qwen3.5:35b (MoE)
n8nn8nio/n8n:latest567915 workflow automations, PostgreSQL backend, Anthropic API for LLM workflows
PostgreSQLpostgres:16-alpine5436n8n state, structured data, health-checked with pg_isready
MCP ServerNode.js process60 tools via MCP protocol, Qdrant + Ollama backends
SupabaseStudio + servicesvariousBackend-as-a-Service with auto-reconnect health checks

Memory Collections

CollectionPurposeRetention
claude_memoriesLong-term persistent knowledgePermanent (protected)
short_term_memoryCurrent session contextTTL-based decay
working_memoryActive task scratch space60-min TTL default
learningsDomain knowledge patternsProtected (never pruned)
proceduresReusable step-by-step workflowsProtected (never pruned)
trajectoriesTool call sequences for few-shotDecay-based
episodesFull task execution recordsConsolidation-eligible
heuristicsTask outcome metrics from task_outcome.pyFeeds self-assessment
pheromone_trailsStigmergy — successful tool chainsDaily decay + evaporation
causal_analysisFailure→fix patternsLong-term
constitutional_assessmentsAlignment drift observationsSession-scoped, flushed at Stop

3. Key Components

3.1 MCP Tool Categories (60 Tools)

Core Memory CRUD (6)

memory_store (temporal classes: permanent/decaying/deadline/periodic, sensitivity levels, decay halflives), memory_recall (semantic search), memory_forget (two-step search-then-delete), memory_scratch (ephemeral TTL workspace), memory_verify (reset decay clock), memory_boost (Noguchi self-organizing relevance)

Lifecycle & Organization (7)

memory_promote (tier transfer), memory_consolidate (episodes→facts→principles→heuristics), memory_prune (soft-delete to cold), memory_organize (knowledge graph: link/traverse/cluster), memory_summarize, memory_impact (causal assessment), hippocampal_consolidation (5-phase brain-inspired)

Provenance & Causality (4)

memory_provenance (chain tracing), memory_trace (upstream/downstream causal edges), contradiction_check (detect + resolve), session_recalled (for Noguchi boosting at session end)

Episodic & Procedural (4)

episode (start/update/complete/search), learning (store with domain + error type), procedure (capture with trigger conditions), trajectory (tool sequences with feedback)

Governance & Compliance (7)

governance_report (ISO 42001 evidence), governance_gap_analysis, compliance_dashboard (ISO 42001 + EU AI Act + OWASP Agentic Top 10), constitutional_contract (monotonically decreasing privilege chains), constitutional_monitor (real-time drift detection), guardrail_proof (Ed25519 + Merkle attestation), data_sovereignty (jurisdiction tagging + GDPR cascading delete)

Agent Identity & Coordination (7)

agent_identity (PQC-ready, key rotation, C-BOM), nhi_lifecycle (spawn/escalate/terminate), parl_coordinator (advisory locks, heartbeat), a2a_protocol (JSON-LD agent cards), task_specialization (performance routing scores), bft_consensus (weighted voting), federation (cross-instance sync with Ed25519)

Agent Ecosystem (4)

agent_marketplace (publish/install/certify), agent_dev_env (isolated dev with hot-reload), meta_agent (underperformer detection), digital_twin (sandbox scenarios + promotion reports)

Debugging & Analysis (5)

causal_debug (counterfactuals), flow_debug (DAG visualization), time_travel (session replay + what-if), semantic_diff (behavioral diffs between versions), self_assess (memory-grounded task assessment)

Performance & Cost (4)

benchmark, benchmark_suite (7-dim + regression detection), cost_router (3-tier Haiku/Sonnet/Opus cascading with budget tracking), stigmergy (pheromone trail reinforcement + decay guidance)

Search, Planning & Swarm (8)

rag_search (Obsidian vault), predictive_preload, context_budget (5 compartments), temporal_planner (dependencies + critical path), workflow_author (NL→conductor), workflow_optimizer (bottleneck A/B testing), micro_swarm (BFT consensus aggregation), skill_discovery

Verification & Security (2)

formal_verify (safety/liveness/invariant checks, Ed25519 certificates), red_team (6 attack categories: goal hijacking, tool misuse, privilege escalation, memory poisoning, prompt injection, data exfiltration)

World Model & Multimodal (2)

world_model (predict outcomes, observe actuals, update service models), multimodal_input (images, audio, diagrams → structured text)

3.2 Cold Path: 15 n8n Workflows

WorkflowScheduleNodesPurpose
Session ExtractionEvery 2h13LLM extracts structured memories from raw session transcripts
Memory CompactionEvery 6h12Cluster similar memories, summarize, archive originals
Predictive PatternsDaily 2AM10Mine trajectories for recurring tool chain patterns
Hippocampal ConsolidationDaily 3AM16Brain-inspired hot→warm consolidation with cycle audit
TTL SweepDaily 3AM UTC8Universal GC — expire points across all collections
Tier TransferDaily 3:30AM14Promote warm→long-term + delete expired cold
Stigmergy DecayDaily 4AM7Pheromone trail decay + evaporation below threshold
Active PruningWeekly Sun 5AM13Demote underused memories to cold + audit trail
Hierarchical AbstractionWeekly Sun 4AM15LLM synthesis into higher-level abstractions + dedup
Permission ReviewWeekly Mon 6AM6Audit NHI lifecycle for stale permissions
Monthly Review1st of month84-way governance report → Obsidian (expiring, never-accessed, sensitive, redactions)
Memory GatewayWebhook14Real-time API — store/recall/rag with auth routing
Benchmark RegressionScheduled2Performance regression detection
Compliance ReportScheduled2Compliance evidence generation
Skill DiscoveryScheduled2Emergent skill pattern detection from trajectories

2 workflows use LLM inference (Session Extraction + Hierarchical Abstraction call Claude via Anthropic API). 3 are lightweight stubs. The Memory Gateway is the only webhook-triggered workflow — all others run on schedule.

3.3 Key Hook Behaviors

Tool Facade (memory-first-gate.sh)

Intercepts Grep and Glob calls. Before the tool executes, the facade embeds the search query, queries Qdrant, and if memory has the answer, serves it directly as the tool result. Claude never needs to "decide" to check memory — the answer appears as if the search found it.

Constitutional Observer

Fires on every tool call. Checks actions against session objectives for scope drift, target drift, and destructive operations. Flags are buffered to JSONL and flushed to constitutional_assessments collection by flush_insights.py at Stop. No LLM calls — must complete in <2s.

Task Outcome (heuristics)

Stop hook. Reads the tool chain buffer, classifies the task type, calculates success metrics, stores in the heuristics collection. Feeds the self-assessment system and dashboard. Must run before flush_insights.py.

Self-Assessment (session_start.py)

Extended SessionStart hook. Beyond auto-recall, now runs self-assessment against the heuristics collection and sets constitutional objectives for the session. These objectives are what the constitutional observer checks against.

4. Requirements

REQ-MEM-001 The MCP server shall expose 60+ tools across 14 categories covering memory CRUD, lifecycle, provenance, episodic/procedural, governance, agent identity, ecosystem, debugging, performance, search, planning, swarm, verification, and world model.
REQ-MEM-002 7+ Qdrant collections shall implement tiered memory lifecycle with differentiated retention (permanent, TTL, decay, consolidation-eligible, session-scoped).
REQ-MEM-003 A two-layer hook architecture shall wire critical hooks in both settings.json (what CC reads) and plugin hooks.json (plugin lifecycle), ensuring PostToolUse and Stop hooks fire reliably.
REQ-MEM-004 Tool Facade scripts shall intercept exploratory searches (Grep, Glob) and serve memory content directly as tool results, removing the need for Claude to decide when to check memory.
REQ-MEM-005 The constitutional observer shall check every tool call against session objectives for scope drift, target drift, and destructive operations, buffering flags to JSONL with <2s latency.
REQ-MEM-006 Task outcome recording shall classify completed tasks, calculate success metrics, and store to a heuristics collection that feeds self-assessment at next session start.
REQ-MEM-007 15 n8n workflows shall automate the cold path across 6 cadences: 2-hourly (extraction), 6-hourly (compaction), daily (consolidation, TTL, transfer, decay), weekly (pruning, abstraction, permissions), monthly (governance), and webhook (gateway).
REQ-MEM-008 Brain-inspired hippocampal consolidation shall implement 5-phase processing: replay, extraction, integration, pruning, reorganization — with hot/warm/cold tiering.
REQ-MEM-009 Stigmergy (pheromone trail) coordination shall reinforce successful tool chains, apply daily decay, evaporate trails below threshold, and provide guidance for future task routing.
REQ-MEM-010 Constitutional contracts shall enforce monotonically decreasing privileges in delegation chains with behavioral rules, data classification ceilings, and permitted/prohibited action lists.
REQ-MEM-011 Data sovereignty shall support per-memory jurisdiction tagging across 8 jurisdictions with GDPR cascading deletion and jurisdiction-filtered recall.
REQ-MEM-012 Agent identities shall be PQC-ready with Ed25519 key rotation, revocation, delegation token signing/verification, and C-BOM generation.
REQ-MEM-013 BFT consensus shall enable multi-agent weighted voting with evidence hashes and critical-decision escalation.
REQ-MEM-014 Time-travel debugging shall support session recording, frozen-state replay, step modification for what-if analysis, and execution comparison.
REQ-MEM-015 Red team self-testing shall support adversarial campaigns across 6 attack categories with severity tracking and trend reporting.
REQ-MEM-016 The system shall fail-open: if Qdrant or Ollama is down, sessions continue using flat-file MEMORY.md for context.
REQ-MEM-017 Multi-framework compliance shall cover ISO 42001, EU AI Act, and OWASP Agentic Top 10 with evidence packages, gap analysis, and scoring dashboards.
REQ-MEM-018 Local Ollama embeddings (nomic-embed-text, 768-dim) shall provide all vectorization with zero cloud dependency.

5. Prompt to Build It

Build a persistent vector memory system for Claude Code:

1. MCP SERVER (Node.js, 60 tools across 14 categories):
   - Core CRUD: store (temporal classes, sensitivity, decay halflife), recall,
     forget (two-step), scratch (ephemeral TTL), verify (reset decay), boost (Noguchi)
   - Lifecycle: promote (tier transfer), consolidate (episodes→heuristics),
     prune (soft-delete to cold), organize (knowledge graph), summarize,
     impact assess, hippocampal consolidation (5-phase brain-inspired)
   - Provenance: trace causal chains, contradiction detection/resolution
   - Episodic: episodes, learnings, procedures (with triggers), trajectories
   - Governance: ISO 42001 + EU AI Act + OWASP scoring, constitutional contracts,
     guardrail proofs (Ed25519+Merkle), data sovereignty (GDPR cascading delete)
   - Agent Identity: PQC-ready, NHI lifecycle, parallel coordination, A2A protocol,
     BFT consensus, federation (Ed25519 keypairs)
   - Debugging: causal debug, flow debug, time-travel, semantic diff, self-assess
   - Performance: benchmarks, cost routing (3-tier cascade), stigmergy trails

2. QDRANT COLLECTIONS (7+ tiered):
   - claude_memories (permanent), short_term_memory (TTL), working_memory (60min),
     learnings (protected), procedures (protected), trajectories (decay), episodes
   - Plus: heuristics, pheromone_trails, causal_analysis, constitutional_assessments
   - Dedup at 0.92 cosine similarity, 1000 max per collection

3. TWO-LAYER HOOK ARCHITECTURE:
   - Layer 1 (settings.json — what CC actually reads):
     SessionStart: auto-recall script
     UserPromptSubmit: embed prompt → Qdrant search → inject context
     PreToolUse: Tool Facade (intercepts Grep/Glob, serves memory as result),
       constitutional observer (drift detection on every tool call)
     PostToolUse: tool chain tracker, world model observer
     Stop: task outcome → heuristics, flush insights → stigmergy + constitutional
     SessionEnd: transcript capture, Obsidian daily note
   - Layer 2 (plugin hooks.json): session_start (self-assessment + objectives),
     pre_store, auto_linker, pre_compact, assistant_response_capture

4. N8N WORKFLOWS (15 active):
   - Session Extraction (2h): LLM transcript → structured memories
   - Memory Compaction (6h): cluster → summarize → archive
   - Hippocampal Consolidation (daily 3AM): 5-phase hot→warm
   - Tier Transfer (daily 3:30AM): warm→LT + cold expiry
   - Stigmergy Decay (daily 4AM): pheromone evaporation
   - Active Pruning (weekly): demote to cold + audit
   - Hierarchical Abstraction (weekly): LLM synthesis + dedup
   - Permission Review (weekly): NHI audit
   - Monthly Review: governance report → Obsidian
   - Memory Gateway (webhook): store/recall/rag API

5. DOCKER COMPOSE: Qdrant, PostgreSQL, n8n (with Anthropic API key),
   Supabase (with reconnecting health checks), Ollama native on host

Build as MCP server + Claude Code plugin + n8n workflow definitions.
Wire hooks in BOTH settings.json and plugin hooks.json.

6. Design Decisions

Two-Layer Hooks over Plugin-Only

Claude Code's hook loading has a gap: PostToolUse and Stop from plugin hooks.json don't always fire. Wiring critical hooks in settings.json guarantees execution. The plugin layer handles SessionStart and PreToolUse where plugin loading works. Both layers reference the same Python scripts.

Tool Facade over Explicit Memory Checks

Requiring Claude to "decide" to check memory before searching is fragile — it often skips it under context pressure. The Tool Facade intercepts Grep/Glob searches and serves memory results directly. Claude gets the answer without needing to make the right decision.

Constitutional Observer over Post-Hoc Review

Checking alignment after the session is too late. The constitutional observer runs on every tool call in <2s, buffering drift flags. The flush at session Stop writes them to Qdrant for trend analysis. Real-time detection, batch storage.

Heuristics Collection + Self-Assessment

Task outcome recording creates a feedback loop: session N's task_outcome.py writes to heuristics → session N+1's session_start.py reads heuristics for self-assessment → better task routing and risk awareness. The system gets better at knowing what it's good at.

15 Workflows over 2

Separating concerns means each workflow runs independently at the right frequency. A slow LLM synthesis (weekly) never blocks a fast TTL sweep (daily). Each workflow can fail without affecting the others. The original 2-workflow approach (organize + forget) couldn't scale.

Stigmergy over Explicit Routing Rules

Pheromone trails encode successful tool chains through observation, not programming. Daily decay prevents stale patterns from dominating. Agents get probabilistic guidance ("87% success rate for this pattern") instead of rigid rules. The system learns what works by watching what works.

7. Integration Points

→ Plugin Ecosystem

Memory operates as a Claude Code plugin with both settings.json and plugin hooks.json wiring. 6 slash commands (/memory-search, /memory-save, /memory-resume, /forget, /memory-stats, exit) provide user-facing interfaces. MCP server registered via .mcp.json.

→ Agent Governance

7 governance tools connect memory to the governance framework. Constitutional contracts and monitor tools enforce delegation chain privileges. Data sovereignty and guardrail proofs provide compliance evidence. The governance plugin's policy engine evaluates memory writes.

→ Multi-Agent Orchestration

Conductor stores trajectories, learnings, and task outcomes. Agent identity, NHI lifecycle, BFT consensus, and task specialization tools support the conductor's 29-agent workforce. The conductor state schema references governance manifests stored in memory.

→ Context Guard

Context budget management (5 compartments) bridges memory and context window management. PreCompact hooks trigger emergency state saves. Predictive preloading reduces recall latency by pre-fetching likely-needed memories based on trajectory patterns.

8. Operations

Backup & Recovery

Failure Modes

Component DownImpactFallback
OllamaCannot embedSession continues with MEMORY.md (flat file)
QdrantCannot store/recallEmpty results; MEMORY.md still loads
n8nMaintenance stopsVectors accumulate; manual run when restored
PostgreSQLn8n state lostMCP continues; n8n workflows pause
SupabaseDashboard disconnectsAuto-reconnect health check restores connection

SLOs

MetricTarget
memory_recall latency<500ms p95
memory_store latency<1s p95
Dedup false positive rate<1%
Constitutional observer latency<2s per tool call
Weekly prune coverage100% of collections scanned
Backup freshness<7 days

Monitoring

The debug-memory diagnostic checks all 6 hops: Ollama process + API + model, Qdrant container + API + all collections, MCP server + hooks, n8n container + API, hook pipeline integrity, and Memory Dashboard container.