PRD 3 of 8

Context Management
& Auto-Memory

Active context window monitoring with 4-tier escalation, velocity-based prediction, a two-layer memory architecture (file-based hot + vector deep), and automatic state preservation before compression.

Context Management Architecture

1. Problem Statement

Claude Code operates within a finite context window. During long sessions — multi-file refactors, complex debugging, multi-agent workflows — the context fills. When it does, the system compresses conversation history, destroying in-progress plans, architectural decisions, and task state. This manifests as lost plans, repeated mistakes, broken workflows, and session amnesia.

The memory system's MCP tools provide the context_budget tool with 5 compartments (active_task, project_background, operator_preferences, safety_constraints, ambient_knowledge), but that manages what's in context — not when context is about to run out. The context guard monitors the window itself, predicting exhaustion and triggering graduated responses.

2. Architecture Overview

Three layers of context management:

Layer 1: CLAUDE.md Hierarchy (Static)

LevelPathScope
Global~/.claude/CLAUDE.mdAll projects — universal rules, preferences, standards
Project~/.claude/projects/{path}/CLAUDE.mdPer-project — conventions, architecture notes
Repository{repo}/.claude/CLAUDE.mdGit-tracked — shared team configuration

Rules marked (ENFORCED) or (CRITICAL) survive compaction — reloaded from disk, not conversation.

Layer 2: Auto-Memory (File-Based Hot Context)

MEMORY.md index (~200 lines, loaded every session) with individual topic files (user, feedback, project, reference types). Claude manages these automatically. Two-layer: file-based hot (loaded every session) + vector deep (queried via MCP tools on demand).

Layer 3: Context Guard Plugin (Active Monitoring)

v3.0.0 plugin with PostToolUse hooks (wired in settings.json) tracking token consumption. Velocity analysis predicts exhaustion. 4-tier escalation triggers graduated responses.

4-Tier Escalation

L1

Warning

30 tokens. Informational.

L2

Advisory

15 tokens. Save state.

L3

Yellow Alert

7 tokens. Force save.

L4

Critical

3 tokens. Emergency.

3. Key Components

3.1 Velocity Prediction

3-sample sliding window analyzes consumption rate. If >5 tokens/operation, escalation triggers earlier — before static thresholds would react.

3.2 Dynamic Thresholds

Rolling window of last 10 measurements adapts alert levels to conversation patterns. Hard minimum: 16 tokens absolute floor.

3.3 Context Budget MCP Tool

The memory system's context_budget tool manages 5 compartments with pinning, eviction, and token estimation. This bridges the gap between what's in context (budget) and when context runs out (guard).

3.4 State-Save Verification

PreCompact hook verifies all critical state persisted: plans, tasks, decisions, workflow state (conductor-state.json). Won't allow compression until verification passes.

4. Requirements

REQ-CTX-001 Token-precise context monitoring via PostToolUse hooks (wired in settings.json for reliable firing).
REQ-CTX-002 4-tier escalation at configurable thresholds: Warning (30), Advisory (15), Yellow Alert (7), Critical (3).
REQ-CTX-003 3-sample velocity prediction with >5 tokens/operation early escalation trigger.
REQ-CTX-004 Dynamic thresholds from rolling 10-measurement window with 16-token absolute floor.
REQ-CTX-005 PreCompact hook shall persist all critical state (plans, tasks, workflow) before allowing compression.
REQ-CTX-006 CLAUDE.md cascade: global → project → repository with (ENFORCED)/(CRITICAL) markers surviving compaction.
REQ-CTX-007 Auto-memory: MEMORY.md index + topic files organized by type (user, feedback, project, reference).
REQ-CTX-008 Two-layer memory: file-based hot (loaded every session) + vector deep (queried on demand via 60 MCP tools).
REQ-CTX-009 Context budget tool with 5 compartments: active_task, project_background, operator_preferences, safety_constraints, ambient_knowledge.
REQ-CTX-010 Session resumption from saved state including auto-memory, workflow checkpoints, and task progress.
REQ-CTX-011 Context health telemetry (level, velocity, predicted exhaustion) available to orchestrator for adaptive routing.
REQ-CTX-012 SessionEnd hook shall write session breadcrumb to Obsidian daily note for cross-session continuity.

5. Prompt to Build It

Build a context management system for Claude Code:

1. CONTEXT GUARD PLUGIN (v3.0.0):
   - PostToolUse hook (in settings.json!) tracking token consumption
   - 4-tier escalation: Warning(30), Advisory(15), Yellow Alert(7), Critical(3)
   - 3-sample velocity prediction with early trigger at >5 tokens/op
   - Dynamic thresholds from rolling 10-measurement window
   - Hard minimum: 16 tokens absolute floor

2. AUTO-MEMORY:
   - MEMORY.md index (~200 lines) loaded every session
   - Topic files with YAML frontmatter (name, description, type)
   - Types: user, feedback, project, reference
   - Two-layer: file-based hot + vector deep (via 60 MCP tools)

3. CLAUDE.MD HIERARCHY:
   - Global → Project → Repository cascade
   - (ENFORCED)/(CRITICAL) markers survive compaction

4. CONTEXT BUDGET (MCP tool):
   - 5 compartments with pin/evict/estimate operations
   - Bridges "what's in context" with "when it runs out"

5. STATE PRESERVATION:
   - PreCompact hook saves plans, tasks, decisions, workflow state
   - SessionEnd writes Obsidian daily note breadcrumb

Build as a Claude Code plugin with hooks wired in settings.json.

6. Design Decisions

PostToolUse in settings.json

Context monitoring depends on PostToolUse firing reliably. Plugin hooks.json alone won't do it — settings.json is required. This is the same two-layer insight from the plugin PRD.

Velocity over Static Thresholds

Static thresholds react after the fact. Velocity catches rapid consumption before it hits the threshold, enabling preemptive saves during fast-paced workflows.

File-Based + Vector Two-Layer

Files survive infrastructure failures. If Qdrant is down, MEMORY.md still loads. Vector provides deep semantic search. Complementary, not competing.

5-Compartment Budget

The context_budget MCP tool partitions the window into managed compartments with priorities. This prevents any single domain from crowding out others.

7. Integration Points

→ Memory System

File-based auto-memory is the hot layer. Vector memory (60 MCP tools, 7+ collections) is the deep layer. Context guard triggers saves before compaction. The context_budget tool manages what fills the window.

→ Multi-Agent Orchestration

Context pressure signals feed the conductor. The 60% budget rule and max 3 specs/session constraint in the conductor's context-management skill directly depend on context guard telemetry.

→ Agent Governance

Context pressure events are CONTEXT_PRESSURE audit events in the governance bus. Constitutional observer flags context-pressured decisions for review.

→ Plugin Ecosystem

Context guard is a reference implementation of a monitoring plugin using both hook layers (settings.json for PostToolUse, plugin hooks.json for PreCompact).