Active context window monitoring with 4-tier escalation, velocity-based prediction, a two-layer memory architecture (file-based hot + vector deep), and automatic state preservation before compression.

Claude Code operates within a finite context window. During long sessions — multi-file refactors, complex debugging, multi-agent workflows — the context fills. When it does, the system compresses conversation history, destroying in-progress plans, architectural decisions, and task state. This manifests as lost plans, repeated mistakes, broken workflows, and session amnesia.
The memory system's MCP tools provide the context_budget tool with 5 compartments (active_task, project_background, operator_preferences, safety_constraints, ambient_knowledge), but that manages what's in context — not when context is about to run out. The context guard monitors the window itself, predicting exhaustion and triggering graduated responses.
Three layers of context management:
| Level | Path | Scope |
|---|---|---|
| Global | ~/.claude/CLAUDE.md | All projects — universal rules, preferences, standards |
| Project | ~/.claude/projects/{path}/CLAUDE.md | Per-project — conventions, architecture notes |
| Repository | {repo}/.claude/CLAUDE.md | Git-tracked — shared team configuration |
Rules marked (ENFORCED) or (CRITICAL) survive compaction — reloaded from disk, not conversation.
MEMORY.md index (~200 lines, loaded every session) with individual topic files (user, feedback, project, reference types). Claude manages these automatically. Two-layer: file-based hot (loaded every session) + vector deep (queried via MCP tools on demand).
v3.0.0 plugin with PostToolUse hooks (wired in settings.json) tracking token consumption. Velocity analysis predicts exhaustion. 4-tier escalation triggers graduated responses.
Warning
30 tokens. Informational.
Advisory
15 tokens. Save state.
Yellow Alert
7 tokens. Force save.
Critical
3 tokens. Emergency.
3-sample sliding window analyzes consumption rate. If >5 tokens/operation, escalation triggers earlier — before static thresholds would react.
Rolling window of last 10 measurements adapts alert levels to conversation patterns. Hard minimum: 16 tokens absolute floor.
The memory system's context_budget tool manages 5 compartments with pinning, eviction, and token estimation. This bridges the gap between what's in context (budget) and when context runs out (guard).
PreCompact hook verifies all critical state persisted: plans, tasks, decisions, workflow state (conductor-state.json). Won't allow compression until verification passes.
Build a context management system for Claude Code:
1. CONTEXT GUARD PLUGIN (v3.0.0):
- PostToolUse hook (in settings.json!) tracking token consumption
- 4-tier escalation: Warning(30), Advisory(15), Yellow Alert(7), Critical(3)
- 3-sample velocity prediction with early trigger at >5 tokens/op
- Dynamic thresholds from rolling 10-measurement window
- Hard minimum: 16 tokens absolute floor
2. AUTO-MEMORY:
- MEMORY.md index (~200 lines) loaded every session
- Topic files with YAML frontmatter (name, description, type)
- Types: user, feedback, project, reference
- Two-layer: file-based hot + vector deep (via 60 MCP tools)
3. CLAUDE.MD HIERARCHY:
- Global → Project → Repository cascade
- (ENFORCED)/(CRITICAL) markers survive compaction
4. CONTEXT BUDGET (MCP tool):
- 5 compartments with pin/evict/estimate operations
- Bridges "what's in context" with "when it runs out"
5. STATE PRESERVATION:
- PreCompact hook saves plans, tasks, decisions, workflow state
- SessionEnd writes Obsidian daily note breadcrumb
Build as a Claude Code plugin with hooks wired in settings.json.Context monitoring depends on PostToolUse firing reliably. Plugin hooks.json alone won't do it — settings.json is required. This is the same two-layer insight from the plugin PRD.
Static thresholds react after the fact. Velocity catches rapid consumption before it hits the threshold, enabling preemptive saves during fast-paced workflows.
Files survive infrastructure failures. If Qdrant is down, MEMORY.md still loads. Vector provides deep semantic search. Complementary, not competing.
The context_budget MCP tool partitions the window into managed compartments with priorities. This prevents any single domain from crowding out others.
File-based auto-memory is the hot layer. Vector memory (60 MCP tools, 7+ collections) is the deep layer. Context guard triggers saves before compaction. The context_budget tool manages what fills the window.
Context pressure signals feed the conductor. The 60% budget rule and max 3 specs/session constraint in the conductor's context-management skill directly depend on context guard telemetry.
Context pressure events are CONTEXT_PRESSURE audit events in the governance bus. Constitutional observer flags context-pressured decisions for review.
Context guard is a reference implementation of a monitoring plugin using both hook layers (settings.json for PostToolUse, plugin hooks.json for PreCompact).