Wink - AI原生创新，忠于用户，专属智能体验

The context window is only so big: a single grep can take 100KB, a cat can take 50KB. Read a few files, run a few commands, and that 200K token budget is gone in an instant.

Claude Code's solution isn't to just increase the window—it built a 7-layer memory system, ranging from sub-millisecond lightweight cleanup to background 'dream'-like long-term integration. Each layer tries its best to prevent the next layer from triggering. The cheap layers block first, the expensive ones serve as a safety net.

**Tool Result Storage (Layer 1)**

Outputs exceeding a threshold are directly dumped to disk; only a 2KB preview + label remains in context. The key is that ContentReplacementState 'freezes' which result was replaced, ensuring byte-for-byte consistency of the prompt prefix with every API call, so the cache can hit.

**Micro-compression (Layer 2)**

Old tool results are cleaned before each round of calls. Three methods: time-triggered (inactivity >60 minutes), cache_edits (server-side deletions/modifications that don't break local cache), and the native context_management API. Only targets tools like FileRead, Grep, Bash; thinking blocks and user messages remain untouched.

**Session Memory (Layer 3)**

Continuously maintains structured Markdown notes at ~/.claude/projects/.../session-memory/.md. When triggered by a threshold, forks a sub-agent to extract a summary. When compaction is truly needed, injects the ready-made notes directly—saving a summarizer API call.

**Full Compression (Layer 4)**

This is the last line of defense. Forks a dedicated summarizer to generate a 9-section structured summary, quality-enhanced with a draft area before being stripped. After compression, re-injects key context like recent files, skills, plans, and inserts a CompactBoundaryMessage marker as a breakpoint. A circuit breaker activates after 3 consecutive failures to prevent infinite retry loops wasting money.

**Automatic Memory Extraction (Layer 5)**

Runs after each full query cycle, extracting cross-session persistent knowledge to the project memory/ directory. Four memory types: feedback, workflows, environment preferences, tech stack. Mutex mechanism: memories already written by the main agent are not duplicated by the background process. MEMORY.md serves as an index, hard-capped at 200 lines / 25KB.

**Dream System (Layer 6)**

Background cross-session integration, analogous to human memory consolidation during sleep. Four phases: Orient → Gather → Consolidate → Prune. Read-only tool restrictions (ls, grep, cat), file locks prevent concurrency. Users can monitor and terminate via the UI.

**Cross-Agent Communication (Layer 7)**

All background operations run on forked agents: clones mutable states, shares prompt cache prefix. Relies on Agent tool multi-mode spawning and SendMessage to handle memory, compression, approval, and communication.

**Underlying Design Philosophy**

1. **Layered Defense + Lowest Cost First**: Each layer strives to prevent the next from triggering, forming an efficient waterfall.

2. **Extreme Prompt Cache Optimization**: ContentReplacementState freezing, renderedSystemPrompt passthrough, cache_edits, fork prefix byte consistency—nearly every decision revolves around not breaking the cache.

3. **Balance of Isolation and Sharing**: Forking sub-agents prevents state pollution while maximizing cache reuse.

4. **Fault Tolerance & Control**: Circuit breakers, mutex protection, GrowthBook feature flags allow remote disabling of any module, graceful degradation.

5. **Observability**: From UI progress pills to detailed logs, full transparency for background tasks.

![Claude Code 7-Tier Hierarchical Memory System](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHEyHLepaoAAWQff%3Fformat%3Djpg%26name%3Dlarge)

![Comparison of Different Layer Storage Technologies](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHEwnczCaUAA-ICK%3Fformat%3Djpg%26name%3Dlarge)

For most people, talking about context management means 'how big is the window' or 'how good is the compression algorithm.' Claude Code takes a different path: it doesn't focus on compression itself, but on how to make compression happen less often, or even not at all. Tool results go straight to disk, kept out of context whenever possible; what micro-compression can clean up, never waits for session memory; what session memory can handle, never calls the full summarizer.

Put simply, the core of this system isn't 'memory' at all—it's **cache protection**. All the fancy mechanisms—ContentReplacementState freezing, cache_edits, fork prefix byte consistency—point to the same goal: not letting that ~1-hour TTL server-side prompt cache expire. Every cache miss costs ~200 times more to re-tokenize 200K tokens than a hit. That's the real cost driver.

For an AI coding tool, features are certainly important. But what Claude Code demonstrates is this: true engineering strength isn't about how much you can do, but about knowing where to save.

Wink Pings

Claude Code's 7-Layer Memory System: A Complete Re-architecture of 'Context Management'