Wink Pings

Claude Code's KV Cache Invalidation Issue: A Hidden Performance Trap

Users discovered that Claude Code triggers complete prompt processing on every request, causing KV cache invalidation. The root cause lies in billing header information in system messages, with a simple and effective solution available.

Recently, users encountered a subtle performance issue with Claude Code where each request triggered complete prompt processing, leading to KV cache invalidation. By examining the logs, the root cause was identified: Claude Code adds specific billing header information to the system message in every request.

## Problem Discovery

Users found the following system message content in the logs:

```

text:"x-anthropic-billing-header: cc_version=2.1.39.c39; cc_entrypoint=cli; cch=56445;",

type:"text"

```

The values in this header change with every request, and the template renders it as text in the system prompt, causing the entire prompt to be reprocessed each time.

## Solution

In a related GitHub issue, developers provided a simple solution:

Set an environment variable in Claude's settings.json file:

```json

{

"env": {

"CLAUDE_CODE_ATTRIBUTION_HEADER": "0"

}

}

```

The specific operation is to add this configuration to the "env" section of the `~/.claude/settings.json` file.

## Effect Verification

After applying this modification, the billing header information is no longer added to the system prompt, and the KV cache becomes effective again. Multiple users have confirmed that this solution works.

Some developers have pointed out that the latest version has more bugs on Windows and recommend using version 2.1.7 for a more stable experience.

## Technical Background

KV (Key-Value) cache is an important mechanism for optimizing inference speed in large language models. When the content of the prompt changes, the model needs to recalculate the attention weights for the entire sequence, which significantly increases computational overhead. Keeping the prompt stable allows full utilization of the caching mechanism, improving response speed.

Although this issue has a simple solution, if left unnoticed, it can silently consume significant computational resources. For users who frequently use Claude Code for code development, the performance improvement from this optimization is quite substantial.

发布时间: 2026-02-14 09:33