Wink Pings

Anthropic Reveals How It Keeps Claude Working Autonomously for 6 Hours: Lay the Foundation First, Then Run the Loop

Anthropic has solved the core problem of long-running agents: sustained work across context windows. They found failures stem from two patterns: trying to finish everything in one go, and declaring victory too early. The solution is a two-stage design with an initialization agent and a coding agent, paired with feature_list.json, incremental progress tracking, and Puppeteer-based testing. The community has further broken this down into an actionable engineering framework with 7 files and 5 steps. Build the harness first, then run the loop — otherwise your cycle is running on thin air.

Anthropic has publicly shared how they get Claude to work continuously for 6 hours without human intervention. It doesn't rely on a longer context window, but on an engineered design of harness and loop.

First, here are the key points they summarized (from ArchiveExplorer's tweet):

- 0:00 - Two failure patterns that kill all long agent runs

- 0:09 - The feature_list.json format + the "no deleting tests" rule

- 0:16 - Incremental progress + why they use Puppeteer MCP for testing

- 0:22 - The fixed session protocol every new agent launch follows

- 0:28 - Verbatim: a transcript of a typical session

- 0:32 - Failure patterns → solutions cheat sheet

## The Real Problem With Long-Running Agents

Anthropic states it plainly in their official blog post *Effective harnesses for long-running agents*: even with context compression, if you give Opus 4.5 a high-level task like "clone claude.ai" and ask it to run across multiple context windows, it still fails.

Failures fall into two patterns:

1. **Trying to finish everything in one go** — the agent crams too much work into a single context window, runs out of space halfway through, and leaves behind half-finished work and chaos. The next session can only guess what happened previously, and wastes massive amounts of time restoring basic functionality.

2. **Declaring victory too early** — when the agent joins the project in a later session, it sees some existing functionality and incorrectly assumes the project is done, stopping work entirely.

At their core, both patterns stem from the same problem: the agent doesn't pass clear state and progress between sessions.

## Two-Stage Solution: Initialization Agent + Coding Agent

Anthropic's approach splits the work between two distinct roles:

- **Initialization agent**: Only runs once, for the first execution. It's responsible for setting up the environment — writing an `init.sh` script, a `claude-progress.txt` progress file, a `feature_list.json` (containing over 200 features, all marked as "failing"), and making the first git commit.

- **Coding agent**: Runs the same prompt in every subsequent session. It's instructed to only make incremental progress: read the progress file and git log, pick the highest-priority unimplemented feature, build it, update `feature_list.json` once tests pass, and commit the change.

Key design detail: `feature_list.json` uses JSON format (which models are far less likely to corrupt by accident), and each feature entry includes a description, implementation steps, and a `passes` status field. The agent is explicitly instructed: "Deleting or editing existing tests is unacceptable, as this can lead to missing functionality or bugs."

![代码编辑器截图,显示 .claude/rules/core-rules.md 文件内容,包含调查与准确性、范围纪律、验证与安全等规则](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHL5U4TbWgAArdOa%3Fformat%3Djpg%26name%3Dlarge)

## Incremental Progress + Automated Testing

At the start of every session, the agent follows a fixed protocol:

1. Run `pwd` to confirm the working directory

2. Read the git log and progress file

3. Read `feature_list.json` and select the highest-priority unfinished feature

4. Run `init.sh` to start the development server

5. Run end-to-end tests with Puppeteer MCP to confirm core functionality works correctly

6. Implement the new feature, self-test, update progress, and commit to git

Anthropic found that after explicitly requiring the agent to run tests with a browser automation tool (operating just like a human user would), functional completeness improved dramatically. There are still limitations, however: Claude can't see native browser alert popups, so functionality relying on these alerts is prone to bugs.

## Community Breakdown: 7 Files + 5 Steps

Anthropic's blog focuses heavily on concepts, while a community article *Loop and Harness engineering: 7 files, 5 steps* lays out a production-ready engineering framework. Its core thesis: **Most people struggle with their loops, but the real problem is they never built a solid harness.**

### What are harness and loop?

- **Harness** is the `.claude/` folder. It defines rules, permissions, tools, subagents, skills, and memory. It doesn't change between sessions.

- **Loop** is the process that runs inside the harness: goal → action → verification → write memory → continue or stop.

Without a harness, the loop has to guess every time. Guessing leads to invented files, invented commands, and passing meaningless tests.

### The 7 Files

1. **CLAUDE.md** — Project structure, commands, and prohibited actions. Keep it under 300 lines. Citing the paper *Less Context, Better Agents*: when context becomes too large, task completion rate drops from 91.6% to 71%.

2. **settings.json** — Allow/deny lists for permissions and hook registration. First thing to add: a read-only allow array to avoid permission confirmation popups every time you run `ls`.

3. **hooks** — Deterministic scripts triggered by tool events. The first essential hook: match `PostToolUse` for `Edit|Write` commands, and run prettier automatically.

4. **subagents** — Markdown files with YAML frontmatter stored in `.claude/agents/`. The main agent calls them via the Task tool, and they run in a fresh context. A common use case: a verifier that checks if a diff meets the task goal.

5. **skills** — A folder in `.claude/skills/` that contains a SKILL.md for each skill. Loaded on demand: only the name and description are loaded at session start, and the full text is only loaded when triggered by a matching task.

6. **MCP** — `.mcp.json` declares external tool servers. Three rules: only add what you need for the current job, prefer official servers, don't install five "just in case".

7. **state and memory** — A `MEMORY.md` index plus a vault directory. Memory stores cross-session preferences and decisions, the vault stores immutable specifications.

![Git 仓库文件列表截图,显示 .claude/skills/ 目录下的多个技能文件夹](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHL5WWGsW4AAxnSd%3Fformat%3Djpg%26name%3Dlarge)

### The 5 Loop Steps

1. **Goal spec** — `PROMPT.md` stored on disk, re-read by the loop on every iteration. Without this file, the agent drifts off-task after just three iterations.

2. **Plan → Act → Verify** — The three-step minimal loop. Verification must run in an independent context (a subagent), otherwise the main agent will always agree with its own work.

3. **Sub-agent fan-out** — When a task branches into multiple independent sub-tasks, dispatch subagents in parallel, then aggregate the results. Internal Anthropic testing shows multi-agent setups improve performance by +90.2% over single-agent setups.

4. **Scheduler and persistence** — Use cron, launchd, or other simple timers to trigger the loop. The scheduler must be dumber than the agent — if the scheduler tries to do its own reasoning, it will fail silently for days.

5. **Failure modes** — Three common failures: confident garbage (missing verification), context rot (accuracy collapses after exceeding 200K tokens), and the Ralph Wiggum loop (repeating the same work over and over because state wasn't saved).

![图表对比:单上下文循环 1.48M tokens、71% 完成率 vs 剪枝+总结循环 553K tokens、91.6% 完成率](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FHL5ZlSkXgAA09wS%3Fformat%3Dpng%26name%3Dlarge)

## Something You Can Try Tonight

Open your `.claude/` folder and run:

```

ls -la .claude/

```

Count the files. If you have fewer than 7, you know what to add.

Then do one more thing: open the corresponding repo and clone it.

- Anthropic official quickstart: [github.com/anthropics/claude-quickstarts/tree/main/autonomous-coding](https://github.com/anthropics/claude-quickstarts/tree/main/autonomous-coding)

- Community minimal configuration reference: [centminmod/my-claude-code-setup](https://github.com/centminmod/my-claude-code-setup)

- Subagent template library: [wshobson/agents](https://github.com/wshobson/agents) (37K stars)

- Adversarial verifier: [moonrunnerkc/swarm-orchestrator](https://github.com/moonrunnerkc/swarm-orchestrator)

**The harness is your foundation. Without it, every loop runs on thin air.**

发布时间: 2026-07-05 14:27