Anthropic Reveals How It Keeps Claude Working Autonomously for 6 Hours: Lay the Foundation First, Then Run the Loop
Anthropic has solved the core problem of long-running agents: sustained work across context windows. They found failures stem from two patterns: trying to finish everything in one go, and declaring victory too early. The solution is a two-stage design with an initialization agent and a coding agent, paired with feature_list.json, incremental progress tracking, and Puppeteer-based testing. The community has further broken this down into an actionable engineering framework with 7 files and 5 steps. Build the harness first, then run the loop — otherwise your cycle is running on thin air.
Anthropic has publicly shared how they get Claude to work continuously for 6 hours without human intervention. It doesn't rely on a longer context window, but on an engineered design of harness and loop.
First, here are the key points they summarized (from ArchiveExplorer's tweet):
- 0:00 - Two failure patterns that kill all long agent runs
- 0:09 - The feature_list.json format + the "no deleting tests" rule
- 0:16 - Incremental progress + why they use Puppeteer MCP for testing
- 0:22 - The fixed session protocol every new agent launch follows
- 0:28 - Verbatim: a transcript of a typical session
- 0:32 - Failure patterns → solutions cheat sheet
## The Real Problem With Long-Running Agents
Anthropic states it plainly in their official blog post *Effective harnesses for long-running agents*: even with context compression, if you give Opus 4.5 a high-level task like "clone claude.ai" and ask it to run across multiple context windows, it still fails.
Failures fall into two patterns:
1. **Trying to finish everything in one go** — the agent crams too much work into a single context window, runs out of space halfway through, and leaves behind half-finished work and chaos. The next session can only guess what happened previously, and wastes massive amounts of time restoring basic functionality.
2. **Declaring victory too early** — when the agent joins the project in a later session, it sees some existing functionality and incorrectly assumes the project is done, stopping work entirely.
At their core, both patterns stem from the same problem: the agent doesn't pass clear state and progress between sessions.
## Two-Stage Solution: Initialization Agent + Coding Agent
Anthropic's approach splits the work between two distinct roles:
- **Initialization agent**: Only runs once, for the first execution. It's responsible for setting up the environment — writing an `init.sh` script, a `claude-progress.txt` progress file, a `feature_list.json` (containing over 200 features, all marked as "failing"), and making the first git commit.
- **Coding agent**: Runs the same prompt in every subsequent session. It's instructed to only make incremental progress: read the progress file and git log, pick the highest-priority unimplemented feature, build it, update `feature_list.json` once tests pass, and commit the change.
Key design detail: `feature_list.json` uses JSON format (which models are far less likely to corrupt by accident), and each feature entry includes a description, implementation steps, and a `passes` status field. The agent is explicitly instructed: "Deleting or editing existing tests is unacceptable, as this can lead to missing functionality or bugs."

## Incremental Progress + Automated Testing
At the start of every session, the agent follows a fixed protocol:
1. Run `pwd` to confirm the working directory
2. Read the git log and progress file
3. Read `feature_list.json` and select the highest-priority unfinished feature
4. Run `init.sh` to start the development server
5. Run end-to-end tests with Puppeteer MCP to confirm core functionality works correctly
6. Implement the new feature, self-test, update progress, and commit to git
Anthropic found that after explicitly requiring the agent to run tests with a browser automation tool (operating just like a human user would), functional completeness improved dramatically. There are still limitations, however: Claude can't see native browser alert popups, so functionality relying on these alerts is prone to bugs.
## Community Breakdown: 7 Files + 5 Steps
Anthropic's blog focuses heavily on concepts, while a community article *Loop and Harness engineering: 7 files, 5 steps* lays out a production-ready engineering framework. Its core thesis: **Most people struggle with their loops, but the real problem is they never built a solid harness.**
### What are harness and loop?
- **Harness** is the `.claude/` folder. It defines rules, permissions, tools, subagents, skills, and memory. It doesn't change between sessions.
- **Loop** is the process that runs inside the harness: goal → action → verification → write memory → continue or stop.
Without a harness, the loop has to guess every time. Guessing leads to invented files, invented commands, and passing meaningless tests.
### The 7 Files
1. **CLAUDE.md** — Project structure, commands, and prohibited actions. Keep it under 300 lines. Citing the paper *Less Context, Better Agents*: when context becomes too large, task completion rate drops from 91.6% to 71%.
2. **settings.json** — Allow/deny lists for permissions and hook registration. First thing to add: a read-only allow array to avoid permission confirmation popups every time you run `ls`.
3. **hooks** — Deterministic scripts triggered by tool events. The first essential hook: match `PostToolUse` for `Edit|Write` commands, and run prettier automatically.
4. **subagents** — Markdown files with YAML frontmatter stored in `.claude/agents/`. The main agent calls them via the Task tool, and they run in a fresh context. A common use case: a verifier that checks if a diff meets the task goal.
5. **skills** — A folder in `.claude/skills/` that contains a SKILL.md for each skill. Loaded on demand: only the name and description are loaded at session start, and the full text is only loaded when triggered by a matching task.
6. **MCP** — `.mcp.json` declares external tool servers. Three rules: only add what you need for the current job, prefer official servers, don't install five "just in case".
7. **state and memory** — A `MEMORY.md` index plus a vault directory. Memory stores cross-session preferences and decisions, the vault stores immutable specifications.

### The 5 Loop Steps
1. **Goal spec** — `PROMPT.md` stored on disk, re-read by the loop on every iteration. Without this file, the agent drifts off-task after just three iterations.
2. **Plan → Act → Verify** — The three-step minimal loop. Verification must run in an independent context (a subagent), otherwise the main agent will always agree with its own work.
3. **Sub-agent fan-out** — When a task branches into multiple independent sub-tasks, dispatch subagents in parallel, then aggregate the results. Internal Anthropic testing shows multi-agent setups improve performance by +90.2% over single-agent setups.
4. **Scheduler and persistence** — Use cron, launchd, or other simple timers to trigger the loop. The scheduler must be dumber than the agent — if the scheduler tries to do its own reasoning, it will fail silently for days.
5. **Failure modes** — Three common failures: confident garbage (missing verification), context rot (accuracy collapses after exceeding 200K tokens), and the Ralph Wiggum loop (repeating the same work over and over because state wasn't saved).

## Something You Can Try Tonight
Open your `.claude/` folder and run:
```
ls -la .claude/
```
Count the files. If you have fewer than 7, you know what to add.
Then do one more thing: open the corresponding repo and clone it.
- Anthropic official quickstart: [github.com/anthropics/claude-quickstarts/tree/main/autonomous-coding](https://github.com/anthropics/claude-quickstarts/tree/main/autonomous-coding)
- Community minimal configuration reference: [centminmod/my-claude-code-setup](https://github.com/centminmod/my-claude-code-setup)
- Subagent template library: [wshobson/agents](https://github.com/wshobson/agents) (37K stars)
- Adversarial verifier: [moonrunnerkc/swarm-orchestrator](https://github.com/moonrunnerkc/swarm-orchestrator)
**The harness is your foundation. Without it, every loop runs on thin air.**
发布时间: 2026-07-05 14:27