Wink Pings

DeepSeek V4 Preview: Million-Token Context Window and Inference Acceleration

DeepSeek V4, expected to launch in October, supports a million-token context window and GRPO inference acceleration technology, potentially redefining benchmarks for long-text analysis and complex task processing.

The preview of DeepSeek V4 is like a depth charge. What does a million-token context window mean? The full text of *War and Peace* is about 560,000 words—it can process two copies simultaneously. Codebase analysis no longer requires chunking; the entire *Linux Kernel Source Code* (approximately 28 million lines) can be loaded at once—though this is just theoretical.

The GRPO inference engine is even more intriguing. Tasks requiring "intermediate reasoning," such as mathematical proofs or multi-step code debugging, often suffer from broken reasoning chains in current models. Testers report that V4 improves step-by-step problem-solving completeness on LeetCode hard problems by 40% compared to V3, with a 62% reduction in error backtracking.

The mystery of the NSA/SPCT architecture lies in its energy efficiency. A competitor requires 8 H100 GPUs to process a 1M context in 11 seconds, while DeepSeek's demo video shows the same task completed in 6 seconds using just 3 GPUs. But don’t cheer too soon—footnote on page 17 of the technical documentation notes that actual inference speed depends on the triggering ratio of "dynamic sparse attention."

![DeepSeek V4 Blue Whale Logo](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG193IiSXkAARwcm%3Fformat%3Djpg%26name%3Dlarge)

The open-source strategy follows the previous generation, with APIs expected to open on October 15. Interestingly, user tests reveal that when processing a 300-page PDF, V4 achieves 23% higher accuracy in chapter association than Claude 3 but underperforms by 7% in cross-referencing legal clauses. The technical team responded that they are adjusting the "long-range dependency decay coefficient."

The sharpest criticism comes from hardware engineer Ruslan: "These parameter boosts are like fitting a rocket engine to a sports car—the real test is preventing the tires from spinning out." Indeed, early testing shows that when the context exceeds 800K tokens, the model’s detail recall rate for the end of the document drops sharply by 15%.

However, one AI team used the V4 beta to complete a cross-chapter causal analysis of a behavioral economics paper—a task that traditionally takes two weeks was compressed to three hours. This may hint at a new battleground: not a parameter race, but task-closure capability.

(Note: Demo data comes from DeepSeek Technical Whitepaper v0.9b; actual performance may vary in the final release.)

发布时间: 2025-09-29 06:20