Wink Pings

DeepSeek-V3.2 Released: Sparse Attention Enables Constant Inference Speed, Prices Drop by 70%

DeepSeek-V3.2 implements sparse attention mechanism to achieve constant decoding speed, with input and output prices dropping to $0.28/M and $0.42/M respectively. Performance remains comparable to V3.1 while costs are significantly reduced.

DeepSeek has just released version V3.2, bringing significant changes in both pricing and performance.

In terms of pricing, input tokens have decreased from $0.56 per million to $0.28 per million, while output tokens have dropped from $1.68 per million to $0.42 per million, representing a reduction of 50%-75%. More importantly, performance remains essentially consistent with V3.1.

The core technical breakthrough is the DeepSeek Sparse Attention mechanism. According to the technical report, this sparse attention calculates attention for only k selected previous tokens, dynamically determining which tokens to focus on through an index selector. Unlike traditional linear attention models, DeepSeek's approach maintains the ability for the state to grow with sequence length.

![DeepSeek V3.2 Architecture Diagram](https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/raw/main/DeepSeek_V3_2.pdf)

Interestingly, DeepSeek implemented sparsification directly on top of V3.1's weights, rather than requiring retraining from scratch as previously thought. If this approach proves to be generalizable, it could have significant implications for the entire industry.

Community discussions have focused on the implementation details of the sparse attention. Some researchers have pointed out that attention maps themselves are compressible, but efficiently utilizing this property presents technical challenges. DeepSeek's solution combines three techniques: sliding windows, compression, and selective attention.

The model is currently available on HuggingFace, with the technical report and code both open-sourced. For applications requiring long sequence processing, this version deserves special attention.

发布时间: 2025-09-29 18:04