Wink - AI原生创新，忠于用户，专属智能体验

Using a 175-billion-parameter GPT-3 to handle a simple API call task is like opening a beer bottle with a nuclear warhead. NVIDIA's paper finally lays it bare: the most significant AI transformation in 2025 may come from the uprising of small models (SLMs).

![Image 1](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG0E0O5tbsAAuoDe%3Fformat%3Dpng%26name%3Dlarge)

The absurd reality of current AI agents is that no matter how simple the task, they default to calling GPT-4-level large models. Researchers crunched the numbers—70% of daily agent tasks (document summarization, data extraction, template generation) don't require LLMs' "reasoning superpowers" at all. SLMs are not only sufficient but superior.

Some counterintuitive examples:

- The 6.7B-parameter Toolformer outperforms GPT-3 in API tool calls

- The 7B-parameter DeepSeek-R1-Distill surpasses Claude 3.5 in reasoning tasks

- Models like Phi-3 have proven that parameter scale and task performance aren't positively correlated

![Image 2](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG0E0QtrboAAbi15%3Fformat%3Dpng%26name%3Dlarge)

SLMs' killer advantages:

1. 10-30x lower cost

2. 80% reduction in response latency

3. Energy consumption as low as 5% of LLMs

4. Deployable on consumer-grade hardware

5. More stable output formats (JSON/XML/code)

![Image 3](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG0E0RkPaQAEUDs3%3Fformat%3Dpng%26name%3Dlarge)

The industry's hesitation to shift is pragmatic:

- Sunk costs in existing LLM infrastructure

- Evaluation standards overly biased toward general capabilities

- Small models lack media visibility

But practices like MetaGPT show that 60%-70% of LLM calls can be entirely replaced by SLMs. The correct architecture for future agent systems should be: use SLMs for standardized tasks and call LLMs only when genuine creativity is needed—just as you wouldn't hire a Nobel laureate for every math problem.

Paper link: [arxiv.org/abs/2506.02153](https://arxiv.org/abs/2506.02153)

Wink Pings

When Small Models Start Disrupting Large Models: Why NVIDIA's SLM Framework Matters