Wink Pings

How a 2.6B Small Model on Phones Can Surpass GPT-4?

An open-source model with just 26 billion parameters has outperformed GPT-4 in multiple benchmark tests, marking a fundamental shift in AI training methods.

When ChatGPT upgraded from GPT-3.5 to GPT-4, many felt the震撼 of technological leap. But now, a model with only 2.6B parameters that can run locally on phones has surpassed the initial GPT-4 in key tests.

![Benchmarking LFM2-2.6B-Exp vs. GPT-4](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG9C2z-6XUAAGTFu%3Fformat%3Djpg%26name%3Dlarge)

The LFM2-2.6B-Exp model by Liquid AI employs pure reinforcement learning training. Test data shows this small model excels in instruction-following tasks like IFEval (strict instruction adherence), Multi-IF (complex instructions), and IFBench (format accuracy), even showing competitiveness in high-difficulty tasks like GPQA (doctoral-level reasoning) and AIME25 (Olympiad-level math).

![Model Performance Comparison Chart](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG9BDxHdW4AA9GeF%3Fformat%3Dpng%26name%3Dlarge)

Notably, the model's score on IFBench surpasses DeepSeek R1-0528—a model 263 times larger in parameter scale. Developers comment that this proves parameter count is becoming a vanity metric, while intelligence density is the new key.

Looking at the timeline of technological development, it took only about 3 years from GPT-4's release to now having a phone-running model with comparable capabilities. This pace makes one question: Are we too fixated on expanding model scale while ignoring improvements in training efficiency?

Developers who have tried the model report fast response times, making it suitable for integration into local AI applications. However, some argue that while benchmark victories are important, real-world deployment requires considering engineering factors like inference speed and accuracy loss.

Regardless, this small model's performance确实 challenges the "bigger is better" assumption. When 2.6B parameters can achieve or even exceed performance once requiring trillions, the direction of AI development may be quietly shifting.

发布时间: 2025-12-26 05:27