Gemini 3.5 Flash: Outperforms peers in speed, gains 9 points, 5x price hike, hallucination rate still stands at 61%
Google launched the new generation Gemini 3.5 Flash large language model on May 19, 2026. Third-party independent tests show its composite intelligence score has increased by 9 points compared to the previous generation, with an output speed of 284 tokens per second, multimodal capabilities leading the industry, and Agent task performance has significantly filled the gap. However, combined with a 3x model price increase and higher token consumption, the overall usage cost has risen 5.5 times from the previous generation. While its hallucination rate dropped from 92% to 61%, it remains at a high level, and the industry has widespread controversy over its cost-performance ratio and performance in long-workflow tasks.
On May 19, 2026, Google DeepMind launched Gemini 3.5 Flash, the newest addition to the Gemini model family. Third-party AI evaluation firm Artificial Analysis obtained early access to the model and released full independent test data covering performance, cost, and speed.
Previously, the Gemini Flash series has always positioned itself as faster and more affordable at the same performance level, serving as a budget alternative to the Pro series. This release of 3.5 Flash has broken away from this original positioning in both performance and pricing.
### Core Performance Improvements
In Artificial Analysis' composite intelligence index test, Gemini 3.5 Flash scored 55 points, a 9-point improvement over the previous generation 3 Flash. It ranked 7th among all tested models, surpassing popular models including Grok 4.3 (53 points) and Claude Sonnet 4.6 (52 points).

Speed is one of the biggest highlights of this release. The model's output speed reaches 284 tokens per second, 70% faster than the previous generation. It ranked 2nd among all tested inference models, only slightly lower than the open-source gpt-oss-120B, and nearly 3 times faster than models such as GPT-5.4 mini and Claude Sonnet. Combined with its 55-point intelligence score, it has entered the Pareto optimal range of speed-intelligence tradeoff, joining Gemini 3.1 Pro and 3.1 Flash Lite in the first tier of "fast speed, strong performance" models.
Agent capability is the most obvious shortcoming that has been improved in this iteration. In the GDPval-AA benchmark, which measures performance on complex real-world tasks, Gemini 3.5 Flash scored an Elo rating of 1656. This is not only far higher than the previous generation 3 Flash's 1204, but even outperformed the higher-positioned Gemini 3.1 Pro (1314), only slightly lower than GPT-5.4's 1674, fixing the long-standing weak Agent capability of the Gemini series.

It is worth noting that Gemini 3.5 Flash completes Agent tasks with an average of 49 turns, higher than models such as Claude Opus 4.7 (45 turns) and GPT-5.4 (40 turns). More turns mean higher token consumption, which directly pushes up usage costs.
Multimodal capability continues Google's traditional advantage. In the MMMU-Pro benchmark, which measures multimodal visual reasoning ability, Gemini 3.5 Flash scored 84 points, the highest among all tested models to date, 2 points higher than the second-place Gemini 3.1 Pro. Unlike many cutting-edge models that only support image input, it supports four input modalities: text, image, audio, and video, covering more multimodal usage scenarios.

Improvements to the hallucination problem is another point of widespread attention. In AA's knowledge and hallucination benchmark, Gemini 3.5 Flash's hallucination rate dropped from the previous generation's 92% to 61%, a 31 percentage point drop, making it one of the models with the most notable hallucination rate improvement among all tested models. However, some industry practitioners point out that a 61% hallucination rate means nearly 2 out of every 3 responses contain factual errors, and any Agent workflow built on this model must prioritize validation logic over performance improvements.

### The Most Controversial Cost Issue
In terms of pricing, Gemini 3.5 Flash's official pricing is $1.50 per 1 million input tokens and $9.00 per 1 million output tokens, a direct 3x increase from the previous generation 3 Flash's $0.50/$3.00. While Google offers a 90% discount on cached inputs, it charges an additional hourly cache storage fee, and not all usage scenarios can qualify for the discount.
Combined with the increased input token consumption brought by more Agent task turns, the total cost to run the full Artificial Analysis intelligence index test reached $1552, which is 5.5 times that of the previous generation 3 Flash, and even 75% more expensive than the higher-positioned Gemini 3.1 Pro.
A performance-price comparison chart shared by an industry practitioner shows that Gemini 3.5 Flash is now more expensive than top-tier cutting-edge models such as GPT-5.5, while its composite performance still lags behind.

Many practitioners noted in comments that the 5x cost increase is the most critical decision factor: for speed-sensitive scenarios that can absorb the higher cost, it has strong competitiveness, but for general daily inference scenarios, the price hike directly makes it lose its original positioning as the "default budget model", and can only serve as an option for specific scenarios in routing strategies.
### Other Key Parameters
Gemini 3.5 Flash retains the previous generation's 1 million token context window (equivalent to approximately 750,000 words of text), which remains among the top tier in the industry; however, its time-to-first-token is 17.75 seconds, far higher than the median of 2.72 seconds for models in the same price range. This means that despite its fast output speed, the wait for it to start responding is very long, making it unsuitable for conversational scenarios that require quick replies. The model is a closed-source inference model, Google has not publicly disclosed its parameter scale, and API access is now open.
### Industry Division
There is currently clear division in the industry's evaluation of this model:
- The positive view holds that the combination of "high intelligence + extreme speed" is one-of-a-kind on the current market, suitable for scenarios requiring fast generation of long text and medium-complexity Agent tasks, and its multimodal advantage is also very prominent;
- The skeptical view points out that models such as DeepSeek V4 Pro and MiMo V2.5 Pro have performance close to Gemini 3.5 Flash at a much lower price point, resulting in poor cost-performance; a 61% hallucination rate is still not enough to support high-reliability Agent workflows; all current tests are for short-chain tasks, and performance degradation in long multi-step Agent scenarios has not yet been verified.
Judging from this round of pricing and positioning adjustment, Google has shifted the Flash series from its original positioning as an "affordable volume entry-level model" to a "speed-priority mainstay model", and the Flash series' original biggest advantage of high cost-performance no longer exists.
发布时间: 2026-05-20 06:55