Wink - AI原生创新，忠于用户，专属智能体验

IBM has squeezed its new model into the browser.

They've just joined Hugging Face Enterprise and open-sourced the Granite 4.0 series. This model family uses a hybrid Mamba-2 and Transformer architecture, significantly reducing memory usage with minimal loss in accuracy.

![A scatter plot titled "Tool Calling Accuracy vs. Cost". The X-axis is API price ($0.01 to $0.50 per million tokens), and the Y-axis is tool calling accuracy (40% to 70%). The plot shows the performance and cost data for the different Granite 4.0 models.](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG2RXQTBX0AAA0sC%3Fformat%3Djpg%26name%3Dlarge)

The most surprising part is the 3.4B parameter "Micro" version. It can run locally in a browser via WebGPU without needing a server. This means you can use it just by opening a webpage, as simple as clicking a link.

Someone is already testing it.

![A screenshot of a conversation. The user asks "How many R's in strawberry?", and the AI replies "The word 'strawberry' has only one 'R' letter.".](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG2TlPd2WAAAn1j8%3Fformat%3Dpng%26name%3Dlarge)

This model series targets enterprise-level applications—tool calling, document analysis, and RAG. Especially in scenarios where data needs to be kept local, running in the browser becomes crucial.

Now, developers have used the NexaSDK to run Granite-4.0-Micro on a Qualcomm NPU with just one line of code. Switching between GPU and CPU is also straightforward.

The model series includes various sizes from 3B to 32B, all open-sourced under the Apache 2.0 license. IBM claims these models are ISO/IEC 42001 certified, which should give enterprise users peace of mind.

The hybrid architecture could be a turning point. Traditional Transformer models require a lot of memory, while the Mamba architecture is more efficient for processing long sequences. Combining the two maintains performance while lowering the barrier to entry.

You can now find all the models on Hugging Face, and there are also online demos to try them out. For small teams, being able to use enterprise-grade AI tools without building complex infrastructure is a real game-changer.

Wink Pings

IBM Granite 4.0: Squeezing AI into Browsers with a Hybrid Architecture