IBM Granite 4.0: Squeezing AI into Browsers with a Hybrid Architecture
IBM has released the next generation of its open-source large model, Granite 4.0. Featuring a Mamba-2/Transformer hybrid architecture, the 3.4B parameter model can run directly in a browser.
IBM has squeezed its new model into the browser.
They've just joined Hugging Face Enterprise and open-sourced the Granite 4.0 series. This model family uses a hybrid Mamba-2 and Transformer architecture, significantly reducing memory usage with minimal loss in accuracy.

The most surprising part is the 3.4B parameter "Micro" version. It can run locally in a browser via WebGPU without needing a server. This means you can use it just by opening a webpage, as simple as clicking a link.
Someone is already testing it.

This model series targets enterprise-level applications—tool calling, document analysis, and RAG. Especially in scenarios where data needs to be kept local, running in the browser becomes crucial.
Now, developers have used the NexaSDK to run Granite-4.0-Micro on a Qualcomm NPU with just one line of code. Switching between GPU and CPU is also straightforward.
The model series includes various sizes from 3B to 32B, all open-sourced under the Apache 2.0 license. IBM claims these models are ISO/IEC 42001 certified, which should give enterprise users peace of mind.
The hybrid architecture could be a turning point. Traditional Transformer models require a lot of memory, while the Mamba architecture is more efficient for processing long sequences. Combining the two maintains performance while lowering the barrier to entry.
You can now find all the models on Hugging Face, and there are also online demos to try them out. For small teams, being able to use enterprise-grade AI tools without building complex infrastructure is a real game-changer.
发布时间: 2025-10-03 01:13