Is Qwen3 Omni's Real-time Speech Feature Available Now?
Qwen3 Omni claims to support real-time voice conversation, but the community has found deployment challenging, with inference engines like vLLM not yet fully supporting audio output.

When Qwen3 Omni was released, one of its most exciting features was real-time speech interaction. The official documentation claimed you could speak continuously, and the model would respond with voice.
However, when people actually tried to use it, they discovered it wasn't that simple. There are no detailed tutorials on GitHub, and many people in the community are asking the same question: how do I get this feature running?
User SOCSChamp mentioned that while the model has been downloaded extensively, no one has successfully implemented true voice-to-voice conversation yet. Everyone is waiting for inference engines like vLLM to support audio output.
Currently, the main codebase of vLLM has merged support for the Qwen3 Omni "Thinker" path, which can process multimodal inputs (such as images, videos, audio), but its OpenAI-compatible server doesn't yet support direct audio output. This means you can interact with the model through text, but to hear responses, you still need to connect your own TTS (text-to-speech) system.
Some users have tried running the official notebook example on 4 RTX 4090 cards, but the inference speed was too slow to meet the requirements for real-time interaction.
So, if you're expecting a fully integrated, end-to-end voice conversation model, Qwen3 Omni might not deliver that yet. It's more like a powerful multimodal text model, with its speech functionality still requiring additional toolchains and waiting for engine updates.
True real-time speech may still be a while away.
发布时间: 2025-10-21 12:21