NVIDIA LongLive: Real-time Interactive Long Video Generation, 240 Seconds on a Single H100
NVIDIA and its partners have released the LongLive text-to-video system, breaking through the existing 5-10 second limitations of current models. It achieves smooth generation of up to 240 seconds on a single H100 card, with support for switching prompts mid-generation while maintaining visual continuity. Key technologies include KV recaching, streaming long-tuning, and short-window attention mechanisms.
NVIDIA and its collaborators have just unveiled LongLive, a text-to-video system that finally solves the challenges of long-form and interactive video generation.
Current models can typically only output 5 to 10 second clips, but LongLive can handle videos up to 240 seconds long on a single H100 GPU, and it maintains smooth visuals and responsiveness even when you switch prompts mid-generation.
It combines several key technologies:
- **KV Recaching**: Enables seamless transitions between prompts
- **Streaming Long-Tuning**: Manages generation for ultra-long sequences
- **Short-Window Attention + Frame Submerging**: Balances speed and context
Benchmark tests show that while baseline models achieve less than 1 frame per second, LongLive can deliver over 20 frames per second while maintaining high-quality output.
Paper link: https://arxiv.org/abs/2509.22622
HuggingFace model: https://huggingface.co/Efficient-Large-Model/LongLive-1.3B
Video demonstration: https://youtu.be/caDE6f54pvA
发布时间: 2025-09-29 22:03