StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

  • 2025-05-08 18:57:40
  • Haibo Wang, Bo Feng, Zhengfeng Lai, Mingze Xu, Shiyu Li, Weifeng Ge, Afshin Dehghan, Meng Cao, Ping Huang
  • 0

Abstract

We present StreamBridge, a simple yet effective framework that seamlesslytransforms offline Video-LLMs into streaming-capable models. It addresses twofundamental challenges in adapting existing models into online scenarios: (1)limited capability for multi-turn real-time understanding, and (2) lack ofproactive response mechanisms. Specifically, StreamBridge incorporates (1) amemory buffer combined with a round-decayed compression strategy, supportinglong-context multi-turn interactions, and (2) a decoupled, lightweightactivation model that can be effortlessly integrated into existing Video-LLMs,enabling continuous proactive responses. To further support StreamBridge, weconstruct Stream-IT, a large-scale dataset tailored for streaming videounderstanding, featuring interleaved video-text sequences and diverseinstruction formats. Extensive experiments show that StreamBridge significantlyimproves the streaming understanding capabilities of offline Video-LLMs acrossvarious tasks, outperforming even proprietary models such as GPT-4o and Gemini1.5 Pro. Simultaneously, it achieves competitive or superior performance onstandard video understanding benchmarks.

 

Quick Read (beta)

loading the full paper ...