Abstract
Interactive Generative Video (IGV) has emerged as a crucial technology inresponse to the growing demand for high-quality, interactive video contentacross various domains. In this paper, we define IGV as a technology thatcombines generative capabilities to produce diverse high-quality video contentwith interactive features that enable user engagement through control signalsand responsive feedback. We survey the current landscape of IGV applications,focusing on three major domains: 1) gaming, where IGV enables infiniteexploration in virtual worlds; 2) embodied AI, where IGV serves as aphysics-aware environment synthesizer for training agents in multimodalinteraction with dynamically evolving scenes; and 3) autonomous driving, whereIGV provides closed-loop simulation capabilities for safety-critical testingand validation. To guide future development, we propose a comprehensiveframework that decomposes an ideal IGV system into five essential modules:Generation, Control, Memory, Dynamics, and Intelligence. Furthermore, wesystematically analyze the technical challenges and future directions inrealizing each component for an ideal IGV system, such as achieving real-timegeneration, enabling open-domain control, maintaining long-term coherence,simulating accurate physics, and integrating causal reasoning. We believe thatthis systematic analysis will facilitate future research and development in thefield of IGV, ultimately advancing the technology toward more sophisticated andpractical applications.