Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Abstract

Sora is a text-to-video generative AI model, released by OpenAI in February2024. The model is trained to generate videos of realistic or imaginativescenes from text instructions and show potential in simulating the physicalworld. Based on public technical reports and reverse engineering, this paperpresents a comprehensive review of the model's background, relatedtechnologies, applications, remaining challenges, and future directions oftext-to-video AI models. We first trace Sora's development and investigate theunderlying technologies used to build this "world simulator". Then, we describein detail the applications and potential impact of Sora in multiple industriesranging from film-making and education to marketing. We discuss the mainchallenges and limitations that need to be addressed to widely deploy Sora,such as ensuring safe and unbiased video generation. Lastly, we discuss thefuture development of Sora and video generation models in general, and howadvancements in the field could enable new ways of human-AI interaction,boosting productivity and creativity of video generation.

Quick Read (beta)

loading the full paper ...