Abstract
Since the advent of Multimodal Large Language Models (MLLMs), they have madea significant impact across a wide range of real-world applications,particularly in Autonomous Driving (AD). Their ability to process complexvisual data and reason about intricate driving scenarios has paved the way fora new paradigm in end-to-end AD systems. However, the progress of developingend-to-end models for AD has been slow, as existing fine-tuning methods demandsubstantial resources, including extensive computational power, large-scaledatasets, and significant funding. Drawing inspiration from recent advancementsin inference computing, we propose OpenEMMA, an open-source end-to-endframework based on MLLMs. By incorporating the Chain-of-Thought reasoningprocess, OpenEMMA achieves significant improvements compared to the baselinewhen leveraging a diverse range of MLLMs. Furthermore, OpenEMMA demonstrateseffectiveness, generalizability, and robustness across a variety of challengingdriving scenarios, offering a more efficient and effective approach toautonomous driving. We release all the codes inhttps://github.com/taco-group/OpenEMMA.