How Real is CARLAs Dynamic Vision Sensor? A Study on the Sim-to-Real Gap in Traffic Object Detection

Abstract

Event cameras are gaining traction in traffic monitoring applications due totheir low latency, high temporal resolution, and energy efficiency, which makesthem well-suited for real-time object detection at traffic intersections.However, the development of robust event-based detection models is hindered bythe limited availability of annotated real-world datasets. To address this,several simulation tools have been developed to generate synthetic event data.Among these, the CARLA driving simulator includes a built-in dynamic visionsensor (DVS) module that emulates event camera output. Despite its potential,the sim-to-real gap for event-based object detection remains insufficientlystudied. In this work, we present a systematic evaluation of this gap bytraining a recurrent vision transformer model exclusively on synthetic datagenerated using CARLAs DVS and testing it on varying combinations of syntheticand real-world event streams. Our experiments show that models trained solelyon synthetic data perform well on synthetic-heavy test sets but suffersignificant performance degradation as the proportion of real-world dataincreases. In contrast, models trained on real-world data demonstrate strongergeneralization across domains. This study offers the first quantifiableanalysis of the sim-to-real gap in event-based object detection using CARLAsDVS. Our findings highlight limitations in current DVS simulation fidelity andunderscore the need for improved domain adaptation techniques in neuromorphicvision for traffic monitoring.

Quick Read (beta)

loading the full paper ...