Abstract
The neuromorphic event cameras have overwhelming advantages in temporalresolution, power efficiency, and dynamic range compared to traditionalcameras. However, the event cameras output asynchronous, sparse, and irregularevents, which are not compatible with mainstream computer vision and deeplearning methods. Various methods have been proposed to solve this issue but atthe cost of long preprocessing procedures, losing temporal resolutions, orbeing incompatible with massively parallel computation. Inspired by the greatsuccess of the word to vector, we summarize the similarities between words andevents, then propose the first event to vector (event2vec) representation. Wevalidate event2vec on classifying the ASL-DVS dataset, showing impressiveparameter efficiency, accuracy, and speed than previous graph/image/voxel-basedrepresentations. Beyond task performance, the most attractive advantage ofevent2vec is that it aligns events to the domain of natural languageprocessing, showing the promising prospect of integrating events into largelanguage and multimodal models. Our codes, models, and training logs areavailable at https://github.com/fangwei123456/event2vec.