Abstract
Temporal object detection has attracted significant attention, but mostpopular detection methods can not leverage the rich temporal information invideo or robotic vision. Although many different algorithms have been developedfor video detection task, real-time online approaches are frequently deficient.In this paper, based on attention mechanism and convolutional long short-termmemory (ConvLSTM), we propose a temporal single-shot detector (TSSD) forrobotic vision. Distinct from previous methods, we take aim at temporallyintegrating pyramidal feature hierarchy using ConvLSTM, and design a novelstructure including a high-level ConvLSTM unit as well as a low-level one(HL-LSTM) for multi-scale feature maps. Moreover, we develop a creativetemporal analysis unit, namely ConvLSTM-based attention and attention-basedConvLSTM (A&CL), in which ConvLSTM-based attention is specially tailored forbackground suppression and scale suppression while attention-based ConvLSTMtemporally integrates attention-aware features. Finally, our method isevaluated on ImageNet VID dataset. Extensive comparisons on the detectioncapability confirm or validate the superiority of the proposed approach.Consequently, the developed TSSD is fairly faster and achieves a considerablyenhanced performance in terms of mean average precision. As a temporal,real-time, and online detector, TSSD is applicable to robot's intelligentperception.