Abstract
Video coding, which targets to compress and reconstruct the whole frame, andfeature compression, which only preserves and transmits the most criticalinformation, stand at two ends of the scale. That is, one is with compactnessand efficiency to serve for machine vision, and the other is with fullfidelity, bowing to human perception. The recent endeavors in imminent trendsof video compression, e.g. deep learning based coding tools and end-to-endimage/video coding, and MPEG-7 compact feature descriptor standards, i.e.Compact Descriptors for Visual Search and Compact Descriptors for VideoAnalysis, promote the sustainable and fast development in their own directions,respectively. In this paper, thanks to booming AI technology, e.g. predictionand generation models, we carry out exploration in the new area, Video Codingfor Machines (VCM), arising from the emerging MPEG standardization efforts1.Towards collaborative compression and intelligent analytics, VCM attempts tobridge the gap between feature coding for machine vision and video coding forhuman vision. Aligning with the rising Analyze then Compress instance DigitalRetina, the definition, formulation, and paradigm of VCM are given first.Meanwhile, we systematically review state-of-the-art techniques in videocompression and feature compression from the unique perspective of MPEGstandardization, which provides the academic and industrial evidence to realizethe collaborative compression of video and feature streams in a broad range ofAI applications. Finally, we come up with potential VCM solutions, and thepreliminary results have demonstrated the performance and efficiency gains.Further direction is discussed as well.