Learning to Segment Rigid Motions from Two Frames

Abstract

Appearance-based detectors achieve remarkable performance on common scenes,but tend to fail for scenarios lack of training data. Geometric motionsegmentation algorithms, however, generalize to novel scenes, but have yet toachieve comparable performance to appearance-based ones, due to noisy motionestimations and degenerate motion configurations. To combine the best of bothworlds, we propose a modular network, whose architecture is motivated by ageometric analysis of what independent object motions can be recovered from anegomotion field. It takes two consecutive frames as input and predictssegmentation masks for the background and multiple rigidly moving objects,which are then parameterized by 3D rigid transformations. Our method achievesstate-of-the-art performance for rigid motion segmentation on KITTI and Sintel.The inferred rigid motions lead to a significant improvement for depth andscene flow estimation. At the time of submission, our method ranked 1st onKITTI scene flow leaderboard, out-performing the best published method (sceneflow error: 4.89% vs 6.31%).

Quick Read (beta)

loading the full paper ...