3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-scale 3D Point Clouds

Abstract

Semantic parsing of large-scale 3D point clouds is an important researchtopic in computer vision and remote sensing fields. Most existing approachesutilize hand-crafted features for each modality independently and combine themin a heuristic manner. They often fail to consider the consistency andcomplementary information among features adequately, which makes them difficultto capture high-level semantic structures. The features learned by most of thecurrent deep learning methods can obtain high-quality image classificationresults. However, these methods are hard to be applied to recognize 3D pointclouds due to unorganized distribution and various point density of data. Inthis paper, we propose a 3DCNN-DQN-RNN method which fuses the 3D convolutionalneural network (CNN), Deep Q-Network (DQN) and Residual recurrent neuralnetwork (RNN) for an efficient semantic parsing of large-scale 3D point clouds.In our method, an eye window under control of the 3D CNN and DQN can localizeand segment the points of the object class efficiently. The 3D CNN and ResidualRNN further extract robust and discriminative features of the points in the eyewindow, and thus greatly enhance the parsing accuracy of large-scale pointclouds. Our method provides an automatic process that maps the raw data to theclassification results. It also integrates object localization, segmentationand classification into one framework. Experimental results demonstrate thatthe proposed method outperforms the state-of-the-art point cloud classificationmethods.

Quick Read (beta)

loading the full paper ...