P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting

Abstract

Nowadays, pre-training big models on large-scale datasets has become acrucial topic in deep learning. The pre-trained models with high representationability and transferability achieve a great success and dominate manydownstream tasks in natural language processing and 2D vision. However, it isnon-trivial to promote such a pretraining-tuning paradigm to the 3D vision,given the limited training data that are relatively inconvenient to collect. Inthis paper, we provide a new perspective of leveraging pre-trained 2D knowledgein 3D domain to tackle this problem, tuning pre-trained image models with thenovel Point-to-Pixel prompting for point cloud analysis at a minor parametercost. Following the principle of prompting engineering, we transform pointclouds into colorful images with geometry-preserved projection andgeometry-aware coloring to adapt to pre-trained image models, whose weights arekept frozen during the end-to-end optimization of point cloud analysis tasks.We conduct extensive experiments to demonstrate that cooperating with ourproposed Point-to-Pixel Prompting, better pre-trained image model will lead toconsistently better performance in 3D vision. Enjoying prosperous developmentfrom image pre-training field, our method attains 89.3% accuracy on the hardestsetting of ScanObjectNN, surpassing conventional point cloud models with muchfewer trainable parameters. Our framework also exhibits very competitiveperformance on ModelNet classification and ShapeNet Part Segmentation. Code isavailable at https://github.com/wangzy22/P2P

Quick Read (beta)

loading the full paper ...