Abstract
The size of vision models has grown exponentially over the last few years,especially after the emergence of Vision Transformer. This has motivated thedevelopment of parameter-efficient tuning methods, such as learning adapterlayers or visual prompt tokens, which allow a tiny portion of model parametersto be trained whereas the vast majority obtained from pre-training are frozen.However, designing a proper tuning method is non-trivial: one might need to tryout a lengthy list of design choices, not to mention that each downstreamdataset often requires custom designs. In this paper, we view the existingparameter-efficient tuning methods as "prompt modules" and propose NeuralprOmpt seArcH (NOAH), a novel approach that learns, for large vision models,the optimal design of prompt modules through a neural architecture searchalgorithm, specifically for each downstream dataset. By conducting extensiveexperiments on over 20 vision datasets, we demonstrate that NOAH (i) issuperior to individual prompt modules, (ii) has a good few-shot learningability, and (iii) is domain-generalizable. The code and models are availableat https://github.com/Davidzhangyuanhan/NOAH.