Abstract
Few-shot image classifiers are designed to recognize and classify new datawith minimal supervision and limited data but often show reliance on spuriouscorrelations between classes and spurious attributes, known as spurious bias.Spurious correlations commonly hold in certain samples and few-shot classifierscan suffer from spurious bias induced from them. There is an absence of anautomatic benchmarking system to assess the robustness of few-shot classifiersagainst spurious bias. In this paper, we propose a systematic and rigorousbenchmark framework, termed FewSTAB, to fairly demonstrate and quantify varieddegrees of robustness of few-shot classifiers to spurious bias. FewSTAB createsfew-shot evaluation tasks with biased attributes so that using them forpredictions can demonstrate poor performance. To construct these tasks, wepropose attribute-based sample selection strategies based on a pre-trainedvision-language model, eliminating the need for manual dataset curation. Thisallows FewSTAB to automatically benchmark spurious bias using any existing testdata. FewSTAB offers evaluation results in a new dimension along with a newdesign guideline for building robust classifiers. Moreover, it can benchmarkspurious bias in varied degrees and enable designs for varied degrees ofrobustness. Its effectiveness is demonstrated through experiments on tenfew-shot learning methods across three datasets. We hope our framework caninspire new designs of robust few-shot classifiers. Our code is available athttps://github.com/gtzheng/FewSTAB.