Abstract
Error slice discovery associates structured patterns with model errors.Existing methods discover error slices by clustering the error-prone sampleswith similar patterns or assigning discrete attributes to each sample forpost-hoc analysis. While these methods aim for interpretability and easiermitigation through reweighting or rebalancing, they may not capture the fullcomplexity of error patterns due to incomplete or missing attributes. Contraryto the existing approach, this paper utilizes the reasoning capabilities of theLarge Language Model (LLM) to analyze complex error patterns and generatetestable hypotheses. This paper proposes LADDER: Language Driven sliceDiscovery and Error Rectification. It first projects the model's representationinto a language-aligned feature space (eg CLIP) to preserve semantics in theoriginal model feature space. This ensures the accurate retrieval of sentencesthat highlight the model's errors. Next, the LLM utilizes the sentences andgenerates hypotheses to discover error slices. Finally, we mitigate the errorby fine-tuning the classification head by creating a group-balanced datasetusing the hypotheses. Our entire method does not require any attributeannotation, either explicitly or through external tagging models. We validateour method with \textbf{five} image classification datasets. The code isavailable (https://github.com/batmanlab/Ladder).