FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

Abstract

Modern Language Models (LMs) are capable of following long and complexinstructions that enable a large and diverse set of user requests. WhileInformation Retrieval (IR) models use these LMs as the backbone of theirarchitectures, virtually none of them allow users to provide detailedinstructions alongside queries, thus limiting their ability to satisfy complexinformation needs. In this work, we study the use of instructions in IRsystems. First, we introduce our dataset FollowIR, which contains a rigorousinstruction evaluation benchmark as well as a training set for helping IRmodels learn to better follow real-world instructions. FollowIR repurposesdetailed instructions -- also known as narratives -- developed for professionalassessors to evaluate retrieval systems. In particular, we build our benchmarkfrom three collections curated for shared tasks at the Text REtrievalConference (TREC). These collections contains hundreds to thousands of labeleddocuments per query, making them suitable for our exploration. Through thisprocess, we can measure how well IR models follow instructions, through a newpairwise evaluation framework. Our results indicate that existing retrievalmodels fail to correctly use instructions, using them for basic keywords andstruggling to understand long-form information. However, we show that it ispossible for IR models to learn to follow complex instructions: our newFollowIR-7B model has significant improvements after fine-tuning on ourtraining set.

Quick Read (beta)

loading the full paper ...