Abstract
Online learning to rank is a sequential decision-making problem where in eachround the learning agent chooses a list of items and receives feedback in theform of clicks from the user. Many sample-efficient algorithms have beenproposed for this problem that assume a specific click model connectingrankings and user behavior. We propose a generalized click model thatencompasses many existing models, including the position-based and cascademodels. Our generalization motivates a novel online learning algorithm based ontopological sort, which we call TopRank. TopRank is (a) more natural thanexisting algorithms, (b) has stronger regret guarantees than existingalgorithms with comparable generality, (c) has a more insightful proof thatleaves the door open to many generalizations, (d) outperforms existingalgorithms empirically.