Fully Gap-Dependent Bounds for Multinomial Logit Bandit

  • 2020-11-19 17:52:12
  • Jiaqi Yang
  • 0


We study the multinomial logit (MNL) bandit problem, where at each time step,the seller offers an assortment of size at most $K$ from a pool of $N$ items,and the buyer purchases an item from the assortment according to a MNL choicemodel. The objective is to learn the model parameters and maximize the expectedrevenue. We present (i) an algorithm that identifies the optimal assortment$S^*$ within $\widetilde{O}(\sum_{i = 1}^N \Delta_i^{-2})$ time steps with highprobability, and (ii) an algorithm that incurs $O(\sum_{i \notin S^*}K\Delta_i^{-1} \log T)$ regret in $T$ time steps. To our knowledge, ouralgorithms are the first to achieve gap-dependent bounds that fully depends onthe suboptimality gaps of all items. Our technical contributions include analgorithmic framework that relates the MNL-bandit problem to a variant of thetop-$K$ arm identification problem in multi-armed bandits, a generalizedepoch-based offering procedure, and a layer-based adaptive estimationprocedure.


Quick Read (beta)

loading the full paper ...