Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems

Abstract

We use a novel modification of Multi-Armed Bandits to create a new model forrecommendation systems. We model the recommendation system as a bandit seekingto maximize reward by pulling on arms with unknown rewards. The catch howeveris that this bandit can only access these arms through an unreliableintermediate that has some level of autonomy while choosing its arms. Forexample, in a streaming website the user has a lot of autonomy while choosingcontent they want to watch. The streaming sites can use targeted advertising asa means to bias opinions of these users. Here the streaming site is the banditaiming to maximize reward and the user is the unreliable intermediate. We modelthe intermediate as accessing states via a Markov chain. The bandit is allowedto perturb this Markov chain. We prove fundamental theorems for this settingafter which we show a close-to-optimal Explore-Commit algorithm.

Quick Read (beta)

loading the full paper ...