Exploration and Incentives in Reinforcement Learning

Abstract

How do you incentivize self-interested agents to $\textit{explore}$ when theyprefer to $\textit{exploit}$? We consider complex exploration problems, whereeach agent faces the same (but unknown) MDP. In contrast with traditionalformulations of reinforcement learning, agents control the choice of policies,whereas an algorithm can only issue recommendations. However, the algorithmcontrols the flow of information, and can incentivize the agents to explore viainformation asymmetry. We design an algorithm which explores all reachablestates in the MDP. We achieve provable guarantees similar to those forincentivizing exploration in static, stateless exploration problems studiedpreviously. To the best of our knowledge, this is the first work to considermechanism design in a stateful, reinforcement learning setting.

Quick Read (beta)

loading the full paper ...