Abstract
Efficient exploration is an unsolved problem in Reinforcement Learning. Weintroduce Model-Based Active eXploration (MAX), an algorithm that activelyexplores the environment. It minimizes data required to comprehensively modelthe environment by planning to observe novel events, instead of merely reactingto novelty encountered by chance. Non-stationarity induced by traditionalexploration bonus techniques is avoided by constructing fresh explorationpolicies only at time of action. In semi-random toy environments where directedexploration is critical to make progress, our algorithm is at least an order ofmagnitude more efficient than strong baselines.
Quick Read (beta)
loading the full paper ...