Abstract
Reinforcement learning (RL) promises a framework for near-universalproblem-solving. In practice however, RL algorithms are often tailored tospecific benchmarks, relying on carefully tuned hyperparameters and algorithmicchoices. Recently, powerful model-based RL methods have shown impressivegeneral results across benchmarks but come at the cost of increased complexityand slow run times, limiting their broader applicability. In this paper, weattempt to find a unifying model-free deep RL algorithm that can address adiverse class of domains and problem settings. To achieve this, we leveragemodel-based representations that approximately linearize the value function,taking advantage of the denser task objectives used by model-based RL whileavoiding the costs associated with planning or simulated trajectories. Weevaluate our algorithm, MR.Q, on a variety of common RL benchmarks with asingle set of hyperparameters and show a competitive performance againstdomain-specific and general baselines, providing a concrete step towardsbuilding general-purpose model-free deep RL algorithms.