Exploration in Deep Reinforcement Learning: A Comprehensive Survey

  • 2021-09-14 13:16:33
  • Tianpei Yang, Hongyao Tang, Chenjia Bai, Jinyi Liu, Jianye Hao, Zhaopeng Meng, Peng Liu
  • 0

Abstract

Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning(MARL) have achieved significant success across a wide range of domains, suchas game AI, autonomous vehicles, robotics and finance. However, DRL and deepMARL agents are widely known to be sample-inefficient and millions ofinteractions are usually needed even for relatively simple game settings, thuspreventing the wide application in real-industry scenarios. One bottleneckchallenge behind is the well-known exploration problem, i.e., how toefficiently explore the unknown environments and collect informativeexperiences that could benefit the policy learning most. In this paper, we conduct a comprehensive survey on existing explorationmethods in DRL and deep MARL for the purpose of providing understandings andinsights on the critical problems and solutions. We first identify several keychallenges to achieve efficient exploration, which most of the explorationmethods aim at addressing. Then we provide a systematic survey of existingapproaches by classifying them into two major categories: uncertainty-orientedexploration and intrinsic motivation-oriented exploration. The essence ofuncertainty-oriented exploration is to leverage the quantification of theepistemic and aleatoric uncertainty to derive efficient exploration. Bycontrast, intrinsic motivation-oriented exploration methods usually incorporatedifferent reward agnostic information for intrinsic exploration guidance.Beyond the above two main branches, we also conclude other exploration methodswhich adopt sophisticated techniques but are difficult to be classified intothe above two categories. In addition, we provide a comprehensive empiricalcomparison of exploration methods for DRL on a set of commonly used benchmarks.Finally, we summarize the open problems of exploration in DRL and deep MARL andpoint out a few future directions.

 

Quick Read (beta)

loading the full paper ...