Ecological Reinforcement Learning

Abstract

Much of the current work on reinforcement learning studies episodic settings,where the agent is reset between trials to an initial state distribution, oftenwith well-shaped reward functions. Non-episodic settings, where the agent mustlearn through continuous interaction with the world without resets, and wherethe agent receives only delayed and sparse reward signals, is substantiallymore difficult, but arguably more realistic considering real-world environmentsdo not present the learner with a convenient "reset mechanism" and easy rewardshaping. In this paper, instead of studying algorithmic improvements that canaddress such non-episodic and sparse reward settings, we instead study thekinds of environment properties that can make learning under such conditionseasier. Understanding how properties of the environment impact the performanceof reinforcement learning agents can help us to structure our tasks in waysthat make learning tractable. We first discuss what we term "environmentshaping" -- modifications to the environment that provide an alternative toreward shaping, and may be easier to implement. We then discuss an even simplerproperty that we refer to as "dynamism," which describes the degree to whichthe environment changes independent of the agent's actions and can be measuredby environment transition entropy. Surprisingly, we find that even thisproperty can substantially alleviate the challenges associated withnon-episodic RL in sparse reward settings. We provide an empirical evaluationon a set of new tasks focused on non-episodic learning with sparse rewards.Through this study, we hope to shift the focus of the community towardsanalyzing how properties of the environment can affect learning and theultimate type of behavior that is learned via RL.

Quick Read (beta)

loading the full paper ...