Generalized Reinforcement Learning: Experience Particles, Action Operator, Reinforcement Field, Memory Association, and Decision Concepts

Abstract

Learning a control policy that involves time-varying and evolving systemdynamics often poses a great challenge to mainstream reinforcement learningalgorithms. In most standard methods, actions are often assumed to be a rigid,fixed set of choices that are sequentially applied to the state space in apredefined manner. Consequently, without resorting to substantial re-learningprocesses, the learned policy lacks the ability in adapting to variations inthe action set and the action's "behavioral" outcomes. In addition, thestandard action representation and the action-induced state transitionmechanism inherently limit how reinforcement learning can be applied incomplex, real-world applications primarily due to the intractability of theresulting large state space and the lack of facility to generalize the learnedpolicy to the unknown part of the state space. This paper proposes aBayesian-flavored generalized reinforcement learning framework by firstestablishing the notion of parametric action model to better cope withuncertainty and fluid action behaviors, followed by introducing the notion ofreinforcement field as a physics-inspired construct established through"polarized experience particles" maintained in the learning agent's workingmemory. These particles effectively encode the dynamic learning experience thatevolves over time in a self-organizing way. On top of the reinforcement field,we will further generalize the policy learning process to incorporatehigh-level decision concepts by considering the past memory as having animplicit graph structure, in which the past memory instances (or particles) areinterconnected with similarity between decisions defined, and thereby, the"associative memory" principle can be applied to augment the learning agent'sworld model.

Quick Read (beta)

loading the full paper ...