Dynamic Reinforcement Learning for Actors

Abstract

Dynamic Reinforcement Learning (Dynamic RL), proposed in this paper, directlycontrols system dynamics, instead of the actor (action-generating neuralnetwork) outputs at each moment, bringing about a major qualitative shift inreinforcement learning (RL) from static to dynamic. The actor is initiallydesigned to generate chaotic dynamics through the loop with its environment,enabling the agent to perform flexible and deterministic exploration. DynamicRL controls global system dynamics using a local index called "sensitivity,"which indicates how much the input neighborhood contracts or expands into thecorresponding output neighborhood through each neuron's processing. Whilesensitivity adjustment learning (SAL) prevents excessive convergence of thedynamics, sensitivity-controlled reinforcement learning (SRL) adjusts them --to converge more to improve reproducibility around better state transitionswith positive TD error and to diverge more to enhance exploration around worsetransitions with negative TD error. Dynamic RL was applied only to the actor inan Actor-Critic RL architecture while applying it to the critic remains achallenge. It was tested on two dynamic tasks and functioned effectivelywithout external exploration noise or backward computation through time.Moreover, it exhibited excellent adaptability to new environments, althoughsome problems remain. Drawing parallels between 'exploration' and 'thinking,'the author hypothesizes that "exploration grows into thinking through learning"and believes this RL could be a key technique for the emergence of thinking,including inspiration that cannot be reconstructed from massive existing textdata. Finally, despite being presumptuous, the author presents the argumentthat this research should not proceed due to its potentially fatal risks,aiming to encourage discussion.

Quick Read (beta)

loading the full paper ...