HTMRL: Biologically Plausible Reinforcement Learning with Hierarchical Temporal Memory

Abstract

Building Reinforcement Learning (RL) algorithms which are able to adapt tocontinuously evolving tasks is an open research challenge. One technology thatis known to inherently handle such non-stationary input patterns well isHierarchical Temporal Memory (HTM), a general and biologically plausiblecomputational model for the human neocortex. As the RL paradigm is inspired byhuman learning, HTM is a natural framework for an RL algorithm supportingnon-stationary environments. In this paper, we present HTMRL, the firststrictly HTM-based RL algorithm. We empirically and statistically show thatHTMRL scales to many states and actions, and demonstrate that HTM's ability foradapting to changing patterns extends to RL. Specifically, HTMRL performs wellon a 10-armed bandit after 750 steps, but only needs a third of that to adaptto the bandit suddenly shuffling its arms. HTMRL is the first iteration of anovel RL approach, with the potential of extending to a capable algorithm forMeta-RL.

Quick Read (beta)

loading the full paper ...