Abstract
Reinforcement Learning formalises an embodied agent's interaction with theenvironment through observations, rewards and actions. But where do the actionscome from? Actions are often considered to represent something external, suchas the movement of a limb, a chess piece, or more generally, the output of anactuator. In this work we explore and formalize a contrasting view, namely thatactions are best thought of as the output of a sequence of internal choiceswith respect to an action model. This view is particularly well-suited forleveraging the recent advances in large sequence models as prior knowledge formulti-task reinforcement learning problems. Our main contribution in this workis to show how to augment the standard MDP formalism with a sequential notionof internal action using information-theoretic techniques, and that this leadsto self-consistent definitions of both internal and external action valuefunctions.