Abstract
Most model-free reinforcement learning methods leverage state representations(embeddings) for generalization, but either ignore structure in the space ofactions or assume the structure is provided a priori. We show how a policy canbe decomposed into a component that acts in a low-dimensional space of actionrepresentations and a component that transforms these representations intoactual actions. These representations improve generalization over large, finiteaction sets by allowing the agent to infer the outcomes of actions similar toactions already taken. We provide an algorithm to both learn and use actionrepresentations and provide conditions for its convergence. The efficacy of theproposed method is demonstrated on large-scale real-world problems.