Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep Reinforcement Learning

Abstract

Reinforcement learning (RL) is a powerful tool for finding optimal policiesin sequential decision processes. However, deep RL methods have two weaknesses:collecting the amount of agent experience required for practical RL problems isprohibitively expensive, and the learned policies exhibit poor generalizationon tasks outside the training data distribution. To mitigate these issues, weintroduce automaton distillation, a form of neuro-symbolic transfer learning inwhich Q-value estimates from a teacher are distilled into a low-dimensionalrepresentation in the form of an automaton. We then propose methods forgenerating Q-value estimates where symbolic information is extracted from ateacher's Deep Q-Network (DQN). The resulting Q-value estimates are used tobootstrap learning in the target discrete and continuous environment via amodified DQN and Twin-Delayed Deep Deterministic (TD3) loss function,respectively. We demonstrate that automaton distillation decreases the timerequired to find optimal policies for various decision tasks in newenvironments, even in a target environment different in structure from thesource environment.

Quick Read (beta)

loading the full paper ...