Abstract
In this paper we present ISA, an approach for learning and exploitingsubgoals in episodic reinforcement learning (RL) tasks. ISA interleavesreinforcement learning with the induction of a subgoal automaton, an automatonwhose edges are labeled by the task's subgoals expressed as propositional logicformulas over a set of high-level events. A subgoal automaton also consists oftwo special states: a state indicating the successful completion of the task,and a state indicating that the task has finished without succeeding. Astate-of-the-art inductive logic programming system is used to learn a subgoalautomaton that covers the traces of high-level events observed by the RL agent.When the currently exploited automaton does not correctly recognize a trace,the automaton learner induces a new automaton that covers that trace. Theinterleaving process guarantees the induction of automata with the minimumnumber of states, and applies a symmetry breaking mechanism to shrink thesearch space whilst remaining complete. We evaluate ISA in several grid-worldand continuous state space problems using different RL algorithms that leveragethe automaton structures. We provide an in-depth empirical analysis of theautomaton learning process performance in terms of the traces, the symmetricbreaking and specific restrictions imposed on the final learnable automaton.For each class of RL problem, we show that the learned automata can besuccessfully exploited to learn policies that reach the goal, achieving anaverage reward comparable to the case where automata are not learned buthandcrafted and given beforehand.