Abstract
It is notoriously difficult to control the behavior of reinforcement learningagents. Agents often learn to exploit the environment or reward signal and needto be retrained multiple times. The multi-objective reinforcement learning(MORL) framework separates a reward function into several objectives. An idealMORL agent learns to generalize to novel combinations of objectives allowingfor better control of an agent's behavior without requiring retraining. ManyMORL approaches use a weight vector to parameterize the importance of eachobjective. However, this approach suffers from lack of expressiveness andinterpretability. We propose using propositional logic to specify theimportance of multiple objectives. By using a logic where predicates corresponddirectly to objectives, specifications are inherently more interpretable.Additionally the set of specifications that can be expressed with formallanguages is a superset of what can be expressed by weight vectors. In thispaper, we define a formal language based on propositional logic withquantitative semantics. We encode logical specifications using a recurrentneural network and show that MORL agents parameterized by these encodings areable to generalize to novel specifications over objectives and achieveperformance comparable to single objective baselines.