Logic-based Reward Shaping for Multi-Agent Reinforcement Learning

Abstract

Reinforcement learning (RL) relies heavily on exploration to learn from itsenvironment and maximize observed rewards. Therefore, it is essential to designa reward function that guarantees optimal learning from the receivedexperience. Previous work has combined automata and logic based reward shapingwith environment assumptions to provide an automatic mechanism to synthesizethe reward function based on the task. However, there is limited work on how toexpand logic-based reward shaping to Multi-Agent Reinforcement Learning (MARL).The environment will need to consider the joint state in order to keep track ofother agents if the task requires cooperation, thus suffering from the curse ofdimensionality with respect to the number of agents. This project explores howlogic-based reward shaping for MARL can be designed for different scenarios andtasks. We present a novel method for semi-centralized logic-based MARL rewardshaping that is scalable in the number of agents and evaluate it in multiplescenarios.

Quick Read (beta)

loading the full paper ...