Credit-cognisant reinforcement learning for multi-agent cooperation

Abstract

Traditional multi-agent reinforcement learning (MARL) algorithms, such asindependent Q-learning, struggle when presented with partially observablescenarios, and where agents are required to develop delicate action sequences.This is often the result of the reward for a good action only being availableafter other agents have taken theirs, and these actions are not creditedaccordingly. Recurrent neural networks have proven to be a viable solutionstrategy for solving these types of problems, resulting in significantperformance increase when compared to other methods. In this paper, we explorea different approach and focus on the experiences used to update theaction-value functions of each agent. We introduce the concept ofcredit-cognisant rewards (CCRs), which allows an agent to perceive the effectits actions had on the environment as well as on its co-agents. We show that bymanipulating these experiences and constructing the reward contained withinthem to include the rewards received by all the agents within the same actionsequence, we are able to improve significantly on the performance ofindependent deep Q-learning as well as deep recurrent Q-learning. We evaluateand test the performance of CCRs when applied to deep reinforcement learningtechniques at the hands of a simplified version of the popular card gameHanabi.

Quick Read (beta)

loading the full paper ...