Abstract
Multi-agent reinforcement learning (MARL) extends (single-agent)reinforcement learning (RL) by introducing additional agents and (potentially)partial observability of the environment. Consequently, algorithms for solvingMARL problems incorporate various extensions beyond traditional RL methods,such as a learned communication protocol between cooperative agents thatenables exchange of private information or adaptive modeling of opponents incompetitive settings. One popular algorithmic construct is a memory mechanismsuch that an agent's decisions can depend not only upon the current state butalso upon the history of observed states and actions. In this paper, we studyhow a memory mechanism can be useful in environments with different properties,such as observability, internality and presence of a communication channel.Using both prior work and new experiments, we show that a memory mechanism ishelpful when learning agents need to model other agents and/or whencommunication is constrained in some way; however we must to be cautious ofagents achieving effective memoryfulness through other means.