Abstract
The aim of multi-agent reinforcement learning systems is to provideinteracting agents with the ability to collaboratively learn and adapt to thebehavior of other agents. In many real-world applications, the agents can onlyacquire a partial view of the world. However, in realistic settings, one ormore agents that show arbitrarily faulty or malicious behavior may suffice tolet the current coordination mechanisms fail. In this paper, we study apractical scenario considering the security issues in the presence of agentswith arbitrarily faulty or malicious behavior. Under these circumstances,learning an optimal policy becomes particularly challenging, even in theunrealistic case that an agent's policy can be made conditional upon all otheragents' observations. To overcome these difficulties, we present anAttention-based Fault-Tolerant (FT-Attn) algorithm which selects correct andrelevant information for each agent at every time-step. The multi-headattention mechanism enables the agents to learn effective communicationpolicies through experience concurrently to the action policies. Empiricalresults have shown that FT-Attn beats previous state-of-the-art methods in somecomplex environments and can adapt to various kinds of noisy environmentswithout tuning the complexity of the algorithm. Furthermore, FT-Attn caneffectively deal with the complex situation where an agent needs to reachmultiple agents' correct observation at the same time.