Learning Altruistic Behaviours in Reinforcement Learning without External Rewards

Abstract

Can artificial agents learn to assist others in achieving their goals withoutknowing what those goals are? Generic reinforcement learning agents could betrained to behave altruistically towards others by rewarding them foraltruistic behaviour, i.e., rewarding them for benefiting other agents in agiven situation. Such an approach assumes that other agents' goals are known sothat the altruistic agent can cooperate in achieving those goals. However,explicit knowledge of other agents' goals is often difficult to acquire. Evenassuming such knowledge to be given, training of altruistic agents wouldrequire manually-tuned external rewards for each new environment. Thus, it isbeneficial to develop agents that do not depend on external supervision and canlearn altruistic behaviour in a task-agnostic manner. Assuming that otheragents rationally pursue their goals, we hypothesize that giving them morechoices will allow them to pursue those goals better. Some concrete examplesinclude opening a door for others or safeguarding them to pursue theirobjectives without interference. We formalize this concept and propose analtruistic agent that learns to increase the choices another agent has bymaximizing the number of states that the other agent can reach in its future.We evaluate our approach on three different multi-agent environments whereanother agent's success depends on the altruistic agent's behaviour. Finally,we show that our unsupervised agents can perform comparably to agentsexplicitly trained to work cooperatively. In some cases, our agents can evenoutperform the supervised ones.

Quick Read (beta)

loading the full paper ...