Adversarially Guided Self-Play for Adopting Social Conventions

  • 2020-01-16 18:51:42
  • Mycal Tucker, Yilun Zhou, Julie Shah
  • 1

Abstract

Robotic agents must adopt existing social conventions in order to beeffective teammates. These social conventions, such as driving on the right orleft side of the road, are arbitrary choices among optimal policies, but allagents on a successful team must use the same convention. Prior work hasidentified a method of combining self-play with paired input-output datagathered from existing agents in order to learn their social convention withoutinteracting with them. We build upon this work by introducing a techniquecalled Adversarial Self-Play (ASP) that uses adversarial training to shape thespace of possible learned policies and substantially improves learningefficiency. ASP only requires the addition of unpaired data: a dataset ofoutputs produced by the social convention without associated inputs.Theoretical analysis reveals how ASP shapes the policy space and thecircumstances (when behaviors are clustered or exhibit some other structure)under which it offers the greatest benefits. Empirical results across threedomains confirm ASP's advantages: it produces models that more closely matchthe desired social convention when given as few as two paired datapoints.

 

Quick Read (beta)

loading the full paper ...