Improving Fictitious Play Reinforcement Learning with Expanding Models

Abstract

Fictitious play with reinforcement learning is a general and effectiveframework for zero-sum games. However, using the current deep neural networkmodels, the implementation of fictitious play faces crucial challenges. Neuralnetwork model training employs gradient descent approaches to update allconnection weights, and thus is easy to forget the old opponents after trainingto beat the new opponents. Existing approaches often maintain a pool ofhistorical policy models to avoid the forgetting. However, learning to beat apool in stochastic games, i.e., a wide distribution over policy models, iseither sample-consuming or insufficient to exploit all models with limitedamount of samples. In this paper, we propose a learning process with neuralfictitious play to alleviate the above issues. We train a single model as ourpolicy model, which consists of sub-models and a selector. Everytime facing anew opponent, the model is expanded by adding a new sub-model, where only thenew sub-model is updated instead of the whole model. At the same time, theselector is also updated to mix up the new sub-model with the previous ones atthe state-level, so that the model is maintained as a behavior strategy insteadof a wide distribution over policy models. Experiments on Kuhn poker, agrid-world Treasure Hunting game, and Mini-RTS environments show that theproposed approach alleviates the forgetting problem, and consequently improvesthe learning efficiency and the robustness of neural fictitious play.

Quick Read (beta)

loading the full paper ...