There is renewed interest in simulating language emergence among deep neuralagents that communicate to jointly solve a task, spurred by the practical aimto develop language-enabled interactive AIs, as well as by theoreticalquestions about the evolution of human language. However, optimizing deeparchitectures connected by a discrete communication channel (such as that inwhich language emerges) is technically challenging. We introduce EGG, a toolkitthat greatly simplifies the implementation of emergent-language communicationgames. EGG's modular design provides a set of building blocks that the user cancombine to create new games, easily navigating the optimization andarchitecture space. We hope that the tool will lower the technical barrier, andencourage researchers from various backgrounds to do original work in thisexciting area.
Quick Read (beta)
EGG: a toolkit for research on Emergence of lanGuage in Games
There is renewed interest in simulating language emergence among deep neural agents that communicate to jointly solve a task, spurred by the practical aim to develop language-enabled interactive AIs, as well as by theoretical questions about the evolution of human language. However, optimizing deep architectures connected by a discrete communication channel (such as that in which language emerges) is technically challenging. We introduce EGG, a toolkit that greatly simplifies the implementation of emergent-language communication games. EGG’s modular design provides a set of building blocks that the user can combine to create new games, easily navigating the optimization and architecture space. We hope that the tool will lower the technical barrier, and encourage researchers from various backgrounds to do original work in this exciting area.
Marco Baroni Facebook AI / ICREA [email protected]
1 Introduction00footnotetext: A screencast demonstration of EGG is available at https://vimeo.com/345470060
Studying the languages that emerge when neural agents interact with each other recently became a vibrant area of research (Havrylov and Titov, 2017; Lazaridou et al., 2016, 2018; Kottur et al., 2017; Bouchacourt and Baroni, 2018; Lowe et al., 2019). Interest in this scenario is fueled by the hypothesis that the ability to interact through a human-like language is a prerequisite for genuine AI (Mikolov et al., 2016; Chevalier-Boisvert et al., 2019). Furthermore, such simulations might lead to a better understanding of both standard NLP models (Chaabouni et al., 2019b) and the evolution of human language itself (Kirby, 2002).
For all its promise, research in this domain is technically very challenging, due to the discrete nature of communication. The latter prevents the use of conventional optimization methods, requiring either Reinforcement Learning algorithms (e.g., REINFORCE; Williams 1992) or the Gumbel-Softmax relaxation (Maddison et al., 2016; Jang et al., 2016). The technical challenge might be particularly daunting for researchers whose expertise is not in machine learning, but in fields such as linguistics and cognitive science, that could very profitably contribute to this interdisciplinary research area.
To lower the starting barrier and encourage high-level research in this domain, we introduce the EGG (Emergence of lanGuage in Games) toolkit. EGG aims at
Providing reliable building bricks for quick prototyping;
Serving as a library of pre-implemented games;
Providing tools for analyzing the emergent languages.
Notable features of EGG include: (a) Primitives for implementing single-symbol or variable-length communication (with vanilla RNNs (Elman, 1990), GRUs (Cho et al., 2014), LSTMs (Hochreiter and Schmidhuber, 1997));11 1 EGG also provides an experimental support of Transformers (Vaswani et al., 2017). (b) Training with optimization of the communication channel through REINFORCE or Gumbel-Softmax relaxation via a common interface; (c) Simplified configuration of the general components, such as check-pointing, optimization, Tensorboard support,22 2 https://www.tensorflow.org/tensorboard etc.; (d) A simple CUDA-aware command-line tool for hyperparameter grid-search.
2 EGG’s architecture
In the first iteration of EGG, we concentrate on a simple class of games, involving a single, unidirectional (Sender Receiver) message. In turn, messages can be either single-symbol or multi-symbol variable-length sequences. Our motivation for starting with this setup is two-fold. First, it corresponds to classic signaling games (Lewis, 1969), it already covers a large portion of the literature (e.g., 5 out of 6 relevant studies mentioned in Introduction) and it allows exploring many interesting research questions. Second, it constitutes a natural first step for further development; in particular, the majority of components should remain useful in multi-directional, multi-step setups.
2.1 Design principles
As different training methods and architectures are used in the literature, our primary goal is to provide EGG users with the ability to easily navigate the space of common design choices.
Building up on this idea, EGG makes switching between Gumbel-Softmax relaxation-based and REINFORCE-based training effortless, through the simple choice of a different wrapper. Similarly, one can switch between one-symbol communication and variable-length messages with little changes in the code.33 3 This also proved to be a convenient debugging mechanism, as single-symbol communication is typically simpler to train.
We aim to maintain EGG minimalist and “hackable” by encapsulating the user-implemented agent architectures, the Reinforce/GS agent wrappers and the game logic into PyTorch modules. The user can easily replace any part.
Finally, since virtually any machine-learning experiment has common pieces, such as setting the random seeds, configuring the optimizer, model check-pointing, etc., EGG pre-implements many of them, reducing the necessary amount of boilerplate code to the minimum.
2.2 EGG design
EGG, in its first iteration, operates over the following entities. Firstly, there are two distinct agent roles: Sender and Receiver. Sender and Receiver are connected via a one-directional communication channel from the former to the latter, that has to produce the game-specific output.
The next crucial entity is Game. It encapsulates the agents and orchestrates the game scenario: feeding the data to the agents, transmitting the messages, and getting the output of Receiver. Figure 1 illustrates EGG’s game flow in a specific example. Game applies a user-provided loss function, which might depend on the outputs of Receiver, the message transmitted, and the data. The value of the loss is minimized by a fourth entity, Trainer. Trainer also controls model checkpointing, early stopping, etc.
The Trainer and Game modules are pre-implemented in EGG. In a typical scenario, the communication method (single or multiple symbol messages) will be implemented by EGG-provided wrappers. As a result, what is left for the user to implement consists of: (a) the data stream, (b) core (non-communication-related) parts of the agents, (c) the loss. The data interface that is expected by Trainer is an instance of the standard PyTorch data loader utils.data.DataLoader.
To implement Sender, the user must define a module that consumes the data and outputs a vector. On Receiver’s side, the user has to implement a module that takes an input consisting of a message embedding and possibly further data, and generates Receiver’s output.
Section 4 below provides examples of how to implement agents, choose communication and optimization type, and train a game.
3 Optimizing the communication channel in EGG
EGG supports two widely adopted strategies for learning with a discrete channel, Gumbel-Softmax relaxation (used, e.g., by Havrylov and Titov (2017)) and REINFORCE (used, e.g., by Lazaridou et al. (2016)). Below, we briefly review both of them.
Gumbel-Softmax relaxation is based on the Gumbel-Softmax (GS) (aka Concrete) distribution (Maddison et al., 2016; Jang et al., 2016), that allows to approximate one-hot samples from a Categorical distribution. At the same time, GS admits reparametrization, hence allows backpropagation-based training. Suppose that Sender produces a distribution over the vocabulary, with th symbol having probability . To obtain a sample from a corresponding Gumbel-Softmax distribution, we take i.i.d. samples from the Gumbel(0, 1) distribution and obtain the vector with components :
where is the temperature hyperparameter, which controls the degree of relaxation. We treat as a relaxed symbol representation. In the case of single-symbol communication, is passed to Receiver. In case of variable-length messages, this symbol’s embedding is also fed into a RNN cell to generate the message.
As a result, if Receiver and the game loss are differentiable w.r.t. their inputs, we can get gradients of all game parameters, including those of Sender, via conventional backpropagation.
REINFORCE (Williams and Peng, 1991) is a standard Reinforcement Learning algorithm. Assume that both agents are stochastic: Sender samples a message , and Receiver samples its output . Let us fix a pair of inputs, , , and the ground-truth output . Then, using the log-gradient “trick”, the gradient of the expectation of the loss w.r.t. the vector of agents’ parameters is:
where specifies the joint probability distribution over the agents’ outputs.
The gradient estimate is found by sampling messages and outputs. A standard trick to reduce variance of the estimator in Eq. 2 is to subtract an action-independent baseline from the optimized loss (Williams, 1992). EGG uses the running mean baseline.
Importantly, the estimator in Eq. 2 allows us to optimize agents even if the loss is not differentiable (e.g., 0/1 loss). However, if the loss is differentiable and Receiver is differentiable and deterministic, this can be leveraged by a “hybrid” approach: the gradient of Receiver’s parameters can be found by backpropagation, while Sender is optimized with REINFORCE. This approach, a special case of gradient estimation using stochastic computation graphs as proposed by Schulman et al. (2015), is also supported in EGG.
4 Implementing a game
In this Section we walk through the main steps to build a communication game in EGG. We illustrate them through a MNIST (LeCun et al., 1998) communication-based autoencoding task: Sender observes an image and sends a message to Receiver. In turn, Receiver tries to reconstruct the image. We only cover here the core aspects of the implementation, ignoring standard pre- and post-processing steps, such as data loading. The full implementation can be found in an online tutorial.44 4 https://colab.research.google.com/github/facebookresearch/EGG/blob/master/tutorials/EGG%20walkthrough%20with%20a%20MNIST%20autoencoder.ipynb
We start by implementing the agents’ architectures, as shown in Figure 2. Sender is passed an input image to be processed by its pre-trained vision module, and returns its output after a linear transformation. The way Sender’s output will be interpreted depends on the type of communication to be used (discussed below). Receiver gets an input from Sender and returns an image-sized output with pixels valued in . Again, depending on the type of channel employed, the Receiver input will have a different semantics.
In the case of one-symbol communication, Sender’s output is passed through a softmax layer and its output is interpreted as the probabilities of sending a particular symbol. Hence, the output dimensionality defines the size of the vocabulary. In the case of variable-length messages, Sender’s output specifies the initial hidden state of an RNN cell. This cell is then “unrolled” to generate a message, until the end-of-sequence symbol (eos) is produced or maximum length is reached. Receiver’s input is an embedding of the message: either the embedding of the single-symbol message, or the last hidden state of the RNN cell that corresponds to the eos symbol.
Once Sender and Receiver are defined, the user wraps them into EGG-implemented wrappers which determine the communication and optimization scenarios. Importantly, the actual user-specified Sender and Receiver architectures can be agnostic to whether single-symbol or variable-length communication is used; and to whether Gumbel-Softmax relaxation- or REINFORCE-based training is performed. In Figure 3 we illustrate different communication/training scenarios: (a) single-symbol communication, trained with Gumbel-Softmax relaxation, (b) single-symbol communication, trained with REINFORCE, (c) variable-length communication, trained with Gumbel-Softmax relaxation, (d) variable-length communication, trained with REINFORCE.
Once the Game instance is defined, everything is ready for training. That is, the user has to pass the game instance to core.Trainer, as shown in Figure 4.
We report some results obtained with the code we just described. We used the following parameters. The vision module is a pre-trained LeNet-1 (LeCun et al., 1990) instance, the maximal message length is set to 2, the communication between the agents is done through LSTM units with hidden-size 20, vocabulary size is 10. The agents are trained with REINFORCE for 15 epochs with batch size of 32, and the loss is per-pixel cross-entropy.
In Figure 5 we illustrate the language that emerges in this setup. To do this, we enumerate all possible 100 two-symbol messages and input them to Receiver. We report all images that Receiver produces. The eos symbol is fixed to be 0, hence if the first symbol is 0 then the second symbol is ignored (top row of Figure 5).
Note that the first symbol tends to denote digit identity: . In contrast, the second symbol is either ignored () or specifies the style of the produced digit (). The second symbol has the most striking effect with , where encodes the rotation angle of the digit 1.
5 Some pre-implemented games
EGG contains implementations of several games. They (a) illustrate how EGG can be used to explore interesting research questions, (b) provide reference usage patterns and building blocks, (c) serve as means to ensure reproducibility of studies reported in the literature. For example, EGG incorporates an implementation of the signaling game of Lazaridou et al. (2016) and Bouchacourt and Baroni (2018). It contains code that was recently used to study the communicative efficiency of artificial LSTM-based agents (Chaabouni et al., 2019a) and the information-minimization properties of emergent discrete codes (Kharitonov et al., 2019).55 5 A small illustration can be run in Google Colaboratory: https://colab.research.google.com/github/facebookresearch/EGG/blob/master/egg/zoo/language_bottleneck/mnist-style-transfer-via-bottleneck.ipynb. Finally, EGG provides a pre-implemented game that allows to train agents entirely via the command line and external input/output files, without having to write a single line of Python code. We hope this will lower the learning curve for those who want to experiment with language emergence without previous coding experience.
6 Conclusion and future work
We introduced EGG, a toolkit for research on emergence of language in games. We outlined its main features design principles. Next, we briefly reviewed how training with a discrete communication channel is performed. Finally, we walked through the main steps for implementing a MNIST autoencoding game using EGG.
We intend to extend EGG in the following directions. First, we want to provide support for multi-direction and multi-step communicative scenarios. Second, we want to add more advanced tooling for analyzing the properties of the emergent languages (such as compositionality; Andreas 2019). We will also continue to enlarge the set of pre-implemented games, to build a library of reference implementations.
We are grateful to Roberto Dessì for his contributions to the EGG codebase and to Serhii Havrylov for sharing his code with us.
- Andreas (2019) Jacob Andreas. 2019. Measuring compositionality in representation learning. In Proceedings of ICLR, New Orleans, LA. Published online: https://openreview.net/group?id=ICLR.cc/2019/conference.
- Bouchacourt and Baroni (2018) Diane Bouchacourt and Marco Baroni. 2018. How agents see things: On visual representations in an emergent language game. In EMNLP.
- Chaabouni et al. (2019a) Rahma Chaabouni, Eugene Kharitonov, Emmanuel Dupoux, and Marco Baroni. 2019a. Anti-efficient encoding in emergent communication. arXiv preprint arXiv:1905.12561.
- Chaabouni et al. (2019b) Rahma Chaabouni, Eugene Kharitonov, Alessandro Lazaric, Emmanuel Dupoux, and Marco Baroni. 2019b. Word-order biases in deep-agent emergent communication. In Proceedings of ACL.
- Chevalier-Boisvert et al. (2019) Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, and Yoshua Bengio. 2019. BabyAI: First steps towards grounded language learning with a human in the loop. In International Conference on Learning Representations.
- Cho et al. (2014) Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
- Elman (1990) Jeffrey Elman. 1990. Finding structure in time. Cognitive Science, 14:179–211.
- Havrylov and Titov (2017) Serhii Havrylov and Ivan Titov. 2017. Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. In NIPS.
- Hochreiter and Schmidhuber (1997) Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
- Jang et al. (2016) Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization with Gumbel-Softmax. arXiv preprint arXiv:1611.01144.
- Kharitonov et al. (2019) Eugene Kharitonov, Rahma Chaabouni, Diane Bouchacourt, and Marco Baroni. 2019. Information minimization in emergent languages. arXiv preprint arXiv:1905.13687.
- Kirby (2002) Simon Kirby. 2002. Natural language from artificial life. Artificial life, 8(2):185–215.
- Kottur et al. (2017) Satwik Kottur, José MF Moura, Stefan Lee, and Dhruv Batra. 2017. Natural language does not emerge’naturally’in multi-agent dialog. arXiv preprint arXiv:1706.08502.
- Lazaridou et al. (2018) Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, and Stephen Clark. 2018. Emergence of linguistic communication from referential games with symbolic and pixel input. arXiv preprint arXiv:1804.03984.
- Lazaridou et al. (2016) Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni. 2016. Multi-agent cooperation and the emergence of (natural) language. arXiv preprint arXiv:1612.07182.
- LeCun et al. (1990) Yann LeCun, Bernhard E Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne E Hubbard, and Lawrence D Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems, pages 396–404.
- LeCun et al. (1998) Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.
- Lewis (1969) David Lewis. 1969. Convention. Harvard University Press, Cambridge, MA.
- Lowe et al. (2019) Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, and Yann Dauphin. 2019. On the pitfalls of measuring emergent communication. arXiv preprint arXiv:1903.05168.
- Maddison et al. (2016) Chris J Maddison, Andriy Mnih, and Yee Whye Teh. 2016. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712.
- Mikolov et al. (2016) Tomas Mikolov, Armand Joulin, and Marco Baroni. 2016. A roadmap towards machine intelligence. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 29–61. Springer.
- Paszke et al. (2017) Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. In NIPS-W.
- Schulman et al. (2015) John Schulman, Nicolas Heess, Theophane Weber, and Pieter Abbeel. 2015. Gradient estimation using stochastic computation graphs. In NIPS.
- Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
- Williams (1992) Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256.
- Williams and Peng (1991) Ronald J Williams and Jing Peng. 1991. Function optimization using connectionist reinforcement learning algorithms. Connection Science, 3(3):241–268.