Adversarial Language Games for Advanced Natural Language Intelligence

  • 2019-11-08 14:00:45
  • Yuan Yao, Haoxi Zhong, Zhengyan Zhang, Xu Han, Xiaozhi Wang, Chaojun Xiao, Guoyang Zeng, Zhiyuan Liu, Maosong Sun
  • 0

Abstract

While adversarial games have been well studied in various board games andelectronic sports games, etc., such adversarial games remain a nearly blankfield in natural language processing. As natural language is inherently aninteractive game, we propose a challenging pragmatics game called AdversarialTaboo, in which an attacker and a defender compete with each other throughsequential natural language interactions. The attacker is tasked with inducingthe defender to speak a target word invisible to the defender, while thedefender is tasked with detecting the target word before being induced by theattacker. In Adversarial Taboo, a successful attacker must hide its intentionand subtly induce the defender, while a competitive defender must be cautiouswith its utterances and infer the intention of the attacker. To instantiate thegame, we create a game environment and a competition platform. Sufficient pilotexperiments and empirical studies on several baseline attack and defensestrategies show promising and interesting results. Based on the analysis on thegame and experiments, we discuss multiple promising directions for futureresearch.

 

Quick Read (beta)

Adversarial Language Games
for Advanced Natural Language Intelligence

Yuan Yao, Haoxi Zhong, Zhengyan Zhang, Xu Han, Xiaozhi Wang,
Chaojun Xiao, Guoyang Zeng, Zhiyuan Liu, Maosong Sun
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Institute for Artificial Intelligence, Tsinghua University, Beijing, China
State Key Lab on Intelligent Technology and Systems, Tsinghua University, Beijing, China
[email protected]
Abstract

While adversarial games have been well studied in various board games and electronic sports games, etc., such adversarial games remain a nearly blank field in natural language processing. As natural language is inherently an interactive game, we propose a challenging pragmatics game called Adversarial Taboo, in which an attacker and a defender compete with each other through sequential natural language interactions. The attacker is tasked with inducing the defender to speak a target word invisible to the defender, while the defender is tasked with detecting the target word before being induced by the attacker. In Adversarial Taboo, a successful attacker must hide its intention and subtly induce the defender, while a competitive defender must be cautious with its utterances and infer the intention of the attacker. To instantiate the game, we create a game environment and a competition platform 11 1  taboo.thunlp.org. Sufficient pilot experiments and empirical studies on several baseline attack and defense strategies show promising and interesting results. Based on the analysis on the game and experiments, we discuss multiple promising directions for future research.

Adversarial Language Games
for Advanced Natural Language Intelligence


Yuan Yao, Haoxi Zhong, Zhengyan Zhang, Xu Han, Xiaozhi Wang, Chaojun Xiao, Guoyang Zeng, Zhiyuan Liu, Maosong Sun Department of Computer Science and Technology, Tsinghua University, Beijing, China Institute for Artificial Intelligence, Tsinghua University, Beijing, China State Key Lab on Intelligent Technology and Systems, Tsinghua University, Beijing, China [email protected]

22footnotetext:  Work in progress.

1 Introduction

Figure 1: An example of Adversarial Taboo played by two human players, where the attacker and the defender compete with each other through sequential language interactions. The target word “banana” is only visible to the attacker. Two possible cases of this game are shown above. In the first case, the attacker wins since he/she successfully induced the defender to speak the target word. In the second case, the defender wins because he/she successfully inferred the target word of the attacker.

Natural language is inherently an interactive game between participants, which is ubiquitous in human activities such as discussion, debate, cheating, intention concealment and detection. Such context-related interactions are believed to play a central role in natural language mastery in the theory of both linguistics (mey2001pragmatics) and philosophy of language (wittgenstein1953philosophical; lewis1969convention). High-quality goal-oriented natural language interactions, or pragmatics games, generally require advanced language intelligence beyond syntax and semantics, and are particularly challenging due to the complexity, diversity and latent obscurity of natural language.

In the context of natural language processing (NLP), recent years have witnessed the success of deep learning on natural language understanding and generation. Language patterns learned from large-scale data lead to intelligent agents that can interact with human with reasonable adequacy, fluency and diversity. However, the intelligence of such agents is generally restricted to static syntax and semantics, and is weak in pragmatics. Higher-level language mastery (e.g., goal-oriented complex language skill and strategy usage in open domain) is still far from reach. Such advanced language intelligence can be better achieved in interactive language games (mikolov2016roadmap).

While cooperation and adversary are both important elements in interactive language games, developmental psychology studies have shown that adversarial language games, such as debate, interrogation and maintaining of lies generally require higher-level language intelligence than basic communications (talwar2008social; ding2014elementary). Adversarial games have significantly promoted the development of many artificial intelligence areas such as board games (campbell2002deep; silver2016mastering; silver2017mastering), electronic sports games (bansal2017emergent; vinyals2017starcraft; jaderberg2019human) and physically grounded games (baker2019emergent), and have enabled the emergence of complex strategies and superhuman proficiency in many cases. Despite the aforementioned success, adversarial games remain a nearly blank field in NLP.

To this end, we propose a novel pragmatics game called Adversarial Taboo as an example of adversarial language games for NLP, in which the attacker and the defender compete with each other through sequential natural language interactions. The goal of the attacker is to induce the defender to unconsciously speak a target word, which is given by the game system and invisible to the defender, and prevent the target word from being detected by the defender. Meanwhile, the defender aims to avoid the target word in utterances. The defender is also given one chance that can be used at any point to predict the target word.

Figure 1 shows an example of Adversarial Taboo. The attacker is assigned with a target word “banana” by the judge system. In the first turn, the attacker asks for fruit recommendation, which is obscurely related to banana. Since the defender responds with “apple”, in the second turn, the attacker continues to lead the topic more specifically to banana. The game can be terminated with two possible cases: (i) The defender speaks “banana” in his/her utterances, which leads to the win of the attacker. (ii) The defender successfully predicts the target word. If the game does not terminate within certain turns of interactions (e.g., 50), the defender is forced to make predictions. We refer the readers to Section 2 for more details about the rules of the game.

Several complex language capabilities are required in Adversarial Taboo to win the game. The attacker is required to obscure its intention (i.e., the target word) and subtly induce the defender, while striking a balance between obscurity and inducement. A successful defender must balance between maintaining the semantic relevance of the response and preventing being induced, and at the same time, infer the attacking intention. Hence, mastering Adversarial Taboo leads to fine-grained language understanding, inferring and generation, which is effective and convenient for language intelligence research. Moreover, Adversarial Taboo can also serve as a benchmark for language intelligence beyond syntax and semantics.

To assess Adversarial Taboo, we propose several attack and defense strategies and conduct comprehensive experiments. Experimental results show that simple attack and defense strategies can achieve promising and interesting results, while the proposed targeted improvements in strategies lead to alternate rises in their performance.

To summarize, our contributions are as follows: (i) We formulate a novel pragmatics game called Adversarial Taboo for advanced natural language intelligence. (ii) We propose several attack and defense strategies, and conduct comprehensive experiments on the proposed game. (iii) Based on the experimental results and analysis, we discuss multiple promising directions for future research. Moreover, the source code, experiment datasets, and game platform will be available to provide more details for advancing future research works.

2 Adversarial Taboo

2.1 Game Introduction

We will first introduce the details of Adversarial Taboo. Adversarial Taboo is a variant of Taboo. In the original taboo, two players are required to chat to exchange information. One player needs to describe a target word without saying it while the other player needs to guess what the target word is. However, in this setting, players can utilize the shortcut to win the game easily. For example, if the target word is “Memory”, the first player only needs to say “Long short-term what?” to complete the game.

Therefore, we introduce Adversarial Taboo. In Adversarial Taboo, the players are divided into two groups. Each group is assigned with a target word, and their goal is to lead the other group to say the target word while preventing saying the other group’s target word. If one group successfully leads the other group to say their word, or guesses the other group’s word correctly, then this group will win the game. In this setting, a shortcut will not be a good solution as any shortcut will help the opponent win the game. As a result, the two groups must compete to win the game.

In Adversarial Taboo, there are two roles for each group: Attacker and Defender. The goal of the attacker is leading another group to say the word, while the goal of the defender is preventing saying the word while trying to guess it. To further simplify the task, in our game setting, the first group only needs to attack while the second group only needs to defend, thus every group only needs to play just a role in the game.

2.2 Task Definition

To formalize the task setting of Adversarial Taboo, we give a general mathematical form of our game. First, we define the game as G=(A,D,P,R). Here A=(a1,a2,,an) are n attackers and D=(d1,d2,,dm) are m defenders in the game. P=(p1,p2,,pn+m) is a permutation of AD which decides the order of chatting. Here 2in+m,piAD and p1A which means the first player must be an attacker to lead the topic of chatting. J is the judge system, which represents the rules of the game, and we will discuss the details later.

When the game begins, the judge system J will first assign a target word w to the attacker p1. The game will last at most T rounds, and in every round, p1,p2,,pn+m will take turns to say a sentence 22 2 Each agent is allowed to speak more than one sentence. Here without losing generality, we discuss the scenario of one sentence.. Suppose the sentence of the i-th player at the t-th round is si(t), then the judge system J will check the following things:

(1) Whether si(t) is a qualified sentence. To ensure the quality of the game, the judge system will check whether si(t) is a legitimate sentence from a linguistic point of view. Moreover, since the players are required to chat in the game, the judge system will check whether si(t) is relevant to the previous sentences. By these two rules, we can ensure the quality of the sentences in the game.

(2) If pi is a defender, check whether si(t) contains the target word w. If si(t) contains w, then the defenders lose. Note that here “contain” means that si(t) contains w or some variants of w. For example, if the target word is “apple”, the “apples” is a variant of “apple”.

Each defender is given one chance that can be used at any point to predict the target word. If one of the defenders correctly guesses the target word, then defenders win otherwise the game continues. If the game still does not end after T rounds, all defenders that have not predicted the target word, are forced to predict the target word from the vocabulary set. If one of the defenders successfully predicts the word, then defenders win otherwise it is a draw.

2.2.1 Benchmark Settings

In this paper, all experiments are under the same setting where n=m=1, which means that only one attacker and one defender play the game. As a result, the permutation P can only be (a1,d1). For the judge system, we trained a language model using BERT (devlin2019bert) to check the legitimacy of sentences, and GPT-2 (radford2019language) to check the relevance of the sentence. We set T=50 to limit the rounds of the game. The target words are selected from entities in our experiments.

(a) Straightforward inquiries versus no defense.
(b) Straightforward inquiries versus intention detection.
(c) Subtle induction versus intention detection.
(d) Subtle induction versus inducement prevention.
Figure 2: Alternate improvements in attack and defense strategies with complex language skills that could emerge through co-adaptation in Adversarial Taboo. (a) A defender without sense of defending, which is the case of most chatbots, will be successfully attacked by straightforward inquiries. (b) Straightforward inquiries will be easily defeated by a defender via intention detection. (c) The attacker need to hide its intention to prevent being detected by the defender. (d) Both the attacker and defender need to be cautious with utterances when trying to achieve their goals.

3 Experiments

We are organizing the information obtained from our experiments. More details are coming soon!

Language Game Cooperative Adversarial Asymmetric
Formal
Language
Natural
Language
Open-domain
Knowledge
Referential Games
Taboo
Persuasion
Negotiation
Avalon
Werewolf
Who is the Spy
Adversarial Taboo
Table 1: Different language games and their properties, including whether the language game is cooperative and adversarial (or both), whether the information of the players is asymmetric, whether the interactions can be simplified into formal language, whether the interactions can be in the form of natural language, and whether open-domain knowledge helps (or is required by) the game.

4 Discussion and Future Work

In this section, we discuss several promising directions for future research. First of all, we give some potential directions for better attack and defense strategies. Secondly, we point out the challenges for building a robust judge system, as it is an important part in Adversarial Taboo. Finally, we discuss the principles underlying the design of Adversarial Taboo.

4.1 Advanced Settings

We discuss two advanced settings of Adversarial Taboo, which are more realistic and challenging. In a more general setting, based on the current black-box setting, multiple agents from different groups can engage in the game, and perform both attacking and defending. Another interesting direction would be white-box attacking, where the attacker try to make use of the internal structure of the defending model. Note that in realistic scenarios (e.g., open chatbots such as Siri and Xiaoice), although the defending model cannot be obtained directly, history data of (or even free access to) interactions with the defending model is usually available. Thus in white-box setting, the attacker is provided with access to the interaction history of the defender, or the opportunity of actively interacting with the defender for several turns, from which the defending model can be duplicated via model extraction attacks (tramer2016stealing). Then the attacker can search corner cases based on the gradients of the duplicated defending model for inducement (wallace2019universal).

4.2 Better Attack and Defense Strategies

For the attacker that aims to induce the defender to say a specific word unconsciously, the simplest strategy is to talk about the target word directly. However, the limitation is that the defender can narrow down the target word candidates easily. There are two directions for better attack strategies. For black-box attacking, reinforcement learning should be the basic framework for the attacking system, which has been proven successful on many adversarial games (silver2017mastering). Besides, knowledge graphs are very useful for the conditional language generation and have been widely used in the dialog systems (vougiouklis2016neural; ghazvininejad2018knowledge).

For the defender that aims to avoid the target word in utterances and figure out the target word from the context, the critical challenge is how to remain the semantic relevance in the interaction without saying the target word. Some universal responses are discouraged in Adversarial Taboo, like ”It’s an interesting question.”, so that the system should learn to converse both carefully and informatively.

As learning from scratch is difficult for current data-driven models, we will release an annotated dataset for attackers and defenders to imitate the strategies of human. The final challenge is how to learn from scratch without annotated data as Alpha Go Zero (silver2017mastering). We expect the emergence of complex language skills in attack and defense through co-adaption in Adversarial Taboo, as shown in Figure 2.

4.3 Robust Judge System

The current judge system consists of two components : fluency judge system and relevance judge system. Fluency judge system aims to compute the legitimacy in syntax. Relevance judge system aims to compute the semantic relevance of the interaction. Due to the diversity of natural language, the action space is not enumerable and there are a large number of illegitimate actions. We assume the judge can detect the illegitimate actions in the game such as the irrelevant responses and the unreadable text. However, the current judging strategy has some limitations shared with existing evaluation methods for language generation. For example, the agent system may generate high-perplexity but unreadable sentences that hacks the evaluation method based on language modeling. What’s more, the evaluation for dialogue systems also is not well-established, which has received much attention from the community (tao2018ruber; ghazarian2019better). Therefore, the research for a more robust judge system can remarkably benefit the development of the evaluation of language generation.

4.4 Game Design Principles

In this section, we discuss the principles underlying the design of Adversarial Taboo. Adversarial Taboo takes the form of conversations, where agents are required to communicate in natural language. A judge system is introduced to prevent agents from generating unreadable or irrelevant sentences. However, different from other adversarial games (e.g., Go games), the diverse and complex natural language interactions in Adversarial Taboo cannot be explicitly defined with a small set of rules, which makes the automatic evaluation particularly challenging. The automatic evaluation of natural language interactions is also a long-standing problem in open dialogue systems (liu2016not).

Nevertheless, several important properties must be satisfied for good natural language interactions: (i) fluency: Each sentence should be legitimate in syntax. (ii) relevance: The response should be relevant in semantics. We adopt pre-trained neural language models to measure the fluency and relevance in our judge system. Note that it is possible that a conversation between agents meets the aforementioned properties, but still differs from human language. We expect continuous improvements in the judge system will constrain the conversation more closely to human language. Note that we do not expect the emergence of natural language in the game, but obtaining sophisticated language skills from Adversarial Taboo on the base of primary language usage. The important thing is that the game of Adversarial Taboo is well defined once given the judge system (i.e., the rules are deterministic) even if the judge system may still need improvements.

4.5 Language Games

In this section, we discuss different language games and their properties, including the language games that have been investigated and the ones that are unexplored but promising for future research, as shown in Figure 1.

We compare several important properties. (1) Cooperative and Adversarial indicates whether the players share the same utility. Note that a language game can be both cooperative (within groups) and adversarial (between groups). (2) Asymmetric indicates whether the information and roles of players are asymmetric. Asymmetry in adversarial language games often leads to deception and interrogation. (3) Formal Language denotes whether the interactions can be simplified into formal language, where interactions are defined by specific rules on a finite set of atomic actions. Natural Language indicates whether the game can be played in natural language. Note that although many language games (e.g. Negotiation and Avalon) can take the form of natural language, they can be simplified into formal language. (4) Open-domain Knowledge indicates whether open-domain knowledge helps (or is required by) the game. We believe the ability to incorporate and utilize open-domain knowledge is critical to advanced natural language intelligence.

We give a brief description of the language games in Figure 1: (1) Referential Games (lewis1969convention) are a broad family of cooperative interactive games, where one agent needs to select a specific object from candidates based on the descriptions from another agent (lazaridou2017multi; havrylov2017emergence; bouchacourt2018agents; kharitonov2019egg), or two agents with incomplete private information communicate to achieve a common goal (vogel2013implicatures; he2017learning; khani2018planning). Taboo is a variant of referential games, where the target object is a word. (2) Persuasion is a game where multiple agents with conflicting opinions persuade each other (prakken2006formal) or an audience (amgoud2013axiomatic). (3) Negotiation requires agents to divide items with different values based on conversation (sadri2001dialogues; lewis2017deal; he2018decoupling). (4) Avalon33 3 https://en.wikipedia.org/wiki/The_Resistance_(game) and Werewolf44 4 https://en.wikipedia.org/wiki/Werewolf_(social_deduction_game) are two popular role-playing language games, where the attackers seek to disrupt the defenders, while the defenders need to identify the hidden attackers among them. (5) In Who Is the Spy, one agent (the spy) is assigned with a word, while the rest agents, assigned with a different but similar word, aim to identify the spy through language interactions.

Despite the efforts in some of the language games, many language games, especially adversarial language games remain unexplored in NLP. Since different advanced language skills are required in different games, we expect investigating these language games in NLP will both promote and benchmark the development of advanced natural language intelligence.

5 Related Work

Pragmatics. is the study of language meaning in the interactional context (mey2001pragmatics), which is a critical subfield of linguistics and plays an important role in language teaching (kasper2001pragmatics). We call our Adversarial Taboo a pragmatics game because it requires the attacker and defender to use advanced language skills to mislead the other player and unearth his intention from their interaction.

Most of the existing NLP works focus on syntax and semantics, which are about analyzing sentence structures and understanding word meanings, but ignore the impact of pragmatics. There are also some works (vogel2013implicatures; smith2013learning; monroe2015learning; hawkins2015you; andreas-klein-2016-reasoning; khani2018planning) study the pragmatic reasoning ability of NLP models with pragmatics language games (krauss1964changes; clark1986referring; potts2012goal) and pragmatics theories such as the speech-act theory  (searle1980speech) and Rational Speech Act framework (golland-etal-2010-game; goodman2016pragmatic). However, the existing pragmatics games mainly focus on cooperation rather than competition. Actually, competition and cheating require higher-level language skills. In our Adversarial Taboo, the attacker needs to hide his intention and induce the defender, and the defender needs to unearth the attacking intention to win the game, which is more challenging than understanding the speaking utterances and cooperating to achieve some goals.

Adversarial Attack.  (zhang2019adversarial) aims at finding adversarial examples of NLP models to evaluate the model robustness (jia-liang-2017-adversarial; michel-etal-2019-evaluation) or to help to train a robust model (wang-bansal-2018-robust; cheng-etal-2019-robust). The adversarial attack methods still focus on statically attacking NLP models with corrupted semantics and enhance the model robustness in semantic understanding. Our Adversarial Taboo challenges models with dynamic interactions between agents, and will enhance the pragmatic language skills of models. cheng2019evaluating study adversarial learning in negotiation dialogues (lewis2017deal), where agents divide several items with different values based on conversation, so that each item is assigned to one agent. Although the game is an adversarial game and also takes the form of conversations, the interactions can be simplified into formal language, where each utterance (i.e., proposal and response) can be represented by several natural numbers indicating item allocation (sadri2001dialogues). In contrast, Adversarial Taboo is focused on natural language mastery with multiple complex language skills in open domain.

Dialogue Systems. can be divided into two categories: goal-oriented systems and non-goal-oriented systems. Goal-oriented dialogue systems aim to assist users to accomplish certain tasks (e.g., booking hotels or restaurants) (goddeau1996form; williams2013dialog; henderson2014second; cuayahuitl2015strategic; zhao2016towards), while none-goal-oriented dialogue systems (also known as chatbots) interact with human in open domains naturally to provide entertainment, and typically generate responses by maximizing the likely-hood of human responses (ritter2011data; banchs2012iris; li2016diversity; li2016persona; serban2016building; serban2017hierarchical; zhou2018emotional). To better approximates the real-world goal of dialogue agents in conversation, recent years have witnessed a rising interest in developing dialogue systems through goal-oriented interactions between agents (e.g., getting reward in pragmatics games) (li2016deep; das2017learning; lewis2017deal). The game of Adversarial Taboo takes the form of conversations, and we absorb many settings in dialogue systems to define our task.

References