Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning

  • 2019-10-31 17:01:24
  • Arvind Neelakantan, Semih Yavuz, Sharan Narang, Vishaal Prasad, Ben Goodrich, Daniel Duckworth, Chinnadhurai Sankar, Xifeng Yan
  • 14

Abstract

Task-oriented dialog presents a difficult challenge encompassing multipleproblems including multi-turn language understanding and generation, knowledgeretrieval and reasoning, and action prediction. Modern dialog systems typicallybegin by converting conversation history to a symbolic object referred to asbelief state by using supervised learning. The belief state is then used toreason on an external knowledge source whose result along with the conversationhistory is used in action prediction and response generation tasksindependently. Such a pipeline of individually optimized components not onlymakes the development process cumbersome but also makes it non-trivial toleverage session-level user reinforcement signals. In this paper, we developNeural Assistant: a single neural network model that takes conversation historyand an external knowledge source as input and jointly produces both textresponse and action to be taken by the system as output. The model learns toreason on the provided knowledge source with weak supervision signal comingfrom the text generation and the action prediction tasks, hence removing theneed for belief state annotations. In the MultiWOZ dataset, we study the effectof distant supervision, and the size of knowledge base on model performance. Wefind that the Neural Assistant without belief states is able to incorporateexternal knowledge information achieving higher factual accuracy scorescompared to Transformer. In settings comparable to reported baseline systems,Neural Assistant when provided with oracle belief state significantly improveslanguage generation performance.

 

Quick Read (beta)

Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning

Arvind Neelakantan
Google
[email protected]
&Semih Yavuz11footnotemark: 1
Salesforce
[email protected]
&Sharan Narang11footnotemark: 1
Google
[email protected]
&Vishaal Prasad
Google
&Ben Goodrich 22footnotemark: 2
Google
&Daniel Duckworth
Google
&Chinnadhurai Sankar 22footnotemark: 2
Montreal Institute for Learning Algorithms
&Xifeng Yan
UC Santa Barbara
Equal contributionWork done when all authors were at Google
Abstract

Task-oriented dialog presents a difficult challenge encompassing multiple problems including multi-turn language understanding and generation, knowledge retrieval and reasoning, and action prediction. Modern dialog systems typically begin by converting conversation history to a symbolic object referred to as belief state by using supervised learning. The belief state is then used to reason on an external knowledge source whose result along with the conversation history is used in action prediction and response generation tasks independently. Such a pipeline of individually optimized components not only makes the development process cumbersome but also makes it non-trivial to leverage session-level user reinforcement signals. In this paper, we develop Neural Assistant: a single neural network model that takes conversation history and an external knowledge source as input and jointly produces both text response and action to be taken by the system as output. The model learns to reason on the provided knowledge source with weak supervision signal coming from the text generation and the action prediction tasks, hence removing the need for belief state annotations. In the MultiWOZ dataset, we study the effect of distant supervision, and the size of knowledge base on model performance. We find that the Neural Assistant without belief states is able to incorporate external knowledge information achieving higher factual accuracy scores compared to Transformer. In settings comparable to reported baseline systems, Neural Assistant when provided with oracle belief state significantly improves language generation performance.

 

Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning


  Arvind Neelakantanthanks: Equal contribution Google [email protected] Semih Yavuz11footnotemark: 1 thanks: Work done when all authors were at Google Salesforce [email protected] Sharan Narang11footnotemark: 1 Google [email protected] Vishaal Prasad Google Ben Goodrich 22footnotemark: 2 Google Daniel Duckworth Google Chinnadhurai Sankar 22footnotemark: 2 Montreal Institute for Learning Algorithms Xifeng Yan UC Santa Barbara

\@float

noticebox[b]\[email protected]

1 Introduction

User Neural Assistant
Hello, please find me a restaurant?
Response: Sure, what type of food are you looking for?
I feel like eating mediterranean food.
Response: There are two restaurants in the moderate price range, la mimosa and shiraz restaurant. Do you have a preference?
No.
Response: La Mimosa is in the south part of town in the moderate price range.
Sounds good. Please book a table on Monday at 19:30 for 4 people.
Response: I have booked you a table for 4 at La Mimosa on Monday at 19:30. Your reference number is VRZRPRCM. Is there anything else I can help you with?
Action:   restaurant-book(people=4, time=19:30, day=monday)
No, thanks. Have a good day.
Response: You’re welcome. Have a great day!
Figure 1: Example conversation to book a restaurant with the trained Neural Assistant model. The model apart from the conversation history is attending to a knowledge base containing 5K triples to incorporate external knowledge.

Task-oriented dialog systems have been a longstanding goal of artificial intelligence. A single turn of the task-oriented dialog problem can be formalized as mapping conversation history containing both user and system utterances to system action and system text response11 1 We ignore speech-to-text and text-to-speech components in this work.. Along with the conversation history, task-oriented systems have access to an external knowledge source (knowledge bases, documents etc.) relevant to the task that is being completed. For example, a knowledge base containing restaurants and their attributes is typically provided to a food ordering dialog system.

Dialog systems starting from the early rule-based, expert systems Weizenbaum (1966) to the present commercially available virtual assistants like Apple Siri, Amazon Alexa, and Google Assistant rely on a pipeline containing many components. Having such a pipeline seems unavoidable given that task-oriented dialog encompasses multiple problems including multi-turn language understanding and generation, knowledge retrieval and reasoning, and action prediction. Dialog systems typically begin by converting conversation history to belief state by using supervised learning Henderson et al. (2013); Rastogi et al. (2017); Mrkšić et al. (2017); Wen et al. (2017). The belief state is then used to reason on an external knowledge source whose result along with the conversation history is used in action prediction and response generation tasks independently. However, relying on a pipeline of individually optimized components makes these systems hard to scale. Moreover, success of consumer facing systems rely on efficient incorporation of user reinforcement signals which is non-trivial for a pipeline system.

End-to-end learned deep learning methods have recently enjoyed much success over pipeline systems in many tasks such as image recognition, speech recognition, and machine translation Lecun et al. (2015). Such methods have been applied to task-oriented dialog only in a limited way. For example, Rojas-Barahona et al. (2017) use a separate deep neural network trained independently for every individual component. Bordes et al. (2017) attend to a small knowledge base but do not have a generative model for text response generation. A major difficulty has been on efficiently incorporating external (structured or unstructured) knowledge to action prediction and text response generation models. In this paper, we develop Neural Assistant: a single neural network model that takes conversation history and an external knowledge source as input and jointly produces both text response and action to be taken by the system as output. The model learns to reason on the provided knowledge source with weak supervision signal coming from the text generation and the action prediction tasks, hence removing the need for belief state annotations.

We evaluate our approach on the MultiWOZ dataset Budzianowski et al. (2018). The dataset contains approximately 10,000 multi-turn dialogs between users and wizards. Along with conversations, the dataset contains both belief state and dialog act (or semantic parse) annotations. We only predict belief state annotations that correspond to action prediction and remove belief state annotations that are used for accessing the knowledge base from the dataset. We do not use the dialog act annotations in our study. Figures 1, 4 and 5 are examples conversations with the Neural Assistant model. We study the effect of distant supervision, and the size of knowledge base on model performance. We find that the Neural Assistant without belief states is able to incorporate external knowledge information achieving higher factual accuracy scores compared to Transformer. In settings comparable to reported baseline systems, Neural Assistant when provided with oracle belief state significantly improves language generation performance. Even with a weakly labeled knowledge base, our system comes very close to the quality of the baseline belief state system.

2 Neural Assistant

Figure 2: We formulate the task-oriented dialog problem as taking conversation history along with a relevant knowledge base (KB) as input, and generating system action and the assistant’s next turn text response as output. Here we show two examples of the expected Assistant response. In the first turn no system action is taken, but in the second turn a system action is taken as all the necessary information is available. Note that only some of the triples provided in the KB are relevant to the conversation.

We formulate the task-oriented dialog problem as taking conversation history along with a relevant knowledge base (KB) as input, and generating system action and the assistant’s next turn text response as output (Figure 2). For example, the conversation history could contain a single turn of user utterance "find me an inexpensive Italian restaurant in San Francisco," and one possible next turn assistant response could be "how about The Great Italian?" Here, the external knowledge required to generate the output would be present in the provided KB. A common way to store such facts is in triple format, e.g. in this case the KB could contain (The Great Italian, type, restaurant), (The Great Italian, cuisine, Italian), (The Great Italian, price, cheap) and so on. Given the above two utterances, the user might say "sounds good, can you book a table for 4 at 7pm?", for which the assistant performs a system action book_table(name=The Great Italian, num_seats=4, time=7pm), and generates a text response "Done!"

Neural Assistant learns to directly map the conversation history and KB to next system action and text response without any intermediate symbolic states or intermediate supervision signals. We first begin by introducing notation, then we describe the model architecture and the training objective.
Conversation History consists of alternating user and assistant turns. Let ((u1,a1),(u2,a2),,(uU,aU)) denote conversation history containing U turns each of user utterance (ui) and assistant utterance (ai). The user and assistant turns each contain variable number of word tokens.
Knowledge Base: We assume the external knowledge required to solve the task is provided. While it is possible to leverage both structured and unstructured knowledge in our framework, in this work, we consider external knowledge in the form of structured KB containing a list of triples. Let K=(e11,r1,e12),(e21,r2,e22),,(eM1,rK,eM2) be the list of triples in the provided KB.
Output consists of both the system action and text response.

2.1 Model and Training Objective

Figure 3: Neural Assistant model with attention to provided knowledge base. Transformer encoder consumes the conversation history (containing alternating user and assistant turns). The Transformer decoder generates the output sequence after performing decoder attention on the encoded conversation history and the knowledge base. Note that only some of the triples provided in the KB are relevant to the conversation and the model has to learn to pick them from weak supervision signal.

Neural Assistant is an extension of the Transformer Vaswani et al. (2017) encoder-decoder model. Our model additionally attends to the provided KB to incorporate external knowledge. We encode the knowledge triples separately (in parallel) and the decoder attends to the triples in addition to the input conversation history.

Transformer encoder is used to consume the input conversation history. Let x=(x1,x2,,xP) be the concatenated conversation history (both assistant and user turns separated by delimiters) containing P tokens. Then the encoder produces P hidden states h1,h2,,hP after word embedding lookup and multiple self attention layers. We represent each KB fact as an average of the word embeddings of the tokenized triple. We denote the representations of the triples K=(e11,r1,e12),(e21,r2,e22),,(eM1,rK,eM2) by v1,v2,,vM.

The transformer decoder which contains both self-attention and encoder-decoder attention layers generates the output sequence consisting of both the system action and text response one token at a time, left-to-right. We tokenize the system action with the text tokenizer and generate a concatenated version of system action and text response as one long sequence. While the encoder-decoder attention layers in Transformer Vaswani et al. (2017) only attend to input (conversation history), we make a modification to the Transformer decoder where it attends to both the encoder hidden states of the conversation history, and to the representation of the fact triples (Figure 3). So, the decoder attention heads attend to the set [h1,,hP,v1,,vM]. In previous work with Transformer, the decoder attends only to [h1,,hP].

Let y=(y1,y2,,yT) denote the target sequence, we model the target sequence distribution as

Pgen(y|x,K)=t=1TPθ(yt|y1:t-1,x,K). (1)

Given a training set of N examples ((x1,K1,y1),(x2,K2,y2),,(xN,KN,yN)), the objective function to be maximized is given by

gen(θ)=i=1Nt=1Tilogpθ(yti|y1:t-1i,xi,Ki). (2)

We use teacher-forcing (Williams and Zipser, 1989) where the model conditions on ground-truth previous tokens in the output and ground-truth previous assistant turns in the conversation history.

2.2 Distant Supervision

We adopt a technique called distant supervision Mintz et al. (2009) widely used in knowledge base construction research. At train time, we (weakly) label facts in the KB positive if some word in the entities of the triple (e1,e2) in (e1,r,e2) are present in the ground-truth target sequence. This weak supervision signal could potentially guide the decoder attention to KB described above.

The distant supervision objective to be maximized is given by

d(θ)=i=1Nm=1Mi(logqm[[ym==1]]+log(1-qm)[[ym==0]]) (3)

where qm is the attention probability, and ym is an indicator variable that is set to 1 if some word in the entities of the triple are present in the ground-truth target sequence and 0 otherwise. The model now maximizes an interpolation of the two objective functions in Equation 2 and Equation 3, given by

final(θ)=αgen(θ)+(1-α)d(θ). (4)

where α[0,1] is a weighting term tuned on the development set.

3 Related Work

In past work, dialog systems have generally relied on pipeline systems Singh et al. (2002); Levin and Pieraccini (2000). Deep learning has been applied to task-oriented dialog in many recent studies Henderson et al. (2013); Wen et al. (2015); Williams et al. (2017); Rastogi et al. (2017); Mrkšić et al. (2017); Wen et al. (2017); Bordes et al. (2017). One line of work has been on using deep learning to predict belief states using supervised learning Henderson et al. (2013); Rastogi et al. (2017). The other line of work makes use of pipelines consisting of many components each represented as a neural network trained independently Wen et al. (2015); Mrkšić et al. (2017); Wen et al. (2017); Eric and Manning (2017).

The line of work closest to our is in the use of memory networks Weston et al. (2015); Sukhbaatar et al. (2015) for task-oriented dialog Bordes et al. (2017); Pere and Liu (2017); Henderson et al. (2017). While all these works incorporate an external knowledge source directly to text response generation, they do not employ a generative model for response generation, and instead rely on selecting a response from a list of candidate responses which is impractical in real-word settings. More recently, Wu et al. (2019) use a generative model instead of a text classification model but they along with previous work Bordes et al. (2017); Pere and Liu (2017); Henderson et al. (2017) work with much smaller knowledge bases where unlike in our case, full softmax attention over the knowledge base is computationally feasible. Also, they do not generate both the text response and system action jointly together in a single model.

Other kinds of dialog tasks have also been tackled by deep learning. This line of work has predominantly been in the chit-chat setting where generative deep learning models are used to generate text responses Vinyals and Le (2015); Serban et al. (2017); Li et al. (2016). More recent work has extended this line of work to language based negotiation games Lewis et al. (2017) and dialog systems with persona Zhang et al. (2018).

4 Experiments

We evaluate our method on the MultiWOZ Budzianowski et al. (2018) dataset. The dataset contains close to 8,000 training examples and 1,000 examples in both the validation and test sets. We report results on test set in the tables below. The dataset includes an associated knowledge base containing 28,483 triples. To evaluate the performance of different methods, we use F-1 score for action prediction (Action F-1) and BLEU score for text response generation. Apart from BLEU score which primarily measures fluency, we also report Entity-F1 score which is an approximate metric to measure the factualness of the text response. We get the list of entities mentioned in the ground truth response and compare it to the list of entities in the model prediction. We use exact string match to get the list of entities. Our models are implemented in the Tensor2Tensor Vaswani et al. (2018) framework. All models are trained for 50k steps. Due to the small size of the dataset, we use the tiny Transformer hyper-parameter setting in Tensor2Tensor. Unless otherwise stated the Neural Assistant is trained without the distant supervision objective.

Figures 1, 4 and 5 are examples conversations with the Neural Assistant model in real-time to complete a task. Note that the model is trained at turn-level, where the dialog history fed into model as input consists of the previous ground-truth turns of the dialog example. The model is not exposed to text responses it generated in the previous turns as a part of input dialog history in training time. However, in the conversations in the figures, the actual text responses generated by model itself are used as the assistant’s side of dialog history to be fed as input to model for generating text responses and actions in the following turns of the dialog.

4.1 Results

First, we benchmark the Transformer model on belief state prediction and text generation problems to compare with the results reported in Budzianowski et al. (2018). The Transformer baseline models only take the conversation history as input. They skip the KB and do not use oracle belief state annotations. The text generation results are in Table 1. We treat belief state prediction also as a sequence-to-sequence problem and achieve 72.9 F-1 score on belief state prediction, which is once again significantly higher than 63.8 F-1 score from the baseline system.

Next, we start reporting results on the Neural Assistant model. We evaluate our framework in increasingly harder settings by gradually increasing the size of the external knowledge source to be incorporated by the model. To begin with as done in Budzianowski et al. (2018), we include oracle belief state annotations which reduces the size of the KB to be considered for a given input to be less than 10 triples. As shown in Table 1, the Neural Assistant model achieves a BLEU score of 25.71, significantly higher than the baseline system Budzianowski et al. (2018) that gets 18.9 BLEU score. Since the oracle belief states are provided to the model, we do not evaluate the Entity F-1 and Action F-1 score for this setting. Then we make the setting slightly harder where the model consumes only weakly labeled positive triples from distant supervision (Section 2.2). Here, the size of the KB to be considered is around 50 triples per example. Even with a weakly labeled knowledge base, our system comes very close to the quality of the baseline belief state system.

Model BLEU Action F-1 Entity F-1
System with Oracle Belief State Budzianowski et al. (2018) 18.9 N/A N/A
Transformer 14.1 90.0 40.0
Neural Assistant (oracle triples) 25.71 N/A N/A
Neural Assistant (weakly labeled positive triples) 17.9 90.8 90.9
Table 1: Comparison of Neural Assistant with other baselines.

4.2 Neural Assistant with Large Knowledge Base

Now, we carefully study the extent to which Neural Assistant models can handle large KBs. We get the set of weakly labeled positive triples for every example and fill up the rest of KB with randomly sampled negative examples both at train and test time. The goal of this experiment is to study the effect of KB size on Neural Assistant performance. Another way to look at this experiment is to study the extent to which our model can tolerate the errors of a retrieval system. The performance of Neural Assistant on different KB sizes are in Table 2. The BLEU score and Entity F-1 scores for Neural Assistant reduce as the KB size increases. The model is able to incorporate external knowledge effectively as long as the KB size is 2000 triples or smaller. Beyond that, the Entity F-1 score degrades quite rapidly. We also study the effect of distant supervision discussed in Section 2.2 as an additional training objective on Neural Assistant performance. Our experiments show that in some cases distant supervision helps the model to get better performance particularly higher entity F-1 score but not in all cases. Finally, we report results from using the entire KB at test time using a model that is trained with 5,000 triples at train time without distant supervision. In this setting, the entity F-1 score is quite low indicating since the model is not able to select the relevant entities from the knowledge base at test time. The model cannot consume the entire KB at train time as it runs out of memory on ML accelerators.

Neural Assistant Neural Assistant with DS
Size of KB BLEU Action F-1 Entity F-1 BLEU Action F-1 Entity F-1
100 16.7 87.6 74.0 17.0 87.2 74.1
2000 15.5 87.9 65.0 15.4 86.3 65.5
5000 14.3 86.2 59.2 14.4 85.7 59.0
8000 13.9 85.7 55.5 14.0 88.6 54.9
12000 14.3 85.5 50.3 13.7 87.7 50.2
13000 13.7 88.1 48.0 13.8 86.7 49.2
28483 (at test) 13.5 90.2 42.9 - - -
Table 2: We compare the performance of Neural Assistant with and without distant supervision (Section 2.2). The performance of the Neural Assistant model drops as the number of negative examples (Size of KB) increases. Neural Assistant is able to incorporate external knowledge information especially when the size of KB is not too large.

5 Conclusion

We introduced Neural Assistant, a single neural network model that takes conversation history and an external knowledge source as input and jointly produces both text response and action to be taken by the system as output. The model learns to reason on the provided knowledge source with weak supervision signal coming from the text generation and the action prediction tasks, thereby removing the need for belief state annotations. Future work involves developing techniques that can effectively handle much larger KBs.

User Neural Assistant
Hi there, can you find me a place to stay?
Response: Sure, do you have a price range in mind?
I prefer a place with moderate price.
Response: There are several in the moderate price range. Do you have a preference?
I would prefer it to be in the south.
Response: I have two guesthouses in the south in the moderate price range. Would you like to book one of them?
Yes, please.
Response: I’d be happy to book you a room at aylesbray lodge guest house, and bridge guest house. Which one do you prefer?
bridge guest house sounds good.
Response: I can book that for you. How many people will be staying and how many nights?
4 people will be staying for 2 nights starting on this friday.
Response: Booking was successful. Reference number is : GBHPGZH.
Action:   hotel-book(stay=2,people=4,day=friday)
That was quick, thank you Neural Assistant!
Response: You’re welcome. Is there anything else I can help you with?
No, thanks. Have good week!
Response: You’re welcome. Have a great day.
Figure 4: Example conversation to book a hotel with the trained Neural Assistant model. The model apart from the conversation history is attending to a knowledge base containing 5K triples to incorporate external knowledge.
User Neural Assistant
Hello, can you find me a hotel to stay?
Response: Sure, do you have a price range in mind?
Yes, somewhere in the moderate price range.
Response: There are several in the moderate price range. Do you have a preference on area?
Yes, how about somewhere in the north?
Response: I have two options for you. The Ashley Hotel and Lovell Lodge. Would you like to book one of these?
Yes.
Response: What day would you like to start your stay?
Can you book it for 1 people starting on Thursday for 2 night?
Response: Booking was successful. Reference number is : FKQVSY9Y.
Action:   hotel-book(stay=2,people=1,day=thursday)
Thank you!
Response: You’re welcome. Is there anything else I can help you with?
Can you also find me a nice restaurant?
Response: Sure, do you have a food preference?
Yes, I prefer Italian food.
Response: There are two Italian restaurants in the south area. One is in the moderate price range and one is in the centre. Do you have a preference?
I prefer the one in moderate price range.
Response: Pizza Hut Cherry Hinton is in the south part of town and is in the moderate price range. Would you like to book a table?
Yes, please book a table for 1 person at 20:30.
Response: I’m sorry, that time is not available. Would you like to try another time or day?
Action:   restaurant-book(people=1,time=20:30,day=thursday)
Try 19:30.
Response: I was able to book you at Pizza Hut Cherry Hinton. Your reference number is FG1KWP5. Is there anything else I can help you with?
No, thanks. Have a good night.
Response: Thank you for using our service. Have a great day.
Figure 5: Example conversation to book a hotel and a restaurant in a single conversation with the trained Neural Assistant model. The model apart from the conversation history is attending to a knowledge base containing 5K triples to incorporate external knowledge.

References

  • A. Bordes, Y. Boureau, and J. Weston (2017) Learning end-to-end goal-oriented dialog. ICLR. Cited by: §1, §3, §3.
  • P. Budzianowski, T. Wen, B. Tseng, I. Casanueva, S. Ultes, O. Ramadan, and M. Gasic (2018) MultiWOZ - a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. EMNLP. Cited by: §1, §4.1, §4.1, Table 1, §4.
  • M. Eric and C. Manning (2017) A copy-augmented sequence-to-sequence architecture gives good performance on task-oriented dialogue. EACL. Cited by: §3.
  • M. Henderson, B. Thomson, and S. Young (2013) Deep neural network approach for the dialog state tracking challenge. SIGDIAL. Cited by: §1, §3.
  • M. Henderson, B. Thomson, and S. Young (2017) QUERY-reduction networks for question answering. Cited by: §3.
  • Y. Lecun, Y. Bengio, and G. Hinton (2015) Deep learning. Nature. Cited by: §1.
  • E. Levin and R. Pieraccini (2000) A stochastic model of computer-human interaction for learning dialogue strategies. IEE Transactions on Speech and Audio Processing. Cited by: §3.
  • M. Lewis, D. Yarats, Y. N. Dauphin, D. Parikh, and D. Batra (2017) Deal or no deal? end-to-end learning for negotiation dialogues. EMNLP. Cited by: §3.
  • J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan (2016) A diversity-promoting objective function for neural conversation models. NAACL. Cited by: §3.
  • M. Mintz, S. Bills, R. Snow, and D. Jurafsky (2009) Distant supervision for relation extraction without labeled data. ACL. Cited by: §2.2.
  • N. Mrkšić, D. Ó Séaghdha, T. Wen, B. Thomson, and S. Young (2017) Neural belief tracker: data-driven dialogue state tracking. ACL. Cited by: §1, §3.
  • J. Pere and F. Liu (2017) Gated end-to-end memory networks. ACL. Cited by: §3.
  • A. Rastogi, D. Hakkani-Tur, and L. Heck (2017) Scalable multi-domain dialogue state tracking. Proceedings of IEEE ASRU. Cited by: §1, §3.
  • L. M. Rojas-Barahona, M. Gasic, N. Mrksic, P. Su, S. Ultes, T. Wen, S. J. Young, and D. Vandyke (2017) A network-based end-to-end trainable task-oriented dialogue system. EACL. Cited by: §1.
  • I. V. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio (2017) A hierarchical latent variable encoder-decoder model for generating dialogues. AAAI. Cited by: §3.
  • S. Singh, D. Litman, M. Kearns, and M. Walker (2002) Optimizing dialogue management with reinforcement learning: experiments with the njfun system. Journal of Artificial Intelligence Research. Cited by: §3.
  • S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus (2015) End-to-end memory networks. NeurIPS. Cited by: §3.
  • A. Vaswani, S. Bengio, E. Brevdo, F. Chollet, A. N. Gomez, S. Gouws, L. Jones, Ł. Kaiser, N. Kalchbrenner, N. Parmar, R. Sepassi, N. Shazeer, and J. Uszkoreit (2018) Tensor2Tensor for neural machine translation. CoRR. Cited by: §4.
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. NeurIPS. Cited by: §2.1, §2.1.
  • O. Vinyals and Q. V. Le (2015) A neural conversational model. CoRR. Cited by: §3.
  • J. Weizenbaum (1966) ELIZA a computer program for the study of natural language communication between man and machine. Computation Linguistics. Cited by: §1.
  • T. Wen, M. Gasic, N. Mrkšić, P. Su, D. Vandyke, and S. Young (2015) Semantically conditioned lstm-based natural language generation for spoken dialogue systems. EMNLP. Cited by: §3.
  • T. Wen, Y. Miao, P. Blunsom, and S. J. Young (2017) Latent intention dialogue models. ICML. Cited by: §1, §3.
  • J. Weston, S. Chopra, and A. Bordes (2015) Memory networks. ICLR. Cited by: §3.
  • J. D. Williams, K. Asadi, and G. Zweig (2017) Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. ACL. Cited by: §3.
  • R. J. Williams and D. Zipser (1989) A learning algorithm for continually running fully recurrent neural networks. Neural computation. Cited by: §2.1.
  • C. Wu, R. Socher, and C. Xiong (2019) GLOBAL-to-local memory pointer networks for task-oriented dialogue. ICLR. Cited by: §3.
  • S. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela, and J. Weston (2018) Personalizing dialogue agents: i have a dog, do you have pets too?. ACL. Cited by: §3.