Building an Application Independent Natural Language Interface

  • 2019-10-30 18:57:28
  • Sahisnu Mazumder, Bing Liu, Shuai Wang, Sepideh Esmaeilpour
  • 11

Abstract

Traditional approaches to building natural language (NL) interfaces typicallyuse a semantic parser to parse the user command and convert it to a logicalform, which is then translated to an executable action in an application.However, it is still challenging for a semantic parser to correctly parsenatural language. For a different domain, the parser may need to be retrainedor tuned, and a new translator also needs to be written to convert the logicalforms to executable actions. In this work, we propose a novel and applicationindependent approach to building NL interfaces that does not need a semanticparser or a translator. It is based on natural language to natural languagematching and learning, where the representation of each action and each usercommand are both in natural language. To perform a user intended action, thesystem only needs to match the user command with the correct actionrepresentation, and then execute the corresponding action. The system alsointeractively learns new (paraphrased) commands for actions to expand theaction representations over time. Our experimental results show theeffectiveness of the proposed approach.

 

Quick Read (beta)

Building an Application Independent Natural Language Interface

Sahisnu Mazumder,  Bing Liu,  Shuai Wang,  Sepideh Esmaeilpour
Department of Computer Science, University of Illinois at Chicago, USA
[email protected],[email protected]
[email protected],[email protected]

Abstract

Traditional approaches to building natural language (NL) interfaces typically use a semantic parser to parse the user command and convert it to a logical form, which is then translated to an executable action in an application. However, it is still challenging for a semantic parser to correctly parse natural language. For a different domain, the parser may need to be retrained or tuned, and a new translator also needs to be written to convert the logical forms to executable actions. In this work, we propose a novel and application independent approach to building NL interfaces that does not need a semantic parser or a translator. It is based on natural language to natural language matching and learning, where the representation of each action and each user command are both in natural language. To perform a user intended action, the system only needs to match the user command with the correct action representation, and then execute the corresponding action. The system also interactively learns new (paraphrased) commands for actions to expand the action representations over time. Our experimental results show the effectiveness of the proposed approach.

Building an Application Independent Natural Language Interface


Sahisnu Mazumder,  Bing Liu,  Shuai Wang,  Sepideh Esmaeilpour Department of Computer Science, University of Illinois at Chicago, USA [email protected],[email protected] [email protected],[email protected]

1 Introduction

Existing techniques (Zelle and Mooney, 1996; Artzi and Zettlemoyer, 2013; Andreas and Klein, 2015; Zettlemoyer and Collins, 2012; Li and Rafiei, 2018) for building natural language interfaces (NLIs) often use a semantic parser to parse the natural language (NL) command from the user and convert it to a logical form (LF) and then, translate LF into an executable action in the application. This approach has several limitations: (1) it is still very challenging for a semantic parser to correctly parse natural language. (2) For different applications, the parser may need to be retrained/tuned with an application domain corpus. (3) For each application, a different translator is needed to convert the logic forms into executable actions. (4) Due to the last two limitations, it is difficult to build an application-independent system.

This work proposes a novel approach to NLI design, called Natural Language to Natural Language (NL2NL). This approach works in the following setting: Let the underlying application has a finite set of executable actions/functions that the user can use to accomplish his/her goal. These actions or functions for end users are commonly defined by the application as Application Programming Interfaces (APIs). Most applications work in this setting as providing APIs is a standard approach. For each API, the proposed approach attaches a natural language representation of it, which is a set of one or more API seed commands (ASCs) written in natural language (e.g., by the application developer) just like a natural language command from the user to invoke the API. The only difference is that the objects to be acted upon in each ASC are replaced with variables, which indicate the arguments of the API. These variables will be instantiated to actual objects/arguments in a real user command. Let’s see an example.

Consider an application like Microsoft Paint and an API like drawCircle(X1,X2) (drawing a circle having color X1 at coordinate X2). One ASC for this API can be “draw a X1 circle at X2”. Clearly, X1 and X2 are arguments of the API and represented as variables in the ASC. When the user gives a natural language command, the system simply matches the command with one of the ASCs and in doing so, also instantiates the objects (i.e., API arguments referred in the user command) for the associated API to be executed. For example, the user command “draw a blue circle at (20, 40)” can be grounded to the aforementioned ASC, where the grounded API arguments are X1=blue’ and X2=(20, 40). Since both the user command and the ASCs are written in natural language, we call this approach natural language to natural language (NL2NL).

Since the user may use many different language expressions to express the same command, the matching process is still challenging. For example, given the ASC (‘draw a X1 circle at X2’), the user may say “insert a circle with color blue at (20, 40).” The matching algorithm may not be able to match them. Also, if the user says “move the circle with color blue to (20, 40)”, it should be grounded/matched to a different API [e.g., moveCircle(X1, X2)]. As the developer has to provide API seed commands (ASCs) for the APIs of the underlying application, it can be hard for the developer to write all possible paraphrased commands for a given API. This affects the coverage of the proposed NL2NL system in grounding user commands. To deal with this issue, we also propose an interactive learning mechanism to interact with the end user in natural language to learn new (paraphrased) commands and convert them to new ASCs and add them to the existing set of ASCs (the set enlarges) so that when similar commands are issued by this or other users in the future, the NL2NL approach can handle it. This enables continuous learning of new ASCs from users to improve the subsequent grounding performance.

The proposed system based on NL2NL is called CML (Command Matching and Learning). CML has three key advantages: (1) Due to the NL2NL matching, we no longer need a semantic parser to convert the user command to a logical form (or action formalism) or to write a program to translate the logical form to an API call. This makes the design of natural language interfaces much simpler and quick because ASCs are written in natural language (NL) just like user commands, and can be easily written by the application developer for the APIs of their application. (2) Again, as ASCs are written in NL, the matching algorithm of CML can be used for any application and thus is application-independent. CML simply maps a user command to a correct ASC, and the system executes the API attached to the ASC. (3) CML also learns new ASCs in the process of being used to make it more powerful. To our knowledge, no existing approach is NL2NL, application independent, or able to learn after deployment.

We evaluate the proposed system CML using two representative applications: (1) Blocks-World, and (2) Webpage Design. Experimental results show its effectiveness.

2 Related Work

Many existing works have been done on building natural language interfaces (NLI) for various applications. For robot navigation, (Artzi and Zettlemoyer, 2013) proposed a weakly supervised learning method to train semantic parsers for grounding navigation instructions. (Tellex et al., 2011) proposed a related method also for navigation and mobile manipulation. (Janner et al., 2018) worked on understanding spatial references in NL for NLIs. (Guu et al., 2017) learns a reinforcement learning (RL) based semantic parser with indirect supervision. (Andreas and Klein, 2015) gave a sequence-prediction based model for following NL instructions. (Fried et al., 2017) proposed an unified pragmatic model for instruction following. Other prominent works include NLI for scrutable robots (Garcia et al., 2018) and dialog agents for mobile robots to understand instructions through semantic parsing (Thomason et al., 2015).

In NLI for databases, (Zelle and Mooney, 1996) used inductive logic programming to construct an NLI for database querying. (Zettlemoyer and Collins, 2007) learns a weighted combinatory categorial grammar (CCG) for flight database queries. (Berant et al., 2013; Yih et al., 2015) proposed a semantic parser for question-answering using a knowledge base. Other prominent works on database querying are (Baik et al., 2019; Xiong and Sun, 2019; Neelakantan et al., 2016; Li et al., 2019; Ferré, 2017; Liang, 2016; Zhong et al., 2017) and data exploration and visual analysis (Setlur et al., 2016; Utama et al., 2018; Lawrence and Riezler, 2016; Gao et al., 2015). More information can be found in (Li and Rafiei, 2018).

For webpages and GUIs, (Branavan et al., 2010) proposed a RL-based solution for mapping high-level instructions to API calls. (Su et al., 2017) proposed an end-to-end approach to designing NLI for web APIs. A similar approach is also used in (Pasupat and Liang, 2015) for performing computations on web tables, (Soh, 2017; Pasupat et al., 2018) designed an NLI for submitting web forms and interacting with webpages and (Lin et al., 2018) proposed an NLI for Bash commands.

Besides these, there are works on selecting correct objects referenced in utterances (Frank and Goodman, 2012; Golland et al., 2010; Smith et al., 2013; Celikyilmaz et al., 2014), learning language games (Wang et al., 2016) and discovering commands in multimodal interfaces (Srinivasan et al., 2019).

All these approaches differ substantially from our work as they are based on learning semantic parsers of various kinds or end-to-end models with labeled examples. CML is mainly based on natural language to natural language (NL2NL) matching. Due to NL2NL matching, our approach presents an application-independent solution to NLI, in the sense that the application developer does not need to collect application-specific training examples to learn a parser and our matcher can work with any application. We only require the developer to write ASCs in natural language for their APIs, which is much easier to do than collecting labeled data and training a parser or an end-to-end model. Furthermore, CML can learn new ASCs in the usage process (after the system is deployed) to make it more and more powerful.

3 Proposed Approach

The proposed approach CML consists of four parts: (1) an ASC (API seed command) specification, (2) a utility constraint marker, (3) a command grounding module, and (4) an interactive ASC learner. ASC specification enables the application developer to specify a set of ASCs for each API of their application. The utility constraint marker identifies some sub-expressions in each ASC that utility ASCs should not be applied to. The command grounding module grounds a user command to an action ASC (both are in natural language) for an API call. The ASC learner learns new ASCs from the user to make the system more powerful. An natural language interface (NLI) is then built with no application specific programming or data collection required.

3.1 ASC Specification

Since ASCs are written in natural language, they can be specified by the application developer (without knowing how CML works). To achieve the goal of application-independent NLI design, the proposed CML has to automatically read and understand the ASCs for each application. We define an ASC specification for developers that can be followed to easily write ASCs for their applications.

The ASC specification consists of three parts: (1) properties and value domains of the application, (2) action ASC specification and (3) utility ASC specification. Let the set of actions that can be performed in the application be 𝒜. Each action ai𝒜 causes a change in the state of the (instantiated) objects in the application, specified by the object properties and their values. For example, in the Microsoft Paint application, a circle drawn on the editor is an example of instantiated object and it can have properties like color, name, shape, etc, with their values like the color of the circle being red. And examples of actions are: draw a circle, change the shape of the circle to make it a square, etc. Each of these actions is performed by an unique API call, referred to as an action API. The developer defines one or more action ASCs [in (paraphrased) natural language] for each action API, where the arguments of the API are variables in the corresponding action ASC (see the Introduction section).

Besides action APIs, there are also Utility APIs, which are used to retrieve information about the properties of the instantiated object and their values from the current application state and are used as helper functions to 𝒜. The concept of Utility API is explained as follows.

Often a natural language command from the user involves one or more referential expressions to instantiated objects, which need to be resolved to interpret the user command. A referential expression is a phrase used to refer an object indirectly by a property value. For example, in the command: “move the blue block to the left of the cube”, ‘blue block”, “cube”, “left of the cube” are referential expressions, where ‘blue block” and “cube” refers to some blocks having color “blue” and shape “cube” respectively. The phrase “left of the cube” refers to some location in the application space, where the blue block is to be moved to.

Since there may not be any action API that consider such indirect object references as a part of its arguments, the aforementioned user command cannot be directly grounded to any of the action APIs. In such cases, utility APIs are used to resolve the referential expressions by identifying the referred objects, having a given property and a value. Thus, unlike action APIs, a utility API returns one or more values as output (e.g., object ids) to the API call which is used by an action API and/or other utility APIs to unambiguously ground the user command. Similar to action ASCs, the developer also defines one or more utility ASCs [in (paraphrased) natural language] for each utility API, where the arguments of a utility API becomes variables in the corresponding utility ASC.

Table 1: Properties and their domains for Blocks-World
Property Domain
color (object)
’red’, ’green’, ’orange’,
’blue’, ’yellow’
shape (object)
’triangular’, ’circular’, ’cube’,
’square’, ’rectangular’
location (object) (x, y) coordinate in 2D space
name (object) English alphabets A-Z
direction (action) ’right’, ’left’, ’above’, ’below’
Table 2: Action ASC specifications for Blocks-World and groundable example NL commands from user. (*) denotes that the variable do not take part in command reduction (Utility Constraints), which is automatically detected and marked by CML. (X denotes input)
API Function AID Action ASCs Variable/Argument: Type Example User Commands
Add(X1, X2) 1
add a block at X1;
insert a block at X1
’X1’: ’location’(*) add a block at row 2 and column 3; put a block at (2, 3)
Remove(X1) 2 remove X1 ’X1’: ’block_set’
delete blue block; take away blue
Move(X1, X2) 3
move X1 to X2; shift X1 to X2
’X1’: ’block_set’, ’X2’: ’location’(*)
move blue block to the left of cube; shift green
cube to (4, 5)
MoveByUnits(X1,
X2, X3)
4
move X1 along X2
by X3 units
’X1’: ’block_set’, ’X2’: ’direction’,
’X3’: ’number’
move blue block left by 2 units;  shift green
cube down by 3 units
UpdateColor(X1, X2) 5 change color of X1 to X2; color X1 X2 ’X1’:’block_set’, ’X2’:’color’(*) color A red; change color of B to blue
UpdateShape(X1, X2) 6 change shape of X1 to X2 ’X1’:’block_set’, ’X2’:’shape’(*)
set the shape of A to cube; make B square
Rename(X1, X2) 7 rename block X1 to X2 ’X1’: ’block_set’, ’X2’:’name’(*) Name the block at (4, 5) as C; rename A to D
Table 3: Utility ASC specifications for Blocks-World and groundable example sub-expressions (O denotes output, X denotes input)
API Function AID Utility ASC Argument: Type Example sub-expressions
GetBlocksbyColor(X1) 8 color/X1 X1: ’color’, O1: ’block_set’ blue block
GetBlocksbyShape(X1) 9 shape/X1 X1: ’shape’, O1: ’block_set’ square block; cube
GetBlocksbyName(X1) 10 name/X1 X1: ’name’, O1: ’block_set’ block A
GetBlocksbyLocation(X1) 11 location/X1 X1: ’location’, O1: ’block_set’ block at row 2 and column 3; block at (2, 3)
GetLocation(X1, X2) 12
get location
along X1 of X2
’X1’: ’direction’, ’X2’: ’block_set’,
’O1’: ’location’
at the left of blue block;
below block B

3.2 Blocks-World Specification: An Example

We now give an example specification for a Blocks-World application, which is about arranging different blocks or tiles on a 2D grid or adding them to form a goal arrangement. Note that, our CML system is not concerned with the goal or how to achieve it, which is the responsibility of the user. CML only matches each user command utterance to an action ASC for the associated action API to be executed in the environment, with the help of utility ASCs.

In blocks-world, the objects are basically tiles of different shapes, colors and names and the action APIs are designed to manipulate these objects in the 2D space. The utility APIs retrieve values of the properties like shape, color and name, etc., of an instantiated tile. We use the term block and tile interchangeably on-wards.

3.2.1 Properties and Domains

Table 1 shows different properties of an object or action (specified in the bracket next to the property name). The domain column in Table 1 lists the domain of each property. For example, “color” is a property of a tile, which can take one of five possible values {‘red’, ‘green’, ‘orange’, ‘blue’, ‘yellow’}. Similarly, shape denotes the shape of a tile and can be one of five categories: {‘triangular’, ‘circular’, ‘cube’, ‘square’, ‘rectangular’}. The name of a block is denoted by any letter from A-Z. The location of an object is specified by a 2D co-ordinate (x, y) and so on. Besides these, ‘direction’ is a property with domain {‘left’, ‘right’, ‘above’, ‘below’} needed to execute an action for moving an object in a given direction.

3.2.2 ASC Specification

Table 2 shows the action ASCs and Table 3 shows the utility ASCs and also, some example user commands and sub-expressions (explained later) that can fire them. Note that, all ASCs are written in natural language and there is no fixed format. Here, the argument type ‘block_set’ denotes a set of unique (instantiated) block object ids. We define seven action ASCs and five utility ASCs to ground a user command for Blocks-World. The action ASCs are: adding a block at location (x, y) [AID 1], removing a block [AID 2], moving blocks [AID 3-4], changing properties of a block [AID 5-7]. Utility ASCs either retrieve the block_set for a given input property value [AID 8-11] or returns a location object (x, y) in a given direction of a block [AID 12]. The arguments marked (*) in Table 2 denote the utility constraints automatically identified by CML for restricting the use of utility ASCs, which we discuss next.

3.3 Utility Constraint Marker

For some actions, some variables (arguments) in the user command should not be reduced as it can result in incorrect subsequent grounding. As utility ASCs are used to resolve sub-expressions in a user command, if a sub-expression (a sub-sequence of user command, defined in next subsection) matches both partially with an action ASC and fully with a utility ASC, it creates ambiguity (see below). A sub-expression is said to be matched with an ASC, if the sequence of argument types of the variables appearing in the sub-expression from left to right also appears in the ASC. And by full match, we mean the number of argument types matched is equal to the number of argument types present in the ASC. We mark each such argument appearing in the action ASC with a * for the action ASC indicating a utility constraint. Note, if a sub-expression matches fully with an action ASC, there is no ambiguity as the corresponding action API will be automatically fired.

Let’s have an example: Consider the ASC with AID 1 in Table 2 for adding a block at location X1. Here, the argument X1:location [marked (*)] should not be resolved using utility ASC of AID 11 (getting the block(s) at the location). Otherwise, it will mislead the grounding by reducing a user command like “add a block at (2, 3)” to “add a block at block_set/X1”. Similar problems arise when CML attempts to resolve arguments like color (X2) in ASC with AID 5, shape (X2) in ASC with AID 6, etc., using utility ASCs with AID 8 and AID 9 respectively, while grounding user commands like “color block A to red” and “make the blue block square” respectively. we use * to mark these action ASC arguments to indicate no reduction should be applied.

Formally, let useq=type(X1u),,type(XNu) be the sequence of variable types in a utility ASC u from left to right, where Xiu is the ith variable in u. Similarly, let aseq be the sequence of variable types (from left to right) in an action ASC a. Let Mseq be the longest common sub-sequence of useq and aseq. Then if |Mseq|=|useq|, all variables corresponding to the argument types in Mseq should be marked with (*) to indicate utility constraints for action ASC a.

3.4 Command Grounding Module of CML

Given a natural language command C from the user and the ASC specification S for an application, the command grounding module (CGM) returns a grounded ASC tuple A^, consisting of one action ASC and a set of utility ASCs. They together tell the system what action (API call) to perform. If the grounding is not possible, is returned.

CGM consists of two primary modules: a tagging and rephrasing module () and an ASC matching module [or simply Matcher] (). tags the input user command C and then, repharses the tagged C to get a command C that is lexically closer to the ASCs. Here, a tag is an argument type of an action ASC. For example, in command ”move the cube to (2,3)”, the phrase “cube” is tagged as “shape” and (2, 3) is tagged as “location” for blocks-world. This work uses a dictionary lookup based and regular expression-based tagger for 11 1 As most NLI applications have finite domains of objects and properties, we found simple lookup based tagging works well. In complex scenarios, can be learned through user feedback or provided by the developer.. While reading the specification S, forms a tagset by enumerating all argument types in action ASCs and also, builds a vocabulary V consisting of all the words in the ASCs. Given C, either maps individual words or phrases in C into one of the tags, or repharses them by replacing synonym words/phrases from V using WordNet (Miller et al., 1990) and ConceptNet (Speer et al., 2017).

Given the rephrased and tagged user command C and the set 𝒯 of (action or utility) ASCs for an application, Matcher computes a match score s(t, C) for each t𝒯 and returns the top ranked ASC t^=argmaxt𝒯s(t,C). Any paraphrasing model can be used as . This work uses information retrieval (IR) based unsupervised matching models for (we compare different types of unsupervised IR matching models in the Experiment section).

3.4.1 Working of Command Grounding Module (CGM)

Algorithm 1 shows the iterative grounding process of a user command C by CGM. We again use the Blocks-World application (see Figure 1) to explain the algorithm on-wards.

{algorithm}

[tb] Iterative Command Grounding

Input: C: natural language command issued by user;

𝒯: ASC store;

: Command tagger and rephraser;

: ASC Matcher;

Output: A^: predicted AID set for grounding C;

\[email protected]@algorithmic

[1] \STATEC Semantic_Tagging_Rephrasing(C, ) \COMMENTC is the tagged and rephrased user command \STATESP Enumerate_Filter_Subexpressions(C, 𝒯, ) \COMMENTSP is the list of enumerated and filtered sub-expr. in C \STATEA^;j 0

\WHILE

TRUE \STATEAset GetCandidateASCs(𝒯, C, “action”) \IFAset= \IFSP=  or  j>|SP|-1 \RETURN \ELSE\STATEUset GetCandidateASCs(𝒯, SPj, “utility”) \COMMENTSPj is the jth sub-expression in SP \IFUset \STATEur GetTopRankedASC(Uset, SPj, ) \STATEA^A^{ur} \STATEC ReduceCommand(C, SPj, ur.output) \STATESP Enumerate_Filter_Subexpressions
(C, 𝒯, )

\STATE

j 0 \ENDIF\STATEjj+1 \ENDIF\ELSE\STATEak GetTopRankedASC(Aset, C, ) \RETURNA^{ak} \ENDIF\ENDWHILE

CGM first performs tagging and rephrasing of C using (Line 1). For example, the user command C in Figure 1 is tagged to ”relocate the color/X1 block to the direction/X2 of name/X3” [where, X1=”blue”, X2=”left”, X3=”D”] and rephrased command is C0 in Figure 1 (C0 is C in Line 1 of Algorithm 1 at the moment), where “relocate” is replaced with synonym “move” found in vocabulary V.

Next, CGM enumerates and extracts the list of sub-expressions (SP) present in C (Line 2) to assist compositional grounding of C using utility ASCs in subsequent steps of the algorithm. A sub-expression of length-m is defined as a substring of C involving m consecutive variables (with types) and intermediate words linking them. For example, given C0 in Figure 1, the list of all sub-expressions: SP = [ “color/X1”, “direction/X2”, “name/X3”, “color/X1 block to the direction/X2”, “direction/X2 of name/X3”], where the first three in SP are length-1 and last two are length-2 sub-expressions. Length-3 sub-expression is the full command C0 and is discarded from SP as it does not take part in matching the utility ASCs, only matched with action ASCs.

Here we also filter out those sub-expressions that are marked with (*) (with utility constraints) as they should not be reduced by utility ASCs. For this, CGM first uses Matcher to identify the top-ranked action ASC semantically close to C (e.g., AID 5 for the user command “color block A to red”) and then, delete all sub-expressions involving such arguments (e.g., “color/X2”) from SP [Line 2].

With the resulting SP, the ASC grounding for C [Lines 4-24] happens as follows:

First, a candidate set Aset of action ASCs is retrieved from the set of action ASCs such that for any aAset, C and a has an identical set of variables and their types. If Aset, finds and returns the top ranked action ASC ak [Lines 20-21] for C. In Figure 1, none of the action ASCs are matched directly for C0 and so, Aset=.

Figure 1: An example of working of CGM on a user command for Blocks-World. Here, AID denotes ASC ID (see Tables 2 and 3).

If Aset=, Matcher works as follows: If SP= or all sub-expressions in SP have been checked, it returns indicating grounding of C is not possible [Lines 7-8]. Otherwise, it selects the sub-expressions from SP one by one and then, reduces C further by resolving bindings for the variables present in that sub-expression [Lines 10-18]: Given the jth sub-expression SPj, Matcher first selects a candidate set Uset of utility ASCs from 𝒯 such that for any uUset, SPj and u has identical set of variables and their types [Line 10]. Note, while matching the variables with utility ASCs, we rename the variables in SPj in order to avoid error in matching due to different variable names. For example, variable X3 in sub-expression “name/X3” [see C1 in Figure 1] is renamed as X1 [i.e.“name/X1”] (not shown), so that it can match with AID 10, while reducing C1 to C2. Similarly, X2 and O1 [in C2 of Figure 1] in sub-expression “direction/X2 of block_set/O1” are renamed as X1 and X2 [i.e., “direction/X1 of block_set/X2”] so that it can match with AID 12.

If Uset=, Matcher cannot perform reduction of C for SPj and only j is incremented [Line 18]. Otherwise, returns the top ranked utility ASC ur for SPj [Line 12]. Next, C is reduced by replacing SPj with ”type(O1)/O1” [the output variable and its type corresponding to ur] (Line 14). For example, given SPj =“name/X3” [see C1 in Figure 1], utility AID 10 gets matched and C1 gets reduced to C2, where “O1” is the output variable and type(O1)= “block_set” [i.e., set of block ids]. As a part of reduction, the variables of C obtained after replacement are also renamed [i.e. from block_set/O1 in C2 in Figure 1 to block_set/X3 (not shown)] and aliases are recorded, so that in the next iteration, it can be matched with the action ASCs [in Lines 5 and 21]. After reduction, we again enumerate and filter SP using the new (reduced) C and set j to 0 (Line 15-16) similar to Line 2. CGM also stores the values of the output variables in a buffer obtained by executing the functions of the matched utility ASCs for subsequent grounding. In Figure 1, C3 matches with action API of AID 3 and the iterative grounding completes here. The final grounded AID list returned by CGM for the example in Figure 1 is [8,10,12,3].

3.5 ASC Learner of CML

Usually the developer can provide only a small set of action ASCs for each action API. But the users may come up with many different paraphrased commands for which CML cannot find significant match in the current ASC set provided by the developer, which results in low recall. To deal with this issue, we enable CML to learn new ASCs from users through interactions, as discussed below.

Given a user command C and CML has grounded C into some action ASC/API or the grounding has failed (i.e., Line 8 in Algorithm-1 is executed), it asks user to verify whether the grounding output is correct or not. In this step, the verification question asked to the user is formulated based on fixed natural language (NL) templates (e.g. “Am I correct?[yes/No]” or “Do you agree?[yes/No]”) and expects a yes/no answer from the user. If the user says “yes” or remains silent, the system assumes its prediction is correct and requests next command to be grounded. If the user says “No” (i.e., prediction is incorrect), CML shows a top-k list of action ASCs (ranked based on match score by ) in natural language (NL, for easy understanding) one for each action API for the user to choose the correct one from (see below). As the number of action APIs in our application is small, we choose k to be the total number of action APIs in the experiments. Thus, the user can choose the correct ASC by scrolling down the list of k options in one go.

The ranked list of action ASCs in NL are generated as follows. Given a user command C [say, “put a block to the left of A”], CML tags C with and iteratively reduces the command to get a final reduced command C [in this case, “put a block to location/X1”, where X1=(2, 3) is detected while resolving the reference “left of A” using utility ASCs]. Given the final reduced command C, CML replaces the variables in C with the corresponding value to get a NL command equivalent to the original user command C. In the example, the reduced NL command will be “put a block to (2,3)”. Such a NL command is generated for each action ASC, provided there is at least one match in argument values.

If the user chooses one from the ranked list, the learner asks for verifying the correctness of the detected argument values. If the user confirms, ASC learner gathers ground truth action API along with a new ASC and add it into the action ASC set. If the user thinks none of the commands are correct or the intended action is not there in the rank list, user enters a rephrased/rectified version of the command (in a text box), which is again utilized to generate a new rank list of action ASCs. This process repeats until the user chooses an option from the rank list to provide the ground truth or maximum m attempts are made (in which case, the learner learns no new ASC for the current interaction process).

4 Experiments

We evaluate CML on two representative applications: (1) Block-World and (2) Webpage Design.22 2 In terms of user commands, many applications are similar to our experimental applications, e.g., building graphical user interfaces, robot navigation, drone control, etc. This initial work does not handle compositions of actions which is required by some complex tasks such as database queries. We plan to do it next. To create the test data for each application, we showed the supported API functions of each application to five users (graduate students, who are unaware of the working of CML) and asked them to write commands to play with the application to gather the evaluation data (Table 4). We also asked them to write down some commands that are not groundable to any of the action APIs. We randomly shuffle the list of commands and run CML to ground them one by one. Next, we asked the same users to mark the correctness of the grounding results. Based on this, we compute the accuracy of various versions of CML. In computing accuracy, we consider the true action API of non-groundable user commands as AID 0, indicating that it cannot be grounded to any of the action ASCs (or APIs) present in the specification. If CML can detect it is non-groundable, it is considered correct. ASC specifications for the Blocks-World applications were given in Tables 2 and 3. The ASC specifications for Webpage Design are given in the Supplementary material. We will release the code and the datasets after paper acceptance.

Table 4: Dataset statistics. Here, m-UC denotes the number of user commands that needs m utility ASCs for grounding. ”Non-Groundable” denotes the number of commands for which no grounding exists for the given API set.
0-UC 1-UC 2-UC > 2-UC
Non-
Groundable
Total
Blocks-World
15 160 20 76 42 313
Webpage Design
13 146 13 20 14 206

4.1 Compared Models

As there is no existing work using the NL2NL approach for NLI design, we compare various versions of the CML model for evaluation. Note that we don’t compare with existing parsing and/or end-to-end methods as they need application training data and/or are thus not application independent.

(1) CML-jaccard : This version of CML uses the Jaccard similarity as the scoring function of Matcher .

(2) CML-vsm : This version uses tf-idf based vector space model for where all ASCs associated with each API are regarded as one document and the user command as query. It is slightly poor if ASCs are treated individually.

(3) CML-emb: This version uses word embedding based matching model for . Given an ASC t and a user command (after tagging) C, we retrieve the pre-trained word embedding vectors for each word in C (t) and average them to get the vector representation of C (t) as vI (vt). Next, we use cosine similarity as the scoring function to measure the similarity between vI and vt. We use 50D Glove embeddings for evaluation.

(4) CML-vsm (-R): Variant of CML-vsm, where the rephrasing of words in the input command [Line 1, algorithm 1] is disabled.

(5) CML-vsm (-U): Variant of CML-vsm, where the use of utility ASCs while grounding [i.e., Lines 7-19, algorithm 1] is disabled.

For (4) and (5), although we only discuss CML-vsm variant in next subsection, we also compared these variants for CML-emb and CML-jaccard as well and found poorer results. Note also for all CML variants mentioned above, we disabled the use of ASC Learner for learning new ASCs.

(6) CML-Y + ASC Learner: This is the version of CML-Y (where Y is jaccard, vsm, or emb), where we allow CML-Y to learn new ASCs through user interactions. Since the emb version is relatively poorer (discussed next), we chose to compare the jaccard and vsm variants of ASC Learner.

Table 5: Overall accuracy comparison of CML variants
Compared Models Blocks-World
Webpage Design
CML-jaccard 67.73 80.58
CML-vsm 67.41 81.06
CML-emb 67.73 77.18
CML-vsm (-R) 58.86 67.82
CML-vsm (-U) 15.97 13.10
CML-jaccard +
ASC Learner 68.69 83.98
CML-vsm +
ASC Learner 67.73 83.01

4.2 Results and Analysis

Table 5 shows the accuracy comparison of CML variants on Blocks-World and Webpage Design. CML-jaccard and CML-vsm perform better overall. The drop in performance for CML-emb in Webpage Design is primarily due to the out of vocabulary words (no pre-trained embedding is available) in user commands. The performance of CML-vsm(-R) drops significantly, which shows that rephrasing helps substantially in command reduction and grounding. CML-vsm(-U) performs the worst among all variants which shows the importance of user command reduction using utility ASCs.

As CML-jaccard and CML-vsm perform the best overall for both applications, we also compare the performance of ASC Learning variants of these two CML versions in Table 5. We can see that ASC learning clearly improves the performance for both applications. It is very important to note that these improvements are gained from the existing datasets, which do not have many similar commands to the newly learned ASCs. In practice, if similar commands are repeated by many users, the improvement will grow substantially. The performance improvement for Webpage Design is more than for Blocks-World, which can be explained as follows. For Blocks-World, the arguments and their types in API and ASC specifications (see Tables 2 and 3) are quite distinguishable from each other. Thus, correctly identifying arguments and their values in user commands plays a major role in the success of command grounding. Learning of new action ASCs does not make significant impact here. But, for Webpage Design specifications (see Supplementary), action ASCs for APIs 8, 9 and 10 have exactly the same arguments and types, but they differ significantly in action intents. Thus, learning new ASCs helps greatly.

To investigate the effect of ASC learning further, we evaluate CML-jaccard and CML-jaccard + ASC Learner (CML-vsm and CML-vsm + ASC Learner) on user commands that are only groundable to any of the action APIs with AIDs 8, 9 and 10, as shown in Table 6. Here, we observe almost 8% improvement in accuracy for ASC Leaner variants of CML, which justifies the explanation.

In Table 7, we compare CML-vsm over various command types (listed in Table 4). Here, we see that, for 0-UC and 1-UC, CML-vsm performs significantly better than for 2-UC, >2-UC and NOG (note, for NOG user commands, the true AID is considered as 0) as these commands are harder to ground due to the requirement of multiple (recursive) reduction steps using utility ASCs.

Table 6: Accuracy improvement of CML ASC Learner variants on user commands groundable to action APIs with AIDs 8, 9 and 10 in Webpage Design (see Supplementary)
Compared Models
Webpage Design
CML-jaccard 82.92
CML-vsm 78.04
CML-jaccard + ASC Learner 90.24
CML-vsm + ASC Learner 85.36
Table 7: Accuracy for various command categories. Here, NOG denotes “Non-groundable”.
Application 0-UC 1-UC 2-UC >2-UC NOG
Blocks-World 53.33 85.0 40.0 59.21 33.33
Webpage
Design
100 84.93 69.23 55.00 71.42

Error Analysis. Overall, we can see that CML variants perform better for Webpage Design than for Blocks-World. On analyzing the datasets, we found that due to the flexibility of the Blocks-World application, the user commands for it are more ambiguous and less specific compared to those in Webpage Design, where users use more application-specific terminologies, which helps grounding. Apart from that, we identified some common grounding errors: (1) Argument detection and rephrasing: failures of CML in detecting argument values in user command or rephrasing due to lack of semantic knowledge in . E.g., given the command “get the ring shaped block out”, CML cannot map the phrase ‘get out’ to ‘remove’ and ‘ring shaped’ to shape ’circular’, which cause failure in grounding. (2) Ambiguous semantics of words: In user command “write down A on red cube”, word “down” is tagged as “direction”, whereas “write down” is a single phrase referring to “rename” action. (3) Wrong ASC matching: In user command “move the green block to first row”, first row is not a coordinate and thus not groundable. However, due to a small similarity with action ASC of AID 2 [see Table 2], the command is wrongly reduced to that. (4) implicitly parameterized language expressions: Considering the user command “Enlarge paragraph 1”, the phrase “enlarge” is implicitly parameterized, which is equivalent to “set font size to large” and is difficult to ground to action ASC with AID 7 for Webpage Design (see Supplementary).

5 Conclusion

This paper proposed an natural language to natural language (NL2NL) approach to building natural language interfaces and a system CML, which are very different from traditional approaches. Due to the NL2NL approach, CML is application independent except that the ASCs (API seed commands) need to be specified by the application developer. Our evaluation using two applications show that CML is effective and highly promising. In our future work, apart from improving CML’s accuracy, we will study how the composition of actions can be handled as well for more complex applications (e.g., database querying) and how to learn referential expressions through dialogues.

Acknowledgements

This work was partially supported by a research gift from Northrop Grumman Corporation.

References

  • J. Andreas and D. Klein (2015) Alignment-based compositional semantics for instruction following. arXiv preprint. Cited by: §1, §2.
  • Y. Artzi and L. Zettlemoyer (2013) Weakly supervised learning of semantic parsers for mapping instructions to actions. TACL. Cited by: §1, §2.
  • C. Baik, H. Jagadish, and Y. Li (2019) Bridging the semantic gap with sql query logs in natural language interfaces to databases. In ICDE, Cited by: §2.
  • J. Berant, A. Chou, R. Frostig, and P. Liang (2013) Semantic parsing on freebase from question-answer pairs. In EMNLP, Cited by: §2.
  • S. Branavan, L. S. Zettlemoyer, and R. Barzilay (2010) Reading between the lines: learning to map high-level instructions to commands. In ACL, Cited by: §2.
  • A. Celikyilmaz, Z. Feizollahi, D. Hakkani-Tur, and R. Sarikaya (2014) Resolving referring expressions in conversational dialogs for natural user interfaces. In EMNLP, Cited by: §2.
  • S. Ferré (2017) Sparklis: an expressive query builder for sparql endpoints with guidance in natural language. Semantic Web. Cited by: §2.
  • M. C. Frank and N. D. Goodman (2012) Predicting pragmatic reasoning in language games. Science. Cited by: §2.
  • D. Fried, J. Andreas, and D. Klein (2017) Unified pragmatic models for generating and following instructions. arXiv preprint arXiv:1711.04987. Cited by: §2.
  • T. Gao, M. Dontcheva, E. Adar, Z. Liu, and K. G. Karahalios (2015) Datatone: managing ambiguity in natural language interfaces for data visualization. In UIST, Cited by: §2.
  • F. J. C. Garcia, D. A. Robb, X. Liu, A. Laskov, P. Patron, and H. Hastie (2018) Explain yourself: a natural language interface for scrutable autonomous robots. arXiv preprint. Cited by: §2.
  • D. Golland, P. Liang, and D. Klein (2010) A game-theoretic approach to generating spatial descriptions. In EMNLP, Cited by: §2.
  • K. Guu, P. Pasupat, E. Z. Liu, and P. Liang (2017) From language to programs: bridging reinforcement learning and maximum marginal likelihood. arXiv preprint. Cited by: §2.
  • M. Janner, K. Narasimhan, and R. Barzilay (2018) Representation learning for grounded spatial reasoning. TACL. Cited by: §2.
  • C. Lawrence and S. Riezler (2016) Nlmaps: a natural language interface to query openstreetmap. In COLING, Cited by: §2.
  • J. Li, W. Wang, W. Ku, Y. Tian, and H. Wang (2019) SpatialNLI: a spatial domain natural language interface to databases using spatial comprehension. arXiv preprint. Cited by: §2.
  • Y. Li and D. Rafiei (2018) Natural language data management and interfaces. , Morgan & Claypool Publishers. External Links: Link, Document Cited by: §1, §2.
  • P. Liang (2016) Learning executable semantic parsers for natural language understanding. arXiv preprint. Cited by: §2.
  • X. V. Lin, C. Wang, L. Zettlemoyer, and M. D. Ernst (2018) NL2Bash: a corpus and semantic parser for natural language interface to the linux operating system. arXiv preprint. Cited by: §2.
  • G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller (1990) Introduction to wordnet: an on-line lexical database. International journal of lexicography. Cited by: §3.4.
  • A. Neelakantan, Q. V. Le, M. Abadi, A. McCallum, and D. Amodei (2016) Learning a natural language interface with neural programmer. arXiv preprint arXiv:1611.08945. Cited by: §2.
  • P. Pasupat, T. Jiang, E. Liu, K. Guu, and P. Liang (2018) Mapping natural language commands to web elements. arXiv preprint. Cited by: §2.
  • P. Pasupat and P. Liang (2015) Compositional semantic parsing on semi-structured tables. arXiv preprint. Cited by: §2.
  • V. Setlur, S. E. Battersby, M. Tory, R. Gossweiler, and A. X. Chang (2016) Eviza: a natural language interface for visual analysis. In UIST, Cited by: §2.
  • N. J. Smith, N. Goodman, and M. Frank (2013) Learning and using language via recursive pragmatic reasoning about other agents. In NIPS, Cited by: §2.
  • K. Soh (2017) TagUI: rpa / cli tool for automating user interactions.. In https://github.com/ kelaberetiv/TagUI, Cited by: §2.
  • R. Speer, J. Chin, and C. Havasi (2017) Conceptnet 5.5: an open multilingual graph of general knowledge. In AAAI, Cited by: §3.4.
  • A. Srinivasan, M. Dontcheva, E. Adar, and S. Walker (2019) Discovering natural language commands in multimodal interfaces. In IUI, Cited by: §2.
  • Y. Su, A. H. Awadallah, M. Khabsa, P. Pantel, M. Gamon, and M. Encarnacion (2017) Building natural language interfaces to web apis. In CIKM, Cited by: §2.
  • S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. G. Banerjee, S. Teller, and N. Roy (2011) Understanding natural language commands for robotic navigation and mobile manipulation. In AAAI, Cited by: §2.
  • J. Thomason, S. Zhang, R. J. Mooney, and P. Stone (2015) Learning to interpret natural language commands through human-robot dialog. In IJCAI, Cited by: §2.
  • P. Utama, N. Weir, F. Basik, C. Binnig, U. Cetintemel, B. Hättasch, A. Ilkhechi, S. Ramaswamy, and A. Usta (2018) An end-to-end neural natural language interface for databases. arXiv preprint arXiv:1804.00401. Cited by: §2.
  • S. I. Wang, P. Liang, and C. D. Manning (2016) Learning language games through interaction. arXiv preprint. Cited by: §2.
  • H. Xiong and R. Sun (2019) Transferable natural language interface to structured queries aided by adversarial generation. In ICSC, Cited by: §2.
  • W. Yih, M. Chang, X. He, and J. Gao (2015) Semantic parsing via staged query graph generation: question answering with knowledge base. In ACL-IJCNLP, Vol. , pp. . Cited by: §2.
  • J. M. Zelle and R. J. Mooney (1996) Learning to parse database queries using inductive logic programing. In AAAI, Cited by: §1, §2.
  • L. Zettlemoyer and M. Collins (2007) Online learning of relaxed ccg grammars for parsing to logical form. In EMNLP, Cited by: §2.
  • L. S. Zettlemoyer and M. Collins (2012) Learning to map sentences to logical form: structured classification with probabilistic categorial grammars. arXiv preprint. Cited by: §1.
  • V. Zhong, C. Xiong, and R. Socher (2017) Seq2sql: generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103. Cited by: §2.

Appendix A Webpage Design

The goal of the Webpage Design application is to provide functions to the end user to help them design a web page. Our current environment deals with various html objects like image, title, paragraph, button. Action functions (APIs) are designed to manipulate these objects in the 2D space.

A.1 Properties and Domains

Table 1 shows different properties of an object or action (specified in the bracket next to the name of the property in the property column). The domain column in Table 1 lists the value domain of each property. E.g., “color” is a property of an html object, which can take five values {’red’, ’green’, ’brown’, ’blue’, ’black’}. Similarly, type denotes html object type and can be one of four categories: {’image’, ’button’, ’title’, ’paragraph’}. The text written on an element like title or paragraph is denoted by double quote (“ ”) in the command. The name of an element is denoted by its type followed by an id number (like title 1, image 2, etc.) and/or some string followed by image extension for image objects (like myphoto.jpg) in the command. The location of an object is specified by an 2D co-ordinate (x, y) and so on. Besides these, ‘direction’ is a application-specific property with domain {‘left’, ‘right’, ‘above’, ‘below’} needed to execute an action for moving an object in a given direction.

Table 8: Properties and their domains for Webpage Design
Property Domain
color (object)
’red’, ’green’, ’brown’,
’blue’, ’black’
type (object)
’image’, ’button’, ’title’,
’paragraph’
location (object) (x, y) coordinate in 2D space
name (object)
object’s type followed by a
number, (like image 1, title 2
etc.) or xyz.jpeg, xyz.png,
where xyz is a string
text (object) string in quote ( ” …. ”)
font_size (object)
’small’, ’medium’, ’large’
graphics_size (object)
’height’, ’width’
direction (action) ’right’, ’left’, ’above’, ’below’
Table 9: Action ASC specifications for Webpage Design and groundable example commands from user. (*) denotes that the variable do not take part in command reduction (Utility Constraint), which is automatically identified by CML.
API Function AID Action ASCs Variable/Argument: Type Example User Commands
Add(X1, X2) 1
add X1 at
location X2
’X1’: ’type’(*),
’X2’: ’location’(*)
add a title at (20, 30);
add an image at (30, 40)
Write(X1, X2) 2 write text X1 on X2
’X1’: ’text’(*),
’X2’: ’element_set’
write ”My Home Page” on title 1
Remove(X1) 3 remove X1 ’X1’: ’element_set’
delete title 1
remove image photo.png
Move(X1, X2) 4
move X1 to
location X2
’X1’: ’element_set’,
’X2’: ’location’(*)
move title 1 to (20, 30)
MoveByUnits(X1,
X2, X3)
5
move X1
along X2 by X3
units
’X1’: ’element_set’,
’X2’: ’direction’,
’X3’: ’number’
move image 1 left by 10 units
UpdateColor(X1, X2) 6
set color of
X1 to X2
’X1’: ’element_set’,
’X2’: ’color’(*)
color paragraph 1 as red;
change color of title 1 to blue
UpdateFont(X1, X2) 7
set font size of
X1 to X2
’X1’: ’element_set’,
’X2’: ’font_size’(*)
make title 1 large
SetGraphicsSize(X1,
X2, X3)
8
set the X1 of X2
to X3
’X1’: ’graphics_size’(*),
’X2’: ’element_set’,
’X3’: ’number’(*)
set the height of image 1 to 30
set the width of paragraph 1
to 40
IncreaseSize(X1, X2, X3) 9
increase the X1 of
X2 by X3 units
’X1’: ’graphics_size’(*),
’X2’: ’element_set’,
’X3’: ’number’(*)
increase the height of image 1
by 10 units
DecreaseSize(X1, X2, X3) 10
decrease the X1 of
X2 by X3 units
’X1’: ’graphics_size’(*),
’X2’: ’element_set’,
’X3’: ’number’(*)
reduce the width of image 1
by 5 units
Table 10: Utility ASC specifications for Webpage Design and groundable example command sub-expressions (O for output).
Function AID Utility ASCs Argument: Type Example sub-expressions
GetElementbyLocation(X1) 11 location/X1
X1: ’location’,
O1: ’element_set’
element at (20, 30)
GetElementbyType(X1) 12 X1
X1: ’type’,
O1: ’element_set’
get all titles;
get all paragraphs
GetElementbyFont(X1) 13 X1
X1: ’font_size’,
O1: ’element_set’
elements having
size large
GetElementby
GraphicsSize(X1, X2)
14 X1 of X2
X1: ’graphics_size’,
X2: ’number’,
O1: ’element_set’
elements having
height of 20
GetElementbyColor(X1) 15 X1
X1: ’color’,
O1: ’element_set’
red element;
blue title
GetElementbyText(X1) 16 X1
X1: ’text’,
O1: ’element_set’
element with text
”welcome!”
GetLocation(X1, X2) 17
get location
along X1 of X2
X1: ’direction’,
X2: ’element_set’,
O1: ’location’
location at the left of
image 2; location
below title 1
GetElementbyName(X1) 18 X1
X1: ’name’,
O1: ’element_set’
title 1; image 2;
profile.jpeg

A.2 ASC Specification

Table 2 shows the action ASC (API seed command) and Table 3 shows the utility ASC specifications for Webpage Design. Some example user commands or sub-expressions that can fire them are also given. Here, the argument type ‘element_set’ denotes a set of unique html (instantiated) object ids. We define 10 action ASCs and 8 utility ASCs (for their respective APIs) to ground a user command. The action ASCs are: adding an html object at location (x,y) [AID 1], writing some text on an element like title or paragraph [AID 2], removing an element [AID 3], moving elements [AID 4-5], changing properties of an object [AID 6-10]. Utility ASCs either retrieve the element set having a given property and value [AID 11-16, 18] or return a location object (x, y) in a given direction of an object [AID 17].