From Abstractions to "Natural Languages" for Coordinating Planning Agents

  • 2020-03-18 02:16:19
  • Yu Zhang, Li Wang
  • 0


Despite significant advancements in developing autonomous agents,communication between them often relies on a set of pre-specified symbols for agiven domain. In this paper, we investigate the automatic construction of thesesymbols from abstractions to form "natural languages" for such agents. Thefocus of this initial investigation is on a task planning setting where oneagent (the speaker) directly communicates a "plan sketch" to another agent (thelistener) to achieve coordination. In contrast to prior work, we view languageformation as a fundamental requirement for resolving miscoordination. This viewenables us to "compute" a language from the ground up by mapping physicalstates to symbols, thus reverse engineering the function of languages.Languages that arise from this process are only approximately expressive of theactual plans, meaning that they specify abstractions over the plan space, whichis not only theoretically appealing as it provides the desired flexibility tothe listener to choose its plan during execution, but also practically usefulsince it both reduces the communication cost of the speaker and computationalcost of the listener. We formulate this language construction problem and showthat it is NEXP-complete. An approximate solution is then developed to relatethis problem to task planning problems that have efficient off-the-shelfsolutions. Finally, we discuss a multi-agent path-finding domain in ourevaluation to provide a comprehensive set of results to illustrate the benefitsof the constructed languages and their applications.


Quick Read (beta)

From Abstractions to “Natural Languages”
for Coordinating Planning Agents

Yu Zhang and Li Wang    \authorblockNYu Zhang \authorblockA Arizona State University
Email: [email protected]
   \authorblockNLi Wang \authorblockABeijing Institute of Technology
Email: [email protected]

Despite significant advancements in developing autonomous agents, communication between them often relies on a set of pre-specified symbols for a given domain. In this paper, we investigate the automatic construction of these symbols from abstractions to form “natural languages” for such agents. The focus of this initial investigation is on a task planning setting where one agent (the speaker) directly communicates a “plan sketch” to another agent (the listener) to achieve coordination. In contrast to prior work, we view language formation as a fundamental requirement for resolving miscoordination. This view enables us to “compute” a language from the ground up by mapping physical states to symbols, thus reverse engineering the function of languages. Languages that arise from this process are only approximately expressive of the actual plans, meaning that they specify abstractions over the plan space, which is not only theoretically appealing as it provides the desired flexibility to the listener to choose its plan during execution, but also practically useful since it both reduces the communication cost of the speaker and computational cost of the listener. We formulate this language construction problem and show that it is NEXP-complete. An approximate solution is then developed to relate this problem to task planning problems that have efficient off-the-shelf solutions. Finally, we discuss a multi-agent path-finding domain in our evaluation to provide a comprehensive set of results to illustrate the benefits of the constructed languages and their applications.

I Introduction

Many real-world tasks require the coordination of multiple agents. A key consideration in facilitating such coordination is the adoption of a communication language. While humans are endowed with the gift of natural languages, autonomous agents must often rely on a set of symbols that are pre-specified by experts of a given domain [4, 44, 14, 31]. This approach, however, is neither practical nor efficient. An alternative approach is to implement an interface layer, often in the form of a natural language generation and understanding framework [41, 27], so that agents can convert between semantic meanings and natural language sentences. It comes with the additional benefit of better human interpretability and hence is desirable when there are human agents in the loop. On the other hand, it involves significant technical challenges and is itself an active research area [15, 39, 30, 19].

In this paper, we set out to investigate the problem of automatic language construction for autonomous agents. The focus of this initial study is on a task planning setting where two agents must communicate to coordinate their behaviors before execution. In particular, one agent (the speaker) directly communicates a “plan sketch” to another agent (the listener) to achieve coordination, which captures the speaker’s preferences for the task plan. In contrast to prior work, we view language construction as a fundamental requirement for resolving miscoordination, which enables us to compute a language from the ground up by mapping physical states to symbols, thus reverse engineering the function of languages. In particular, our language construction process is driven by the fundamental question of when agents are required to coordinate. It is observed that the presence of coordination conflicts introduces the need for communication and, in turn, languages. This effectively converts the language construction problem to the problem of searching for a symbolic system that resolves those conflicts. More importantly, the answer to the question demarcates situations where coordination is required so that languages become a necessity, from situations where it is not essential but still used such as for providing redundancy. Furthermore, languages that arise from this process are guaranteed to resolve coordination conflicts, and at the same time are only approximately expressive of the actual plans, meaning that they specify abstractions over the plan space. This feature is not only theoretically appealing as it provides the desired flexibility to the listener to accommodate preferences of its own while respecting those of the speaker but also practically useful since it both reduces the communication cost of the speaker and computational cost of the listener, as we will show.

Our work addresses the above question by answering specifically how a language naturally arises from a problem domain that requires coordination. First, we define the notion of required coordination (RC) to identify the situations where coordination is required.11 1 The notion of required coordination may appear similar to others discussed before [10, 46], where different problems are often involved. The work of  [43] is particularly relevant but the focus there has been on applications of the constructed languages without providing a formal or complete treatment. Our goal then becomes that of searching for a symbolic system that resolves these RCs. Observing that a symbolic system is nothing but an abstraction of the domain, we base our language construction method on a general temporal-state abstraction. While finding a language that resolves RC is not difficult, the challenge lies in identifying the ones that also provide more flexibility to the listener. This can be achieved by optimizing the domain abstraction and our focus here is on minimizing a language in terms of its word set, which is generally aligned with language design principles [7, 8] and often represents a desirable language property [35]. We show that searching for such a language is NEXP-complete. In light of the complexity, we develop an approximate solution that addresses it by solving a set of planning problems, which have efficient off-the-shelf solutions. For evaluation, we discuss a multi-agent path-finding domain to provide a complete set of results to illustrate the benefits of the constructed languages and their applications.

The contribution here is three-fold. First, we provide a new perspective on and a formal treatment of language construction for autonomous agents as an optimization problem based on the notion of required coordination. We establish its theoretical foundation in a task planning setting with two agents. Second, we provide an approximate solution to a formulation of this problem that minimizes the word set by relating it to planning problems. Third, we provide a comprehensive evaluation of the languages generated by our method in a commonly used domain to demonstrate its many benefits.

II Related Work

The emergence and evolution of languages have been studied extensively in evolutionary and computational linguistics [34, 9, 6]. Most of the work there has considered the questions in the context of evolutionary games [36], and often in an iterated learning setting [23]. Steedman’s work [33] is particularly inspiring in which he suggests a connection between natural languages and a hidden planning language that preexists in mind, although it falls short of establishing their computational connections. From this perspective, our work may be most connected to a prior study that considers the incentive of communication in a game-theoretical framework [2]. In contrast, our work is focused on a cooperative setting where languages naturally arise out of the need for coordination. While we draw inspiration from these prior works, it is our intention here to address the language construction problem from a fundamentally different computational view.

A language represents a structured symbolic system that maps symbols to semantic meanings grounded in the environment [20, 36, 40, 39, 19]. In this work, we take the path to reverse engineer this process by considering the mapping from physical states to symbols for language construction. These symbols are used to specify an abstraction over the plan space such that the expressed plans do not introduce coordination conflicts. The idea of applying abstractions to planning problems is hardly new, and has been well studied in both path planning and task planning methods [17, 22, 26, 12]. The notion of temporal-state abstraction used in our language construction process is similar to options in semi-MDP and LTL expressions in temporal logic [38, 42, 11]. However, abstractions in these prior works have mostly been discussed in the planning context to reduce the computational cost [1, 24, 25], instead of in connection with the fundamental requirement of coordination. Similarly, conflict-oriented methods have been studied in multi-agent planning problems [32, 45] but never from a language construction perspective.

Our problem may also appear similar to the task of learning or planning to communicate. Along this line, there exists prior work on developing communication schemes to maximize the team performance, where it is considered either as a planning or learning problem [37, 18, 28, 3]. For example, a communication scheme can arise from a Dec-POMDP framework where communicating actions are modeled specifically [18], which essentially augment the observation function. The set of symbols in these works are assumed to be given a priori and the focus there has been on learning [28] or optimizing [18] the association between those symbols and semantic meanings to improve a reward metric. Instead, we construct symbols from physical states from the ground up as the need for coordination arises in the given domain. In this way, the languages constructed from our method neither suffer from “overfitting” nor “underfitting” of those symbols.

III Background

The utility of a language is for a speaker to convey important information to the listeners. The ability to use languages is known to exist in many natural species [21], each having evolved a language of its own. In this paper, we intend to study how a language may be constructed for a set of autonomous agents from a task planning setting. Next, we present our problem settings along with the simplifications made in this work, followed by a motivating example.

Fig. 1: A motivating example in a multi-agent path-finding domain for illustrating coordination conflicts with three rooms 1, 2, and 3, and two agents, A and B. The agents are constrained to stay in different rooms and not switch locations at the same step. In this example, the task is for the two agents to switch their locations. Without coordination, the agents may choose the joint plans (and their actions) differently, thus leading to coordination conflicts. The first steps for an example of such plans are shown in their thought bubbles, respectively. A possible language for resolving such conflicts could be one that can express clockwise or counterclockwise movements. In this case, for example, agent B may communicate a “plan sketch” to A to “request” A to move in the clockwise direction.

III-A Problem Settings

  • Two rational agents in a cooperative task planning setting. Each agent knows about the other’s capabilities, the environment, and the joint initial and goal states.

  • Both agents must commit to a task plan before execution.

  • The plan preferences of each agent are private and may change arbitrarily. However, both agents can communicate information using a shared language to convey their preferences to the other agent. Both agents understand about the language.

  • It is always in each agent’s interest to not only respect preferences of its own but also the other’s if possible. More specifically, the speaker communicates a “plan sketch” (i.e., an abstract plan) that captures its plan preferences (as a set of plans), which are to be respected by the listener if at all possible (i.e., there exists an overlap between the agents’ preferences).

The first condition means that each agent may independently anticipate a joint plan for both agents, and only optimal plans are considered. Given a cooperative setting, the agents are tasked only to optimize the team performance. In this work, we use makespan as the team performance metric. The second condition applies to many situations when it is difficult (if not outright infeasible) to change plans after execution starts. In such a case, without communication, the two agents may anticipate different joint plans due to their individual preferences and end up with coordination conflicts (i.e., failing to reach the goal state after the execution from both agents completes). Coordination is achieved by one of the agents communicating its preferences as a path sketch to the other agent. The listener is expected to respect the speaker’s preferences if they overlap with the preferences of its own. Hence, the goal of a language in our problem settings is to resolve conflicts while providing as much flexibility as possible when providing a plan sketch in various situations.

III-B Motivating Example

Consider a cooperative and distributed path-finding domain in Fig. 1 with two agents A and B with three rooms 1, 2 and 3. A task in this domain is for the agents to start from an initial state and reach a goal state, where a state specifies locations for both agents. The initial and goal states may specify any locations of the agents in the environment. Both agents share the same optimization criterion for the team (i.e., minimizing the makespan of the task). Each agent can move to an adjacent room or stay where it is in the next time step but they are not allowed to stay in the same room or switch locations at the same time step to avoid collisions. Without coordination, the agents may choose their actions arbitrarily according to any joint optimal plan that is aligned with their preferences, which can lead to coordination conflicts (see Fig. 1).

In such a case, communication is needed for the agents to coordinate their actions due to the different preferences. For example, agent A may prefer to first move and then wait while B may prefer the other way around. Such preferences may also be changed arbitrarily although we assume that they do not change within a negotiation cycle (i.e., from when the speaker sends a message to when it receives an acknowledgment from the listener about whether or not a commitment can be made). The utility of a language is to implement such communication under all tasks (i.e., all initial and goal state pairs). In this example, a language may be one that enables the agents to express something like “clockwise” or “counterclockwise” movements. This happens to be expressible by a language computed by our approximate solution to the language construction problem. As we will discuss later, the symbols in the constructed language for this example form an abstraction over the joint state. The resulting abstract states (i.e., symbols) are automatically constructed and can be used to specify the location of either agent A or B regardless of the other agent. In this way, a clockwise movement can be expressed by specifying the locations of the agent in the desired sequence (i.e., the symbols are compositional in our languages).

IV Problem Formulation

IV-A Basics for Task Planning

We model the domain and environment based on the STRIPS model [13] M=(P,A), where P is the set of propositional state variables, and A is the set of agent joint actions. Each joint action aA (which specifies the individual action of each agent) is associated with a set of preconditions, pre(a)P, add effects, add(a)P, and delete effects, del(a)P. We assume that every joint action has a cost of 1 and the optimality criterion is to minimize the makespan. The agents must work together to achieve tasks in the form of state pairs (I,G)S×S, where S=2P is the state space, and I and G are the initial and goal states. Note that this deviates from the STRIPS model where G can be a partial state. We adopt a more strict goal specification here to simplify the discussion. A discussion on the removal of this restriction is delayed to future work. Each such state pair introduces a planning problem O=(P,A,I,G).

Given any task as a planning problem O=(P,A,I,G), each agent may choose a joint plan based on its individual preferences. However, as seen in the motivating example in Fig. 1, problems may occur when this joint plan is chosen differently by each agent. In such cases, communication is required to coordinate them. The necessity of coordination is tied to the notion of required coordination, which is captured in the definition below:

Definition 1 (Required Coordination (RC)).

Given any task (I,G) that results in a planning problem for a domain, a required coordination is the condition for two plans π1Π and π2Π (π1π2) such that π1A,π2BΠ or (non-exclusive) π2A,π1BΠ, where π1=π1A,π1B,π2=π2A,π2B and Π is the set of joint (optimal) plans for the task.

Fig. 2: Three different joint plans (with individual plans also shown) for the task in our motivating example: the first two plans introduce no RC: πA from π1 (top) can be switched with its counterpart from π2 without affecting the task completion. Both π1 and π2 however introduce RC with π3.

A plan with a superscript (e.g., πA) denotes the individual plan (the sequence formed by individual actions) for the respective agent and πA,πB combines the individual plans πA and πB back into a joint plan. Intuitively, an RC condition defines a situation for the existence of two different plans for a given task, such that a task failure will occur (i.e., goal not reached) after the execution completes had the agents chosen them differently. In such cases, we also refer to that the two plans introduce RC. Notice that two different plans do not necessarily introduce RC. See Fig. 2 for an example.

Proposition 1.

Given a planning domain model M=(P,A), no coordination is necessary when no RC is present for any problem O=(P,A,I,G), where (I,G)T and T denotes the set of all possible tasks in the domain.

Given any problem O=(P,A,I,G), if RC is not present, we consider the following two cases: 1) There exist none or a single joint plan: in this case, either the agents will both identify that the task is unsolvable or choose the same plan. No coordination is necessary. 2) There exist multiple optimal plans but none of them pair-wisely introduce RC. In such a case, both agents can arbitrarily choose a plan, which is guaranteed to be conflict-free based on Def. 1.

In Fig. 2, π1 and π2 do not introduce RC. Hence, had these two plans being the only plans that the agents can choose from, no coordination will be necessary for the given task. This observation already hints on the language construction. In particular, a language for this example will be required to separate (π1,π2) from π3, i.e., expressing them differently.

IV-B Coordination Language

Next, we define language in our problem. Following a widely held view [20, 36], we consider languages as specifying abstractions that relate symbols to physical (joint) states in the environment. More specifically, a sentence of a language specifies a constraint that expresses a temporal-state abstraction:

Definition 2 (Temporal-State Constraint (TSC)).

A temporal-state constraint specifies a constraint in the form of ζ=S1SiSn, where Si2S. A plan π satisfies a constraint ζ if and only if there exists a set of monotonically increasing positive integers j0,j1,j2jn with j0=0 and jn=|πS|, such that πS[jk-1+1:jk]Sk (nk1), πS[jk-1]Sk (nk>1) and πS[jk+1]Sk (n>k1), where πS is the state sequence that corresponds to π.

Fig. 3: An illustration of a TSC. The top part of the figure shows the abstract states associated with the TSC and the bottom shows a plan as a state sequence that satisfies it. A TSC that is satisfied by a plan, in general, may not be unique.

Brackets are used to index into πS and the symbol : indicates a range of indices (inclusive); πS[x:y] thus returns a set. An illustration of a TSC is provided in Fig. 3. Alternatively, we also refer to that a TSC expresses a plan if the plan satisfies the constraint. It is worth noting that the abstraction used in TSC may be considered as a strict generalization of options in semi-MDPs [38], which specify only the initial and final states. Instead, a TSC specifies a sequence of abstract states that must be visited by a plan in order. While this distinction may not be important for planning purposes, since a TSC can be constructed from options, it could affect the construction of a language significantly since TSC allows for more flexible expressions. Similarly, LTL and CTL [42, 11] may also be used to specify TSCs. A fundamental difference is that our focus here is not on planning where the abstractions and their associated physical states are given a priori, and instead on how the abstractions naturally arise from the requirement of coordination. Next, we formally define language using TSCs.

Definition 3 (Language).

A language for a domain M=(P,A) is a pair L=(W,O) where W is the set of words and O the set of compositional operators that can be applied to the words. Each word wW is a TSC.

Consider the natural language word zigzag that is defined as “have or move along in a zigzag course”, which specifies a temporal-state constraint on the movement of an agent. The language operators are applied to the words (which are also referred to as basic TSCs) in a language to introduce more complex TSCs (referred to as sentences. For example, a concatenation operator may be used to express “move along in a zigzag course and then go forward” when used to connect zigzag and forward. Any given language thus introduces a set of TSCs for the domain.

Not all languages are of interest to the agents. To be useful, a language must at least be able to resolve RCs. Such languages are referred to as coordination languages:

Definition 4 (Coordination Language).

A language is a coordination language for a domain if the following conditions hold: given any (optimal) plan π for any task, 1) a sentence of the language expresses it; 2) for any sentence of the language that expresses it, the set of all plans for the task that are expressed by the sentence must introduce no RC.

More intuitively, since a sentence may correspond to multiple plans, a coordination language ensures that the set of plans that are expressed do not introduce RC. In such a case, the flexibility of the listener to choose a plan, as long as it is constrained by a sentence that expresses it, is guaranteed to not introduce coordination conflicts. However, a coordination language may be created too stringently, such as when it expresses only a single plan for any task. Such languages are undesirable since it significantly restricts the listener’s flexibility. Hence, we further prefer languages that are more flexible, or approximately expressive of plans. One way to build in more flexibility is to minimize the number of words in a coordination language. We will focus our following discussion on constructing coordination languages that have a minimal word set. To simplify the formal analysis in this work, we will hold the set of operators fixed and simple (i.e., with only concatenation), and further assume state abstractions only (i.e., words are all state abstractions or w2S for all w𝒲). Discussions for removing these simplifications will be delayed to future work.

Theorem 1.

The decision problem, DMinW=(P,A,T,O,K), to determine whether a coordination language exists with word size |W|=K, under O={.} (concatenation) with state abstractions is NEXP-complete.


First, we prove that the problem can be solved by a Non-deterministic Turning Machine (NTM) in exponential time. This machine can guess the 𝒲 by enumerating all possible word set with K words, which can be done in exponential time. For each candidate word set, checking whether it forms a coordination language with 𝒪={.} will not use more than an exponential amount of time, which involves computing the set of optimal plans for any given initial and goal states and validating that the language resolves all RCs subject to the conditions in Def. 4.

Next, we show that the decision problem DMinW can be reduced from the TILING problem, which has the form of D=(L,H,V,N) and is known to be NEXP-complete [5]. In a TILING problem, we are given a board of size N×N, a set of tile types L={tile0tilek}, and a set of horizontal and vertical compatibility relations, H,VL×L. A tiling is a mapping f:{1N}×{1N}L and is consistent if and only if f(0,0)=tile0 and each pair of the horizontally or vertically adjacent cells satisfies one of the horizontal or vertical compatibility relations, respectively. The TILING problem can be reduced to a DMinW problem that involves a modified two-agent path-finding domain in a grid-world of size N×N. For any given task, the agents always start in the same location and must move to the same goal location while minimizing the makespan. They can move horizontally or vertically to an adjacent cell and choose a label from L to label the cell that it moves into. The location (0,0) is labeled with tile0 initially and other locations are unlabeled. A failure will be resulted when they choose a different location to move into, different labels for the same next location, or that the move and labeling (referred to as a transition henceforth) does not correspond to any relation in H or V. The labels will not be reset after a task is completed and are carried over to the following tasks. Each state of the domain corresponds to a possible transition from an environment state to the next.

It can now be shown that the decision problem DMinW has a solution of size |H|+|V| if and only if the problem D has a consistent tiling. In the first direction, if TILING has a consistent tiling, we can set the |H|+|V| words according to H and V, each specifying a horizontal or vertical compatibility relation that corresponds to a set of legitimate transitions. This set of words clearly can be used to coordinate the agents’ transitions in the desired manner since every word specifies a single valid transition given the current environment state (i.e., agent locations and labels). On the other hand, if there does not exist a tiling that is consistent, there must exist two adjacent locations that could not satisfy any relation in H or V when all the locations are labeled. In such a situation, a failure will be resulted if all the plans for a given task require the agents to move along those two locations, no matter how we choose the word set. In other words, no coordination languages could have existed. ∎

V Approximating DMinW

We have shown that the class of DMinW problems is generally NEXP-complete. Hence, solving the problems exactly is computationally infeasible. Next, we focus on developing an approximate solution for DMinW. First, we formally define state abstraction used in DMinW.

Definition 5 (State Abstraction).

A state abstraction of a state space S is a mapping f:2S{1M} where M is the number of abstract states. Where there is no ambiguity, M is also used to denote the set of abstract states.

Note that the same state may appear in multiple abstract states under this definition. When viewing the TSC in Fig. 3 as a sentence that is constructed by concatenating words of state abstractions, it may also be considered as an illustration of the relationship between abstract states and physical states. It follows that the set of words for DMinW is a state abstraction. We further define a perfect state abstraction as a state abstraction that corresponds to a partition: every physical state belongs to one and only one abstract state. This means that f now becomes a surjective function for the mapping of S{1M} such that f(s) returns the abstract state for s.

Fig. 4: An illustration of how the approximation method works in Theorem 2 with two plans that introduce RC (top and bottom) for the same task. When {I, s1, s2} (or {s3, s4, s5} are pair-wisely separated into different abstract states, any TSCs for expressing these two plans will necessarily be different since they would have to represent s1 and s2 (or s3 and s4) as different abstract states with the preceding (or trailing) abstract states shared, thus resolving the RC.
Lemma 1.

The set of words (i.e.,W) of any coordination language for DMinW may be replaced by a perfect state abstraction, resulting in another coordination language.


This is easy to see since we can construct a perfect state abstraction from a state abstraction by constructing additional abstract states for the overlapping states, and for states not covered by the original set of words. Then, for any TSC that may be constructed by the original state abstraction, a more stringent TSC can be constructed using the perfect state abstraction, which does not affect the requirements of a coordination language (see Def. 4). This also means that this new language can still express any optimal plan for any task. Hence, the conclusion holds. ∎

Notice, however, that the replacement may reduce the flexibility when using the language since it now contains more words and tends to be more stringent. The benefit of using perfect state abstractions is that the TSC (or sentence) constructed to express any plan is unique since any physical state is also uniquely mapped to an abstract state.

Corollary 1.

The decision problem of DMinW with perfect state abstractions, denoted as DMinW+, is NEXP-complete.

The proof in Theorem 1 applies also to perfect state abstractions so this is a direct result. In spite of belonging to the same complexity class, problems with perfect state abstractions are easier to analyze, which leads to an approximate solution as we will discuss next.

For any (I,G)𝒯 of a domain, a set of optimal plans can be constructed and denoted as Π. For any plan pair (π1,π2) in Π that introduces RC, we denote the first and last states where they differ in the plans as d=(α,β) (α and β could be the same); α and β each denotes a pair of states from π1 and π2, respectively. Furthermore, we use α* (and similarly for β*) to denote the three pairs of nodes that are formed pair-wisely by α and the previous node that is shared by the plans. See Fig. 4 for an illustration of α* and β* with two plans. We denote the set of such (α*,β*)’s for all (I,G)𝒯 as 𝒟.

Theorem 2.

A language that satisfies the following condition with perfect state abstractions is a coordination language: (α*,β*)D, i f(α*[i][1])f(α*[i][2]) OR i f(β*[i][1])f(β*[i][2]).

Essentially, the condition above requires the three states that are associated with either α* or β* between any two plans (that introduce RC) to be pair-wisely separated in the state abstraction. Intuitively, this enables a language to always distinguish between the two plans so that the associated RC can be resolved. Next, we more formally prove this result.


First, recall that a coordination language must not express any two plans that introduce RC. In other words, all plan pairs that introduce RC must always be expressed separately by such a language. We show next that a language that satisfies the condition above always resolves RC for any given plan pair π1,π2 for any (I,G)𝒯. Given π1,π2, we can construct a unique TSC ζ1 and ζ2, respectively, given a perfect abstraction. When the above condition is satisfied, we know that ζ1 and ζ2 must differ by at least one abstract state and hence π1,π2 will never be expressed by the same TSC (see Fig. 4 for additional explanations). As a result, the RC will always be resolved by the language. Since this holds for any plan pair for any (I,G)𝒯, we can conclude that the language is a coordination language for the domain. ∎

Fig. 5: One possible set of all state pairs (i.e., edges) to be separated as discussed in Theorem 2 for our motivating example. Each node corresponds to a state with the first and second numbers representing the locations of A and B, respectively. The blue clusters present a candidate abstraction.

The condition described above in Theorem 2 is only a sufficient condition for a coordination language–there may exist a coordination language that does not satisfy it. As a result, a language that satisfies it may not be the exact solution for DMinW+. In other words, it establishes an approximate solution. The positive side is that this solution only requires solving a set of (PSPACE-complete) planning problems (of size |𝒯| to be precise), which have efficient existing solutions. For our motivating example, a perfect abstraction produced is presented in Fig. 5. Note that this example is a special case where the longest plan length is 2, implying that α and β will always be the same and hence the solution returned by our approximate method is in fact the exact solution for DMinW+! This is further backed up by the result in Table I. Readers may have observed that the abstraction in Fig. 5 corresponds to specifying the location of agent B, in which case agent A’s location no longer matters. Using this abstraction, the agents can specify clockwise or counterclockwise movements (of B) by composing the words in the desired sequences, as we initially discussed (see Fig. 1). Another note is that the perfect abstraction may not be unique. For example, A and B have a symmetrical relationship in our motivating example so that a similar abstraction for A also exists. Finally, it is worth noting that the semantic meanings (physical states) associated with the abstraction (abstract states) in this example are drawn from hindsight; the construction is automatic based solely on the coordination requirements of the domain.

Advantages of MinW

A precondition for using a MinW as a language in our setting is that the two parties both maintain the language specifications (e.g., 𝒲 only with DMinW+), which requires knowledge alignment a priori, or learning a language [16]. The advantages of using a MinW is at least three-fold (to show next): 1) Using a MinW resolves all RC. 2) Using a MinW reduces the communication cost of the speaker while providing more flexibility to the listener. 3) Using a MinW reduces the planning effort of the listener.

VI Evaluation Results

env. size |𝒯| |S| M time (s) M time (s)
GW #1 2 × 2 132 12 3 26.2 8 0.1
GW #2 2 × 3 870 30 3600.0 13 0.9
GW #3 2 × 4 3080 56 3600.0 17 19.5
GW #4 3 × 3 5112 72 3600.0 23 86.2
env. size |𝒯| |S| Manhattan(I,G) M time (s)
GWC #1 3 × 3 3080 56 1 12 92.9
GWC #1 3 × 3 380 56 4 3 1.4
GWC #2 3 × 4 636 90 5 4 11.5
GWC #3 3 × 5 956 132 6 3 132.5
GWC #4 4 × 4 956 132 6 4 140.3
TABLE I: Performance of our language construction method for DMinW+ in a multi-agent path-finding domain.

We focus our evaluation on a multi-agent path-finding domain that is similar to the motivating example to illustrate the complete set of features of the constructed languages. Applying our method to other domains should not be much different and similar results are expected. In our evaluation domain, the agents operate in a grid-world environment and can move to any adjacent cells or stay in the same cell in the next time step. Similarly, the agents cannot stay in the same cell or switch locations at the same time step. The optimization criterion is to minimize the makespan of the joint plan. While seemingly simple, this domain is in fact challenging for coordination since the agents can freely choose their actions according to any joint optimal plan as long as they end up in their respective goal locations. This means that an agent may choose to wander around in any preferred way or stay at the goal or any intermediate locations while waiting for the other agent to reach its goal. Hence, the domain in fact represents a challenging setting for language construction. The results were obtained on a MacBook Pro with Intel Core i7 CPU 2.80GHz with 16GB of RAM using a Java implementation.

VI-A Language Construction

For the first set of results, we evaluate the performance of our language construction method for DMinW+, using the approximate solution described in Theorem 2, and compare it with a brute-force method that computes the exact solution for DMinW+. The brute-force method checks through all possible abstractions and returns the one with the smallest (word) size that is a coordination language. It always starts with the smallest size abstraction and gradually increases the size. To further reduce computation, in addition to following Theorem 2, we considered only α* and implemented a greedy solution for determining the smallest abstraction that satisfies the condition in Theorem 2, i.e., adding any remaining states to an abstract state whenever possible until nothing more can be added. Table I presents the results, where the top part corresponds to standard grid-worlds and the bottom part corresponds to grid-worlds with a non-traversable center (GWC)–an agent may only go around the center.

We can see that our method is effective in finding an abstraction of the language–it can largely compress the state space even as an approximate method (denoted by M). The exact solution failed to find a solution in under an hour in almost all cases other than the smallest case. In most cases, our approximate solution returns a set of words that is significantly smaller than the set of joint actions (i.e., with size 25), which represents one of the most stringent languages. This effect is seen more clearly in the results at the bottom where we restrict the set of tasks (i.e., 𝒯) to be of at least some length by requiring Manhattan(I,G) for either one of the agents to be equal or above that length). This is reasonable in situations when we care less about coordination in tasks having short spans.

VI-B Communication Savings

To illustrate the benefit of communication savings for the speaker, we compare our method with a baseline approach that communicates the entire plan. The cost of communication is determined by the length of a plan, and in our method the length of a TSC. We chose the setting for GWC #3 in Table I since the plans are longer and hence the effect would be clearer. The result is presented in Fig. 6. The average communication saving is 33.3% among the 784 plans or 27.3% among all the 956 plans, which is rather impressive given this simple setting.

Fig. 6: Comparison of communication cost between using plans vs. TSCs for GWC #3 in Table I, where the optimal (state) plan length is always constrained to be 7. The X-axis shows the problems and the Y-axis the communication costs. Out of the 956 problems, we are showing the first 100 out of the 784 problems where the communication cost differs.
Fig. 7: Comparison of plan flexibility between using plans vs. TSCs for GWC #3 in Table I. The X-axis shows the problems and the Y-axis the number of plans that are available for the listener to choose from after receiving a TSC. Out of the 956 problems, we are showing the first 100 out of the 774 problems where the plan flexibility differs.
Fig. 8: Comparison of planning between using A* directly with Manhattan distance vs. using the same A* with a given TSC for GWC #3 in Table I. The X-axis shows the problems and the Y-axis the number of states that are expanded in each case. Out of the 956 problems, we are showing the first 100 out of the 950 problems where the number differs.

VI-C Increased Flexibility

To verify that using TSC increases the flexibility of the listener, using the same setting as above, we compute the number of plans that are available to the listener after receiving a TSC from the speaker. The speaker always chooses the first plan that is found and then converts it to a TSC according to the language. The result is presented in Fig. 7. The increase in flexibility is notable: the average number of plans available is 14.7 among the 774 plans or 12.1 among all the 956 plans. (compared to 1 with the baseline approach)!

VI-D Reduced Planning

In this experiment, we test how much computation is saved for the listener had it had to generate its own plan. For this evaluation, we implemented an A* search with Manhattan distance as the heuristic. Using the same A* algorithm, we modified it to also consider a given TSC (sent by the speaker). More specifically, when expanding a node, the modified algorithm would not consider its neighbors that are not aligned with the TSC. The result is presented in Fig. 8. Out of the 956 problems, the average node reduction is 1.6-fold among all the 956 plans. It is clear that computation is still required by the listener but reduced substantially.

VI-E Coordination Game

In this experiment, we test the benefits of the flexibility provided to the listener in a game setting. In this game, given a random task, the speaker first chooses its optimal plan randomly. At the same time, the listener randomly chooses k plans as its preferences out of all the possible plans for the task. The speaker then acts as a proposer that either communicates its plan or the TSC to the listener. While respecting the speaker’s preferences, if there exists a plan that the listener can choose from that is also in its own preferences, the listener accepts the proposal and the game ends. Otherwise, the game continues from the beginning (the speaker re-picks its plan and listener re-picks its preferences) until the proposal of the speaker is eventually accepted. We are interested to see how many iterations (i.e., negotiations) the agents play this game under the settings where plans and TSCs are communicated by the speaker, respectively. We run this game on GWC # 1 described in Table I. The results are averaged over 100 runs as k is changed from 1 to 5, and are shown in Fig. 9. We can see that in both settings the number of iterations decreases as k increases, which is expected. However, these numbers remain relatively consistent and much smaller for the TSC case which clearly shows the benefit of the flexibility enabled by TSCs.

Fig. 9: Comparison of the number of iterations played by the speaker and listener between communicating plans and communicating TSCs for GWC #1 in Table I. The X-axis shows the k value and the Y-axis the number of iterations played until the speaker’s proposal is accepted by the listener. The results are averaged over 100 runs.

VII Conclusions and Discussions

In this paper, we provided a preliminary and formal treatment of the language construction problem for autonomous agents from a new computational perspective. The focus of this initial investigation was on a task planning setting where one agent directly communicates an abstract plan to the other agent to achieve coordination. We viewed language construction as a fundamental requirement for resolving miscoordination. Such a view enables us to compute a language from the ground up by mapping symbols to physical states. Furthermore, we introduced the notion of coordination language and discussed its theoretical properties. Our later discussion was focused on one form of coordination languages that minimize the word set with a given set of operators, or MinW. It was shown that even such a restricted form is NEXP-complete. An approximate solution was then developed to convert this NEXP-complete problem into a set of planning problems. Finally, evaluations of our method were provided.

This preliminary discussion herein opens up plentiful opportunities for future investigation. For example, discussions for removing some of the assumptions, such as plan optimality, goal state specification, cooperative setting, plan commitment, and the requirements of a coordination language in Def. 4, would significantly improve the applicability of the constructed languages. For example, instead of requiring no RC to be introduced in Def. 4, we may require only that a significant portion of the plans do not introduce RC by defining a robustness measure similar to that used in [17, 29]. Another interesting direction is to address the evolution of languages in new environments with additional (and potentially heterogeneous) agents. Although we can naively extend our approach to handle such cases, the computation would scale exponentially. Along this line, methods may be explored to enable online and incremental learning. Better approximate solutions are also desirable. Finally, it is worth noting that although it is not the intention of this work, the proposed computational view on language construction from coordination requirements could shed new light on the origin of languages [33].


  • [1] D. Abel, D. E. Hershkowitz, and M. L. Littman (2017) Near optimal behavior via approximate state abstraction. arXiv preprint arXiv:1701.04113. Cited by: §II.
  • [2] N. Allott (2006) Game theory and communication. In Game Theory and Pragmatics, pp. 123–152. External Links: ISBN 978-0-230-28589-7, Document, Link Cited by: §II.
  • [3] J. Andreas, A. Dragan, and D. Klein (2017) Translating neuralese. arXiv preprint arXiv:1704.06960. Cited by: §II.
  • [4] M. Barbuceanu and M. S. Fox (1995) COOL: a language for describing coordination in multi agent systems.. In ICMAS, pp. 17–24. Cited by: §I.
  • [5] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein (2002) The complexity of decentralized control of markov decision processes. Mathematics of operations research 27 (4), pp. 819–840. Cited by: §IV-B.
  • [6] A. Cangelosi and D. Parisi (2012) Simulating the evolution of language. Springer Science & Business Media. Cited by: §II.
  • [7] N. Chomsky (2005) Three factors in language design. Linguistic inquiry 36 (1), pp. 1–22. Cited by: §I.
  • [8] N. Chomsky (2014) The minimalist program. MIT press. Cited by: §I.
  • [9] M. H. Christiansen and S. Kirby (2003) Language evolution. OUP Oxford. Cited by: §II.
  • [10] W. Cushing, S. Kambhampati, and D. S. Weld (2007) When is temporal planning really temporal?. In Proceedings of the 20th international joint conference on Artifical intelligence, pp. 1852–1859. Cited by: footnote 1.
  • [11] E. A. Emerson (1990) Temporal and modal logic. In Formal Models and Semantics, pp. 995–1072. Cited by: §II, §IV-B.
  • [12] K. Erol, J. Hendler, and D. S. Nau (1994) HTN planning: complexity and expressivity. In AAAI, pp. 1123–1128. Cited by: §II.
  • [13] R. E. Fikes and N. J. Nilsson (1971) Strips: a new approach to the application of theorem proving to problem solving. Artificial Intelligence 2 (3), pp. 189 – 208. External Links: ISSN 0004-3702, Document, Link Cited by: §IV-A.
  • [14] T. Finin, R. Fritzson, D. McKay, and R. McEntire (1994) KQML as an agent communication language. In CIKM, pp. 456–463. Cited by: §I.
  • [15] J. Gauthier and I. Mordatch (2016) A paradigm for situated and goal-driven language learning. CoRR abs/1610.03585. External Links: Link, 1610.03585 Cited by: §I.
  • [16] D. Gentner and L. L. Namy (2006) Analogical processes in language learning. Current Directions in Psychological Science 15 (6), pp. 297–301. Cited by: §V.
  • [17] M. L. Ginsberg (1995) Approximate planning. Artificial Intelligence 76 (1), pp. 89 – 123. Note: Planning and Scheduling External Links: ISSN 0004-3702, Document, Link Cited by: §II, §VII.
  • [18] C. V. Goldman and S. Zilberstein (2003) Optimizing information exchange in cooperative multi-agent systems. In AAMAS, New York, NY, USA, pp. 137–144. External Links: ISBN 1-58113-683-8, Link, Document Cited by: §II.
  • [19] Z. Gong and Y. Zhang (2018) Temporal spatial inverse semantics for robots communicating with humans. In ICRA, Cited by: §I, §II.
  • [20] S. Harnad (1990) The symbol grounding problem. Physica D: Nonlinear Phenomena 42 (1-3), pp. 335–346. Cited by: §II, §IV-B.
  • [21] M. D. Hauser, N. Chomsky, and W. T. Fitch (2002) The faculty of language: what is it, who has it, and how did it evolve?. science 298 (5598), pp. 1569–1579. Cited by: §III.
  • [22] S. Kambhampati, C. A. Knoblock, and Q. Yang (1995) Planning as refinement search: a unified framework for evaluating design tradeoffs in partial-order planning. Artificial Intelligence 76 (1-2), pp. 167–238. Cited by: §II.
  • [23] S. Kirby, T. Griffiths, and K. Smith (2014) Iterated learning and the evolution of language. Current opinion in neurobiology 28, pp. 108–114. Cited by: §II.
  • [24] G. Konidaris, L. Kaelbling, and T. Lozano-Perez (2014) Constructing symbolic representations for high-level planning. In AAAI, Cited by: §II.
  • [25] G. Konidaris, S. Kuindersma, R. Grupen, and A. Barto (2012) Robot learning from demonstration by constructing skill trees. IJRR 31 (3), pp. 360–375. Cited by: §II.
  • [26] J. Kvarnström and P. Doherty (2000) TALplanner: a temporal logic based forward chaining planner. Annals of mathematics and Artificial Intelligence 30 (1-4), pp. 119–169. Cited by: §II.
  • [27] C. Matuszek, E. Herbst, L. Zettlemoyer, and D. Fox (2013) Learning to parse natural language commands to a robot control system. In Experimental Robotics, pp. 403–415. Cited by: §I.
  • [28] I. Mordatch and P. Abbeel (2018) Emergence of grounded compositional language in multi-agent populations. In AAAI, Cited by: §II.
  • [29] T. A. Nguyen, S. Kambhampati, and M. Do (2013) Synthesizing robust plans under incomplete domain models. In NIPS, pp. 2472–2480. Cited by: §VII.
  • [30] R. Paul, J. Arkin, N. Roy, and T. M Howard (2016) Efficient grounding of abstract spatial concepts for natural language interaction with robot manipulators. Cited by: §I.
  • [31] S. Poslad (2007) Specifying protocols for multi-agent systems interaction. TAAS 2 (4), pp. 15. Cited by: §I.
  • [32] G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant (2015) Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence 219, pp. 40–66. Cited by: §II.
  • [33] M. Steedman (2002) Plans, affordances, and combinatory grammar. Linguistics and Philosophy 25 (5-6), pp. 723–753. Cited by: §II, §VII.
  • [34] L. Steels (2003) Evolving grounded communication for robots. Trends in cognitive sciences 7 (7), pp. 308–312. Cited by: §II.
  • [35] L. Steels (2011) Modeling the cultural evolution of language. Physics of life reviews 8 (4), pp. 339–356. Cited by: §I.
  • [36] L. Steels (2012) Grounding language through evolutionary language games. In Language Grounding in Robots, pp. 1–22. Cited by: §II, §II, §IV-B.
  • [37] S. Sukhbaatar, R. Fergus, et al. (2016) Learning multiagent communication with backpropagation. In NIPS, pp. 2244–2252. Cited by: §II.
  • [38] R. S. Sutton, D. Precup, and S. Singh (1999) Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. AIJ 112 (1), pp. 181 – 211. External Links: ISSN 0004-3702, Document, Link Cited by: §II, §IV-B.
  • [39] S. Tellex, R. Knepper, A. Li, D. Rus, and N. Roy (2014-07) Asking for help using inverse semantics. In RSS, Berkeley, USA. External Links: Document Cited by: §I, §II.
  • [40] S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. G. Banerjee, S. Teller, and N. Roy (2011) Approaching the symbol grounding problem with probabilistic graphical models. AI magazine 32 (4), pp. 64–76. Cited by: §II.
  • [41] S. Tellex, T. Kollar, S. Dickerson, M. R. Walter, A. G. Banerjee, S. Teller, and N. Roy (2011) Understanding natural language commands for robotic navigation and mobile manipulation. In AAAI, Cited by: §I.
  • [42] M. Y. Vardi (1996) An automata-theoretic approach to linear temporal logic. In Logics for concurrency, pp. 238–266. Cited by: §II, §IV-B.
  • [43] L. Wang and Q. Guo (2019) Coordination of multiple autonomous agents using naturally generated languages in task planning. Applied Sciences 9 (17), pp. 3571. Cited by: footnote 1.
  • [44] P. Xuan, V. Lesser, and S. Zilberstein (2001) Communication decisions in multi-agent cooperation: model and experiments. In AAMAS, pp. 616–623. Cited by: §I.
  • [45] Y. Zhang, K. Kim, and G. Fainekos (2016) Discof: cooperative pathfinding in distributed systems with limited sensing and communication range. In Distributed Autonomous Robotic Systems, pp. 325–340. Cited by: §II.
  • [46] Y. Zhang, S. Sreedharan, and S. Kambhampati (2016) A formal analysis of required cooperation in multi-agent planning. In Twenty-Sixth International Conference on Automated Planning and Scheduling, Cited by: footnote 1.