Abstract
Large Language Models can carry out human-like conversations in diversesettings, responding to user requests for tasks and knowledge. However,existing conversational agents implemented with LLMs often struggle withhallucination, following instructions with conditional logic, and integratingknowledge from different sources. These shortcomings compromise the agents'effectiveness, rendering them unsuitable for deployment. To address thesechallenges, we introduce Genie, a programmable framework for creatingknowledge-intensive task-oriented conversational agents. Genie can handleinvolved interactions and answer complex queries. Unlike LLMs, it deliversreliable, grounded responses through advanced dialogue state management andsupports controllable agent policies via its declarative specification -- GenieWorksheet. This is achieved through an algorithmic runtime system thatimplements the developer-supplied policy, limiting LLMs to (1) parse user inputusing a succinct conversational history, and (2) generate responses accordingto supplied context. Agents built with Genie outperform SOTA methods on complexlogic dialogue datasets. We conducted a user study with 62 participants onthree real-life applications: restaurant reservations with Yelp, as well asticket submission and course enrollment for university students. Genie agentswith GPT-4 Turbo outperformed the GPT-4 Turbo agents with function calling,improving goal completion rates from 21.8% to 82.8% across three real-worldtasks.