Abstract
The common sense reasoning abilities and vast general knowledge of LargeLanguage Models (LLMs) make them a natural fit for interpreting user requestsin a Smart Home assistant context. LLMs, however, lack specific knowledge aboutthe user and their home limit their potential impact. SAGE (Smart Home Agentwith Grounded Execution), overcomes these and other limitations by using ascheme in which a user request triggers an LLM-controlled sequence of discreteactions. These actions can be used to retrieve information, interact with theuser, or manipulate device states. SAGE controls this process through adynamically constructed tree of LLM prompts, which help it decide which actionto take next, whether an action was successful, and when to terminate theprocess. The SAGE action set augments an LLM's capabilities to support some ofthe most critical requirements for a Smart Home assistant. These include:flexible and scalable user preference management ("is my team playingtonight?"), access to any smart device's full functionality withoutdevice-specific code via API reading "turn down the screen brightness on mydryer", persistent device state monitoring ("remind me to throw out the milkwhen I open the fridge"), natural device references using only a photo of theroom ("turn on the light on the dresser"), and more. We introduce a benchmarkof 50 new and challenging smart home tasks where SAGE achieves a 75% successrate, significantly outperforming existing LLM-enabled baselines (30% successrate).