Foundation models pretrained on diverse data at scale have demonstratedextraordinary capabilities in a wide range of vision and language tasks. Whensuch models are deployed in real world environments, they inevitably interfacewith other entities and agents. For example, language models are often used tointeract with human beings through dialogue, and visual perception models areused to autonomously navigate neighborhood streets. In response to thesedevelopments, new paradigms are emerging for training foundation models tointeract with other agents and perform long-term reasoning. These paradigmsleverage the existence of ever-larger datasets curated for multimodal,multitask, and generalist interaction. Research at the intersection offoundation models and decision making holds tremendous promise for creatingpowerful new systems that can interact effectively across a diverse range ofapplications such as dialogue, autonomous driving, healthcare, education, androbotics. In this manuscript, we examine the scope of foundation models fordecision making, and provide conceptual tools and technical background forunderstanding the problem space and exploring new research directions. Wereview recent approaches that ground foundation models in practical decisionmaking applications through a variety of methods such as prompting, conditionalgenerative modeling, planning, optimal control, and reinforcement learning, anddiscuss common challenges and open problems in the field.