We introduce a large scale crowdsourced text adventure game as a researchplatform for studying grounded dialogue. In it, agents can perceive, emote, andact whilst conducting dialogue with other agents. Models and humans can bothact as characters within the game. We describe the results of trainingstate-of-the-art generative and retrieval models in this setting. We show thatin addition to using past dialogue, these models are able to effectively usethe state of the underlying world to condition their predictions. Inparticular, we show that grounding on the details of the local environment,including location descriptions, and the objects (and their affordances) andcharacters (and their previous actions) present within it allows betterpredictions of agent behavior and dialogue. We analyze the ingredientsnecessary for successful grounding in this setting, and how each of thesefactors relate to agents that can talk and act successfully.