Abstract
Large Language Models (LLMs) have the capacity of performing complexscheduling in a multi-agent system and can coordinate these agents intocompleting sophisticated tasks that require extensive collaboration. However,despite the introduction of numerous gaming frameworks, the community hasinsufficient benchmarks towards building general multi-agents collaborationinfrastructure that encompass both LLM and human-NPCs collaborations. In thiswork, we propose a novel infrastructure - MindAgent - to evaluate planning andcoordination emergent capabilities for gaming interaction. In particular, ourinfrastructure leverages existing gaming framework, to i) require understandingof the coordinator for a multi-agent system, ii) collaborate with human playersvia un-finetuned proper instructions, and iii) establish an in-context learningon few-shot prompt with feedback. Furthermore, we introduce CUISINEWORLD, a newgaming scenario and related benchmark that dispatch a multi-agent collaborationefficiency and supervise multiple agents playing the game simultaneously. Weconduct comprehensive evaluations with new auto-metric CoS for calculating thecollaboration efficiency. Finally, our infrastructure can be deployed intoreal-world gaming scenarios in a customized VR version of CUISINEWORLD andadapted in existing broader Minecraft gaming domain. We hope our findings onLLMs and the new infrastructure for general-purpose scheduling and coordinationcan help shed light on how such skills can be obtained by learning from largelanguage corpora.