Abstract
We introduce a novel reinforcement learning framework of LLM agents namedAGILE (AGent that Interacts and Learns from Environments) designed to performcomplex conversational tasks with users, leveraging LLMs, memory, tools, andinteractions with experts. The agent possesses capabilities beyondconversation, including reflection, tool usage, and expert consultation. Weformulate the construction of such an LLM agent as a reinforcement learning(RL) problem, in which the LLM serves as the policy model. We fine-tune the LLMusing labeled data of actions and the PPO algorithm. We focus on questionanswering and release a dataset for agents called ProductQA, comprisingchallenging questions in online shopping. Our extensive experiments onProductQA, MedMCQA and HotPotQA show that AGILE agents based on 7B and 13B LLMstrained with PPO can outperform GPT-4 agents. Our ablation study highlights theindispensability of memory, tools, consultation, reflection, and reinforcementlearning in achieving the agent's strong performance. Datasets and code areavailable at https://github.com/bytarnish/AGILE.