Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents

Abstract

Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent thatsimulates human-like behaviors in a variety of tasks. However, evaluating RPAsis challenging due to diverse task requirements and agent designs. This paperproposes an evidence-based, actionable, and generalizable evaluation designguideline for LLM-based RPA by systematically reviewing 1,676 papers publishedbetween Jan. 2021 and Dec. 2024. Our analysis identifies six agent attributes,seven task attributes, and seven evaluation metrics from existing literature.Based on these findings, we present an RPA evaluation design guideline to helpresearchers develop more systematic and consistent evaluation methods.

Quick Read (beta)

loading the full paper ...