OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

Abstract

Can large language models (LLMs) accurately simulate the next web action of aspecific user? While LLMs have shown promising capabilities in generating``believable'' human behaviors, evaluating their ability to mimic real userbehaviors remains an open challenge, largely due to the lack of high-quality,publicly available datasets that capture both the observable actions and theinternal reasoning of an actual human user. To address this gap, we introduceOPERA, a novel dataset of Observation, Persona, Rationale, and Action collectedfrom real human participants during online shopping sessions. OPERA is thefirst public dataset that comprehensively captures: user personas, browserobservations, fine-grained web actions, and self-reported just-in-timerationales. We developed both an online questionnaire and a custom browserplugin to gather this dataset with high fidelity. Using OPERA, we establish thefirst benchmark to evaluate how well current LLMs can predict a specific user'snext action and rationale with a given persona and <observation, action,rationale> history. This dataset lays the groundwork for future research intoLLM agents that aim to act as personalized digital twins for human.

Quick Read (beta)

loading the full paper ...