OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

  • 2025-06-16 18:32:08
  • Ziyi Wang, Yuxuan Lu, Wenbo Li, Amirali Amini, Bo Sun, Yakov Bart, Weimin Lyu, Jiri Gesi, Tian Wang, Jing Huang, Yu Su, Upol Ehsan, Malihe Alikhani, Toby Jia-Jun Li, Lydia Chilton, Dakuo Wang
  • 0

Abstract

Can large language models (LLMs) accurately simulate the next web action of aspecific user? While LLMs have shown promising capabilities in generating``believable'' human behaviors, evaluating their ability to mimic real userbehaviors remains an open challenge, largely due to the lack of high-quality,publicly available datasets that capture both the observable actions and theinternal reasoning of an actual human user. To address this gap, we introduceOPERA, a novel dataset of Observation, Persona, Rationale, and Action collectedfrom real human participants during online shopping sessions. OPERA is thefirst public dataset that comprehensively captures: user personas, browserobservations, fine-grained web actions, and self-reported just-in-timerationales. We developed both an online questionnaire and a custom browserplugin to gather this dataset with high fidelity. Using OPERA, we establish thefirst benchmark to evaluate how well current LLMs can predict a specific user'snext action and rationale with a given persona and <observation, action,rationale> history. This dataset lays the groundwork for future research intoLLM agents that aim to act as personalized digital twins for human.

 

Quick Read (beta)

loading the full paper ...