Abstract
Reinforcement Learning (RL) agents often exhibit learning behaviors that arenot intuitively interpretable by human observers, which can result insuboptimal feedback in collaborative teaching settings. Yet, how humansperceive and interpret RL agent's learning behavior is largely unknown. In abottom-up approach with two experiments, this work provides a data-drivenunderstanding of the factors of human observers' understanding of the agent'slearning process. A novel, observation-based paradigm to directly assess humaninferences about agent learning was developed. In an exploratory interviewstudy (\textit{N}=9), we identify four core themes in human interpretations:Agent Goals, Knowledge, Decision Making, and Learning Mechanisms. A secondconfirmatory study (\textit{N}=34) applied an expanded version of the paradigmacross two tasks (navigation/manipulation) and two RL algorithms(tabular/function approximation). Analyses of 816 responses confirmed thereliability of the paradigm and refined the thematic framework, revealing howthese themes evolve over time and interrelate. Our findings provide ahuman-centered understanding of how people make sense of agent learning,offering actionable insights for designing interpretable RL systems andimproving transparency in Human-Robot Interaction.