Abstract
With the advancement of large language models (LLMs), an increasing number ofstudent models have leveraged LLMs to analyze textual artifacts generated bystudents to understand and evaluate their learning. These student modelstypically employ pre-trained LLMs to vectorize text inputs into embeddings andthen use the embeddings to train models to detect the presence or absence of aconstruct of interest. However, how reliable and robust are these models atprocessing language with different levels of complexity? In the context oflearning where students may have different language backgrounds with variouslevels of writing skills, it is critical to examine the robustness of suchmodels to ensure that these models work equally well for text with varyinglevels of language complexity. Coincidentally, a few (but limited) researchstudies show that the use of language can indeed impact the performance ofLLMs. As such, in the current study, we examined the robustness of severalLLM-based student models that detect student self-regulated learning (SRL) inmath problem-solving. Specifically, we compared how the performance of thesemodels vary using texts with high and low lexical, syntactic, and semanticcomplexity measured by three linguistic measures.