Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

  • 2024-11-27 10:59:10
  • Beatriz Borges, Negar Foroutan, Deniz Bayazit, Anna Sotnikova, Syrielle Montariol, Tanya Nazaretzky, Mohammadreza Banaei, Alireza Sakhaeirad, Philippe Servant, Seyed Parsa Neshaei, Jibril Frej, Angelika Romanou, Gail Weiss, Sepideh Mamooler, Zeming Chen, Simin Fan, Silin Gao, Mete Ismayilzada, Debjit Paul, Alexandre Schöpfer, Andrej Janchevski, Anja Tiede, Clarence Linden, Emanuele Troiani, Francesco Salvi, Freya Behrens, Giacomo Orsi, Giovanni Piccioli, Hadrien Sevel, Louis Coulon, Manuela Pineros-Rodriguez, Marin Bonnassies, Pierre Hellich, Puck van Gerwen, Sankalp Gambhir, Solal Pirelli, Thomas Blanchard, Timothée Callens, Toni Abi Aoun, Yannick Calvino Alonso, Yuri Cho, Alberto Chiappa, Antonio Sclocchi, Étienne Bruno, Florian Hofhammer, Gabriel Pescia, Geovani Rizk, Leello Dadi, Lucas
  • 0

Abstract

AI assistants are being increasingly used by students enrolled in highereducation institutions. While these tools provide opportunities for improvedteaching and education, they also pose significant challenges for assessmentand learning outcomes. We conceptualize these challenges through the lens ofvulnerability, the potential for university assessments and learning outcomesto be impacted by student use of generative AI. We investigate the potentialscale of this vulnerability by measuring the degree to which AI assistants cancomplete assessment questions in standard university-level STEM courses.Specifically, we compile a novel dataset of textual assessment questions from50 courses at EPFL and evaluate whether two AI assistants, GPT-3.5 and GPT-4can adequately answer these questions. We use eight prompting strategies toproduce responses and find that GPT-4 answers an average of 65.8% of questionscorrectly, and can even produce the correct answer across at least oneprompting strategy for 85.1% of questions. When grouping courses in our datasetby degree program, these systems already pass non-project assessments of largenumbers of core courses in various degree programs, posing risks to highereducation accreditation that will be amplified as these models improve. Ourresults call for revising program-level assessment design in higher educationin light of advances in generative AI.

 

Quick Read (beta)

loading the full paper ...