Abstract
AI assistants are being increasingly used by students enrolled in highereducation institutions. While these tools provide opportunities for improvedteaching and education, they also pose significant challenges for assessmentand learning outcomes. We conceptualize these challenges through the lens ofvulnerability, the potential for university assessments and learning outcomesto be impacted by student use of generative AI. We investigate the potentialscale of this vulnerability by measuring the degree to which AI assistants cancomplete assessment questions in standard university-level STEM courses.Specifically, we compile a novel dataset of textual assessment questions from50 courses at EPFL and evaluate whether two AI assistants, GPT-3.5 and GPT-4can adequately answer these questions. We use eight prompting strategies toproduce responses and find that GPT-4 answers an average of 65.8% of questionscorrectly, and can even produce the correct answer across at least oneprompting strategy for 85.1% of questions. When grouping courses in our datasetby degree program, these systems already pass non-project assessments of largenumbers of core courses in various degree programs, posing risks to highereducation accreditation that will be amplified as these models improve. Ourresults call for revising program-level assessment design in higher educationin light of advances in generative AI.