Mossad: Defeating Software Plagiarism Detection

Abstract

Automatic software plagiarism detection tools are widely used in educationalsettings to ensure that submitted work was not copied. These tools have grownin use together with the rise in enrollments in computer science programs andthe widespread availability of code on-line. Educators rely on the robustnessof plagiarism detection tools; the working assumption is that the effortrequired to evade detection is as high as that required to actually do theassigned work. This paper shows this is not the case. It presents an entirely automaticprogram transformation approach, Mossad, that defeats popular softwareplagiarism detection tools. Mossad comprises a framework that couplestechniques inspired by genetic programming with domain-specific knowledge toeffectively undermine plagiarism detectors. Mossad is effective at defeatingfour plagiarism detectors, including Moss and JPlag. Mossad is both fast andeffective: it can, in minutes, generate modified versions of programs that arelikely to escape detection. More insidiously, because of its non-deterministicapproach, Mossad can, from a single program, generate dozens of variants, whichare classified as no more suspicious than legitimate assignments. A detailedstudy of Mossad across a corpus of real student assignments demonstrates itsefficacy at evading detection. A user study shows that graduate studentassistants consistently rate Mossad-generated code as just as readable asauthentic student code. This work motivates the need for both research on morerobust plagiarism detection tools and greater integration of naturallyplagiarism-resistant methodologies like code review into computer scienceeducation.

Quick Read (beta)

loading the full paper ...