Abstract
Large language models (LLMs) have revolutionized code generation, automatingprogramming with remarkable efficiency. However, these advancements challengeprogramming skills, ethics, and assessment integrity, making the detection ofLLM-generated code essential for maintaining accountability and standards.While, there has been some research on this problem, it generally lacks domaincoverage and robustness, and only covers a small number of programminglanguages. To this end, we propose a framework capable of distinguishingbetween human- and LLM-written code across multiple programming languages, codegenerators, and domains. We use a large-scale dataset from renowned platformsand LLM-based code generators, alongside applying rigorous data quality checks,feature engineering, and comparative analysis using evaluation of traditionalmachine learning models, pre-trained language models (PLMs), and LLMs for codedetection. We perform an evaluation on out-of-domain scenarios, such asdetecting the authorship and hybrid authorship of generated code andgeneralizing to unseen models, domains, and programming languages. Moreover,our extensive experiments show that our framework effectively distinguisheshuman- from LLM-written code and sets a new benchmark for this task.