Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

Abstract

Large language models (LLMs) have advanced to a point that even humans havedifficulty discerning whether a text was generated by another human, or by acomputer. However, knowing whether a text was produced by human or artificialintelligence (AI) is important to determining its trustworthiness, and hasapplications in many domains including detecting fraud and academic dishonesty,as well as combating the spread of misinformation and political propaganda. Thetask of AI-generated text (AIGT) detection is therefore both very challenging,and highly critical. In this survey, we summarize state-of-the art approachesto AIGT detection, including watermarking, statistical and stylistic analysis,and machine learning classification. We also provide information about existingdatasets for this task. Synthesizing the research findings, we aim to provideinsight into the salient factors that combine to determine how "detectable"AIGT text is under different scenarios, and to make practical recommendationsfor future work towards this significant technical and societal challenge.

Quick Read (beta)

loading the full paper ...