Abstract
This paper presents a novel approach for detecting ChatGPT-generated vs.human-written text using language models. To this end, we first collected andreleased a pre-processed dataset named OpenGPTText, which consists of rephrasedcontent generated using ChatGPT. We then designed, implemented, and trained twodifferent models for text classification, using Robustly Optimized BERTPretraining Approach (RoBERTa) and Text-to-Text Transfer Transformer (T5),respectively. Our models achieved remarkable results, with an accuracy of over97% on the test dataset, as evaluated through various metrics. Furthermore, weconducted an interpretability study to showcase our model's ability to extractand differentiate key features between human-written and ChatGPT-generatedtext. Our findings provide important insights into the effective use oflanguage models to detect generated text.