Abstract
Topic modeling is a well-established technique for exploring text corpora.Conventional topic models (e.g., LDA) represent topics as bags of words thatoften require "reading the tea leaves" to interpret; additionally, they offerusers minimal semantic control over topics. To tackle these issues, weintroduce TopicGPT, a prompt-based framework that uses large language models(LLMs) to uncover latent topics within a provided text collection. TopicGPTproduces topics that align better with human categorizations compared tocompeting methods: for example, it achieves a harmonic mean purity of 0.74against human-annotated Wikipedia topics compared to 0.64 for the strongestbaseline. Its topics are also more interpretable, dispensing with ambiguousbags of words in favor of topics with natural language labels and associatedfree-form descriptions. Moreover, the framework is highly adaptable, allowingusers to specify constraints and modify topics without the need for modelretraining. TopicGPT can be further extended to hierarchical topical modeling,enabling users to explore topics at various levels of granularity. Bystreamlining access to high-quality and interpretable topics, TopicGPTrepresents a compelling, human-centered approach to topic modeling.