Abstract
As text generation has become a core capability of modern Large LanguageModels (LLMs), it underpins a wide range of downstream applications. However,most existing LLMs rely on autoregressive (AR) generation, producing one tokenat a time based on previously generated context-resulting in limited generationspeed due to the inherently sequential nature of the process. To address thischallenge, an increasing number of researchers have begun exploring paralleltext generation-a broad class of techniques aimed at breaking thetoken-by-token generation bottleneck and improving inference efficiency.Despite growing interest, there remains a lack of comprehensive analysis onwhat specific techniques constitute parallel text generation and how theyimprove inference performance. To bridge this gap, we present a systematicsurvey of parallel text generation methods. We categorize existing approachesinto AR-based and Non-AR-based paradigms, and provide a detailed examination ofthe core techniques within each category. Following this taxonomy, we assesstheir theoretical trade-offs in terms of speed, quality, and efficiency, andexamine their potential for combination and comparison with alternativeacceleration strategies. Finally, based on our findings, we highlight recentadvancements, identify open challenges, and outline promising directions forfuture research in parallel text generation. We have also created a GitHubrepository for indexing relevant papers and open resources available athttps://github.com/zhanglingzhe0820/Awesome-Parallel-Text-Generation.