Abstract
Simulating human profiles by instilling personas into large language models(LLMs) is rapidly transforming research in agentic behavioral simulation, LLMpersonalization, and human-AI alignment. However, most existing syntheticpersonas remain shallow and simplistic, capturing minimal attributes andfailing to reflect the rich complexity and diversity of real human identities.We introduce DEEPPERSONA, a scalable generative engine for synthesizingnarrative-complete synthetic personas through a two-stage, taxonomy-guidedmethod. First, we algorithmically construct the largest-ever human-attributetaxonomy, comprising over hundreds of hierarchically organized attributes, bymining thousands of real user-ChatGPT conversations. Second, we progressivelysample attributes from this taxonomy, conditionally generating coherent andrealistic personas that average hundreds of structured attributes and roughly 1MB of narrative text, two orders of magnitude deeper than prior works.Intrinsic evaluations confirm significant improvements in attribute diversity(32 percent higher coverage) and profile uniqueness (44 percent greater)compared to state-of-the-art baselines. Extrinsically, our personas enhanceGPT-4.1-mini's personalized question answering accuracy by 11.6 percent onaverage across ten metrics and substantially narrow (by 31.7 percent) the gapbetween simulated LLM citizens and authentic human responses in social surveys.Our generated national citizens reduced the performance gap on the Big Fivepersonality test by 17 percent relative to LLM-simulated citizens. DEEPPERSONAthus provides a rigorous, scalable, and privacy-free platform for high-fidelityhuman simulation and personalized AI research.