Abstract
Generative artificial intelligence (AI), in particular large language models(LLMs), is poised to drive transformative economic change. LLMs are pre-trainedon vast text data to learn general language patterns, but a subsequentpost-training phase is critical to align them for specific real-world tasks.Reinforcement learning (RL) is the leading post-training technique, yet itseconomic impact remains largely underexplored and unquantified. We examine thisquestion through the lens of the first deployment of an RL-trained LLM forgenerative advertising on Facebook. Integrated into Meta's Text Generationfeature, our model, "AdLlama," powers an AI tool that helps advertisers createnew variations of human-written ad text. To train this model, we introducereinforcement learning with performance feedback (RLPF), a post-training methodthat uses historical ad performance data as a reward signal. In a large-scale10-week A/B test on Facebook spanning nearly 35,000 advertisers and 640,000 advariations, we find that AdLlama improves click-through rates by 6.7%(p=0.0296) compared to a supervised imitation model trained on curated ads.This represents a substantial improvement in advertiser return on investment onFacebook. We also find that advertisers who used AdLlama generated more advariations, indicating higher satisfaction with the model's outputs. To ourknowledge, this is the largest study to date on the use of generative AI in anecologically valid setting, offering an important data point quantifying thetangible impact of RL post-training. Furthermore, the results show that RLPF isa promising and generalizable approach for metric-driven post-training thatbridges the gap between highly capable language models and tangible outcomes.