Abstract
Instruction-tuning large language models (LLMs) reduces the diversity oftheir outputs, which has implications for many tasks, particularly for creativetasks. This paper investigates the ``diversity gap'' for a writing promptnarrative generation task. This gap emerges as measured by current diversitymetrics for various open-weight and open-source LLMs. The results showsignificant decreases in diversity due to instruction-tuning. We explore thediversity loss at each fine-tuning stage for the OLMo and OLMo 2 models tofurther understand how output diversity is affected. The results indicate thatDPO has the most substantial impact on diversity. Motivated by these findings,we present a new decoding strategy, conformative decoding, which guides aninstruct model using its more diverse base model to reintroduce outputdiversity. We show that conformative decoding typically increases diversity andeven maintains or improves quality.