Abstract
Controlling the generation of large language models (LLMs) remains a centralchallenge to ensure their safe and reliable deployment. While promptengineering and finetuning are common approaches, recent work has exploredlatent steering, a lightweight technique that alters LLM internal activationsto guide generation. However, subsequent studies revealed latent steering'seffectiveness to be limited, often underperforming simple instructionprompting. To address this limitation, we first establish a benchmark acrossdiverse behaviors for standardized evaluation of steering techniques. Buildingon insights from this benchmark, we introduce Instruction Attention Boosting(InstABoost), a latent steering method that boosts the strength of instructionprompting by altering the model's attention during generation. InstABoostcombines the strengths of existing approaches and is theoretically supported byprior work that suggests that in-context rule following in transformer-basedmodels can be controlled by manipulating attention on instructions.Empirically, InstABoost demonstrates superior control success compared to bothtraditional prompting and latent steering.