Parameter Efficient Reinforcement Learning from Human Feedback

Abstract

While Reinforcement Learning from Human Feedback (RLHF) effectively alignspretrained Large Language and Vision-Language Models (LLMs, and VLMs) withhuman preferences, its computational cost and complexity hamper its wideradoption. To alleviate some of the computational burden of fine-tuning,parameter efficient methods, like LoRA were introduced. In this work, weempirically evaluate the setup of Parameter Efficient Reinforcement Learningfrom Human Feedback (PE-RLHF) that leverages LoRA fine-tuning for RewardModeling, and Reinforcement Learning. We benchmark the PE-RLHF setup on sixdiverse datasets spanning summarization, harmless/helpful response generation,UI automation, and visual question answering in terms of effectiveness of thetrained models, and the training resources required. Our findings show, for thefirst time, that PE-RLHF achieves comparable performance to RLHF, whilesignificantly reducing training time (up to 90% faster for reward models, and30% faster for RL), and memory footprint (up to 50% reduction for rewardmodels, and 27% for RL). We provide comprehensive ablations across LoRA ranks,and model sizes for both reward modeling and reinforcement learning. Bymitigating the computational burden associated with RLHF, we push for a broaderadoption of PE-RLHF as an alignment technique for LLMs and VLMs.

Quick Read (beta)

loading the full paper ...