HyperTuning: Toward Adapting Large Language Models without Back-propagation

  • 2022-11-22 18:52:25
  • Jason Phang, Yi Mao, Pengcheng He, Weizhu Chen
  • 34

Abstract

Fine-tuning large language models for different tasks can be costly andinefficient, and even methods that reduce the number of tuned parameters stillrequire full gradient-based optimization. We propose HyperTuning, a novelapproach to model adaptation that uses a hypermodel to generate task-specificparameters for a fixed downstream model. We demonstrate a simple setup forhypertuning with HyperT5, a T5-based hypermodel that produces soft prefixes orLoRA parameters for a frozen T5 model from few-shot examples. We train HyperT5in two stages: first, hyperpretraining with a modified conditional languagemodeling objective that trains a hypermodel to generate parameters; second,multi-task fine-tuning (MTF) on a large number of diverse language tasks. Weevaluate HyperT5 on P3, MetaICL and Super-NaturalInstructions datasets, andshow that it can effectively generate parameters for unseen tasks. Moreover, weshow that using hypermodel-generated parameters as initializations for furtherparameter-efficient fine-tuning improves performance. HyperTuning can thus be aflexible and efficient way to leverage large language models for diversedownstream applications.

 

Quick Read (beta)

loading the full paper ...