Abstract
It is notoriously difficult to control the behavior of artificial neuralnetworks such as generative neural language models. We recast the problem ofcontrolling natural language generation as that of learning to interface with apretrained language model, just as Application Programming Interfaces (APIs)control the behavior of programs by altering hyperparameters. In this newparadigm, a specialized neural network (called a Neural Programming Interfaceor NPI) learns to interface with a pretrained language model by manipulatingthe hidden activations of the pretrained model to produce desired outputs.Importantly, no permanent changes are made to the weights of the originalmodel, allowing us to re-purpose pretrained models for new tasks withoutoverwriting any aspect of the language model. We also contribute a new data setconstruction algorithm and GAN-inspired loss function that allows us to trainNPI models to control outputs of autoregressive transformers. In experimentsagainst other state-of-the-art approaches, we demonstrate the efficacy of ourmethods using OpenAI's GPT-2 model, successfully controlling noun selection,topic aversion, offensive speech filtering, and other aspects of language whilelargely maintaining the controlled model's fluency under deterministicsettings.