Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One

Abstract

We propose to reinterpret a standard discriminative classifier of p(y|x) asan energy based model for the joint distribution p(x,y). In this setting, thestandard class probabilities can be easily computed as well as unnormalizedvalues of p(x) and p(x|y). Within this framework, standard discriminativearchitectures may beused and the model can also be trained on unlabeled data.We demonstrate that energy based training of the joint distribution improvescalibration, robustness, andout-of-distribution detection while also enablingour models to generate samplesrivaling the quality of recent GAN approaches. Weimprove upon recently proposed techniques for scaling up the training of energybased models and presentan approach which adds little overhead compared tostandard classification training. Our approach is the first to achieveperformance rivaling the state-of-the-artin both generative and discriminativelearning within one hybrid model.

Quick Read (beta)

loading the full paper ...