On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation

Abstract

We investigate the sensitivity of the Fr\'echet Inception Distance (FID)score to inconsistent and often incorrect implementations across differentimage processing libraries. FID score is widely used to evaluate generativemodels, but each FID implementation uses a different low-level image processingprocess. Image resizing functions in commonly-used deep learning librariesoften introduce aliasing artifacts. We observe that numerous subtle choicesneed to be made for FID calculation and a lack of consistencies in thesechoices can lead to vastly different FID scores. In particular, we show thatthe following choices are significant: (1) selecting what image resizinglibrary to use, (2) choosing what interpolation kernel to use, (3) whatencoding to use when representing images. We additionally outline numerouscommon pitfalls that should be avoided and provide recommendations forcomputing the FID score accurately. We provide an easy-to-use optimizedimplementation of our proposed recommendations in the accompanying code.

Quick Read (beta)

loading the full paper ...