Abstract
Negative flips are errors introduced in a classification system when a legacymodel is replaced with a new one. Existing methods to reduce the negative fliprate (NFR) either do so at the expense of overall accuracy using modeldistillation, or use ensembles, which multiply inference cost prohibitively. Wepresent a method to train a classification system that achieves paragonperformance in both error rate and NFR, at the inference cost of a singlemodel. Our method introduces a generalized distillation objective, LogitDifference Inhibition (LDI), that penalizes changes in the logits between thenew and old model, without forcing them to coincide as in ordinarydistillation. LDI affords the model flexibility to reduce error rate along withNFR. The method uses a homogeneous ensemble as the reference model for LDI,hence the name Ensemble LDI, or ELODI. The reference model can then besubstituted with a single model at inference time. The method leverages theobservation that negative flips are typically not close to the decisionboundary, but often exhibit large deviations in the distance among theirlogits, which are reduced by ELODI.