Abstract
Post-hoc recalibration methods are widely used to ensure that classifiersprovide faithful probability estimates. We argue that parametric recalibrationfunctions based on logistic regression can be motivated from a simpletheoretical setting for both binary and multiclass classification. This insightmotivates the use of more expressive calibration methods beyond standardtemperature scaling. For multi-class calibration however, a key challenge liesin the increasing number of parameters introduced by more complex models, oftencoupled with limited calibration data, which can lead to overfitting. Throughextensive experiments, we demonstrate that the resulting bias-variance tradeoffcan be effectively managed by structured regularization, robust preprocessingand efficient optimization. The resulting methods lead to substantial gainsover existing logistic-based calibration techniques. We provide efficient andeasy-to-use open-source implementations of our methods, making them anattractive alternative to common temperature, vector, and matrix scalingimplementations.