A Tale Of Two Long Tails

Abstract

As machine learning models are increasingly employed to assist humandecision-makers, it becomes critical to communicate the uncertainty associatedwith these model predictions. However, the majority of work on uncertainty hasfocused on traditional probabilistic or ranking approaches - where the modelassigns low probabilities or scores to uncertain examples. While this captureswhat examples are challenging for the model, it does not capture the underlyingsource of the uncertainty. In this work, we seek to identify examples the modelis uncertain about and characterize the source of said uncertainty. We explorethe benefits of designing a targeted intervention - targeted data augmentationof the examples where the model is uncertain over the course of training. Weinvestigate whether the rate of learning in the presence of additionalinformation differs between atypical and noisy examples? Our results show thatthis is indeed the case, suggesting that well-designed interventions over thecourse of training can be an effective way to characterize and distinguishbetween different sources of uncertainty.

Quick Read (beta)

loading the full paper ...