A persistent challenge in the practice of medicine (and machine learning) isthe disagreement of highly trained human experts on data instances, such aspatient image scans. We study the application of machine learning to predictwhich instances are likely to give rise to maximal expert disagreement. Asnecessitated by this, we develop predictors on datasets with noisy and scarcelabels. Our central methodological finding is that direct prediction of ascalar uncertainty score performs better than the two-step process of (i)training a classifier (ii) using the classifier outputs to derive anuncertainty score. This is seen in both a synthetic setting whose parameters wecan control, and a paradigmatic healthcare application involving multiplelabels by medical domain experts. We evaluate these direct uncertainty modelson a gold standard adjudicated set, where they accurately predict when anindividual expert will disagree with an unknown ground truth. We explore theconsequences for using these predictors to identify the need for a medicalsecond opinion and a machine learning data curation application.