The ability to quickly and accurately identify covariate shift at test timeis a critical and often overlooked component of safe machine learning systemsdeployed in high-risk domains. While methods exist for detecting whenpredictions should not be made on out-of-distribution test examples,identifying distributional level differences between training and test time canhelp determine when a model should be removed from the deployment setting andretrained. In this work, we define harmful covariate shift (HCS) as a change indistribution that may weaken the generalization of a predictive model. Todetect HCS, we use the discordance between an ensemble of classifiers trainedto agree on training data and disagree on test data. We derive a loss functionfor training this ensemble and show that the disagreement rate and entropyrepresent powerful discriminative statistics for HCS. Empirically, wedemonstrate the ability of our method to detect harmful covariate shift withstatistical certainty on a variety of high-dimensional datasets. Acrossnumerous domains and modalities, we show state-of-the-art performance comparedto existing methods, particularly when the number of observed test samples issmall.