We develop and rigorously evaluate a deep learning based system that canaccurately classify skin conditions while detecting rare conditions for whichthere is not enough data available for training a confident classifier. Weframe this task as an out-of-distribution (OOD) detection problem. Our novelapproach, hierarchical outlier detection (HOD) assigns multiple abstentionclasses for each training outlier class and jointly performs a coarseclassification of inliers vs. outliers, along with fine-grained classificationof the individual classes. We demonstrate the effectiveness of the HOD loss inconjunction with modern representation learning approaches (BiT, SimCLR, MICLe)and explore different ensembling strategies for further improving the results.We perform an extensive subgroup analysis over conditions of varying risklevels and different skin types to investigate how the OOD detectionperformance changes over each subgroup and demonstrate the gains of ourframework in comparison to baselines. Finally, we introduce a cost metric toapproximate downstream clinical impact. We use this cost metric to compare theproposed method against a baseline system, thereby making a stronger case forthe overall system effectiveness in a real-world deployment scenario.