Fully convolutional neural networks (FCNs), and in particular U-Nets, haveachieved state-of-the-art results in semantic segmentation for numerous medicalimaging applications. Moreover, batch normalization and Dice loss have beenused successfully to stabilize and accelerate training. However, these networksare poorly calibrated i.e. they tend to produce overconfident predictions bothin correct and erroneous classifications, making them unreliable and hard tointerpret. In this paper, we study predictive uncertainty estimation in FCNsfor medical image segmentation. We make the following contributions: 1) Wesystematically compare cross entropy loss with Dice loss in terms ofsegmentation quality and uncertainty estimation of FCNs; 2) We propose modelensembling for confidence calibration of the FCNs trained with batchnormalization and Dice loss; 3) We assess the ability of calibrated FCNs topredict segmentation quality of structures and detect out-of-distribution testexamples. We conduct extensive experiments across three medical imagesegmentation applications of the brain, the heart, and the prostate to evaluateour contributions. The results of this study offer considerable insight intothe predictive uncertainty estimation and out-of-distribution detection inmedical image segmentation and provide practical recipes for confidencecalibration. Moreover, we consistently demonstrate that model ensemblingimproves confidence calibration.