Abstract
In industrial settings, surface defects on steel can significantly compromiseits service life and elevate potential safety risks. Traditional defectdetection methods predominantly rely on manual inspection, which suffers fromlow efficiency and high costs. Although automated defect detection approachesbased on Convolutional Neural Networks(e.g., Mask R-CNN) have advanced rapidly,their reliability remains challenged due to data annotation uncertaintiesduring deep model training and overfitting issues. These limitations may leadto detection deviations when processing the given new test samples, renderingautomated detection processes unreliable. To address this challenge, we firstevaluate the detection model's practical performance through calibration datathat satisfies the independent and identically distributed (i.i.d) conditionwith test data. Specifically, we define a loss function for each calibrationsample to quantify detection error rates, such as the complement of recall rateand false discovery rate. Subsequently, we derive a statistically rigorousthreshold based on a user-defined risk level to identify high-probabilitydefective pixels in test images, thereby constructing prediction sets (e.g.,defect regions). This methodology ensures that the expected error rate (meanerror rate) on the test set remains strictly bounced by the predefined risklevel. Additionally, we observe a negative correlation between the averageprediction set size and the risk level on the test set, establishing astatistically rigorous metric for assessing detection model uncertainty.Furthermore, our study demonstrates robust and efficient control over theexpected test set error rate across varying calibration-to-test partitioningratios, validating the method's adaptability and operational effectiveness.