Abstract
How to obtain a model with good interpretability and performance has alwaysbeen an important research topic. In this paper, we propose rectified decisiontrees (ReDT), a knowledge distillation based decision trees rectification withhigh interpretability, small model size, and empirical soundness. Specifically,we extend the impurity calculation and the pure ending condition of theclassical decision tree to propose a decision tree extension that allows theuse of soft labels generated by a well-trained teacher model in training andprediction process. It is worth noting that for the acquisition of soft labels,we propose a new multiple cross-validation based method to reduce the effectsof randomness and overfitting. These approaches ensure that ReDT retainsexcellent interpretability and even achieves fewer nodes than the decision treein the aspect of compression while having relatively good performance. Besides,in contrast to traditional knowledge distillation, back propagation of thestudent model is not necessarily required in ReDT, which is an attempt of a newknowledge distillation approach. Extensive experiments are conducted, whichdemonstrates the superiority of ReDT in interpretability, compression, andempirical soundness.