Learning Optimal Decision Trees from Large Datasets

Abstract

Inferring a decision tree from a given dataset is one of the classic problemsin machine learning. This problem consists of buildings, from a labelleddataset, a tree such that each node corresponds to a class and a path betweenthe tree root and a leaf corresponds to a conjunction of features to besatisfied in this class. Following the principle of parsimony, we want to infera minimal tree consistent with the dataset. Unfortunately, inferring an optimaldecision tree is known to be NP-complete for several definitions of optimality.Hence, the majority of existing approaches relies on heuristics, and as for thefew exact inference approaches, they do not work on large data sets. In thispaper, we propose a novel approach for inferring a decision tree of a minimumdepth based on the incremental generation of Boolean formula. The experimentalresults indicate that it scales sufficiently well and the time it takes to rungrows slowly with the size of dataset.

Quick Read (beta)

loading the full paper ...