On Explaining Machine Learning Models by Evolving Crucial and Compact Features

Abstract

Feature construction can substantially improve the accuracy of MachineLearning (ML) algorithms. Genetic Programming (GP) has been proven to beeffective at this task by evolving non-linear combinations of input features.GP additionally has the potential to improve ML explainability since explicitexpressions are evolved. Yet, in most GP works the complexity of evolvedfeatures is not explicitly bound or minimized though this is arguably key forexplainability. In this article, we assess to what extent GP still performsfavorably at feature construction when constructing features that are (1) Ofsmall-enough number, to enable visualization of the behavior of the ML model;(2) Of small-enough size, to enable interpretability of the featuresthemselves; (3) Of sufficient informative power, to retain or even improve theperformance of the ML algorithm. We consider a simple feature constructionscheme using three different GP algorithms, as well as random search, to evolvefeatures for four ML algorithms, including support vector machines and randomforest. Our results on 20 datasets pertaining to classification and regressionproblems show that constructing only two compact features can be sufficient torival the use of the entire original feature set. We further find that a modernGP algorithm, GP-GOMEA, performs best overall. These results, combined withexamples that we provide of readable constructed features and of 2Dvisualizations of ML behavior, lead us to positively conclude that GP-basedfeature construction still works well when explicitly searching for compactfeatures, making it extremely helpful to explain ML models.

Quick Read (beta)

loading the full paper ...