Technical Report: A Stratification Approach to Partial Dependence for Codependent Variables

  • 2019-09-19 17:22:36
  • Terence Parr, James D. Wilson
  • 0

Abstract

Model interpretability is important to machine learning practitioners, and akey component of interpretation is the characterization of partial dependenceof the response variable on any subset of features used in the model. The twomost common strategies for assessing partial dependence suffer from a number ofcritical weaknesses. In the first strategy, linear regression modelcoefficients describe how a unit change in an explanatory variable changes theresponse, while holding other variables constant. But, linear regression isinapplicable for high dimensional (p>n) data sets and is often insufficient tocapture the relationship between explanatory variables and the response. In thesecond strategy, Partial Dependence (PD) plots and Individual ConditionalExpectation (ICE) plots give biased results for the common situation ofcodependent variables and they rely on fitted models provided by the user. Whenthe supplied model is a poor choice due to systematic bias or overfitting,PD/ICE plots provide little (if any) useful information. To address these issues, we introduce a new strategy, called StratPD, thatdoes not depend on a user's fitted model, provides accurate results in thepresence codependent variables, and is applicable to high dimensional settings.The strategy works by stratifying a data set into groups of observations thatare similar, except in the variable of interest, through the use of a decisiontree. Any fluctuations of the response variable within a group is likely due tothe variable of interest. We apply StratPD to a collection of simulations andcase studies to show that StratPD is a fast, reliable, and robust method forassessing partial dependence with clear advantages over state-of-the-artmethods.

 

Quick Read (beta)

loading the full paper ...