In the covariate shift learning scenario, the training and test covariatedistributions differ, so that a predictor's average loss over the training andtest distributions also differ. In this work, we explore the potential ofextreme dimension reduction, i.e. to very low dimensions, in improving theperformance of importance weighting methods for handling covariate shift, whichfail in high dimensions due to potentially high train/test covariate divergenceand the inability to accurately estimate the requisite density ratios. We firstformulate and solve a problem optimizing over linear subspaces a combination oftheir predictive utility and train/test divergence within. Applying it tosimulated and real data, we show extreme dimension reduction helps sometimesbut not always, due to a bias introduced by dimension reduction.