Abstract
Domain generalization is the problem of machine learning when the trainingdata and the test data come from different data domains. We present a simpletheoretical model of learning to generalize across domains in which there is ameta-distribution over data distributions, and those data distributions mayeven have different supports. In our model, the training data given to alearning algorithm consists of multiple datasets each from a single domaindrawn in turn from the meta-distribution. We study this model in threedifferent problem settings---a multi-domain Massart noise setting, a decisiontree multi-dataset setting, and a feature selection setting, and find thatcomputationally efficient, polynomial-sample domain generalization is possiblein each. Experiments demonstrate that our feature selection algorithm indeedignores spurious correlations and improves generalization.