Learn to Expect the Unexpected: Probably Approximately Correct Domain Generalization

  • 2020-02-13 17:37:53
  • Vikas K. Garg, Adam Kalai, Katrina Ligett, Zhiwei Steven Wu
Domain generalization is the problem of machine learning when the trainingdata and the test data come from different data domains. We present a simpletheoretical model of learning to generalize across domains in which there is ameta-distribution over data distributions, and those data distributions mayeven have different supports. In our model, the training data given to alearning algorithm consists of multiple datasets each from a single domaindrawn in turn from the meta-distribution. We study this model in threedifferent problem settings---a multi-domain Massart noise setting, a decisiontree multi-dataset setting, and a feature selection setting, and find thatcomputationally efficient, polynomial-sample domain generalization is possiblein each. Experiments demonstrate that our feature selection algorithm indeedignores spurious correlations and improves generalization.


