How much data is sufficient to learn high-performing algorithms?

Abstract

Algorithms for scientific analysis typically have tunable parameters thatsignificantly influence computational efficiency and solution quality. If aparameter setting leads to strong algorithmic performance on average over a setof typical problem instances, that parameter setting---ideally---will performwell in the future. However, if the set of typical problem instances is small,average performance will not generalize to future performance. This raises thequestion: how large should this set be? We answer this question for anyalgorithm satisfying an easy-to-describe, ubiquitous property: its performanceis a piecewise-structured function of its parameters. We are the first toprovide a unified sample complexity framework for algorithm parameterconfiguration; prior research followed case-by-case analyses. We presentapplications from diverse domains, including biology, political science, andeconomics.

Quick Read (beta)

loading the full paper ...