### Abstract

We introduce a general framework for active learning in regression problems.Our framework extends the standard setup by allowing for general types of data,rather than merely pointwise samples of the target function. Thisgeneralization covers many cases of practical interest, such as data acquiredin transform domains (e.g., Fourier data), vector-valued data (e.g.,gradient-augmented data), data acquired along continuous curves, and,multimodal data (i.e., combinations of different types of measurements). Ourframework considers random sampling according to a finite number of samplingmeasures and arbitrary nonlinear approximation spaces (model classes). Weintroduce the concept of generalized Christoffel functions and show how thesecan be used to optimize the sampling measures. We prove that this leads tonear-optimal sample complexity in various important cases. This paper focuseson applications in scientific computing, where active learning is oftendesirable, since it is usually expensive to generate data. We demonstrate theefficacy of our framework for gradient-augmented learning with polynomials,Magnetic Resonance Imaging (MRI) using generative models and adaptive samplingfor solving PDEs using Physics-Informed Neural Networks (PINNs).