Distribution Learning with Valid Outputs Beyond the Worst-Case

Abstract

Generative models at times produce "invalid" outputs, such as images withgeneration artifacts and unnatural sounds. Validity-constrained distributionlearning attempts to address this problem by requiring that the learneddistribution have a provably small fraction of its mass in invalid parts ofspace -- something which standard loss minimization does not always ensure. Tothis end, a learner in this model can guide the learning via "validityqueries", which allow it to ascertain the validity of individual examples.Prior work on this problem takes a worst-case stance, showing that properlearning requires an exponential number of validity queries, and demonstratingan improper algorithm which -- while generating guarantees in a wide-range ofsettings -- makes an atypical polynomial number of validity queries. In thiswork, we take a first step towards characterizing regimes where guaranteeingvalidity is easier than in the worst-case. We show that when the datadistribution lies in the model class and the log-loss is minimized, the numberof samples required to ensure validity has a weak dependence on the validityrequirement. Additionally, we show that when the validity region belongs to aVC-class, a limited number of validity queries are often sufficient.

Quick Read (beta)

loading the full paper ...