Error Scaling Laws for Kernel Classification under Source and Capacity Conditions

Abstract

We consider the problem of kernel classification. While worst-case bounds onthe decay rate of the prediction error with the number of samples are known forsome classifiers, they often fail to accurately describe the learning curves ofreal data sets. In this work, we consider the important class of data setssatisfying the standard source and capacity conditions, comprising a number ofreal data sets as we show numerically. Under the Gaussian design, we derive thedecay rates for the misclassification (prediction) error as a function of thesource and capacity coefficients. We do so for two standard kernelclassification settings, namely margin-maximizing Support Vector Machines (SVM)and ridge classification, and contrast the two methods. We find that our ratestightly describe the learning curves for this class of data sets, and are alsoobserved on real data. Our results can also be seen as an explicit predictionof the exponents of a scaling law for kernel classification that is accurate onsome real datasets.

Quick Read (beta)

loading the full paper ...