Abstract
In classical canonical correlation analysis (CCA), the goal is to determinethe linear transformations of two random vectors into two new random variablesthat are most strongly correlated. Canonical variables are pairs of these newrandom variables, while canonical correlations are correlations between thesepairs. In this paper, we propose and study two generalizations of thisclassical method: (1) Instead of two random vectors we study more complex data structures thatappear in important applications. In these structures, there are $L$ features,each described by $p_l$ scalars, $1 \le l \le L$. We observe $n$ such objectsover $T$ time points. We derive a suitable analog of the CCA for such data. Ourapproach relies on embeddings into Reproducing Kernel Hilbert Spaces, andcovers several related data structures as well. (2) We develop an analogous approach for multidimensional random processes.In this case, the experimental units are multivariate continuous,square-integrable functions over a given interval. These functions are modeledas elements of a Hilbert space, so in this case, we define the multiplefunctional canonical correlation analysis, MFCCA. We justify our approaches by their application to two data sets and suitablelarge sample theory. We derive consistency rates for the related transformationand correlation estimators, and show that it is possible to relax two commonassumptions on the compactness of the underlying cross-covariance operators andthe independence of the data.