Abstract
We introduce a scalable framework for regressing multivariate distributionsonto multivariate distributions, motivated by the application of inferringcell-cell communication from population-scale single-cell data. The observeddata consist of pairs of multivariate distributions for ligands from one celltype and corresponding receptors from another. For each ordered pair $e=(l,r)$of cell types $(l \neq r)$ and each sample $i = 1, \ldots, n$, we observe apair of distributions $(F_{ei}, G_{ei})$ of gene expressions for ligands andreceptors of cell types $l$ and $r$, respectively. The aim is to set up aregression of receptor distributions $G_{ei}$ given ligand distributions$F_{ei}$. A key challenge is that these distributions reside in distinct spacesof differing dimensions. We formulate the regression of multivariate densitieson multivariate densities using a generalized Bayes framework with the slicedWasserstein distance between fitted and observed distributions. Finally, we useinference under such regressions to define a directed graph for cell-cellcommunications.