Leveraging Perfect Multimodal Alignment and Gaussian Assumptions for Cross-modal Transfer

  • 2025-03-19 16:51:17
  • Abhi Kamboj, Minh N. Do
  • 0


Multimodal alignment aims to construct a joint latent vector space where twomodalities representing the same concept map to the same vector. We formulatethis as an inverse problem and show that under certain conditions perfectalignment can be achieved. We then address a specific application of alignmentreferred to as cross-modal transfer. Unsupervised cross-modal transfer aims toleverage a model trained with one modality to perform inference on anothermodality, without any labeled fine-tuning on the new modality. Assuming thatsemantic classes are represented as a mixture of Gaussians in the latent space,we show how cross-modal transfer can be performed by projecting the data pointsfrom the representation space onto different subspaces representing eachmodality. Our experiments on synthetic multimodal Gaussian data verify theeffectiveness of our perfect alignment and cross-modal transfer method. We hopethese findings inspire further exploration of the applications of perfectalignment and the use of Gaussian models for cross-modal learning.


Quick Read (beta)

loading the full paper ...