Neuro-Inspired Hierarchical Multimodal Learning

  • 2024-04-23 18:57:33
  • Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan
  • 0

Abstract

Integrating and processing information from various sources or modalities arecritical for obtaining a comprehensive and accurate perception of the realworld. Drawing inspiration from neuroscience, we develop theInformation-Theoretic Hierarchical Perception (ITHP) model, which utilizes theconcept of information bottleneck. Distinct from most traditional fusion modelsthat aim to incorporate all modalities as input, our model designates the primemodality as input, while the remaining modalities act as detectors in theinformation pathway. Our proposed perception model focuses on constructing aneffective and compact information flow by achieving a balance between theminimization of mutual information between the latent state and the input modalstate, and the maximization of mutual information between the latent states andthe remaining modal states. This approach leads to compact latent staterepresentations that retain relevant information while minimizing redundancy,thereby substantially enhancing the performance of downstream tasks.Experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstratethat our model consistently distills crucial information in multimodal learningscenarios, outperforming state-of-the-art benchmarks.

 

Quick Read (beta)

loading the full paper ...