Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning

Abstract

Detecting Mild Cognitive Impairment from picture descriptions is critical yetchallenging, especially in multilingual and multiple picture settings. Priorwork has primarily focused on English speakers describing a single picture(e.g., the 'Cookie Theft'). The TAUKDIAL-2024 challenge expands this scope byintroducing multilingual speakers and multiple pictures, which presents newchallenges in analyzing picture-dependent content. To address these challenges,we propose a framework with three components: (1) enhancing discriminativerepresentation learning via supervised contrastive learning, (2) involvingimage modality rather than relying solely on speech and text modalities, and(3) applying a Product of Experts (PoE) strategy to mitigate spuriouscorrelations and overfitting. Our framework improves MCI detection performance,achieving a +7.1% increase in Unweighted Average Recall (UAR) (from 68.1% to75.2%) and a +2.9% increase in F1 score (from 80.6% to 83.5%) compared to thetext unimodal baseline. Notably, the contrastive learning component yieldsgreater gains for the text modality compared to speech. These results highlightour framework's effectiveness in multilingual and multi-picture MCI detection.

Quick Read (beta)

loading the full paper ...