Abstract
Multi-modal deep metric learning is crucial for effectively capturing diverserepresentations in tasks such as face verification, fine-grained objectrecognition, and product search. Traditional approaches to metric learning,whether based on distance or margin metrics, primarily emphasize classseparation, often overlooking the intra-class distribution essential formulti-modal feature learning. In this context, we propose a novel loss functioncalled Density-Aware Adaptive Margin Loss(DAAL), which preserves the densitydistribution of embeddings while encouraging the formation of adaptivesub-clusters within each class. By employing an adaptive line strategy, DAALnot only enhances intra-class variance but also ensures robust inter-classseparation, facilitating effective multi-modal representation. Comprehensiveexperiments on benchmark fine-grained datasets demonstrate the superiorperformance of DAAL, underscoring its potential in advancing retrievalapplications and multi-modal deep metric learning.