HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

Abstract

Partially Relevant Video Retrieval (PRVR) addresses the critical challenge ofmatching untrimmed videos with text queries describing only partial content.Existing methods suffer from geometric distortion in Euclidean space thatsometimes misrepresents the intrinsic hierarchical structure of videos andoverlooks certain hierarchical semantics, ultimately leading to suboptimaltemporal modeling. To address this issue, we propose the first hyperbolicmodeling framework for PRVR, namely HLFormer, which leverages hyperbolic spacelearning to compensate for the suboptimal hierarchical modeling capabilities ofEuclidean space. Specifically, HLFormer integrates the Lorentz Attention Blockand Euclidean Attention Block to encode video embeddings in hybrid spaces,using the Mean-Guided Adaptive Interaction Module to dynamically fuse features.Additionally, we introduce a Partial Order Preservation Loss to enforce "text <video" hierarchy through Lorentzian cone constraints. This approach furtherenhances cross-modal matching by reinforcing partial relevance between videocontent and text queries. Extensive experiments show that HLFormer outperformsstate-of-the-art methods. Code is released athttps://github.com/lijun2005/ICCV25-HLFormer.

Quick Read (beta)

loading the full paper ...