HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals

  • 2025-07-17 17:37:24
  • Guimin Hu, Daniel Hershcovich, Hasti Seifi
  • 0

Abstract

Haptic signals, from smartphone vibrations to virtual reality touch feedback,can effectively convey information and enhance realism, but designing signalsthat resonate meaningfully with users is challenging. To facilitate this, weintroduce a multimodal dataset and task, of matching user descriptions tovibration haptic signals, and highlight two primary challenges: (1) lack oflarge haptic vibration datasets annotated with textual descriptions ascollecting haptic descriptions is time-consuming, and (2) limited capability ofexisting tasks and models to describe vibration signals in text. To advancethis area, we create HapticCap, the first fully human-annotatedhaptic-captioned dataset, containing 92,070 haptic-text pairs for userdescriptions of sensory, emotional, and associative attributes of vibrations.Based on HapticCap, we propose the haptic-caption retrieval task and presentthe results of this task from a supervised contrastive learning framework thatbrings together text representations within specific categories and vibrations.Overall, the combination of language model T5 and audio model AST yields thebest performance in the haptic-caption retrieval task, especially whenseparately trained for each description category.

 

Quick Read (beta)

loading the full paper ...