A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization

  • 2024-10-30 18:57:13
  • Bin Wu, Sakriani Sakti, Shinnosuke Takamichi, Satoshi Nakamura
  • 0

Abstract

Marmoset, a highly vocalized primate, has become a popular animal model forstudying social-communicative behavior and its underlying mechanism. In thestudy of vocal communication, it is vital to know the caller identities, callcontents, and vocal exchanges. Previous work of a CNN has achieved a jointmodel for call segmentation, classification, and caller identification formarmoset vocalizations. However, the CNN has limitations in modeling long-rangeacoustic patterns; the Transformer architecture that has been shown tooutperform CNNs, utilizes the self-attention mechanism that efficientlysegregates information parallelly over long distances and captures the globalstructure of marmoset vocalization. We propose using the Transformer to jointlysegment and classify the marmoset calls and identify the callers for eachvocalization.

 

Quick Read (beta)

loading the full paper ...