jina-embeddings-v3: Multilingual Embeddings With Task LoRA

  • 2024-09-17 07:42:20
  • Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Andreas Koukounas, Nan Wang, Han Xiao
  • 0

Abstract

We introduce jina-embeddings-v3, a novel text embedding model with 570million parameters, achieves state-of-the-art performance on multilingual dataand long-context retrieval tasks, supporting context lengths of up to 8192tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA)adapters to generate high-quality embeddings for query-document retrieval,clustering, classification, and text matching. Additionally, MatryoshkaRepresentation Learning is integrated into the training process, allowingflexible truncation of embedding dimensions without compromising performance.Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms thelatest proprietary embeddings from OpenAI and Cohere on English tasks, whileachieving superior performance compared to multilingual-e5-large-instructacross all multilingual tasks.

 

Quick Read (beta)

loading the full paper ...