jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Abstract

We introduce jina-embeddings-v3, a novel text embedding model with 570million parameters, achieves state-of-the-art performance on multilingual dataand long-context retrieval tasks, supporting context lengths of up to 8192tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA)adapters to generate high-quality embeddings for query-document retrieval,clustering, classification, and text matching. Additionally, MatryoshkaRepresentation Learning is integrated into the training process, allowingflexible truncation of embedding dimensions without compromising performance.Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms thelatest proprietary embeddings from OpenAI and Cohere on English tasks, whileachieving superior performance compared to multilingual-e5-large-instructacross all multilingual tasks.

Quick Read (beta)

loading the full paper ...