TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification

  • 2025-06-02 08:43:02
  • Yindu Su, Huike Zou, Lin Sun, Ting Zhang, Haiyang Yang, Liyu Chen, David Lo, Qingheng Zhang, Shuguang Han, Jufeng Chen
  • 0

Abstract

Product Attribute Value Identification (PAVI) involves identifying attributevalues from product profiles, a key task for improving product search,recommendation, and business analytics on e-commerce platforms. However,existing PAVI methods face critical challenges, such as inferring implicitvalues, handling out-of-distribution (OOD) values, and producing normalizedoutputs. To address these limitations, we introduce Taxonomy-Aware ContrastiveLearning Retrieval (TACLR), the first retrieval-based method for PAVI. TACLRformulates PAVI as an information retrieval task by encoding product profilesand candidate values into embeddings and retrieving values based on theirsimilarity. It leverages contrastive training with taxonomy-aware hard negativesampling and employs adaptive inference with dynamic thresholds. TACLR offersthree key advantages: (1) it effectively handles implicit and OOD values whileproducing normalized outputs; (2) it scales to thousands of categories, tens ofthousands of attributes, and millions of values; and (3) it supports efficientinference for high-load industrial deployment. Extensive experiments onproprietary and public datasets validate the effectiveness and efficiency ofTACLR. Further, it has been successfully deployed on the real-world e-commerceplatform Xianyu, processing millions of product listings daily with frequentlyupdated, large-scale attribute taxonomies. We release the code to facilitatereproducibility and future research at https://github.com/SuYindu/TACLR.

 

Quick Read (beta)

loading the full paper ...