Hard vs. Noise: Resolving Hard-Noisy Sample Confusion in Recommender Systems via Large Language Models

Abstract

Implicit feedback, employed in training recommender systems, unavoidablyconfronts noise due to factors such as misclicks and position bias. Previousstudies have attempted to identify noisy samples through their diverged datapatterns, such as higher loss values, and mitigate their influence throughsample dropping or reweighting. However, we observed that noisy samples andhard samples display similar patterns, leading to hard-noisy confusion issue.Such confusion is problematic as hard samples are vital for modeling userpreferences. To solve this problem, we propose LLMHNI framework, leveraging twoauxiliary user-item relevance signals generated by Large Language Models (LLMs)to differentiate hard and noisy samples. LLMHNI obtains user-item semanticrelevance from LLM-encoded embeddings, which is used in negative sampling toselect hard negatives while filtering out noisy false negatives. An objectivealignment strategy is proposed to project LLM-encoded embeddings, originallyfor general language tasks, into a representation space optimized for user-itemrelevance modeling. LLMHNI also exploits LLM-inferred logical relevance withinuser-item interactions to identify hard and noisy samples. These LLM-inferredinteractions are integrated into the interaction graph and guide denoising withcross-graph contrastive alignment. To eliminate the impact of unreliableinteractions induced by LLM hallucination, we propose a graph contrastivelearning strategy that aligns representations from randomly edge-dropped viewsto suppress unreliable edges. Empirical results demonstrate that LLMHNIsignificantly improves denoising and recommendation performance.

Quick Read (beta)

loading the full paper ...