TaoSearchEmb: A Multi-Objective Reinforcement Learning Framework for Dense Retrieval in Taobao Search

Alibaba

Why we think this paper is great for you:
This paper explores dense retrieval in e-commerce search engines using LLMs and embedding models, directly aligning with your interest in search and deep learning for information retrieval. It offers insights into multi-objective reinforcement learning for optimizing semantic retrieval.

Rate paper: 👍 👎 ♥ Save

Abstract
Dense retrieval, as the core component of e-commerce search engines, maps user queries and items into a unified semantic space through pre-trained embedding models to enable large-scale real-time semantic retrieval. Despite the rapid advancement of LLMs gradually replacing traditional BERT architectures for embedding, their training paradigms still adhere to BERT-like supervised fine-tuning and hard negative mining strategies. This approach relies on complex offline hard negative sample construction pipelines, which constrain model iteration efficiency and hinder the evolutionary potential of semantic representation capabilities. Besides, existing multi-task learning frameworks face the seesaw effect when simultaneously optimizing semantic relevance and non-relevance objectives. In this paper, we propose Retrieval-GRPO, a multi-objective reinforcement learning-based dense retrieval framework designed to address these challenges. The method eliminates offline hard negative sample construction by dynamically retrieving Top-K candidate products for each query during training, while introducing a relevance LLM as a reward model to generate real-time feedback. Specifically, the retrieval model dynamically optimizes embedding representations through reinforcement learning, with reward signals combining LLM-generated relevance scores, product quality scores, and multi-way exclusivity metrics to achieve multi-objective user preference alignment and real-time error correction. This mechanism not only removes dependency on hard negatives but also mitigates the seesaw effect through collaborative multi-objective optimization, significantly enhancing the model's semantic generalization capability for complex long-tail queries. Extensive offline and online experiments validate the effectiveness of Retrieval-GRPO, which has been deployed on China's largest e-commerce platform.

AI Summary

The framework effectively mitigates the "seesaw effect" in multi-objective optimization by fusing LLM-generated relevance scores, product quality metrics, and multi-way exclusivity into a single, unified reward signal. [3]
The multi-objective reward design, particularly the exclusivity reward, demonstrably reduces overlap with traditional inverted-index retrieval, providing incremental value and diversifying search results. [3]
The quality and precision of the reward model are paramount; using a high-capacity LLM (TaoSR1) as the reward model yields substantial performance gains compared to simpler models, highlighting the importance of accurate reward calibration. [3]
Multi-Objective Reward Fusion: A mechanism within Retrieval-GRPO where the reward signal for reinforcement learning is composed of weighted contributions from semantic relevance (LLM-generated), item quality (historical data), and exclusivity (overlap with other retrieval channels). [3]
Retrieval-GRPO eliminates the need for complex, offline hard negative sample mining by dynamically retrieving top-k candidates during training, significantly accelerating model iteration cycles. [2]
By leveraging a powerful LLM (TaoSR1) as a real-time reward model, Retrieval-GRPO enables dynamic error correction and user preference alignment, leading to superior semantic generalization for complex long-tail queries. [2]
A two-stage training approach, starting with Supervised Fine-Tuning (SFT) followed by Retrieval-GRPO, is crucial for optimal performance, as the SFT phase provides foundational discriminative capabilities before tackling challenging samples. [2]
Retrieval-GRPO: A multi-objective reinforcement learning framework for dense retrieval that dynamically optimizes embedding representations by using real-time rewards from an LLM-based relevance model, item quality scores, and exclusivity metrics, eliminating offline hard negative mining. [2]
TaoSR1: An advanced internal 42B-MoE LLM, trained through multi-stage RL, used as the reward model within Retrieval-GRPO to provide real-time, precise relevance scores for query-item candidates. [2]
Global Negative Sampling: An enhancement to InfoNCE loss in the SFT phase where negative samples are randomly selected from the entire product pool in addition to standard in-batch negatives, ensuring broader exposure of items during training. [2]

Hint-Augmented Re-ranking: Efficient Product Search using LLM-Based Query Decomposition

Amazoncom, Inc

Why we think this paper is great for you:
You will find this paper highly relevant as it focuses on re-ranking for efficient product search, leveraging LLMs for query decomposition. This directly addresses your interests in search, ranking, and deep learning applications.

Rate paper: 👍 👎 ♥ Save

Abstract
Search queries with superlatives (e.g., best, most popular) require comparing candidates across multiple dimensions, demanding linguistic understanding and domain knowledge. We show that LLMs can uncover latent intent behind these expressions in e-commerce queries through a framework that extracts structured interpretations or hints. Our approach decomposes queries into attribute-value hints generated concurrently with retrieval, enabling efficient integration into the ranking pipeline. Our method improves search performanc eby 10.9 points in MAP and ranking by 5.9 points in MRR over baselines. Since direct LLM-based reranking faces prohibitive latency, we develop an efficient approach transferring superlative interpretations to lightweight models. Our findings provide insights into how superlative semantics can be represented and transferred between models, advancing linguistic interpretation in retrieval systems while addressing practical deployment constraints.

AIF: Asynchronous Inference Framework for Cost-Effective Pre-Ranking

Alibaba

Why we think this paper is great for you:
This paper on an asynchronous inference framework for cost-effective pre-ranking in industrial recommendation systems is a great match. It combines deep neural networks with retrieval and ranking strategies, which aligns well with your interests.

Rate paper: 👍 👎 ♥ Save

Abstract
In industrial recommendation systems, pre-ranking models based on deep neural networks (DNNs) commonly adopt a sequential execution framework: feature fetching and model forward computation are triggered only after receiving candidates from the upstream retrieval stage. This design introduces inherent bottlenecks, including redundant computations of identical users/items and increased latency due to strictly sequential operations, which jointly constrain the model's capacity and system efficiency. To address these limitations, we propose the Asynchronous Inference Framework (AIF), a cost-effective computational architecture that decouples interaction-independent components, those operating within a single user or item, from real-time prediction. AIF reorganizes the model inference process by performing user-side computations in parallel with the retrieval stage and conducting item-side computations in a nearline manner. This means that interaction-independent components are calculated just once and completed before the real-time prediction phase of the pre-ranking stage. As a result, AIF enhances computational efficiency and reduces latency, freeing up resources to significantly improve the feature set and model architecture of interaction-independent components. Moreover, we delve into model design within the AIF framework, employing approximated methods for interaction-dependent components in online real-time predictions. By co-designing both the framework and the model, our solution achieves notable performance gains without significantly increasing computational and latency costs. This has enabled the successful deployment of AIF in the Taobao display advertising system.

Unveiling Inference Scaling for Difference-Aware User Modeling in LLM Personalization

University of Science and

Why we think this paper is great for you:
This work on user modeling and LLM personalization directly speaks to your interest in personalization and deep learning. It explores how to achieve personalized outputs by considering inter-user differences.

Rate paper: 👍 👎 ♥ Save

Abstract
Large Language Models (LLMs) are increasingly integrated into users' daily lives, driving a growing demand for personalized outputs. Prior work has primarily leveraged a user's own history, often overlooking inter-user differences that are critical for effective personalization. While recent methods have attempted to model such differences, their feature extraction processes typically rely on fixed dimensions and quick, intuitive inference (System-1 thinking), limiting both the coverage and granularity of captured user differences. To address these limitations, we propose Difference-aware Reasoning Personalization (DRP), a framework that reconstructs the difference extraction mechanism by leveraging inference scaling to enhance LLM personalization. DRP autonomously identifies relevant difference feature dimensions and generates structured definitions and descriptions, enabling slow, deliberate reasoning (System-2 thinking) over user differences. Experiments on personalized review generation demonstrate that DRP consistently outperforms baseline methods across multiple metrics.

Incorporating Token Importance in Multi-Vector Retrieval

Microsoft Research

Why we think this paper is great for you:
This paper delves into multi-vector retrieval with token importance, using BERT for fine-grained interactions and efficient scoring. It's a strong fit for your interest in information retrieval and deep learning techniques.

Rate paper: 👍 👎 ♥ Save

Abstract
ColBERT introduced a late interaction mechanism that independently encodes queries and documents using BERT, and computes similarity via fine-grained interactions over token-level vector representations. This design enables expressive matching while allowing efficient computation of scores, as the multi-vector document representations could be pre-computed offline. ColBERT models distance using a Chamfer-style function: for each query token, it selects the closest document token and sums these distances across all query tokens. In our work, we explore enhancements to the Chamfer distance function by computing a weighted sum over query token contributions, where weights reflect the token importance. Empirically, we show that this simple extension, requiring only token-weight training while keeping the multi-vector representations fixed, further enhances the expressiveness of late interaction multi-vector mechanism. In particular, on the BEIR benchmark, our method achieves an average improvement of 1.28\% in Recall@10 in the zero-shot setting using IDF-based weights, and 3.66\% through few-shot fine-tuning.

Exploring Multi-Table Retrieval Through Iterative Search

Sorbonne Universit

Why we think this paper is great for you:
You'll find this paper on multi-table retrieval through iterative search highly relevant, as it addresses challenging information retrieval tasks over complex data sources. It focuses on semantic relevance and structural coherence in search.

Rate paper: 👍 👎 ♥ Save

Abstract
Open-domain question answering over datalakes requires retrieving and composing information from multiple tables, a challenging subtask that demands semantic relevance and structural coherence (e.g., joinability). While exact optimization methods like Mixed-Integer Programming (MIP) can ensure coherence, their computational complexity is often prohibitive. Conversely, simpler greedy heuristics that optimize for query coverage alone often fail to find these coherent, joinable sets. This paper frames multi-table retrieval as an iterative search process, arguing this approach offers advantages in scalability, interpretability, and flexibility. We propose a general framework and a concrete instantiation: a fast, effective Greedy Join-Aware Retrieval algorithm that holistically balances relevance, coverage, and joinability. Experiments across 5 NL2SQL benchmarks demonstrate that our iterative method achieves competitive retrieval performance compared to the MIP-based approach while being 4-400x faster depending on the benchmark and search space settings. This work highlights the potential of iterative heuristics for practical, scalable, and composition-aware retrieval.

Local Collaborative Filtering: A Collaborative Filtering Method that Utilizes Local Similarities among Users

Wuhan University

Why we think this paper is great for you:
This paper introduces a novel collaborative filtering method for recommender systems, utilizing local similarities among users. It offers valuable insights into personalization and leveraging user behavior data.

Rate paper: 👍 👎 ♥ Save

Abstract
To leverage user behavior data from the Internet more effectively in recommender systems, this paper proposes a novel collaborative filtering (CF) method called Local Collaborative Filtering (LCF). LCF utilizes local similarities among users and integrates their data using the law of large numbers (LLN), thereby improving the utilization of user behavior data. Experiments are conducted on the Steam game dataset, and the results of LCF align with real-world needs.

Help us improve your experience!