DiffuGR: Generative Document Retrieval with Diffusion Language Models

Why we think this paper is great for you:
This paper explores a novel approach to document retrieval using advanced deep learning models, directly aligning with your interest in cutting-edge retrieval techniques. It offers insights into how generative models can enhance information access.

Rate paper: 👍 👎 ♥ Save

Abstract
Generative retrieval (GR) re-frames document retrieval as a sequence-based document identifier (DocID) generation task, memorizing documents with model parameters and enabling end-to-end retrieval without explicit indexing. Existing GR methods are based on auto-regressive generative models, i.e., the token generation is performed from left to right. However, such auto-regressive methods suffer from: (1) mismatch between DocID generation and natural language generation, e.g., an incorrect DocID token generated in early left steps would lead to totally erroneous retrieval; and (2) failure to balance the trade-off between retrieval efficiency and accuracy dynamically, which is crucial for practical applications. To address these limitations, we propose generative document retrieval with diffusion language models, dubbed DiffuGR. It models DocID generation as a discrete diffusion process: during training, DocIDs are corrupted through a stochastic masking process, and a diffusion language model is learned to recover them under a retrieval-aware objective. For inference, DiffuGR attempts to generate DocID tokens in parallel and refines them through a controllable number of denoising steps. In contrast to conventional left-to-right auto-regressive decoding, DiffuGR provides a novel mechanism to first generate more confident DocID tokens and refine the generation through diffusion-based denoising. Moreover, DiffuGR also offers explicit runtime control over the qualitylatency tradeoff. Extensive experiments on benchmark retrieval datasets show that DiffuGR is competitive with strong auto-regressive generative retrievers, while offering flexible speed and accuracy tradeoffs through variable denoising budgets. Overall, our results indicate that non-autoregressive diffusion models are a practical and effective alternative for generative document retrieval.

AI Summary

DiffuGR introduces a novel generative retrieval paradigm by modeling DocID generation as a discrete diffusion process, enabling non-autoregressive, parallel token generation and iterative refinement. [3]
The diffusion-based approach allows for explicit runtime control over the quality-latency trade-off in document retrieval by dynamically adjusting the number of denoising steps. [3]
DiffuGR achieves competitive retrieval performance, particularly in Recall@1 and MRR@10, against strong auto-regressive generative retrievers on benchmark datasets like NQ320K and MS MARCO. [3]
Confidence-guided denoising strategies (e.g., Maskgit plus, Top-k margin, Entropy) significantly outperform random denoising, highlighting the importance of token generation scheduling for improving DiffuGR's accuracy and stability. [3]
DiffuGR proposes 'pseudo beam search' strategies, including query augmentation and leveraging intermediate denoising states, to approximate top-k generation in a non-autoregressive framework, partially mitigating the lack of native beam search. [3]
Generative Retrieval (GR): A paradigm that re-frames document retrieval as a sequence-based document identifier (DocID) generation task, where a generative model directly produces DocIDs without explicit indexing. [3]
DocID (Document Identifier): A unique sequence of tokens assigned to each document, which a generative sequence-to-sequence model learns to produce in response to a query. [3]
The paper demonstrates that diffusion language models, even at comparable scales, can achieve better retrieval accuracy for generative retrieval tasks than state-of-the-art auto-regressive LLMs like Qwen2.5-7B. [2]
Linguistic DocIDs generally yield better performance than learnable DocIDs for DiffuGR, suggesting that directly leveraging existing semantic spaces is more effective than requiring vocabulary expansion and semantic remapping. [2]
Discrete Diffusion Process (for DocID generation): A process where DocIDs are corrupted via stochastic masking during training, and a diffusion language model learns to recover them; during inference, DocID tokens are generated in parallel and refined through a controllable number of denoising steps. [1]

The Value of Personalized Recommendations: Evidence from Netflix

Why we think this paper is great for you:
This paper directly investigates the impact and value of personalized recommendation systems, which is highly relevant to your focus on tailoring experiences for users. You will find its analysis of recommendation-induced utility particularly insightful.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Personalized recommendation systems shape much of user choice online, yet their targeted nature makes separating out the value of recommendation and the underlying goods challenging. We build a discrete choice model that embeds recommendation-induced utility, low-rank heterogeneity, and flexible state dependence and apply the model to viewership data at Netflix. We exploit idiosyncratic variation introduced by the recommendation algorithm to identify and separately value these components as well as to recover model-free diversion ratios that we can use to validate our structural model. We use the model to evaluate counterfactuals that quantify the incremental engagement generated by personalized recommendations. First, we show that replacing the current recommender system with a matrix factorization or popularity-based algorithm would lead to 4% and 12% reduction in engagement, respectively, and decreased consumption diversity. Second, most of the consumption increase from recommendations comes from effective targeting, not mechanical exposure, with the largest gains for mid-popularity goods (as opposed to broadly appealing or very niche goods).

Differentially Private Rankings via Outranking Methods and Performance Data Aggregation

Why we think this paper is great for you:
This work combines ranking with privacy considerations in dynamic data-driven domains like recommender systems, aligning well with your interests in both areas. It provides a practical perspective on building robust and ethical ranking solutions.

Rate paper: 👍 👎 ♥ Save

Abstract
Multiple-Criteria Decision Making (MCDM) is a sub-discipline of Operations Research that helps decision-makers in choosing, ranking, or sorting alternatives based on conflicting criteria. Over time, its application has been expanded into dynamic and data-driven domains, such as recommender systems. In these contexts, the availability and handling of personal and sensitive data can play a critical role in the decision-making process. Despite this increased reliance on sensitive data, the integration of privacy mechanisms with MCDM methods is underdeveloped. This paper introduces an integrated approach that combines MCDM outranking methods with Differential Privacy (DP), safeguarding individual contributions' privacy in ranking problems. This approach relies on a pre-processing step to aggregate multiple user evaluations into a comprehensive performance matrix. The evaluation results show a strong to very strong statistical correlation between the true rankings and their anonymized counterparts, ensuring robust privacy parameter guarantees.

Local Hybrid Retrieval-Augmented Document QA

Why we think this paper is great for you:
This paper focuses on practical retrieval systems for document question-answering, which is highly relevant to your interest in finding information efficiently. It addresses real-world challenges in deploying secure and accurate QA systems.

Rate paper: 👍 👎 ♥ Save

Abstract
Organizations handling sensitive documents face a critical dilemma: adopt cloud-based AI systems that offer powerful question-answering capabilities but compromise data privacy, or maintain local processing that ensures security but delivers poor accuracy. We present a question-answering system that resolves this trade-off by combining semantic understanding with keyword precision, operating entirely on local infrastructure without internet access. Our approach demonstrates that organizations can achieve competitive accuracy on complex queries across legal, scientific, and conversational documents while keeping all data on their machines. By balancing two complementary retrieval strategies and using consumer-grade hardware acceleration, the system delivers reliable answers with minimal errors, letting banks, hospitals, and law firms adopt conversational document AI without transmitting proprietary information to external providers. This work establishes that privacy and performance need not be mutually exclusive in enterprise AI deployment.

LiteraryTaste: A Preference Dataset for Creative Writing Personalization

Why we think this paper is great for you:
This work addresses personalization by focusing on user preferences in creative writing, offering insights into how deep learning models can adapt to individual tastes. It provides a unique dataset for exploring personalized content generation.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
People have different creative writing preferences, and large language models (LLMs) for these tasks can benefit from adapting to each user's preferences. However, these models are often trained over a dataset that considers varying personal tastes as a monolith. To facilitate developing personalized creative writing LLMs, we introduce LiteraryTaste, a dataset of reading preferences from 60 people, where each person: 1) self-reported their reading habits and tastes (stated preference), and 2) annotated their preferences over 100 pairs of short creative writing texts (revealed preference). With our dataset, we found that: 1) people diverge on creative writing preferences, 2) finetuning a transformer encoder could achieve 75.8% and 67.7% accuracy when modeling personal and collective revealed preferences, and 3) stated preferences had limited utility in modeling revealed preferences. With an LLM-driven interpretability pipeline, we analyzed how people's preferences vary. We hope our work serves as a cornerstone for personalizing creative writing technologies.

A Simple Analysis of Ranking in General Graphs

Northeastern University

Why we think this paper is great for you:
This paper provides a fundamental analysis of ranking algorithms in general graphs, which is a core concept within your area of interest. It offers a theoretical foundation for understanding various ranking methodologies.

Rate paper: 👍 👎 ♥ Save

Abstract
We provide a simple combinatorial analysis of the Ranking algorithm, originally introduced in the seminal work by Karp, Vazirani, and Vazirani [KVV90], demonstrating that it achieves a $(1/2 + c)$-approximate matching for general graphs for $c \geq 0.005$.

Torch-Uncertainty: A Deep Learning Framework for Uncertainty Quantification

ENSTA Paris

Why we think this paper is great for you:
While focusing on deep learning, this paper's emphasis on uncertainty quantification can be valuable for building more reliable and trustworthy systems in your domain. Understanding prediction confidence is crucial for robust applications.

Rate paper: 👍 👎 ♥ Save

Abstract
Deep Neural Networks (DNNs) have demonstrated remarkable performance across various domains, including computer vision and natural language processing. However, they often struggle to accurately quantify the uncertainty of their predictions, limiting their broader adoption in critical real-world applications. Uncertainty Quantification (UQ) for Deep Learning seeks to address this challenge by providing methods to improve the reliability of uncertainty estimates. Although numerous techniques have been proposed, a unified tool offering a seamless workflow to evaluate and integrate these methods remains lacking. To bridge this gap, we introduce Torch-Uncertainty, a PyTorch and Lightning-based framework designed to streamline DNN training and evaluation with UQ techniques and metrics. In this paper, we outline the foundational principles of our library and present comprehensive experimental results that benchmark a diverse set of UQ methods across classification, segmentation, and regression tasks. Our library is available at https://github.com/ENSTA-U2IS-AI/Torch-Uncertainty

Interests not found

Help us improve your experience!