Hi!

Your personalized paper recommendations for 05 to 09 January, 2026.

Adaptive Retrieval for Reasoning-Intensive Retrieval

Seoul National University

Rate paper: 👍 👎 ♥ Save

Abstract
We study leveraging adaptive retrieval to ensure sufficient "bridge" documents are retrieved for reasoning-intensive retrieval. Bridge documents are those that contribute to the reasoning process yet are not directly relevant to the initial query. While existing reasoning-based reranker pipelines attempt to surface these documents in ranking, they suffer from bounded recall. Naive solution with adaptive retrieval into these pipelines often leads to planning error propagation. To address this, we propose REPAIR, a framework that bridges this gap by repurposing reasoning plans as dense feedback signals for adaptive retrieval. Our key distinction is enabling mid-course correction during reranking through selective adaptive retrieval, retrieving documents that support the pivotal plan. Experimental results on reasoning-intensive retrieval and complex QA tasks demonstrate that our method outperforms existing baselines by 5.6%pt.

Why we are recommending this paper?
Due to your Interest in Information Retrieval

This paper directly addresses the challenge of retrieval in reasoning tasks, a core interest for this user. The focus on 'bridge' documents for reasoning aligns with the need to identify relevant supporting information effectively.

Re-Rankers as Relevance Judges

University of Edinburgh

Rate paper: 👍 👎 ♥ Save

Abstract
Using large language models (LLMs) to predict relevance judgments has shown promising results. Most studies treat this task as a distinct research line, e.g., focusing on prompt design for predicting relevance labels given a query and passage. However, predicting relevance judgments is essentially a form of relevance prediction, a problem extensively studied in tasks such as re-ranking. Despite this potential overlap, little research has explored reusing or adapting established re-ranking methods to predict relevance judgments, leading to potential resource waste and redundant development. To bridge this gap, we reproduce re-rankers in a re-ranker-as-relevance-judge setup. We design two adaptation strategies: (i) using binary tokens (e.g., "true" and "false") generated by a re-ranker as direct judgments, and (ii) converting continuous re-ranking scores into binary labels via thresholding. We perform extensive experiments on TREC-DL 2019 to 2023 with 8 re-rankers from 3 families, ranging from 220M to 32B, and analyse the evaluation bias exhibited by re-ranker-based judges. Results show that re-ranker-based relevance judges, under both strategies, can outperform UMBRELA, a state-of-the-art LLM-based relevance judge, in around 40% to 50% of the cases; they also exhibit strong self-preference towards their own and same-family re-rankers, as well as cross-family bias.

Why we are recommending this paper?
Due to your Interest in Ranking

Using LLMs to judge relevance is a key area of exploration in personalization and information retrieval. This research offers a valuable approach to improving ranking systems, directly addressing the user's interests.

Text as a Universal Interface for Transferable Personalization

Northeastern University

Rate paper: 👍 👎 ♥ Save

Abstract
We study the problem of personalization in large language models (LLMs). Prior work predominantly represents user preferences as implicit, model-specific vectors or parameters, yielding opaque ``black-box'' profiles that are difficult to interpret and transfer across models and tasks. In contrast, we advocate natural language as a universal, model- and task-agnostic interface for preference representation. The formulation leads to interpretable and reusable preference descriptions, while naturally supporting continual evolution as new interactions are observed. To learn such representations, we introduce a two-stage training framework that combines supervised fine-tuning on high-quality synthesized data with reinforcement learning to optimize long-term utility and cross-task transferability. Based on this framework, we develop AlignXplore+, a universal preference reasoning model that generates textual preference summaries. Experiments on nine benchmarks show that our 8B model achieves state-of-the-art performanc -- outperforming substantially larger open-source models -- while exhibiting strong transferability across tasks, model families, and interaction formats.

Why we are recommending this paper?
Due to your Interest in Personalization

The paper's exploration of LLM personalization and the concept of transferable profiles is highly relevant. This work tackles the core problem of adapting models to user preferences, aligning with the user's interest in personalization.

SmartSearch: Process Reward-Guided Query Refinement for Search Agents

Renmin University of China

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

SmartSearch is a framework designed to optimize the quality of intermediate search queries through two key mechanisms: Process rewards and Query refinement. [3]
The framework uses a three-stage curriculum learning framework that guides the agent through a progression from imitation and alignment to generalization. [3]
Experiments across four challenging benchmarks demonstrate that SmartSearch consistently surpasses existing baselines, with further quantitative analyses confirming significant gains in both search efficiency and query quality. [3]
Process rewards: Fine-grained supervision for the quality of each query through Dual-Level Assessment. [3]
Query refinement: Promoting the optimization of query generation by selectively refining low-quality queries and regenerating subsequent search rounds from these refined points. [3]
The paper assumes that the teacher model is accurate, which may not always be the case in real-world scenarios. [3]
The framework relies on a three-stage curriculum learning framework, which may not be suitable for all applications or domains. [3]
SmartSearch is a robust framework that can effectively optimize intermediate search queries, achieving significant gains in both search efficiency and query quality. [2]

Abstract
Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems by incorporating information retrieval capabilities. Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked. As a result, the generated queries often remain inaccurate, leading to unexpected retrieval results and ultimately limiting search agents' overall effectiveness. To mitigate this issue, we introduce SmartSearch, a framework built upon two key mechanisms: (1) Process rewards, which provide fine-grained supervision for the quality of each intermediate search query through Dual-Level Credit Assessment. (2) Query refinement, which promotes the optimization of query generation by selectively refining low-quality search queries and regenerating subsequent search rounds based on these refinements. To enable the search agent to progressively internalize the ability to improve query quality under the guidance of process rewards, we design a three-stage curriculum learning framework. This framework guides the agent through a progression from imitation, to alignment, and ultimately to generalization. Experimental results show that SmartSearch consistently surpasses existing baselines, and additional quantitative analyses further confirm its significant gains in both search efficiency and query quality. The code is available at https://github.com/MYVAE/SmartSearch.

Why we are recommending this paper?
Due to your Interest in Search

This research focuses on LLM-based search agents, a rapidly developing area within information retrieval. The query refinement aspect directly relates to improving search quality and relevance, a key interest for the user.

Efficient Sequential Recommendation for Long Term User Interest Via Personalization

Meta

Rate paper: 👍 👎 ♥ Save

Abstract
Recent years have witnessed success of sequential modeling, generative recommender, and large language model for recommendation. Though the scaling law has been validated for sequential models, it showed inefficiency in computational capacity when considering real-world applications like recommendation, due to the non-linear(quadratic) increasing nature of the transformer model. To improve the efficiency of the sequential model, we introduced a novel approach to sequential recommendation that leverages personalization techniques to enhance efficiency and performance. Our method compresses long user interaction histories into learnable tokens, which are then combined with recent interactions to generate recommendations. This approach significantly reduces computational costs while maintaining high recommendation accuracy. Our method could be applied to existing transformer based recommendation models, e.g., HSTU and HLLM. Extensive experiments on multiple sequential models demonstrate its versatility and effectiveness. Source code is available at \href{https://github.com/facebookresearch/PerSRec}{https://github.com/facebookresearch/PerSRec}.

Why we are recommending this paper?
Due to your Interest in Personalization

The paper's investigation into sequential modeling and LLMs for recommendation is a strong fit. This work addresses the challenge of long-term user interest, a critical component of personalization.

On the Limitations of Rank-One Model Editing in Answering Multi-hop Questions

University College London

Rate paper: 👍 👎 ♥ Save

Abstract
Recent advances in Knowledge Editing (KE), particularly Rank-One Model Editing (ROME), show superior efficiency over fine-tuning and in-context learning for updating single-hop facts in transformers. However, these methods face significant challenges when applied to multi-hop reasoning tasks requiring knowledge chaining. In this work, we study the effect of editing knowledge with ROME on different layer depths and identify three key failure modes. First, the "hopping-too-late" problem occurs as later layers lack access to necessary intermediate representations. Second, generalization ability deteriorates sharply when editing later layers. Third, the model overfits to edited knowledge, incorrectly prioritizing edited-hop answers regardless of context. To mitigate the issues of "hopping-too-late" and generalisation decay, we propose Redundant Editing, a simple yet effective strategy that enhances multi-hop reasoning. Our experiments demonstrate that this approach can improve accuracy on 2-hop questions by at least 15.5 percentage points, representing a 96% increase over the previous single-edit strategy, while trading off some specificity and language naturalness.

Why we are recommending this paper?
Due to your Interest in Ranking

Dynamics in Search Engine Query Suggestions for European Politicians

Technical University of Munich

Rate paper: 👍 👎 ♥ Save

Abstract
Search engines are commonly used for online political information seeking. Yet, it remains unclear how search query suggestions for political searches that reflect the latent interest of internet users vary across countries and over time. We provide a systematic analysis of Google search engine query suggestions for European and national politicians. Using an original dataset of search query suggestions for European politicians collected in ten countries, we find that query suggestions are less stable over time in politicians' countries of origin, when the politicians hold a supranational role, and for female politicians. Moreover, query suggestions for political leaders and male politicians are more similar across countries. We conclude by discussing possible future directions for studying information search about European politicians in online search.

Why we are recommending this paper?
Due to your Interest in Search

Stochastic Deep Learning: A Probabilistic Framework for Modeling Uncertainty in Structured Temporal Data

University of Essex

Rate paper: 👍 👎 ♥ Save

Abstract
I propose a novel framework that integrates stochastic differential equations (SDEs) with deep generative models to improve uncertainty quantification in machine learning applications involving structured and temporal data. This approach, termed Stochastic Latent Differential Inference (SLDI), embeds an Itô SDE in the latent space of a variational autoencoder, allowing for flexible, continuous-time modeling of uncertainty while preserving a principled mathematical foundation. The drift and diffusion terms of the SDE are parameterized by neural networks, enabling data-driven inference and generalizing classical time series models to handle irregular sampling and complex dynamic structure. A central theoretical contribution is the co-parameterization of the adjoint state with a dedicated neural network, forming a coupled forward-backward system that captures not only latent evolution but also gradient dynamics. I introduce a pathwise-regularized adjoint loss and analyze variance-reduced gradient flows through the lens of stochastic calculus, offering new tools for improving training stability in deep latent SDEs. My paper unifies and extends variational inference, continuous-time generative modeling, and control-theoretic optimization, providing a rigorous foundation for future developments in stochastic probabilistic machine learning.

Why we are recommending this paper?
Due to your Interest in Deep Learning

Optimization of Deep Learning Models for Radio Galaxy Classification

Zurich University of Applied Sciences ZHAW

Rate paper: 👍 👎 ♥ Save

Abstract
Modern radio telescope surveys, capable of detecting billions of galaxies in wide-field surveys, have made manual morphological classification impracticable. This applies in particular when the Square Kilometre Array Observatory (SKAO) becomes operable in 2027, which is expected to close an important gap in our understanding of the Epoch of Reionization (EoR) and other areas of astrophysics. To this end, foreground objects, contaminants of the 21-cm signal, need to be identified and subtracted. Source finding and identification is thus an important albeit challenging task. We investigate the ability of AI and deep learning (DL) methods that have been previously trained on other data domains to localize and classify radio galaxies with minimal changes to their architectures. Various well-known pretrained neural network architectures for image classification and object detection are trained and fine-tuned and their performance is evaluated on a public radio galaxy dataset derived from the Radio Galaxy Zoo. A comparison between convolutional neural network (CNN)- and transformer-based algorithms is performed. The best performing architecture is systematically optimized and an uncertainty estimation is performed by means of an ensemble analysis. Radio source classification performance nearly comparable to the current leading customized models can be obtained using existing standard pretrained DL architectures, without modification and increase in complexity of the model architectures but rather adaptation of the data, by combining various transformations on replicated image channels. Using an ensemble of models can also further improve performance to over 90% accuracy, on par with top-performing models in the literature. The results can be transferred to other survey data, e.g. from the Murchison Wide-field Array (MWA), and in the future be used to study the EoR with the SKAO.

Why we are recommending this paper?
Due to your Interest in Deep Learning

The Overlooked Role of Graded Relevance Thresholds in Multilingual Dense Retrieval

OriginAI

Rate paper: 👍 👎 ♥ Save

Abstract
Dense retrieval models are typically fine-tuned with contrastive learning objectives that require binary relevance judgments, even though relevance is inherently graded. We analyze how graded relevance scores and the threshold used to convert them into binary labels affect multilingual dense retrieval. Using a multilingual dataset with LLM-annotated relevance scores, we examine monolingual, multilingual mixture, and cross-lingual retrieval scenarios. Our findings show that the optimal threshold varies systematically across languages and tasks, often reflecting differences in resource level. A well-chosen threshold can improve effectiveness, reduce the amount of fine-tuning data required, and mitigate annotation noise, whereas a poorly chosen one can degrade performance. We argue that graded relevance is a valuable but underutilized signal for dense retrieval, and that threshold calibration should be treated as a principled component of the fine-tuning pipeline.

Why we are recommending this paper?
Due to your Interest in Information Retrieval

💬 Help Shape Our Pricing

We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.

Share Your Feedback

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback