🎯 Top Personalized Recommendations
University of Illinois at
Why we think this paper is great for you:
This paper directly addresses advancements in reranking algorithms for document retrieval, which is highly relevant to improving search quality. You will find its exploration of LLM-based approaches for reasoning-intensive queries particularly insightful.
Abstract
Reranking algorithms have made progress in improving document retrieval
quality by efficiently aggregating relevance judgments generated by large
language models (LLMs). However, identifying relevant documents for queries
that require in-depth reasoning remains a major challenge. Reasoning-intensive
queries often exhibit multifaceted information needs and nuanced
interpretations, rendering document relevance inherently context dependent. To
address this, we propose contextual relevance, which we define as the
probability that a document is relevant to a given query, marginalized over the
distribution of different reranking contexts it may appear in (i.e., the set of
candidate documents it is ranked alongside and the order in which the documents
are presented to a reranking model). While prior works have studied methods to
mitigate the positional bias LLMs exhibit by accounting for the ordering of
documents, we empirically find that the compositions of these batches also
plays an important role in reranking performance. To efficiently estimate
contextual relevance, we propose TS-SetRank, a sampling-based,
uncertainty-aware reranking algorithm. Empirically, TS-SetRank improves nDCG@10
over retrieval and reranking baselines by 15-25% on BRIGHT and 6-21% on BEIR,
highlighting the importance of modeling relevance as context-dependent.
AI Summary - The proposed "contextual relevance" framework models document relevance as a probability marginalized over diverse reranking contexts, challenging the traditional deterministic and context-independent assumptions. [3]
- TS-SetRank, a two-phase Bayesian reranking algorithm combining uniform and adaptive Thompson sampling, efficiently estimates contextual relevance and significantly improves nDCG@10 (15-25% on BRIGHT, 6-21% on BEIR) over baselines. [3]
- Deterministic reranking algorithms like Heapify underperform due to their reliance on potentially noisy pairwise comparisons, highlighting the need for methods that aggregate judgments across diverse contexts. [3]
- Uniform sampling, while effective in the long run by averaging judgments, exhibits diminishing returns and converges after approximately 300 inference calls, suggesting a practical limit to non-adaptive exploration. [3]
- TS-SetRank (Thompson Sampling for Setwise Reranking): A two-phase Bayesian reranking algorithm that first samples document batches uniformly to collect unbiased relevance feedback and then adaptively constructs batches using Thompson sampling to efficiently estimate contextual relevance. [3]
- Document relevance for reasoning-intensive queries is context-dependent, influenced by both the composition and ordering of documents within an LLM processing batch. [2]
- Empirical analysis reveals that contextual factors (primarily document order within a batch) account for a substantial portion (25-45%) of the variability in LLM-based relevance judgments, beyond intrinsic model stochasticity. [2]
- TS-SetRank demonstrates superior performance under smaller inference budgets compared to uniform sampling, indicating its effectiveness in adaptively allocating resources to promising candidates earlier. [2]
- Contextual Relevance: The probability that a document is judged relevant to a given query, marginalized over the distribution of different reranking contexts it may appear in (i.e., the set of candidate documents it is ranked alongside and the order in which the documents are presented to a reranking model). [2]
- Setwise Prompting Approach: An LLM-based reranking method where smaller subsets or batches of documents are presented to an LLM, which generates per-document binary relevance judgments, subsequently aggregated to form final rankings. [2]
- Formally, θi,q = E[Pr(di is judged relevant | q, S)] where S is a batch. [1]
Taras Shevchenko National
Why we think this paper is great for you:
This work delves into critical evaluation metrics like Mean Average Precision, essential for assessing the quality of ranking algorithms in information retrieval and recommender systems. It offers valuable insights into the foundations of effective system assessment.
Abstract
Recommender systems and information retrieval platforms rely on ranking
algorithms to present the most relevant items to users, thereby improving
engagement and satisfaction. Assessing the quality of these rankings requires
reliable evaluation metrics. Among them, Mean Average Precision at cutoff k
(MAP@k) is widely used, as it accounts for both the relevance of items and
their positions in the list.
In this paper, the expectation and variance of Average Precision at k (AP@k)
are derived since they can be used as biselines for MAP@k. Here, we covered two
widely used evaluation models: offline and online. The expectation establishes
the baseline, indicating the level of MAP@k that can be achieved by pure
chance. The variance complements this baseline by quantifying the extent of
random fluctuations, enabling a more reliable interpretation of observed
scores.
TOBB University of Econom
Why we think this paper is great for you:
This framework for optimizing Retrieval-Augmented Generation methods, encompassing retrieval and ranking, offers a comprehensive approach to building robust information systems. You'll appreciate its focus on end-to-end architecture search for better performance.
Abstract
Retrieval-Augmented Generation (RAG) quality depends on many interacting
choices across retrieval, ranking, augmentation, prompting, and generation, so
optimizing modules in isolation is brittle. We introduce RAGSmith, a modular
framework that treats RAG design as an end-to-end architecture search over nine
technique families and 46{,}080 feasible pipeline configurations. A genetic
search optimizes a scalar objective that jointly aggregates retrieval metrics
(recall@k, mAP, nDCG, MRR) and generation metrics (LLM-Judge and semantic
similarity). We evaluate on six Wikipedia-derived domains (Mathematics, Law,
Finance, Medicine, Defense Industry, Computer Science), each with 100 questions
spanning factual, interpretation, and long-answer types. RAGSmith finds
configurations that consistently outperform naive RAG baseline by +3.8\% on
average (range +1.2\% to +6.9\% across domains), with gains up to +12.5\% in
retrieval and +7.5\% in generation. The search typically explores $\approx
0.2\%$ of the space ($\sim 100$ candidates) and discovers a robust backbone --
vector retrieval plus post-generation reflection/revision -- augmented by
domain-dependent choices in expansion, reranking, augmentation, and prompt
reordering; passage compression is never selected. Improvement magnitude
correlates with question type, with larger gains on factual/long-answer mixes
than interpretation-heavy sets. These results provide practical, domain-aware
guidance for assembling effective RAG systems and demonstrate the utility of
evolutionary search for full-pipeline optimization.
Brown University
Why we think this paper is great for you:
As a flexible toolkit for dense retrieval, this paper provides practical tools and efficient data management features crucial for conducting research experiments in information retrieval. It simplifies the process of exploring advanced retrieval techniques.
Abstract
We introduce Trove, an easy-to-use open-source retrieval toolkit that
simplifies research experiments without sacrificing flexibility or speed. For
the first time, we introduce efficient data management features that load and
process (filter, select, transform, and combine) retrieval datasets on the fly,
with just a few lines of code. This gives users the flexibility to easily
experiment with different dataset configurations without the need to compute
and store multiple copies of large datasets. Trove is highly customizable: in
addition to many built-in options, it allows users to freely modify existing
components or replace them entirely with user-defined objects. It also provides
a low-code and unified pipeline for evaluation and hard negative mining, which
supports multi-node execution without any code changes. Trove's data management
features reduce memory consumption by a factor of 2.6. Moreover, Trove's
easy-to-use inference pipeline incurs no overhead, and inference times decrease
linearly with the number of available nodes. Most importantly, we demonstrate
how Trove simplifies retrieval experiments and allows for arbitrary
customizations, thus facilitating exploratory research.
Johns Hopkins University
Why we think this paper is great for you:
This research investigates mobile personalization by simulating user personas, offering a deeper understanding of how personalized experiences are delivered. It directly aligns with your focus on tailoring content and services to individuals.
Abstract
Mobile applications increasingly rely on sensor data to infer user context
and deliver personalized experiences. Yet the mechanisms behind this
personalization remain opaque to users and researchers alike. This paper
presents a sandbox system that uses sensor spoofing and persona simulation to
audit and visualize how mobile apps respond to inferred behaviors. Rather than
treating spoofing as adversarial, we demonstrate its use as a tool for
behavioral transparency and user empowerment. Our system injects multi-sensor
profiles - generated from structured, lifestyle-based personas - into Android
devices in real time, enabling users to observe app responses to contexts such
as high activity, location shifts, or time-of-day changes. With automated
screenshot capture and GPT-4 Vision-based UI summarization, our pipeline helps
document subtle personalization cues. Preliminary findings show measurable app
adaptations across fitness, e-commerce, and everyday service apps such as
weather and navigation. We offer this toolkit as a foundation for
privacy-enhancing technologies and user-facing transparency interventions.
Johns Hopkins University
Why we think this paper is great for you:
This paper explores personalized decision-making models, highlighting the unique processes that shape individual choices. Its insights into utility optimization and textualized reasoning will be valuable for understanding personalization.
Abstract
Decision-making models for individuals, particularly in high-stakes scenarios
like vaccine uptake, often diverge from population optimal predictions. This
gap arises from the uniqueness of the individual decision-making process,
shaped by numerical attributes (e.g., cost, time) and linguistic influences
(e.g., personal preferences and constraints). Developing upon Utility Theory
and leveraging the textual-reasoning capabilities of Large Language Models
(LLMs), this paper proposes an Adaptive Textual-symbolic Human-centric
Reasoning framework (ATHENA) to address the optimal information integration.
ATHENA uniquely integrates two stages: First, it discovers robust, group-level
symbolic utility functions via LLM-augmented symbolic discovery; Second, it
implements individual-level semantic adaptation, creating personalized semantic
templates guided by the optimal utility to model personalized choices.
Validated on real-world travel mode and vaccine choice tasks, ATHENA
consistently outperforms utility-based, machine learning, and other LLM-based
models, lifting F1 score by at least 6.5% over the strongest cutting-edge
models. Further, ablation studies confirm that both stages of ATHENA are
critical and complementary, as removing either clearly degrades overall
predictive performance. By organically integrating symbolic utility modeling
and semantic adaptation, ATHENA provides a new scheme for modeling
human-centric decisions. The project page can be found at
https://yibozh.github.io/Athena.
City St Georges, Univer
Why we think this paper is great for you:
This paper explores the foundational semantics of deep learning, a core technology underpinning many advanced information retrieval and personalization systems. It offers a deeper theoretical perspective on the AI methods you utilize.
Abstract
Artificial Intelligence (AI) is a powerful new language of science as
evidenced by recent Nobel Prizes in chemistry and physics that recognized
contributions to AI applied to those areas. Yet, this new language lacks
semantics, which makes AI's scientific discoveries unsatisfactory at best. With
the purpose of uncovering new facts but also improving our understanding of the
world, AI-based science requires formalization through a framework capable of
translating insight into comprehensible scientific knowledge. In this paper, we
argue that logic offers an adequate framework. In particular, we use logic in a
neurosymbolic framework to offer a much needed semantics for deep learning, the
neural network-based technology of current AI. Deep learning and neurosymbolic
AI lack a general set of conditions to ensure that desirable properties are
satisfied. Instead, there is a plethora of encoding and knowledge extraction
approaches designed for particular cases. To rectify this, we introduced a
framework for semantic encoding, making explicit the mapping between neural
networks and logic, and characterizing the common ingredients of the various
existing approaches. In this paper, we describe succinctly and exemplify how
logical semantics and neural networks are linked through this framework, we
review some of the most prominent approaches and techniques developed for
neural encoding and knowledge extraction, provide a formal definition of our
framework, and discuss some of the difficulties of identifying a semantic
encoding in practice in light of analogous problems in the philosophy of mind.