Hi!

Your personalized paper recommendations for 24 to 28 November, 2025.

🎯 Top Personalized Recommendations

What Drives Cross-lingual Ranking? Retrieval Approaches with Multilingual Language Models

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Cross-lingual information retrieval (CLIR) enables access to multilingual knowledge but remains challenging due to disparities in resources, scripts, and weak cross-lingual semantic alignment in embedding models. Existing pipelines often rely on translation and monolingual retrieval heuristics, which add computational overhead and noise, degrading performance. This work systematically evaluates four intervention types, namely document translation, multilingual dense retrieval with pretrained encoders, contrastive learning at word, phrase, and query-document levels, and cross-encoder re-ranking, across three benchmark datasets. We find that dense retrieval models trained specifically for CLIR consistently outperform lexical matching methods and derive little benefit from document translation. Contrastive learning mitigates language biases and yields substantial improvements for encoders with weak initial alignment, and re-ranking can be effective, but depends on the quality of the cross-encoder training data. Although high-resource languages still dominate overall performance, gains over lexical and document-translated baselines are most pronounced for low-resource and cross-script pairs. These findings indicate that cross-lingual search systems should prioritise semantic multilingual embeddings and targeted learning-based alignment over translation-based pipelines, particularly for cross-script and under-resourced languages.

Why we think this paper is great for you:
This paper directly addresses the complexities of cross-lingual information retrieval and ranking, which is highly relevant to your focus on effective search strategies. It explores how multilingual language models can significantly enhance retrieval performance.

STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models

Rate paper: 👍 👎 ♥ Save

Abstract
Ranking models have become an important part of modern personalized recommendation systems. However, significant challenges persist in handling high-cardinality, heterogeneous, and sparse feature spaces, particularly regarding model scalability and efficiency. We identify two key bottlenecks: (i) Representation Bottleneck: Driven by the high cardinality and dynamic nature of features, model capacity is forced into sparse-activated embedding layers, leading to low-rank representations. This, in turn, triggers phenomena like "One-Epoch" and "Interaction-Collapse," ultimately hindering model scalability.(ii) Computational Bottleneck: Integrating all heterogeneous features into a unified model triggers an explosion in the number of feature tokens, rendering traditional attention mechanisms computationally demanding and susceptible to attention dispersion. To dismantle these barriers, we introduce STORE, a unified and scalable token-based ranking framework built upon three core innovations: (1) Semantic Tokenization fundamentally tackles feature heterogeneity and sparsity by decomposing high-cardinality sparse features into a compact set of stable semantic tokens; and (2) Orthogonal Rotation Transformation is employed to rotate the subspace spanned by low-cardinality static features, which facilitates more efficient and effective feature interactions; and (3) Efficient attention that filters low-contributing tokens to improve computional efficiency while preserving model accuracy. Across extensive offline experiments and online A/B tests, our framework consistently improves prediction accuracy(online CTR by 2.71%, AUC by 1.195%) and training effeciency (1.84 throughput).

Why we think this paper is great for you:
This paper tackles crucial challenges in scaling up ranking models for personalized recommendation systems, offering insights into efficiency and handling complex feature spaces. It directly aligns with your interest in building robust and scalable ranking solutions.

Generative Early Stage Ranking

Rate paper: 👍 👎 ♥ Save

AI Summary

Across offline and online experiments, GESR consistently delivered improvements in topline, engagement, and consumption metrics while preserving training and inference efficiencies. [3]
The GESR paradigm has shown promising results in improving ESR performance, demonstrating its potential to reshape the design practices within large-scale recommendation systems. [3]
The Generative Early Stage Ranking (GESR) paradigm addresses the gap between effectiveness and efficiency in industry Early Stage Ranking models. [2]
Early Stage Ranking (ESR): A type of recommendation system that predicts user behavior at the early stages of interaction. [1]

Abstract
Large-scale recommendations commonly adopt a multi-stage cascading ranking system paradigm to balance effectiveness and efficiency. Early Stage Ranking (ESR) systems utilize the "user-item decoupling" approach, where independently learned user and item representations are only combined at the final layer. While efficient, this design is limited in effectiveness, as it struggles to capture fine-grained user-item affinities and cross-signals. To address these, we propose the Generative Early Stage Ranking (GESR) paradigm, introducing the Mixture of Attention (MoA) module which leverages diverse attention mechanisms to bridge the effectiveness gap: the Hard Matching Attention (HMA) module encodes explicit cross-signals by computing raw match counts between user and item features; the Target-Aware Self Attention module generates target-aware user representations conditioned on the item, enabling more personalized learning; and the Cross Attention modules facilitate early and more enriched interactions between user-item features. MoA's specialized attention encodings are further refined in the final layer through a Multi-Logit Parameterized Gating (MLPG) module, which integrates the newly learned embeddings via gating and produces secondary logits that are fused with the primary logit. To address the efficiency and latency challenges, we have introduced a comprehensive suite of optimization techniques. These span from custom kernels that maximize the capabilities of the latest hardware to efficient serving solutions powered by caching mechanisms. The proposed GESR paradigm has shown substantial improvements in topline metrics, engagement, and consumption tasks, as validated by both offline and online experiments. To the best of our knowledge, this marks the first successful deployment of full target-aware attention sequence modeling within an ESR stage at such a scale.

Why we think this paper is great for you:
You will find this paper particularly interesting as it delves into generative early stage ranking systems for large-scale recommendations. It provides valuable insights into optimizing the initial stages of ranking paradigms.

Look It Up: Analysing Internal Web Search Capabilities of Modern LLMs

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Summary

The study focuses on analyzing internal web search capabilities of modern LLMs. [2]
The paper evaluates two closed-source frontier models, GPT-5-mini and Claude Haiku 4.5, with native web search capabilities enabled. [1]

Abstract
Modern large language models integrate web search to provide real-time answers, yet it remains unclear whether they are efficiently calibrated to use search when it is actually needed. We introduce a benchmark evaluating both the necessity and effectiveness of web access across commercial models with no access to internal states or parameters. The dataset includes a static split of 783 temporally anchored questions answerable from pre-cutoff knowledge, aimed at testing whether models invoke search based on low internal confidence, and a dynamic split of 288 post-cutoff queries designed to test whether models recognise when search is required and retrieve updated information. Web access substantially improves static accuracy for GPT-5-mini and Claude Haiku 4.5, though confidence calibration worsens. On dynamic queries, both models frequently invoke search yet remain below 70 percent accuracy due to weak query formulation. Costs per accuracy-improving call remain low, but returns diminish once initial retrieval fails. Selective invocation helps, but models become overconfident and inconsistent after search. Overall, built-in web search meaningfully improves factual accuracy and can be invoked selectively, yet models remain overconfident, skip retrieval when it is essential, and falter once initial search queries underperform. Taken together, internal web search works better as a good low-latency verification layer than a reliable analytical tool, with clear room for improvement.

Why we think this paper is great for you:
This paper offers a direct investigation into how modern language models utilize web search, providing insights into their information retrieval capabilities. It's highly relevant to understanding the practical applications of search within deep learning models.

PRInTS: Reward Modeling for Long-Horizon Information Seeking

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Summary

PRINTS is a generative Process Reward Model (PRM) for long-horizon information seeking that unifies information gain scoring with recursive trajectory summarization. [3]
PRINTS enhances the information-seeking abilities of agents, showcasing its high versatility and ability to push the performance of frontier information-seeking agents beyond their original performance. [3]
Generative PRM: A type of PRM that generates new trajectories based on the current state and action. [3]
Information gain scoring: A method for evaluating the quality of a trajectory by measuring the amount of information gained from each step. [3]
PRINTS is a powerful tool for enhancing the information-seeking abilities of agents, and its ability to push the performance of frontier information-seeking agents beyond their original performance makes it a valuable addition to the field. [3]
The model's versatility and ability to work with different types of agents make it a useful tool for researchers and practitioners alike. [3]
The model requires a large amount of data to train effectively, which can be a limitation for smaller datasets. [3]
The model is trained through an alternating schedule of supervised fine-tuning for summarization and reinforcement learning for scoring. [2]
Process Reward Model (PRM): A type of reward model that is used in reinforcement learning to guide an agent's behavior. [1]

Abstract
Information-seeking is a core capability for AI agents, requiring them to gather and reason over tool-generated information across long trajectories. However, such multi-step information-seeking tasks remain challenging for agents backed by language models. While process reward models (PRMs) can guide agents by ranking candidate steps at test-time, existing PRMs, designed for short reasoning with binary judgment, cannot capture richer dimensions of information-seeking steps, such as tool interactions and reasoning over tool outputs, nor handle the rapidly growing context in long-horizon tasks. To address these limitations, we introduce PRInTS, a generative PRM trained with dual capabilities: (1) dense scoring based on the PRM's reasoning across multiple step quality dimensions (e.g., interpretation of tool outputs, tool call informativeness) and (2) trajectory summarization that compresses the growing context while preserving essential information for step evaluation. Extensive evaluations across FRAMES, GAIA (levels 1-3), and WebWalkerQA (easy-hard) benchmarks on multiple models, along with ablations, reveal that best-of-n sampling with PRInTS enhances information-seeking abilities of open-source models as well as specialized agents, matching or surpassing the performance of frontier models with a much smaller backbone agent and outperforming other strong reward modeling baselines.

Why we think this paper is great for you:
This paper explores advanced techniques for long-horizon information seeking, which is highly pertinent to your interest in developing sophisticated information retrieval agents. It delves into how AI agents can better navigate complex information gathering tasks.

Online-PVLM: Advancing Personalized VLMs with Online Concept Learning

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Summary

The dataset is called PVIT (Personalized Visual Instruction Tuning) and consists of 100,000 image-concept pairs with corresponding questions and answers. [3]
It also mentions recent advances in visual language understanding tasks such as image captioning, visual question answering, and visual reasoning. [3]
The method uses a large dataset of images with multiple concepts and corresponding questions to test the computer's ability to identify and describe these concepts accurately. [3]
The paper proposes a new dataset and evaluation framework for visual language models (VLMs) that can understand and describe concepts in images. [2]

Abstract
Personalized Visual Language Models (VLMs) are gaining increasing attention for their formidable ability in user-specific concepts aligned interactions (e.g., identifying a user's bike). Existing methods typically require the learning of separate embeddings for each new concept, which fails to support real-time adaptation during testing. This limitation becomes particularly pronounced in large-scale scenarios, where efficient retrieval of concept embeddings is not achievable. To alleviate this gap, we propose Online-PVLM, a framework for online concept learning by leveraging hyperbolic representations. Our approach makes a train-free paradigm for concept embeddings generation at test time, making the use of personalized VLMs both scalable and efficient. In addition, we develop OP-Eval, a comprehensive and large-scale benchmark comprising 1,292 concepts and over 30K high-quality instances with diverse question types, designed to rigorously assess online concept learning in realistic scenarios. Extensive experiments demonstrate the state-of-the-art performance of our proposed framework. Our source code and dataset will be made available.

Why we think this paper is great for you:
This paper focuses on advancing personalized visual language models through online concept learning, which directly aligns with your interest in tailoring systems to individual user preferences. It explores innovative ways to achieve user-specific interactions.

The 2nd Workshop on Human-Centered Recommender Systems

Rate paper: 👍 👎 ♥ Save

Abstract
Recommender systems shape how people discover information, form opinions, and connect with society. Yet, as their influence grows, traditional metrics, e.g., accuracy, clicks, and engagement, no longer capture what truly matters to humans. The workshop on Human-Centered Recommender Systems (HCRS) calls for a paradigm shift from optimizing engagement toward designing systems that truly understand, involve, and benefit people. It brings together researchers in recommender systems, human-computer interaction, AI safety, and social computing to explore how human values, e.g., trust, safety, fairness, transparency, and well-being, can be integrated into recommendation processes. Centered around three thematic axes-Human Understanding, Human Involvement, and Human Impact-HCRS features keynotes, panels, and papers covering topics from LLM-based interactive recommenders to societal welfare optimization. By fostering interdisciplinary collaboration, HCRS aims to shape the next decade of responsible and human-aligned recommendation research.

Why we think this paper is great for you:
This workshop summary on human-centered recommender systems is a great match, as it emphasizes the importance of designing ranking and personalization systems that truly matter to users. It encourages a broader perspective beyond traditional metrics.

Deep Learning

How to Use Deep Learning to Identify Sufficient Conditions: A Case Study on Stanley's $e$-Positivity

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
In a study, published in \emph{Nature}, researchers from DeepMind and mathematicians demonstrated a general framework using machine learning to make conjectures in pure mathematics. Their work uses neural networks and attribution techniques to guide human intuition towards making provable conjectures. Here, we build upon this framework to develop a method for identifying sufficient conditions that imply a given mathematical statement. Our approach trains neural networks with a custom loss function that prioritizes high precision. Then uses attribution techniques and exploratory data analysis to make conjectures. As a demonstration, we apply this process to Stanley's problem of $e$-positivity of graphs--a problem that has been at the center of algebraic combinatorics for the past three decades. Guided by AI, we rediscover that one sufficient condition for a graph to be $e$-positive is that it is co-triangle-free, and that the number of claws is the most important factor for $e$-positivity. Based on the most important factors in Saliency Map analysis of neural networks, we suggest that the classification of $e$-positive graphs is more related to continuous graph invariants rather than the discrete ones. Furthermore, using neural networks and exploratory data analysis, we show that the claw-free and claw-contractible-free graphs with $10$ and $11$ vertices are $e$-positive, resolving a conjecture by Dahlberg, Foley, and van Willigenburg.

AI Summary

The authors used a precision-optimized model to identify the top four features that impact e-positivity in graphs. [3]
The model achieved 100% precision on the test set, indicating high reliability for its positive predictions. [3]
The study demonstrates how AI can be used to guide human intuition and advance mathematics by identifying underlying patterns associated with e-positivity. [3]
The authors' approach can be applied to other areas of mathematics where pattern recognition is crucial. [3]
E-positivity: a property of graphs that refers to the existence of certain combinatorial structures, such as cycles or paths. [3]
Chromatic symmetric function: a polynomial invariant of graphs that encodes information about their coloring properties. [3]
The study demonstrates the potential of AI in advancing mathematics by identifying underlying patterns associated with e-positivity. [3]
The precision-optimized model achieved high reliability for its positive predictions, indicating that it can be trusted with high confidence when classifying graphs as e-positive. [3]
The approach used in this study can be applied to other areas of mathematics where pattern recognition is crucial. [3]
Saliency Map analysis: a technique used to identify the most important features or variables in a dataset by computing the average gradient of the model's output with respect to its input features. [2]

Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets

Rate paper: 👍 👎 ♥ Save

Abstract
This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over more traditional statistical methods. We start with the discovery that the set of real-valued function $f$ that can be $ε$-approximated with a binary circuit of size at most $cε^{-γ}$ becomes convex in the `Harder than Monte Carlo' (HTMC) regime, when $γ>2$, allowing for the definition of a HTMC norm on functions. In parallel one can define a complexity measure on the parameters of a ResNets (a weighted $\ell_1$ norm of the parameters), which induce a `ResNet norm' on functions. The HTMC and ResNet norms can then be related by an almost matching sandwich bound. Thus minimizing this ResNet norm is equivalent to finding a circuit that fits the data with an almost minimal number of nodes (within a power of 2 of being optimal). ResNets thus appear as an alternative model for computation of real functions, better adapted to the HTMC regime and its convexity.

AI Summary

The HTMC norm is a measure of the complexity of a function, and it has several useful properties, including compositionality and convexity. [3]
The construction of the ResNet involves two main parts: first, the input is mapped to the weighted binary representations of its surrounding vertices; second, a sorting algorithm is used to recover the simplex that contains the input. [3]
The Lipschitz constant of this network is bounded by cp doutb|C|p=2/3. [3]
The ResNet representation of Tetrakis functions has several useful properties, including compositionality and convexity. [3]
HTMC norm: a measure of the complexity of a function. [3]
H¨ older continuous: a property of a function that implies it can be represented as a sum of Tetrakis functions. [3]
Tetrakis function: a type of function that is both HTMC computable and H¨ older continuous. [3]
ResNet: a type of neural network that can represent functions that are both HTMC computable and H¨ older continuous. [3]
ResNets can be used to represent functions that are both HTMC computable and H¨ older continuous. [2]

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback