Search

UniSearch: Rethinking Search System with a Unified Generative Architecture

Kuaishou Technology

Abstract
Modern search systems play a crucial role in facilitating information acquisition. Traditional search engines typically rely on a cascaded architecture, where results are retrieved through recall, pre-ranking, and ranking stages. The complexity of designing and maintaining multiple modules makes it difficult to achieve holistic performance gains. Recent advances in generative recommendation have motivated the exploration of unified generative search as an alternative. However, existing approaches are not genuinely end-to-end: they typically train an item encoder to tokenize candidates first and then optimize a generator separately, leading to objective inconsistency and limited generalization. To address these limitations, we propose UniSearch, a unified generative search framework for Kuaishou Search. UniSearch replaces the cascaded pipeline with an end-to-end architecture that integrates a Search Generator and a Video Encoder. The Generator produces semantic identifiers of relevant items given a user query, while the Video Encoder learns latent item embeddings and provides their tokenized representations. A unified training framework jointly optimizes both components, enabling mutual enhancement and improving representation quality and generation accuracy. Furthermore, we introduce Search Preference Optimization (SPO), which leverages a reward model and real user feedback to better align generation with user preferences. Extensive experiments on industrial-scale datasets, together with online A/B testing in both short-video and live search scenarios, demonstrate the strong effectiveness and deployment potential of UniSearch. Notably, its deployment in live search yields the largest single-experiment improvement in recent years of our product's history, highlighting its practical value for real-world applications.

AI Insights

GRAM introduces a novel alignment mechanism that bridges generated candidates and query semantics, boosting retrieval accuracy.
The generative component of GRAM produces candidate documents directly from queries, eliminating the need for pre‑built indexes.
Alignment is achieved via a learned similarity scorer that reorders candidates, outperforming traditional BM25 baselines on TREC datasets.
Evaluation on multiple benchmarks (MS MARCO, Natural Questions) shows a 5–7 % MAP lift over state‑of‑the‑art generative models.
Training GRAM requires GPU clusters; inference latency is ~50 ms per query, limiting real‑time deployment without optimization.
Recommended reading: “Listwise Generative Retrieval Models via a Sequential Learning Process” (2024) for advanced training strategies.
Core definition: Generative Retrieval – a retrieval paradigm that synthesizes candidate documents conditioned on the query.

👍 👎 ♥ Save

A Research Vision for Web Search on Emerging Topics

Abstract
We regularly encounter information on novel, emerging topics for which the body of knowledge is still evolving, which can be linked, for instance, to current events. A primary way to learn more about such topics is through web search. However, information on emerging topics is sparse and evolves dynamically as knowledge grows, making it uncertain and variable in quality and trustworthiness and prone to deliberate or accidental manipulation, misinformation, and bias. In this paper, we outline a research vision towards search systems and interfaces that support effective knowledge acquisition, awareness of the dynamic nature of topics, and responsible opinion formation among people searching the web for information on emerging topics. To realize this vision, we propose three overarching research questions, aimed at understanding the status quo, determining requirements of systems aligned with our vision, and building these systems. For each of the three questions, we highlight relevant literature, including pointers on how they could be addressed. Lastly, we discuss the challenges that will potentially arise in pursuing the proposed vision.

Personalization

👍 👎 ♥ Save

Cost-Free Personalization via Information-Geometric Projection in Bayesian Federated Learning

Abstract
Bayesian Federated Learning (BFL) combines uncertainty modeling with decentralized training, enabling the development of personalized and reliable models under data heterogeneity and privacy constraints. Existing approaches typically rely on Markov Chain Monte Carlo (MCMC) sampling or variational inference, often incorporating personalization mechanisms to better adapt to local data distributions. In this work, we propose an information-geometric projection framework for personalization in parametric BFL. By projecting the global model onto a neighborhood of the user's local model, our method enables a tunable trade-off between global generalization and local specialization. Under mild assumptions, we show that this projection step is equivalent to computing a barycenter on the statistical manifold, allowing us to derive closed-form solutions and achieve cost-free personalization. We apply the proposed approach to a variational learning setup using the Improved Variational Online Newton (IVON) optimizer and extend its application to general aggregation schemes in BFL. Empirical evaluations under heterogeneous data distributions confirm that our method effectively balances global and local performance with minimal computational overhead.

👍 👎 ♥ Save

A Contextual Bandits Approach for Personalization of Hand Gesture Recognition

Reality Labs, Meta

Abstract
In human-computer interaction applications like hand gesture recognition, supervised learning models are often trained on a large population of users to achieve high task accuracy. However, due to individual variability in sensor signals and user behavior, static models may not provide optimal performance for all users. Personalizing pretrained models via calibration--collecting labeled data from each user--can improve performance but introduces user friction and struggles with limited data. To overcome these issues, we propose a calibrationless longitudinal personalization method: a contextual multi-arm bandit (MAB) algorithm combined with a pretrained neural network for gesture recognition. This reinforcement-learning-style approach enables personalization using binary reward signals, either user-provided or inferred by the system. We validated this method in a user study. Participants wore a surface electromyography (sEMG) device and played multiple rounds of a 2-D navigation game using six hand gestures. In the session, they completed a baseline round and then a round with our algorithm; in the second session, they played another round with our algorithm. Our approach led to a significant reduction in users' average false negative rate by 0.113 from the initial to the final round, with further decreases between sessions. Average precision also trended upward (by 0.139) from the start to end of a round, continuing in the next session. Notably, some users who could not complete the game with the baseline model succeeded with our contextual MAB model. In summary, our

AI Insights

The algorithm casts each gesture as a bandit arm, updating its policy online with binary success/failure rewards.
It fuses implicit system confidence and explicit user clicks, enabling calibration‑free personalization.
Across two sessions, false negatives fell (p = 0.002) while precision rose by 0.139 per round.
Users who failed the baseline game reached 100 % success after one learning round, rescuing edge‑case performers.
The sEMG‑based 2‑D navigation task shows bandits adapt to highly variable muscle signals across individuals.
A limitation is the linear reward‑feature assumption, which may miss complex gesture dynamics.

Deep Learning

👍 👎 ♥ Save

Towards Interpretable Deep Neural Networks for Tabular Data

Marburg University

Abstract
Tabular data is the foundation of many applications in fields such as finance and healthcare. Although DNNs tailored for tabular data achieve competitive predictive performance, they are blackboxes with little interpretability. We introduce XNNTab, a neural architecture that uses a sparse autoencoder (SAE) to learn a dictionary of monosemantic features within the latent space used for prediction. Using an automated method, we assign human-interpretable semantics to these features. This allows us to represent predictions as linear combinations of semantically meaningful components. Empirical evaluations demonstrate that XNNTab attains performance on par with or exceeding that of state-of-the-art, black-box neural models and classical machine learning approaches while being fully interpretable.

AI Insights

XNNTab’s sparse autoencoder learns monosemantic dictionary features that map to human‑readable rules.
On the ADULT benchmark, these dictionary features are generated by applying data‑driven rules to age, education, and capital gain.
In the CHURN dataset, rule‑derived dictionary features uncover subtle customer‑attrition signals missed by conventional models.
Empirical tests show XNNTab matches or exceeds black‑box DNNs while providing transparent linear explanations.
The approach depends heavily on training‑data quality, so noisy or biased data can distort dictionary semantics.
Future work may automate rule discovery or use transfer learning to broaden applicability across domains.
The subjectivity in rule selection still poses a challenge for reproducibility and generalization.

👍 👎 ♥ Save

An Interpretable Deep Learning Model for General Insurance Pricing

UNSW Sydney NSW 2052, AU

Abstract
This paper introduces the Actuarial Neural Additive Model, an inherently interpretable deep learning model for general insurance pricing that offers fully transparent and interpretable results while retaining the strong predictive power of neural networks. This model assigns a dedicated neural network (or subnetwork) to each individual covariate and pairwise interaction term to independently learn its impact on the modeled output while implementing various architectural constraints to allow for essential interpretability (e.g. sparsity) and practical requirements (e.g. smoothness, monotonicity) in insurance applications. The development of our model is grounded in a solid foundation, where we establish a concrete definition of interpretability within the insurance context, complemented by a rigorous mathematical framework. Comparisons in terms of prediction accuracy are made with traditional actuarial and state-of-the-art machine learning methods using both synthetic and real insurance datasets. The results show that the proposed model outperforms other methods in most cases while offering complete transparency in its internal logic, underscoring the strong interpretability and predictive capability.

Information Retrieval

👍 👎 ♥ Save

Benchmarking Information Retrieval Models on Complex Retrieval Tasks

University of Massachusst

Abstract
Large language models (LLMs) are incredible and versatile tools for text-based tasks that have enabled countless, previously unimaginable, applications. Retrieval models, in contrast, have not yet seen such capable general-purpose models emerge. To achieve this goal, retrieval models must be able to perform complex retrieval tasks, where queries contain multiple parts, constraints, or requirements in natural language. These tasks represent a natural progression from the simple, single-aspect queries that are used in the vast majority of existing, commonly used evaluation sets. Complex queries naturally arise as people expect search systems to handle more specific and often ambitious information requests, as is demonstrated by how people use LLM-based information systems. Despite the growing desire for retrieval models to expand their capabilities in complex retrieval tasks, there exist limited resources to assess the ability of retrieval models on a comprehensive set of diverse complex tasks. The few resources that do exist feature a limited scope and often lack realistic settings making it hard to know the true capabilities of retrieval models on complex real-world retrieval tasks. To address this shortcoming and spur innovation in next-generation retrieval models, we construct a diverse and realistic set of complex retrieval tasks and benchmark a representative set of state-of-the-art retrieval models. Additionally, we explore the impact of LLM-based query expansion and rewriting on retrieval quality. Our results show that even the best models struggle to produce high-quality retrieval results with the highest average nDCG@10 of only 0.346 and R@100 of only 0.587 across all tasks. Although LLM augmentation can help weaker models, the strongest model has decreased performance across all metrics with all rewriting techniques.

👍 👎 ♥ Save

Boosting Data Utilization for Multilingual Dense Retrieval

Beijing Jiaotong Universt

Abstract
Multilingual dense retrieval aims to retrieve relevant documents across different languages based on a unified retriever model. The challenge lies in aligning representations of different languages in a shared vector space. The common practice is to fine-tune the dense retriever via contrastive learning, whose effectiveness highly relies on the quality of the negative sample and the efficacy of mini-batch data. Different from the existing studies that focus on developing sophisticated model architecture, we propose a method to boost data utilization for multilingual dense retrieval by obtaining high-quality hard negative samples and effective mini-batch data. The extensive experimental results on a multilingual retrieval benchmark, MIRACL, with 16 languages demonstrate the effectiveness of our method by outperforming several existing strong baselines.

Ranking

👍 👎 ♥ Save

Variable Selection Using Relative Importance Rankings

Abstract
Although conceptually related, variable selection and relative importance (RI) analysis have been treated quite differently in the literature. While RI is typically used for post-hoc model explanation, this paper explores its potential for variable ranking and filter-based selection before model creation. Specifically, we anticipate strong performance from the RI measures because they incorporate both direct and combined effects of predictors, addressing a key limitation of marginal correlation that ignores dependencies among predictors. We implement and evaluate the RI-based variable selection methods using general dominance (GD), comprehensive relative importance (CRI), and a newly proposed, computationally efficient variant termed CRI.Z. We first demonstrate how the RI measures more accurately rank the variables than the marginal correlation, especially when there are suppressed or weak predictors. We then show that predictive models built on these rankings are highly competitive, often outperforming state-of-the-art methods such as the lasso and relaxed lasso. The proposed RI-based methods are particularly effective in challenging cases involving clusters of highly correlated predictors, a setting known to cause failures in many benchmark methods. Although lasso methods have dominated the recent literature on variable selection, our study reveals that the RI-based method is a powerful and competitive alternative. We believe these underutilized tools deserve greater attention in statistics and machine learning communities. The code is available at: https://github.com/tien-endotchang/RI-variable-selection.

👍 👎 ♥ Save

Improved Approximation Guarantees and Hardness Results for MNL-Driven Product Ranking

Tel Aviv University

Abstract
In this paper, we address open computational questions regarding the market share ranking problem, recently introduced by Derakhshan et al. (2022). Their modelling framework incorporates the extremely popular Multinomial Logit (MNL) choice model, along with a novel search-based consider-then-choose paradigm. In a nutshell, the authors devised a Pandora's-Box-type search model, where different customer segments sequentially screen through a ranked list of products, one position after the other, forming their consideration set by including all products viewed up until terminating their inspection procedure. Subsequently, a purchasing decision out of this set is made based on a joint MNL choice model. Our main contribution consists in devising a polynomial-time approximation scheme for the market share ranking problem, utilizing fresh technical developments and analytical ideas, in conjunction with revising the original insights of Derakhshan et al. (2022). Along the way, we introduce a black-box reduction, mapping general instances of the market share ranking problem into ``bounded ratio'' instances, showing that this result directly leads to an elegant and easily-implementable quasi-PTAS. Finally, to provide a complete computational characterization, we prove that the market share ranking problem is strongly $\mathrm{NP}$-hard.

AI Insights

The PTAS uses a layered decomposition of the search tree, limiting consideration depth to O(log n).
A greedy refinement swaps adjacent products to lower expected search cost while keeping MNL consistency.
The reduction scales product utilities to create bounded‑ratio instances, preserving optimality within a constant factor.
Hardness follows from a Partition reduction that encodes budget constraints into the search termination rule.
Synthetic tests show a 15% market‑share gain over baseline heuristics.
The method generalizes to nested logit by substituting the MNL kernel with a nested logit one.

Help us improve your experience!