Hi j34nc4rl0+ranking,

Here is our personalized paper recommendations for you sorted by most relevant
Search
Kuaishou Technology
Paper visualization
Abstract
Traditional e-commerce search systems employ multi-stage cascading architectures (MCA) that progressively filter items through recall, pre-ranking, and ranking stages. While effective at balancing computational efficiency with business conversion, these systems suffer from fragmented computation and optimization objective collisions across stages, which ultimately limit their performance ceiling. To address these, we propose \textbf{OneSearch}, the first industrial-deployed end-to-end generative framework for e-commerce search. This framework introduces three key innovations: (1) a Keyword-enhanced Hierarchical Quantization Encoding (KHQE) module, to preserve both hierarchical semantics and distinctive item attributes while maintaining strong query-item relevance constraints; (2) a multi-view user behavior sequence injection strategy that constructs behavior-driven user IDs and incorporates both explicit short-term and implicit long-term sequences to model user preferences comprehensively; and (3) a Preference-Aware Reward System (PARS) featuring multi-stage supervised fine-tuning and adaptive reward-weighted ranking to capture fine-grained user preferences. Extensive offline evaluations on large-scale industry datasets demonstrate OneSearch's superior performance for high-quality recall and ranking. The rigorous online A/B tests confirm its ability to enhance relevance in the same exposure position, achieving statistically significant improvements: +1.67\% item CTR, +2.40\% buyer, and +3.22\% order volume. Furthermore, OneSearch reduces operational expenditure by 75.40\% and improves Model FLOPs Utilization from 3.26\% to 27.32\%. The system has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users, generating tens of millions of PVs daily.
AI Insights
  • Generative models now dominate recommendation pipelines, boosting relevance while slashing inference cost.
  • Large language models are blended with collaborative filtering to surface deeper user intent beyond clicks.
  • Contrastive learning is being replaced by data‑augmentation tricks to learn sequential preferences without explicit negatives.
  • “Transformer Memory as a Differentiable Search Index” shows memory‑augmented transformers can serve as fast, trainable retrieval back‑ends.
  • “Neural Discrete Representation Learning” compresses item embeddings into discrete codes, enabling efficient end‑to‑end generative search.
  • Generative model: learns to generate new samples resembling training data; contrastive learning: pulls similar pairs together and pushes dissimilar ones apart.
September 03, 2025
Save to Reading List
Hong Kong University of
Abstract
Query spelling correction is an important function of modern search engines since it effectively helps users express their intentions clearly. With the growing popularity of speech search driven by Automated Speech Recognition (ASR) systems, this paper introduces a novel method named Contextualized Token Discrimination (CTD) to conduct effective speech query correction. In CTD, we first employ BERT to generate token-level contextualized representations and then construct a composition layer to enhance semantic information. Finally, we produce the correct query according to the aggregated token representation, correcting the incorrect tokens by comparing the original token representations and the contextualized representations. Extensive experiments demonstrate the superior performance of our proposed method across all metrics, and we further present a new benchmark dataset with erroneous ASR transcriptions to offer comprehensive evaluations for audio query correction.
AI Insights
  • Large language models routinely exceed 90 % accuracy on ASR error‑correction benchmarks, beating rule‑based baselines.
  • Fine‑tuning a pre‑trained transformer on a modest ASR corpus yields 5–10 % gains over training from scratch.
  • Wav2vec‑2.0 and similar self‑supervised encoders now back most state‑of‑the‑art ASR pipelines, using contextualized embeddings that encode token meaning from surrounding audio.
  • Their computational footprint still limits low‑latency deployment on edge devices.
  • Domain mismatch remains a challenge; adaptive fine‑tuning shows promise for cross‑dialect robustness.
  • “Attention Is All You Need” introduced the transformer that underlies modern ASR and correction models.
  • “Deep Context: End‑to‑End Contextual Speech Recognition” shows end‑to‑end contextual modeling can replace hand‑crafted language models.
September 04, 2025
Save to Reading List
Personalization
Huazhong University of
Abstract
With the dynamic evolution of user interests and the increasing multimodal demands in internet applications, personalized content generation strategies based on static interest preferences struggle to meet practical application requirements. The proposed TIMGen (Temporal Interest-driven Multimodal Generation) model addresses this challenge by modeling the long-term temporal evolution of users' interests and capturing dynamic interest representations with strong temporal dependencies. This model also supports the fusion of multimodal features, such as text, images, video, and audio, and delivers customized content based on multimodal preferences. TIMGen jointly learns temporal dependencies and modal preferences to obtain a unified interest representation, which it then generates to meet users' personalized content needs. TIMGen overcomes the shortcomings of personalized content recommendation methods based on static preferences, enabling flexible and dynamic modeling of users' multimodal interests, better understanding and capturing their interests and preferences. It can be extended to a variety of practical application scenarios, including e-commerce, advertising, online education, and precision medicine, providing insights for future research.
AI Insights
  • TIMGen’s Transformer embeds timestamps, enabling trend‑aware interest drift detection.
  • Attention assigns modality weights per user, letting a single model output text, image, or audio on demand.
  • Fusing rating and category labels jointly optimizes relevance and personalization, easing cold‑start bias.
  • The VAE generator is lightweight but sacrifices visual fidelity versus GAN or diffusion, hinting at hybrid designs.
  • Explicit time embedding lets TIMGen capture seasonal spikes, like holiday content bursts, without manual features.
  • Multimodal fusion struggles with high‑order interactions, suggesting graph‑based or attention‑augmented layers.
September 04, 2025
Save to Reading List
Keio University, NVIDIA
Abstract
Evaluating concept customization is challenging, as it requires a comprehensive assessment of fidelity to generative prompts and concept images. Moreover, evaluating multiple concepts is considerably more difficult than evaluating a single concept, as it demands detailed assessment not only for each individual concept but also for the interactions among concepts. While humans can intuitively assess generated images, existing metrics often provide either overly narrow or overly generalized evaluations, resulting in misalignment with human preference. To address this, we propose Decomposed GPT Score (D-GPTScore), a novel human-aligned evaluation method that decomposes evaluation criteria into finer aspects and incorporates aspect-wise assessments using Multimodal Large Language Model (MLLM). Additionally, we release Human Preference-Aligned Concept Customization Benchmark (CC-AlignBench), a benchmark dataset containing both single- and multi-concept tasks, enabling stage-wise evaluation across a wide difficulty range -- from individual actions to multi-person interactions. Our method significantly outperforms existing approaches on this benchmark, exhibiting higher correlation with human preferences. This work establishes a new standard for evaluating concept customization and highlights key challenges for future research. The benchmark and associated materials are available at https://github.com/ReinaIshikawa/D-GPTScore.
AI Insights
  • D‑GPTScore splits evaluation into fidelity, diversity, and interaction consistency, enabling fine‑grained analysis.
  • The method leverages a multimodal LLM to score each aspect, turning subjective judgments into reproducible metrics.
  • CC‑AlignBench contains over 10,000 single‑concept and 5,000 multi‑concept prompts, spanning simple actions to complex group scenes.
  • Stage‑wise evaluation lets researchers pinpoint whether a model struggles with concept isolation or cross‑concept blending.
  • Experiments show D‑GPTScore’s correlation with human ratings exceeds 0.8, surpassing prior metrics by a wide margin.
  • The open‑source pipeline supports automatic re‑scoring during training, facilitating rapid iteration on concept‑customized models.
  • Future work explores adaptive aspect weighting and zero‑shot evaluation on unseen concepts, promising even tighter human alignment.
September 03, 2025
Save to Reading List
Ranking
University of Augsburg, a
Abstract
The rank aggregation problem, which has many real-world applications, refers to the process of combining multiple input rankings into a single aggregated ranking. In dynamic settings, where new rankings arrive over time, efficiently updating the aggregated ranking is essential. This paper develops a fast, theoretically and practically efficient dynamic rank aggregation algorithm. First, we develop the LR-Aggregation algorithm, built on top of the LR-tree data structure, which is itself modeled on the LR-distance, a novel and equivalent take on the classical Spearman's footrule distance. We then analyze the theoretical efficiency of the Pick-A-Perm algorithm, and show how it can be combined with the LR-aggregation algorithm using another data structure that we develop. We demonstrate through experimental evaluations that LR-Aggregation produces close to optimal solutions in practice. We show that Pick-A-Perm has a theoretical worst case approximation guarantee of 2. We also show that both the LR-Aggregation and Pick-A-Perm algorithms, as well as the methodology for combining them can be run in $O(n \log n)$ time. To the best of our knowledge, this is the first fast, near linear time rank aggregation algorithm in the dynamic setting, having both a theoretical approximation guarantee, and excellent practical performance (much better than the theoretical guarantee).
AI Insights
  • Borda count assigns points inversely proportional to rank position, making it resilient to minor rank shifts.
  • Pairwise comparison aggregates by majority preference between each pair, yielding a Condorcet‑consistent ranking when it exists.
  • Dynamic aggregation of streaming gene lists adapts to new experiments with negligible recomputation, as shown in Wang et al. 2022.
  • Robust variants like median rank or trimmed Borda mitigate outlier influence, addressing weaknesses highlighted in the paper.
  • Wang et al.'s 2024 survey offers a taxonomy of aggregation methods across domains, from social choice to bioinformatics.
  • Teng et al. 2018 present a voting aggregation algorithm that optimizes social satisfaction under cardinal utilities, a useful benchmark.
September 02, 2025
Save to Reading List
Princeton University
Abstract
This paper studies human preference learning based on partially revealed choice behavior and formulates the problem as a generalized Bradley-Terry-Luce (BTL) ranking model that accounts for heterogeneous preferences. Specifically, we assume that each user is associated with a nonparametric preference function, and each item is characterized by a low-dimensional latent feature vector - their interaction defines the underlying low-rank score matrix. In this formulation, we propose an indirect regularization method for collaboratively learning the score matrix, which ensures entrywise $\ell_\infty$-norm error control - a novel contribution to the heterogeneous preference learning literature. This technique is based on sieve approximation and can be extended to a broader class of binary choice models where a smooth link function is adopted. In addition, by applying a single step of the Newton-Raphson method, we debias the regularized estimator and establish uncertainty quantification for item scores and rankings of items, both for the aggregated and individual preferences. Extensive simulation results from synthetic and real datasets corroborate our theoretical findings.
AI Insights
  • Leave‑one‑out analysis of nonconvex gradient descent iterates yields sharp Frobenius, spectral, and infinity‑norm error bounds.
  • The regularization parameter scales as λ = Cλ p d̄/ p̄, ensuring entry‑wise ℓ∞ control across heterogeneous users.
  • Iterations are capped at t₀ = O(d̄²), guaranteeing convergence with probability 1−O(d̄⁻¹⁰).
  • Singular values of the ground‑truth matrix satisfy σ₁(F⋆)=pσ⋆max/2 and σ_R(F⋆)=pσ⋆min/2, anchoring the low‑rank structure.
  • Leave‑one‑out subproblems f^(ℓ)(X,Y) are defined separately for ℓ∈[1,d₁] and ℓ∈[d₁+1, d̄], enabling decoupled analysis.
  • The gradient descent iterates Ft,^(ℓ) are constructed via (H.3), preserving the low‑rank manifold throughout optimization.
  • These techniques are broadly applicable to any binary choice model with a smooth link, beyond the BTL framework.
September 02, 2025
Save to Reading List
Deep Learning
HSE University, Yandex
Paper visualization
Abstract
Recent advancements in tabular deep learning have demonstrated exceptional practical performance, yet the field often lacks a clear understanding of why these techniques actually succeed. To address this gap, our paper highlights the importance of the concept of data uncertainty for explaining the effectiveness of the recent tabular DL methods. In particular, we reveal that the success of many beneficial design choices in tabular DL, such as numerical feature embeddings, retrieval-augmented models and advanced ensembling strategies, can be largely attributed to their implicit mechanisms for managing high data uncertainty. By dissecting these mechanisms, we provide a unifying understanding of the recent performance improvements. Furthermore, the insights derived from this data-uncertainty perspective directly allowed us to develop more effective numerical feature embeddings as an immediate practical outcome of our analysis. Overall, our work paves the way to foundational understanding of the benefits introduced by modern tabular methods that results in the concrete advancements of existing techniques and outlines future research directions for tabular DL.
AI Insights
  • Swapping Bayesian, MC‑Dropout, or ensemble uncertainty estimators leaves the MSE trend unchanged across datasets.
  • Figures show the performance gap between baseline and advanced tabular models is invariant to the uncertainty technique.
  • This invariance confirms conclusions are not artifacts of a specific uncertainty model.
  • Authors assume uncertainty estimators are accurate, which may fail in low‑sample or noisy regimes.
  • Data quality and sampling bias were not modeled, leaving room for future robust preprocessing work.
  • Recommended resources include “Bayesian Methods for Hackers” and a TensorFlow uncertainty tutorial.
  • Robustness of tabular DL hinges on design choices and fidelity of uncertainty estimates, inspiring hybrid architectures.
September 04, 2025
Save to Reading List
OpenReview benefits the
Abstract
OpenReview benefits the peer-review system by promoting transparency, openness, and collaboration. By making reviews, comments, and author responses publicly accessible, the platform encourages constructive feedback, reduces bias, and allows the research community to engage directly in the review process. This level of openness fosters higher-quality reviews, greater accountability, and continuous improvement in scholarly communication. In the statistics community, such a transparent and open review system has not traditionally existed. This lack of transparency has contributed to significant variation in the quality of published papers, even in leading journals, with some containing substantial errors in both proofs and numerical analyses. To illustrate this issue, this note examines several results from Wang, Zhou and Lin (2025) [arXiv:2309.12872; https://doi.org/10.1080/01621459.2024.2412364] and highlights potential errors in their proofs, some of which are strikingly obvious. This raises a critical question: how important are mathematical proofs in statistical journals, and how should they be rigorously verified? Addressing this question is essential not only for maintaining academic rigor but also for fostering the right attitudes toward scholarship and quality assurance in the field. A plausible approach would be for arXiv to provide an anonymous discussion section, allowing readers-whether anonymous or not-to post comments, while also giving authors the opportunity to respond.
AI Insights
  • Theorem 1, 2, and Proposition 1 in Wang et al. (2025) contain algebraic errors that undermine convergence claims.
  • A chain‑rule misuse in Proposition 1’s gradient derivation exposes a common pitfall in high‑dimensional M‑estimation.
  • Minor proof mistakes can distort simulations, stressing theory‑code cross‑validation.
  • An anonymous arXiv discussion could serve as a live proof‑audit platform before acceptance.
  • Casella & Berger’s text remains essential for mastering probabilistic foundations that safeguard proofs.
  • Feng et al.’s score‑matching offers a robust alternative to conventional loss functions, aligning with optimality.
  • JASA’s reproducibility editorial echoes the push for transparent peer review.
September 03, 2025
Save to Reading List
Information Retrieval
Abstract
In this paper, we introduce Technical-Embeddings, a novel framework designed to optimize semantic retrieval in technical documentation, with applications in both hardware and software development. Our approach addresses the challenges of understanding and retrieving complex technical content by leveraging the capabilities of Large Language Models (LLMs). First, we enhance user queries by generating expanded representations that better capture user intent and improve dataset diversity, thereby enriching the fine-tuning process for embedding models. Second, we apply summary extraction techniques to encode essential contextual information, refining the representation of technical documents. To further enhance retrieval performance, we fine-tune a bi-encoder BERT model using soft prompting, incorporating separate learning parameters for queries and document context to capture fine-grained semantic nuances. We evaluate our approach on two public datasets, RAG-EDA and Rust-Docs-QA, demonstrating that Technical-Embeddings significantly outperforms baseline models in both precision and recall. Our findings highlight the effectiveness of integrating query expansion and contextual summarization to enhance information access and comprehension in technical domains. This work advances the state of Retrieval-Augmented Generation (RAG) systems, offering new avenues for efficient and accurate technical document retrieval in engineering and product development workflows.
September 04, 2025
Save to Reading List
Pinterest
Abstract
Relevance evaluation plays a crucial role in personalized search systems to ensure that search results align with a user's queries and intent. While human annotation is the traditional method for relevance evaluation, its high cost and long turnaround time limit its scalability. In this work, we present our approach at Pinterest Search to automate relevance evaluation for online experiments using fine-tuned LLMs. We rigorously validate the alignment between LLM-generated judgments and human annotations, demonstrating that LLMs can provide reliable relevance measurement for experiments while greatly improving the evaluation efficiency. Leveraging LLM-based labeling further unlocks the opportunities to expand the query set, optimize sampling design, and efficiently assess a wider range of search experiences at scale. This approach leads to higher-quality relevance metrics and significantly reduces the Minimum Detectable Effect (MDE) in online experiment measurements.
AI Insights
  • Post‑stratification corrects bias in LLM relevance scores, a nuance absent from the abstract.
  • Fine‑tuned RankT5 outperforms vanilla LLMs on Pinterest queries, advancing ranking‑loss research.
  • Expanding the query pool gives a 30 % sampling‑efficiency boost, widening experiment coverage.
  • Online experiments validate LLM judgments, linking offline metrics to real‑world impact.
  • Relevance Judgment: scoring result relevance to user intent, formalized with a probabilistic model.
  • Large Language Models: AI that generates language from context, here used for relevance scoring.
  • Recommended reading: “Sampling” by Thompson and Netflix’s sensitivity‑improvement case study.
September 03, 2025
Save to Reading List
Unsubscribe from these updates