Kuaishou Technology
Abstract
Modern search systems play a crucial role in facilitating information
acquisition. Traditional search engines typically rely on a cascaded
architecture, where results are retrieved through recall, pre-ranking, and
ranking stages. The complexity of designing and maintaining multiple modules
makes it difficult to achieve holistic performance gains. Recent advances in
generative recommendation have motivated the exploration of unified generative
search as an alternative. However, existing approaches are not genuinely
end-to-end: they typically train an item encoder to tokenize candidates first
and then optimize a generator separately, leading to objective inconsistency
and limited generalization. To address these limitations, we propose UniSearch,
a unified generative search framework for Kuaishou Search. UniSearch replaces
the cascaded pipeline with an end-to-end architecture that integrates a Search
Generator and a Video Encoder. The Generator produces semantic identifiers of
relevant items given a user query, while the Video Encoder learns latent item
embeddings and provides their tokenized representations. A unified training
framework jointly optimizes both components, enabling mutual enhancement and
improving representation quality and generation accuracy. Furthermore, we
introduce Search Preference Optimization (SPO), which leverages a reward model
and real user feedback to better align generation with user preferences.
Extensive experiments on industrial-scale datasets, together with online A/B
testing in both short-video and live search scenarios, demonstrate the strong
effectiveness and deployment potential of UniSearch. Notably, its deployment in
live search yields the largest single-experiment improvement in recent years of
our product's history, highlighting its practical value for real-world
applications.
AI Insights - GRAM introduces a novel alignment mechanism that bridges generated candidates and query semantics, boosting retrieval accuracy.
- The generative component of GRAM produces candidate documents directly from queries, eliminating the need for pre‑built indexes.
- Alignment is achieved via a learned similarity scorer that reorders candidates, outperforming traditional BM25 baselines on TREC datasets.
- Evaluation on multiple benchmarks (MS MARCO, Natural Questions) shows a 5–7 % MAP lift over state‑of‑the‑art generative models.
- Training GRAM requires GPU clusters; inference latency is ~50 ms per query, limiting real‑time deployment without optimization.
- Recommended reading: “Listwise Generative Retrieval Models via a Sequential Learning Process” (2024) for advanced training strategies.
- Core definition: Generative Retrieval – a retrieval paradigm that synthesizes candidate documents conditioned on the query.
Abstract
We regularly encounter information on novel, emerging topics for which the
body of knowledge is still evolving, which can be linked, for instance, to
current events. A primary way to learn more about such topics is through web
search. However, information on emerging topics is sparse and evolves
dynamically as knowledge grows, making it uncertain and variable in quality and
trustworthiness and prone to deliberate or accidental manipulation,
misinformation, and bias. In this paper, we outline a research vision towards
search systems and interfaces that support effective knowledge acquisition,
awareness of the dynamic nature of topics, and responsible opinion formation
among people searching the web for information on emerging topics. To realize
this vision, we propose three overarching research questions, aimed at
understanding the status quo, determining requirements of systems aligned with
our vision, and building these systems. For each of the three questions, we
highlight relevant literature, including pointers on how they could be
addressed. Lastly, we discuss the challenges that will potentially arise in
pursuing the proposed vision.