Paid Search
Alibaba Group
Abstract
Achievement. We introduce LORE, a systematic framework for Large Generative Model-based relevance in e-commerce search. Deployed and iterated over three years, LORE achieves a cumulative +27\% improvement in online GoodRate metrics. This report shares the valuable experience gained throughout its development lifecycle, spanning data, features, training, evaluation, and deployment. Insight. While existing works apply Chain-of-Thought (CoT) to enhance relevance, they often hit a performance ceiling. We argue this stems from treating relevance as a monolithic task, lacking principled deconstruction. Our key insight is that relevance comprises distinct capabilities: knowledge and reasoning, multi-modal matching, and rule adherence. We contend that a qualitative-driven decomposition is essential for breaking through current performance bottlenecks. Contributions. LORE provides a complete blueprint for the LLM relevance lifecycle. Key contributions include: (1) A two-stage training paradigm combining progressive CoT synthesis via SFT with human preference alignment via RL. (2) A comprehensive benchmark, RAIR, designed to evaluate these core capabilities. (3) A query frequency-stratified deployment strategy that efficiently transfers offline LLM capabilities to the online system. LORE serves as both a practical solution and a methodological reference for other vertical domains.
AI Summary - The framework selects features to endow the model with necessary perception and reasoning skills spanning both visual and textual modalities. [3]
- The framework incorporates Stock Keeping Unit (SKU) information to capture fine-grained, purchasable variations such as different colors, sizes, or package contents. [3]
- The framework employs a scientific sampling strategy that adequately covers diverse e-commerce data distributions, thereby mitigating potential oversampling or undersampling bias. [3]
- The framework uses a robust data cleaning pipeline to systematically reduce noise and enhance overall dataset quality. [2]
- The framework uses a two-stage discrimination framework, where the model first analyzes the query's intent and attribute requirements, and then extracts relevant item attributes to render a judgment based on established rules. [1]
Personalization
The University of Quebec
Abstract
As large language models (LLMs) become increasingly capable of generating persuasive content, understanding their effectiveness across different advertising strategies becomes critical. This paper presents a two-part investigation examining LLM-generated advertising through complementary lenses: (1) personality-based and (2) psychological persuasion principles.
In our first study (n=400), we tested whether LLMs could generate personalized advertisements tailored to specific personality traits (openness and neuroticism) and how their performance compared to human experts. Results showed that LLM-generated ads achieved statistical parity with human-written ads (51.1% vs. 48.9%, p > 0.05), with no significant performance differences for matched personalities.
Building on these insights, our second study (n=800) shifted focus from individual personalization to universal persuasion, testing LLM performance across four foundational psychological principles: authority, consensus, cognition, and scarcity. AI-generated ads significantly outperformed human-created content, achieving a 59.1% preference rate (vs. 40.9%, p < 0.001), with the strongest performance in authority (63.0%) and consensus (62.5%) appeals. Qualitative analysis revealed AI's advantage stems from crafting more sophisticated, aspirational messages and achieving superior visual-narrative coherence. Critically, this quality advantage proved robust: even after applying a 21.2 percentage point detection penalty when participants correctly identified AI-origin, AI ads still outperformed human ads, and 29.4% of participants chose AI content despite knowing its origin. These findings demonstrate LLMs' evolution from parity in personalization to superiority in persuasive storytelling, with significant implications for advertising practice given LLMs' near-zero marginal cost and time requirements compared to human experts.
AI Summary - AI-generated advertisements achieved a dominant preference rate compared to human-created advertisements. [3]
- The performance of AI-generated content varied depending on the persuasion strategy employed, with strong results in Authority and Consensus conditions. [3]
- Identifying an advertisement as AI-generated influenced user preference, resulting in a bias against known AI content. [3]
- The results suggest that AI-generated content can be a viable alternative to traditional advertising methods, particularly in certain persuasion strategies. [3]
- LLM-generated ads can be competitive with human-written ads in terms of user engagement and purchase intent. [2]
University of WestFlorida
Abstract
AIVisor, an agentic retrieval-augmented LLM for student advising, was used to examine how personalization affects system performance across multiple evaluation dimensions. Using twelve authentic advising questions intentionally designed to stress lexical precision, we compared ten personalized and non-personalized system configurations and analyzed outcomes with a Linear Mixed-Effects Model across lexical (BLEU, ROUGE-L), semantic (METEOR, BERTScore), and grounding (RAGAS) metrics. Results showed a consistent trade-off: personalization reliably improved reasoning quality and grounding, yet introduced a significant negative interaction on semantic similarity, driven not by poorer answers but by the limits of current metrics, which penalize meaningful personalized deviations from generic reference texts. This reveals a structural flaw in prevailing LLM evaluation methods, which are ill-suited for assessing user-specific responses. The fully integrated personalized configuration produced the highest overall gains, suggesting that personalization can enhance system effectiveness when evaluated with appropriate multidimensional metrics. Overall, the study demonstrates that personalization produces metric-dependent shifts rather than uniform improvements and provides a methodological foundation for more transparent and robust personalization in agentic AI.
Direction on Data Science Organizations
UnB
Abstract
Data science initiatives frequently exhibit high failure rates, driven by technical constraints, organizational limitations and insufficient risk management practices. Challenges such as low data maturity, lack of governance, misalignment between technical and business teams, and the absence of structured mechanisms to address ethical and sociotechnical risks have been widely identified in the literature. In this context, the purpose of this study is to conduct a comparative analysis of the main risk management methodologies applied to data science projects, aiming to identify, classify, and synthesize their similarities, differences and existing gaps. An integrative literature review was performed using indexed databases and a structured protocol for selection and content analysis. The study examines widely adopted risk management standards ISO 31000, PMBOK Risk Management and NIST RMF, as well as frameworks specific to data science workflows, such as CRISP DM and the recently proposed DS EthiCo RMF, which incorporates ethical and sociotechnical dimensions into the project life cycle. The findings reveal that traditional approaches provide limited coverage of emerging risks, whereas contemporary models propose multidimensional structures capable of integrating ethical oversight, governance and continuous monitoring. As a contribution, this work offers theoretical support for the development of hybrid frameworks that balance technical efficiency, organizational alignment and responsible data practices, while highlighting research gaps that can guide future investigations.
AI Summary - Data Science: The study of extracting insights from large datasets using various techniques, including machine learning, statistics, and visualization. [3]
- A systematic literature review was conducted to analyze the application of risk management frameworks in data science projects. [2]
Waseda University
Abstract
Organizations struggle to share data across departments that have adopted different data analytics platforms. If n datasets must serve m environments, up to n*m replicas can emerge, increasing inconsistency and cost. Traditional warehouses copy data into vendor-specific stores; cross-platform access is hard. This study proposes the Enterprise Data Science Platform (EDSP), which builds on data lakehouse architecture and follows a Write-Once, Read-Anywhere principle. EDSP enables federated data access for multi-query engine environments, targeting data science workloads with periodic data updates and query response times ranging from seconds to minutes. By providing centralized data management with federated access from multiple query engines to the same data sources, EDSP eliminates data duplication and vendor lock-in inherent in traditional data warehouses. The platform employs a four-layer architecture: Data Preparation, Data Store, Access Interface, and Query Engines. This design enforces separation of concerns and reduces the need for data migration when integrating additional analytical environments. Experimental results demonstrate that major cloud data warehouses and programming environments can directly query EDSP-managed datasets. We implemented and deployed EDSP in production, confirming interoperability across multiple query engines. For data sharing across different analytical environments, EDSP achieves a 33-44% reduction in operational steps compared with conventional approaches requiring data migration. Although query latency may increase by up to a factor of 2.6 compared with native tables, end-to-end completion times remain on the order of seconds, maintaining practical performance for analytical use cases. Based on our production experience, EDSP provides practical design guidelines for addressing the data-silo problem in multi-query engine environments.
AI Summary - { "title": "Enterprise Data Science Platform (EDSP)", "description": "A unified data management architecture that addresses data management challenges in multi-query engine environments." } { "term": "Write-Once, Read-Anywhere", "definition": "A principle that enables data to be written once and read from multiple query engines without replication or duplication." } { "title": "EDSP Demonstrates Practical Solution to Data Silos in Multi-Query Engine Enterprises" , "description": "The Enterprise Data Science Platform (EDSP) demonstrates that the Write-Once, Read-Anywhere principle can be realized in production environments, offering a practical solution to the long-standing problem of data silos in multi-query engine enterprises." } { "title": "Limited Performance Validation" , "description": "Future work includes performance validation on TB-scale datasets." } { "title": "Data Lake Architectures and Metadata Management" , "description": "The paper references a study on data lake architectures and metadata management, highlighting the importance of metadata in data sharing across heterogeneous query engines." } The paper proposes the Enterprise Data Science Platform (EDSP), a unified data management architecture grounded in the Write-Once, Read-Anywhere principle, to address data management challenges in multi-query engine environments. [2]
Marketing Channels
ITMO University
Abstract
The future of digital marketing lies in the convergence of human creativity and generative AI, where insight, strategy, and storytelling are co-authored by intelligent systems. We present MindFuse, a brave new explainable generative AI framework designed to act as a strategic partner in the marketing process. Unlike conventional LLM applications that stop at content generation, MindFuse fuses CTR-based content AI-guided co-creation with large language models to extract, interpret, and iterate on communication narratives grounded in real advertising data. MindFuse operates across the full marketing lifecycle: from distilling content pillars and customer personas from competitor campaigns to recommending in-flight optimizations based on live performance telemetry. It uses attention-based explainability to diagnose ad effectiveness and guide content iteration, while aligning messaging with strategic goals through dynamic narrative construction and storytelling. We introduce a new paradigm in GenAI for marketing, where LLMs not only generate content but reason through it, adapt campaigns in real time, and learn from audience engagement patterns. Our results, validated in agency deployments, demonstrate up to 12 times efficiency gains, setting the stage for future integration with empirical audience data (e.g., GWI, Nielsen) and full-funnel attribution modeling. MindFuse redefines AI not just as a tool, but as a collaborative agent in the creative and strategic fabric of modern marketing.
AI Summary - The importance of AI in advertising and marketing The role of large language models in transforming industries The need for human creativity in storytelling AI Co-creation: The process of using AI to generate creative ideas or content, often in collaboration with humans. [3]
- AI is becoming increasingly important in advertising and marketing, but it still requires human input for effective storytelling. [2]
Bidding
Criteo
Abstract
We model a procurement scenario in which two \textit{imperfect} bidders act simultaneously on behalf of a single buyer, a configuration common in display advertising and referred to as \textit{side-by-side bidding} but largely unexplored in theory. We prove that the iterated best response algorithm converges to an equilibrium under standard distributional assumptions and provide sufficient condition for uniqueness. Beyond establishing existence and convergence, our analysis provides a tractable numerical method for quantitative studies of side-by-side procurement.
AI Summary - Imperfect bidders exacerbate the winner's curse issue by making noise-influenced bids that can cause overbidding and higher costs for the buyer. [3]
- Imperfect bidders: Bidders that bid a noisy version of their best response to the competition. [3]
- Log-concavity: A function f is log-concave if its logarithm is concave. [3]
- The iterated best-response algorithm converges to an equilibrium under standard distributional assumptions in side-by-side first-price auctions with imperfect bidders. [2]
Google
Abstract
Online platforms connect users with relevant products and services using ads. A key challenge is that a user's search query often leaves their true intent ambiguous. Typically, platforms passively predict relevance based on available signals and in some cases offer query refinements. The shift from traditional search to conversational AI provides a new approach. When a user's query is ambiguous, a Large Language Model (LLM) can proactively offer several clarifying follow-up prompts. In this paper we consider the following: what if some of these follow-up prompts can be ``sponsored,'' i.e., selected for their advertising potential. How should these ``suggestion slots'' be allocated? And, how does this new mechanism interact with the traditional ad auction that might follow?
This paper introduces a formal model for designing and analyzing these interactive platforms. We use this model to investigate a critical engineering choice: whether it is better to build an end-to-end pipeline that jointly optimizes the user interaction and the final ad auction, or to decouple them into separate mechanisms for the suggestion slots and another for the subsequent ad slot. We show that the VCG mechanism can be adopted to jointly optimize the sponsored suggestion and the ads that follow; while this mechanism is more complex, it achieves outcomes that are efficient and truthful. On the other hand, we prove that the simple-to-implement modular approach suffers from strategic inefficiency: its Price of Anarchy is unbounded.
Attribution
Hertie School
Abstract
Explanations are a fundamental element of how people make sense of the political world. Citizens routinely ask and answer questions about why events happen, who is responsible, and what could or should be done differently. Yet despite their importance, explanations remain an underdeveloped object of systematic analysis in political science, and existing approaches are fragmented and often issue-specific. I introduce a framework for detecting and parsing explanations in political text. To do this, I train a lightweight causal language model that returns a structured data set of causal claims in the form of cause-effect pairs for downstream analysis. I demonstrate how causal explanations can be studied at scale, and show the method's modest annotation requirements, generalizability, and accuracy relative to human coding.
AI Summary - The study analyzed the relationship between language and causal relationships in news headlines using a dataset of 10,000 news articles. [3]
- The log-odds analysis revealed significant differences in the odds of certain tokens appearing as causes or effects across different sources and locations. [3]
- Informative prior: A prior probability distribution that splits the strength between the The study's findings suggest that language plays a crucial role in shaping our understanding of causal relationships in news headlines. [3]
- The systematic bias in confidence between models highlights the importance of considering multiple perspectives when analyzing complex data. [3]
- The results showed that the span model consistently reported higher confidence than the sequence-level model, indicating a systematic bias in confidence. [2]
University of Glasgow
Abstract
As generative models become powerful, concerns around transparency, accountability, and copyright violations have intensified. Understanding how specific training data contributes to a model's output is critical. We introduce a framework for interpreting generative outputs through the automatic construction of ontologyaligned knowledge graphs (KGs). While automatic KG construction from natural text has advanced, extracting structured and ontology-consistent representations from visual content remains challenging -- due to the richness and multi-object nature of images. Leveraging multimodal large language models (LLMs), our method extracts structured triples from images, aligned with a domain-specific ontology. By comparing the KGs of generated and training images, we can trace potential influences, enabling copyright analysis, dataset transparency, and interpretable AI. We validate our method through experiments on locally trained models via unlearning, and on large-scale models through a style-specific experiment. Our framework supports the development of AI systems that foster human collaboration, creativity and stimulate curiosity.