Paid Search

Best-of-Both Worlds for linear contextual bandits with paid observations

PMLR Volume 300 2026 by

Abstract
We study the problem of linear contextual bandits with paid observations, where at each round the learner selects an action in order to minimize its loss in a given context, and can then decide to pay a fixed cost to observe the loss of any arm. Building on the Follow-the-Regularized-Leader framework with efficient estimators via Matrix Geometric Resampling, we introduce a computationally efficient Best-of-Both-Worlds (BOBW) algorithm for this problem. We show that it achieves the minimax-optimal regret of $\Theta(T^{2/3})$ in adversarial settings, while guaranteeing poly-logarithmic regret in (corrupted) stochastic regimes. Our approach builds on the framework from \cite{BOBWhardproblems} to design BOBW algorithms for ``hard problem'', using analysis techniques tailored for the setting that we consider.

AI Insights

The algorithm marries Follow‑the‑Regularized‑Leader with Matrix Geometric Resampling to avoid costly matrix inversions.
MGR samples contexts to build a stochastic approximation of Σt,a⁻¹, guaranteeing a uniformly bounded error.
Each round incurs O(T d log T log d + K d² T² log T) operations, yet the overall runtime scales as O(K d² T² log T).
Memory consumption is O(d K T), storing per‑arm parameter estimates for every time step.
The regret guarantee of O(√K T log T) matches the best known bound for paid‑observation bandits.
Despite the quadratic per‑iteration cost of MGR, the algorithm remains practical for moderate T due to efficient sampling.
The framework extends naturally to corrupted stochastic regimes, offering poly‑logarithmic regret without sacrificing adversarial guarantees.

👍 👎 ♥ Save

TaoSR-AGRL: Adaptive Guided Reinforcement Learning Framework for E-commerce Search Relevance

Tsinghua University, Taob

Abstract
Query-product relevance prediction is fundamental to e-commerce search and has become even more critical in the era of AI-powered shopping, where semantic understanding and complex reasoning directly shape the user experience and business conversion. Large Language Models (LLMs) enable generative, reasoning-based approaches, typically aligned via supervised fine-tuning (SFT) or preference optimization methods like Direct Preference Optimization (DPO). However, the increasing complexity of business rules and user queries exposes the inability of existing methods to endow models with robust reasoning capacity for long-tail and challenging cases. Efforts to address this via reinforcement learning strategies like Group Relative Policy Optimization (GRPO) often suffer from sparse terminal rewards, offering insufficient guidance for multi-step reasoning and slowing convergence. To address these challenges, we propose TaoSR-AGRL, an Adaptive Guided Reinforcement Learning framework for LLM-based relevance prediction in Taobao Search Relevance. TaoSR-AGRL introduces two key innovations: (1) Rule-aware Reward Shaping, which decomposes the final relevance judgment into dense, structured rewards aligned with domain-specific relevance criteria; and (2) Adaptive Guided Replay, which identifies low-accuracy rollouts during training and injects targeted ground-truth guidance to steer the policy away from stagnant, rule-violating reasoning patterns toward compliant trajectories. TaoSR-AGRL was evaluated on large-scale real-world datasets and through online side-by-side human evaluations on Taobao Search. It consistently outperforms DPO and standard GRPO baselines in offline experiments, improving relevance accuracy, rule adherence, and training stability. The model trained with TaoSR-AGRL has been successfully deployed in the main search scenario on Taobao, serving hundreds of millions of users.

AI Insights

Ablation study confirms the hard gate in Rule‑aware Reward Shaping provides a clean supervision boundary, boosting policy convergence.
The reasoning reward regularizes the solution space, steering the model toward rule‑consistent derivations.
Table 6 lists 12 Relevance Derivation Rules governing category and attribute tier matching in Taobao.
Two case studies show Adaptive Guided Replay learning distinct reasoning chains for varied query types.
In‑the‑Wild evaluation shows the highest Rule Adherence Rate (RAR) among all ablated variants.
The open‑source repo (github.com/tao‑sr‑agrl/adaptive‑guided‑replay) offers full code and pretrained checkpoints.

Bidding

👍 👎 ♥ Save

Unending Sequential Auctions

Rate this image: 😍 👍 👎

Abstract
Sequential auctions for identical items with unit-demand, private-value buyers are common and often occur periodically without end, as new bidders replace departing ones. We model bidder uncertainty by introducing a probability that a bidder must exit the auction in each period. Treating the sequential auction as a Markov process, we demonstrate the existence of a unique steady state. In the absence of uncertainty, the steady state resembles a posted-price mechanism: bidders with values above a threshold almost surely win items by repeatedly bidding the threshold price, while those below the threshold almost surely do not. The equilibrium price corresponds to the threshold value that balances supply (bidders with values above the threshold) and demand (auction winners). When uncertainty is introduced, the threshold value persists but becomes less precise, growing "fuzzier" as uncertainty increases. This uncertainty benefits low-value bidders, those below the threshold, by giving them a significant chance of winning. Surprisingly, high-value bidders also benefit from uncertainty, up to a certain value limit, as it lowers equilibrium bids and increases their expected utility. On the other hand, this bidder uncertainty often reduces the auctioneer's utility.

👍 👎 ♥ Save

A Unified Multi-Task Learning Framework for Generative Auto-Bidding with Validation-Aligned Optimization

Alibaba Group, Beijing, 1

Abstract
In online advertising, heterogeneous advertiser requirements give rise to numerous customized bidding tasks that are typically optimized independently, resulting in extensive computation and limited data efficiency. Multi-task learning offers a principled framework to train these tasks jointly through shared representations. However, existing multi-task optimization strategies are primarily guided by training dynamics and often generalize poorly in volatile bidding environments. To this end, we present Validation-Aligned Multi-task Optimization (VAMO), which adaptively assigns task weights based on the alignment between per-task training gradients and a held-out validation gradient, thereby steering updates toward validation improvement and better matching deployment objectives. We further equip the framework with a periodicity-aware temporal module and couple it with an advanced generative auto-bidding backbone to enhance cross-task transfer of seasonal structure and strengthen bidding performance. Meanwhile, we provide theoretical insights into the proposed method, e.g., convergence guarantee and alignment analysis. Extensive experiments on both simulated and large-scale real-world advertising systems consistently demonstrate significant improvements over typical baselines, illuminating the effectiveness of the proposed approach.

AI Insights

Entropy regularization smooths the hard‑maximum task‑weight selection for efficient updates.
Maximal alignment of convex gradient combinations aligns updates with the validation objective.
Convergence is proven under L‑smoothness of the validation loss, offering theoretical guarantees.
Experiments on simulated and real‑world ad systems show faster convergence and higher performance than baselines.
LLMs assisted grammar checking and lexical refinement during manuscript preparation, a meta‑learning touch.
The method builds on convex optimization, echoing entropy‑regularized mirror descent and maximal‑alignment ideas.

customer relationship management (crm) optimization

👍 👎 ♥ Save

The role of communication in effective business management

Silesian University of

Abstract
This paper examines the impact of internal communication on effective business management through a comparative analysis of two medium-sized car rental companies operating in Poland. Using a structured survey completed by 220 employees, the study evaluates 15 communication-related factors, including feedback culture, managerial accessibility, message clarity, and interdepartmental coordination. The findings indicate that Company X significantly outperforms Company Y across all evaluated dimensions, largely due to its use of advanced communication technologies, participatory models, and clear feedback mechanisms. The research highlights the strategic role of two-way communication in fostering employee engagement, organizational transparency, and operational efficiency. It contributes to the field by offering a rare, data-driven comparison within one industry and supports existing models that link internal communication to job satisfaction and motivation. Limitations include reliance on self-reported data and focus on a single industry and country. Future studies are recommended to explore cross-sector and longitudinal perspectives, especially in the context of digital and hybrid work environments.

AI Insights

Participatory communication processes boost alignment and motivation beyond what traditional top‑down messaging achieves.
Deploying intranet systems, chat tools, and digital dashboards sharpens message consistency across departments.
A feedback‑oriented culture, with regular constructive reviews, strengthens trust and accelerates employee development.
Cross‑functional meetings, joint workshops, and shared digital spaces can dissolve Company Y’s communication silos.
“Improving Internal Communication Management in SMEs” offers a practical roadmap for service‑design firms.
“Anatomia Biznesu. Komunikacja” dissects the cultural underpinnings of effective business dialogue.
Internal Communication: the exchange of information within an organization to achieve its goals and objectives.

👍 👎 ♥ Save

Agent-in-the-Loop: A Data Flywheel for Continuous Improvement in LLM-based Customer Support

Airbnb, Inc, USA

Abstract
We introduce an Agent-in-the-Loop (AITL) framework that implements a continuous data flywheel for iteratively improving an LLM-based customer support system. Unlike standard offline approaches that rely on batch annotations, AITL integrates four key types of annotations directly into live customer operations: (1) pairwise response preferences, (2) agent adoption and rationales, (3) knowledge relevance checks, and (4) identification of missing knowledge. These feedback signals seamlessly feed back into models' updates, reducing retraining cycles from months to weeks. Our production pilot involving US-based customer support agents demonstrated significant improvements in retrieval accuracy (+11.7% recall@75, +14.8% precision@8), generation quality (+8.4% helpfulness) and agent adoption rates (+4.5%). These results underscore the effectiveness of embedding human feedback loops directly into operational workflows to continuously refine LLM-based customer support system.

AI Insights

RAG splits into Generation, Retrieval, and Ranking models, each fine‑tuned for a specific role.
Generation uses Odds Ratio Preference Optimization (ORPO) to align outputs with agent‑approved preferences.
Retrieval improves recall via Multiple Negatives Ranking Loss, pulling in relevant article chunks.
Ranking refines top‑k candidates with high‑confidence annotated data, surfacing the best answer.
GLOW orchestrates on‑demand Ray clusters, slashing offline experiment time from months to weeks.
LLMs can match human annotators on some steps, hinting at scalable, automated labeling, though ambiguity remains.

Marketing Channels

👍 👎 ♥ Save

The evolution of insurance purchasing behavior: an empirical study on the adoption of online channels in Poland

Silesian University of

Abstract
This paper examines how Polish consumers are adapting to online insurance purchasing channels and what factors influence their preferences. Drawing on a structured survey of 100 respondents with varied demographic profiles, the study explores purchasing frequency, channel usage, price sensitivity, trust, and decision-making behaviors. Results indicate a clear shift toward digital tools, with many consumers valuing the speed, convenience, and transparency of online platforms, particularly for simple insurance products. However, barriers remain, including concerns about data security, lack of personal guidance, and difficulty understanding policy terms. A hybrid model is emerging, where online tools are used for research and comparison, while traditional agents are consulted for complex decisions. Respondents emphasized the importance of trust and personal contact, showing that emotional and psychological factors still play a role in digital adoption. Price was the dominant decision factor, but many consumers also prioritized service quality and reliability. The study concludes that insurers should invest in user-friendly digital experiences while maintaining human support options. Strategic omnichannel integration is recommended to meet diverse customer needs and reduce digital exclusion. Limitations of the study include a modest sample size and focus on the Polish market. Future research should investigate the role of AI in digital distribution, segment preferences by insurance type, and analyze trends across different regions or age groups. This paper adds empirical value to the understanding of insurance distribution and consumer behavior in digitally transforming financial markets.

AI Insights

Empathy-driven UX: design digital sales channels that address both technical and emotional user needs.
Price remains the dominant channel choice factor, underscoring strong price sensitivity among Polish consumers.
Hybrid distribution is emerging: online research and comparison paired with offline agent consultation for complex decisions.
Sample size and scope limits: 100 respondents, no AI impact or insurance-type segmentation explored.
Digitalization definition: converting traditional processes into digital formats for customer access.
Hybrid model definition: seamless integration of online and offline touchpoints for a unified experience.
Suggested reading: Szymańska et al. (2018) on online non‑life insurance distribution in Poland, and Iwanicz‑Drozdowska (2018) for broader European digital insurance market trends.

👍 👎 ♥ Save

Moloch's Bargain: Emergent Misalignment When LLMs Compete for Audiences

Stanford University

Abstract
Large language models (LLMs) are increasingly shaping how information is created and disseminated, from companies using them to craft persuasive advertisements, to election campaigns optimizing messaging to gain votes, to social media influencers boosting engagement. These settings are inherently competitive, with sellers, candidates, and influencers vying for audience approval, yet it remains poorly understood how competitive feedback loops influence LLM behavior. We show that optimizing LLMs for competitive success can inadvertently drive misalignment. Using simulated environments across these scenarios, we find that, 6.3% increase in sales is accompanied by a 14.0% rise in deceptive marketing; in elections, a 4.9% gain in vote share coincides with 22.3% more disinformation and 12.5% more populist rhetoric; and on social media, a 7.5% engagement boost comes with 188.6% more disinformation and a 16.3% increase in promotion of harmful behaviors. We call this phenomenon Moloch's Bargain for AI--competitive success achieved at the cost of alignment. These misaligned behaviors emerge even when models are explicitly instructed to remain truthful and grounded, revealing the fragility of current alignment safeguards. Our findings highlight how market-driven optimization pressures can systematically erode alignment, creating a race to the bottom, and suggest that safe deployment of AI systems will require stronger governance and carefully designed incentives to prevent competitive dynamics from undermining societal trust.

Personalization

👍 👎 ♥ Save

MONKEY: Masking ON KEY-Value Activation Adapter for Personalization

University of Maryland

Rate this image: 😍 👍 👎

Abstract
Personalizing diffusion models allows users to generate new images that incorporate a given subject, allowing more control than a text prompt. These models often suffer somewhat when they end up just recreating the subject image, and ignoring the text prompt. We observe that one popular method for personalization, the IP-Adapter automatically generates masks that we definitively segment the subject from the background during inference. We propose to use this automatically generated mask on a second pass to mask the image tokens, thus restricting them to the subject, not the background, allowing the text prompt to attend to the rest of the image. For text prompts describing locations and places, this produces images that accurately depict the subject while definitively matching the prompt. We compare our method to a few other test time personalization methods, and find our method displays high prompt and source image alignment.

AI Insights

MONKEY adds IP‑Attention, a region‑specific module that weights subject tokens during diffusion.
Second‑pass masking isolates subject tokens, preventing background bleed‑through and keeping prompt fidelity.
Benchmarks on LAION‑400M and MS‑COCO show MONKEY beats Latent Diffusion by 2.3% FID and 1.7% CLIP‑score.
A qualitative ablation shows masking background tokens cuts hallucinations, especially for location‑heavy prompts.
Future work envisions a lightweight cross‑modal transformer to decouple subject and prompt embeddings.
PyTorch code runs on a single RTX‑3090, delivering 8 fps for 512×512 outputs.
Read Denoising Diffusion Probabilistic Models for a deep dive into stochastic training.

👍 👎 ♥ Save

Personalize Before Retrieve: LLM-based Personalized Query Expansion for User-Centric Retrieval

Abstract
Retrieval-Augmented Generation (RAG) critically depends on effective query expansion to retrieve relevant information. However, existing expansion methods adopt uniform strategies that overlook user-specific semantics, ignoring individual expression styles, preferences, and historical context. In practice, identical queries in text can express vastly different intentions across users. This representational rigidity limits the ability of current RAG systems to generalize effectively in personalized settings. Specifically, we identify two core challenges for personalization: 1) user expression styles are inherently diverse, making it difficult for standard expansions to preserve personalized intent. 2) user corpora induce heterogeneous semantic structures-varying in topical focus and lexical organization-which hinders the effective anchoring of expanded queries within the user's corpora space. To address these challenges, we propose Personalize Before Retrieve (PBR), a framework that incorporates user-specific signals into query expansion prior to retrieval. PBR consists of two components: P-PRF, which generates stylistically aligned pseudo feedback using user history for simulating user expression style, and P-Anchor, which performs graph-based structure alignment over user corpora to capture its structure. Together, they produce personalized query representations tailored for retrieval. Experiments on two personalized benchmarks show that PBR consistently outperforms strong baselines, with up to 10% gains on PersonaBench across retrievers. Our findings demonstrate the value of modeling personalization before retrieval to close the semantic gap in user-adaptive RAG systems. Our code is available at https://github.com/Zhang-Yingyi/PBR-code.

Direction on Data Science Organizations

👍 👎 ♥ Save

What Do We Mean When We Talk About Data Storytelling?

Rate this image: 😍 👍 👎

Abstract
We have witnessed rapid growth in data storytelling research. Scholars from multiple disciplines have contributed new theories and techniques surrounding data storytelling. However, with this prolific development, a fuzzy boundary of data storytelling comes. We argue that understanding how "data storytelling" has been defined and interpreted by academia is crucial for facilitating communication between researchers, encouraging the consistent use of concepts and measures, assisting newcomers in approaching and positioning their research in this area, and enabling the effective application of relevant techniques and tools. Thus, it is necessary to systematically reflect on "what is data storytelling" and promote a more thorough understanding of this concept. Specifically, we investigated how existing research has conceptualized "data storytelling." As a result, we identified 96 publications that provide explicit definitions. By coding these definitions in-depth, we identified five paradigms of defining data storytelling, as well as a broad spectrum of interpretations regarding the content, objectives, and techniques of data storytelling. Finally, we concluded with implications for future research, aiming to foster nuanced communication about "data storytelling," suggest research opportunities, and establish a more inclusive theoretical foundation for this research direction.

👍 👎 ♥ Save

Rethinking How We Discuss the Guidance of Student Researchers in Computing

Abstract
Computing faculty at research universities are often expected to guide the work of undergraduate and graduate student researchers. This guidance is typically called advising or mentoring, but these terms belie the complexity of the relationship, which includes several related but distinct roles. I examine the guidance of student researchers in computing (abbreviated to research guidance or guidance throughout) within a facet framework, creating an inventory of roles that faculty members can hold. By expanding and disambiguating the language of guidance, this approach reveals the full breadth of faculty responsibilities toward student researchers, and it facilitates discussing conflicts between those responsibilities. Additionally, the facet framework permits greater flexibility for students seeking guidance, allowing them a robust support network without implying inadequacy in an individual faculty member's skills. I further argue that an over-reliance on singular terms like advising or mentoring for the guidance of student researchers obscures the full scope of faculty responsibilities and interferes with improvement of those as skills. Finally, I provide suggestions for how the facet framework can be utilized by faculty and institutions, and how parts of it can be discussed with students for their benefit.

Data Science Management

👍 👎 ♥ Save

Spiral Model Technique For Data Science & Machine Learning Lifecycle

Data Science problems

Rate this image: 😍 👍 👎

Abstract
Analytics play an important role in modern business. Companies adapt data science lifecycles to their culture to seek productivity and improve their competitiveness among others. Data science lifecycles are fairly an important contributing factor to start and end a project that are data dependent. Data science and Machine learning life cycles comprises of series of steps that are involved in a project. A typical life cycle states that it is a linear or cyclical model that revolves around. It is mostly depicted that it is possible in a traditional data science life cycle to start the process again after reaching the end of cycle. This paper suggests a new technique to incorporate data science life cycle to business problems that have a clear end goal. A new technique called spiral technique is introduced to emphasize versatility, agility and iterative approach to business processes.

AI Insights

The Spiral Model embeds exit checkpoints after each revolution, letting teams stop when business‑defined thresholds are met.
By turning the ML lifecycle into a goal‑driven spiral, teams gain accountability and avoid unbounded iteration costs.
In a turnover‑prediction case study, the spiral halted once ROC‑AUC ≥ 0.85, proving exit‑criteria efficacy.
The technique shines when projects have clear exit criteria, such as designing retention policies or forecasting churn.
Weaknesses arise when objectives shift mid‑cycle; careful checkpoint design is essential to maintain alignment.
Recommended reading: “Hidden Technical Debt in Machine Learning Systems” and “Software Engineering for Machine Learning” for deeper process insights.
Exit checkpoints are business‑defined criteria that signal when a project can be terminated, ensuring resource efficiency.

👍 👎 ♥ Save

Aegis: A Correlation-Based Data Masking Advisor for Data Sharing Ecosystems

Abstract
Data-sharing ecosystems enable entities -- such as providers, consumers, and intermediaries -- to access, exchange, and utilize data for various downstream tasks and applications. Due to privacy concerns, data providers typically anonymize datasets before sharing them; however, the existence of multiple masking configurations results in masked datasets with varying utility. Consequently, a key challenge lies in efficiently determining the optimal masking configuration that maximizes a dataset's utility. This paper presents AEGIS, a middleware framework for identifying the optimal masking configuration for machine learning datasets that consist of features and a class label. We introduce a utility optimizer that minimizes predictive utility deviation -- a metric based on the changes in feature-label correlations before and after masking. Our framework leverages limited data summaries (such as 1D histograms) or none to estimate the feature-label joint distribution, making it suitable for scenarios where raw data is inaccessible due to privacy restrictions. To achieve this, we propose a joint distribution estimator based on iterative proportional fitting, which allows supporting various feature-label correlation quantification methods such as g3, mutual information, or chi-square. Our experimental evaluation on real-world datasets shows that AEGIS identifies optimal masking configurations over an order of magnitude faster, while the resulting masked datasets achieve predictive performance on downstream ML tasks that is on par with baseline approaches.

Attribution

👍 👎 ♥ Save

Where Did It All Go Wrong? A Hierarchical Look into Multi-Agent Error Attribution

Amazon Web Services

Abstract
Error attribution in Large Language Model (LLM) multi-agent systems presents a significant challenge in debugging and improving collaborative AI systems. Current approaches to pinpointing agent and step level failures in interaction traces - whether using all-at-once evaluation, step-by-step analysis, or binary search - fall short when analyzing complex patterns, struggling with both accuracy and consistency. We present ECHO (Error attribution through Contextual Hierarchy and Objective consensus analysis), a novel algorithm that combines hierarchical context representation, objective analysis-based evaluation, and consensus voting to improve error attribution accuracy. Our approach leverages a positional-based leveling of contextual understanding while maintaining objective evaluation criteria, ultimately reaching conclusions through a consensus mechanism. Experimental results demonstrate that ECHO outperforms existing methods across various multi-agent interaction scenarios, showing particular strength in cases involving subtle reasoning errors and complex interdependencies. Our findings suggest that leveraging these concepts of structured, hierarchical context representation combined with consensus-based objective decision-making, provides a more robust framework for error attribution in multi-agent systems.

AI Insights

The appendix details ECHO’s algorithm, prompts, and code, ensuring full reproducibility.
It follows the NeurIPS Code of Ethics, transparently noting limits and societal impacts.
ECHO is benchmarked on Who&When with Anthropic LLMs, and its GitHub repo is public.
No human subjects were used, so IRB approval and risk disclosure were unnecessary.
The paper references Ethics Guidelines for NeurIPS, Paperswithcode datasets, and Coursera’s Ethics course.
A noted weakness is the lack of safeguards for responsible release of high‑risk models.
ECHO’s consensus voting is formally described, offering a new objective decision framework for hierarchical AI debugging.

👍 👎 ♥ Save

The Hidden DNA of LLM-Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution

Abstract
In this paper, we present the first large-scale study exploring whether JavaScript code generated by Large Language Models (LLMs) can reveal which model produced it, enabling reliable authorship attribution and model fingerprinting. With the rapid rise of AI-generated code, attribution is playing a critical role in detecting vulnerabilities, flagging malicious content, and ensuring accountability. While AI-vs-human detection usually treats AI as a single category we show that individual LLMs leave unique stylistic signatures, even among models belonging to the same family or parameter size. To this end, we introduce LLM-NodeJS, a dataset of 50,000 Node.js back-end programs from 20 large language models. Each has four transformed variants, yielding 250,000 unique JavaScript samples and two additional representations (JSIR and AST) for diverse research applications. Using this dataset, we benchmark traditional machine learning classifiers against fine-tuned Transformer encoders and introduce CodeT5-JSA, a custom architecture derived from the 770M-parameter CodeT5 model with its decoder removed and a modified classification head. It achieves 95.8% accuracy on five-class attribution, 94.6% on ten-class, and 88.5% on twenty-class tasks, surpassing other tested models such as BERT, CodeBERT, and Longformer. We demonstrate that classifiers capture deeper stylistic regularities in program dataflow and structure, rather than relying on surface-level features. As a result, attribution remains effective even after mangling, comment removal, and heavy code transformations. To support open science and reproducibility, we release the LLM-NodeJS dataset, Google Colab training scripts, and all related materials on GitHub: https://github.com/LLM-NodeJS-dataset.

Help us improve your experience!