Hi!

Your personalized paper recommendations for 05 to 09 January, 2026.

Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies

Wuhan University

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
With the rapid development of e-commerce, auto-bidding has become a key asset in optimizing advertising performance under diverse advertiser environments. The current approaches focus on reinforcement learning (RL) and generative models. These efforts imitate offline historical behaviors by utilizing a complex structure with expensive hyperparameter tuning. The suboptimal trajectories further exacerbate the difficulty of policy learning. To address these challenges, we proposes QGA, a novel Q-value regularized Generative Auto-bidding method. In QGA, we propose to plug a Q-value regularization with double Q-learning strategy into the Decision Transformer backbone. This design enables joint optimization of policy imitation and action-value maximization, allowing the learned bidding policy to both leverage experience from the dataset and alleviate the adverse impact of the suboptimal trajectories. Furthermore, to safely explore the policy space beyond the data distribution, we propose a Q-value guided dual-exploration mechanism, in which the DT model is conditioned on multiple return-to-go targets and locally perturbed actions. This entire exploration process is dynamically guided by the aforementioned Q-value module, which provides principled evaluation for each candidate action. Experiments on public benchmarks and simulation environments demonstrate that QGA consistently achieves superior or highly competitive results compared to existing alternatives. Notably, in large-scale real-world A/B testing, QGA achieves a 3.27% increase in Ad GMV and a 2.49% improvement in Ad ROI.

Why we are recommending this paper?
Due to your Interest in Bidding

This paper directly addresses auto-bidding, a key area of interest within paid search optimization. The use of reinforcement learning and generative models aligns with the user’s focus on data science management and bidding strategies.

A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction

China Telecom

Rate paper: 👍 👎 ♥ Save

Abstract
This paper presents a unified spoken language model for emotional intelligence, enhanced by a novel data construction strategy termed Injected Emotional-Attribution Thinking (IEAT). IEAT incorporates user emotional states and their underlying causes into the model's internal reasoning process, enabling emotion-aware reasoning to be internalized rather than treated as explicit supervision. The model is trained with a two-stage progressive strategy. The first stage performs speech-text alignment and emotional attribute modeling via self-distillation, while the second stage conducts end-to-end cross-modal joint optimization to ensure consistency between textual and spoken emotional expressions. Experiments on the Human-like Spoken Dialogue Systems Challenge (HumDial) Emotional Intelligence benchmark demonstrate that the proposed approach achieves top-ranked performance across emotional trajectory modeling, emotional reasoning, and empathetic response generation under both LLM-based and human evaluations.

Why we are recommending this paper?
Due to your Interest in Attribution

The focus on emotional attribution within a language model is highly relevant to personalization and CRM optimization. This research could provide valuable insights into understanding and responding to customer sentiment.

From No-Regret to Strategically Robust Learning in Repeated Auctions

Universit Paris Cit

Rate paper: 👍 👎 ♥ Save

Abstract
In Bayesian single-item auctions, a monotone bidding strategy--one that prescribes a higher bid for a higher value type--can be equivalently represented as a partition of the quantile space into consecutive intervals corresponding to increasing bids. Kumar et al. (2024) prove that agile online gradient descent (OGD), when used to update a monotone bidding strategy through its quantile representation, is strategically robust in repeated first-price auctions: when all bidders employ agile OGD in this way, the auctioneer's average revenue per round is at most the revenue of Myerson's optimal auction, regardless of how she adjusts the reserve price over time. In this work, we show that this strategic robustness guarantee is not unique to agile OGD or to the first-price auction: any no-regret learning algorithm, when fed gradient feedback with respect to the quantile representation, is strategically robust, even if the auction format changes every round, provided the format satisfies allocation monotonicity and voluntary participation. In particular, the multiplicative weights update (MWU) algorithm simultaneously achieves the optimal regret guarantee and the best-known strategic robustness guarantee. At a technical level, our results are established via a simple relation that bridges Myerson's auction theory and standard no-regret learning theory. This showcases the potential of translating standard regret guarantees into strategic robustness guarantees for specific games, without explicitly minimizing any form of swap regret.

Why we are recommending this paper?
Due to your Interest in Bidding

The work on Bayesian single-item auctions and robust learning is directly applicable to bidding strategies, a core interest for this user. The focus on optimal policies aligns with the user's interest in data science management.

Text as a Universal Interface for Transferable Personalization

Northeastern University

Rate paper: 👍 👎 ♥ Save

Abstract
We study the problem of personalization in large language models (LLMs). Prior work predominantly represents user preferences as implicit, model-specific vectors or parameters, yielding opaque ``black-box'' profiles that are difficult to interpret and transfer across models and tasks. In contrast, we advocate natural language as a universal, model- and task-agnostic interface for preference representation. The formulation leads to interpretable and reusable preference descriptions, while naturally supporting continual evolution as new interactions are observed. To learn such representations, we introduce a two-stage training framework that combines supervised fine-tuning on high-quality synthesized data with reinforcement learning to optimize long-term utility and cross-task transferability. Based on this framework, we develop AlignXplore+, a universal preference reasoning model that generates textual preference summaries. Experiments on nine benchmarks show that our 8B model achieves state-of-the-art performanc -- outperforming substantially larger open-source models -- while exhibiting strong transferability across tasks, model families, and interaction formats.

Why we are recommending this paper?
Due to your Interest in Personalization

This paper tackles personalization directly, exploring the use of LLMs for transferable profiles. Given the user’s interest in personalization and CRM optimization, this research offers a potentially transformative approach.

Learning from Prompt itself: the Hierarchical Attribution Prompt Optimization

Nanjing University of Science and Technology

Rate paper: 👍 👎 ♥ Save

Abstract
Optimization is fundamental across numerous disciplines, typically following an iterative process of refining an initial solution to enhance performance. This principle is equally critical in prompt engineering, where designing effective prompts for large language models constitutes a complex optimization challenge. A structured optimization approach requires automated or semi-automated procedures to develop improved prompts, thereby reducing manual effort, improving performance, and yielding an interpretable process. However, current prompt optimization methods often induce prompt drift, where new prompts fix prior failures but impair performance on previously successful tasks. Additionally, generating prompts from scratch can compromise interpretability. To address these limitations, this study proposes the Hierarchical Attribution Prompt Optimization (HAPO) framework, which introduces three innovations: (1) a dynamic attribution mechanism targeting error patterns in training data and prompting history, (2) semantic-unit optimization for editing functional prompt segments, and (3) multimodal-friendly progression supporting both end-to-end LLM and LLM-MLLM workflows. Applied in contexts like single/multi-image QA (e.g., OCRV2) and complex task analysis (e.g., BBH), HAPO demonstrates enhanced optimization efficiency, outperforming comparable automated prompt optimization methods and establishing an extensible paradigm for scalable prompt engineering.

Why we are recommending this paper?
Due to your Interest in Attribution

The research on prompt optimization aligns with the user's interest in large language models and their application within marketing channels. This work provides a framework for improving the effectiveness of prompts, a key element in personalization strategies.

A Mathematical Theory of Payment Channel Networks

University of

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
We introduce a geometric theory of payment channel networks that centers the polytope $W_G$ of feasible wealth distributions; liquidity states $L_G$ project onto $W_G$ via strict circulations. A payment is feasible iff the post-transfer wealth stays in $W_G$. This yields a simple throughput law: if $ζ$ is on-chain settlement bandwidth and $ρ$ the expected fraction of infeasible payments, the sustainable off-chain bandwidth satisfies $S = ζ/ ρ$. Feasibility admits a cut-interval view: for any node set S, the wealth of S must lie in an interval whose width equals the cut capacity $C(δ(S))$. Using this, we show how multi-party channels (coinpools / channel factories) expand $W_G$. Modeling a k-party channel as a k-uniform hyperedge widens every cut in expectation, so $W_G$ grows monotonically with k; for single nodes the expected accessible wealth scales linearly with $k/n$. We also analyze depletion. Under linear, asymmetric fees, cost-minimizing flow within a wealth fiber pushes cycles to the boundary, generically depleting channels except for a residual spanning forest. Three mitigation levers follow: (i) symmetric fees per direction, (ii) convex/tiered fees (effective flow control but at odds with source routing without liquidity disclosure), and (iii) coordinated replenishment (choose an optimal circulation within a fiber). Together, these results explain why two-party meshes struggle to scale and why multi-party primitives are more capital-efficient, yielding higher expected payment bandwidth. They also show how fee design and coordination keep operation inside the feasible region, improving reliability.

Why we are recommending this paper?
Due to your Interest in Marketing Channels

Efficient Sequential Recommendation for Long Term User Interest Via Personalization

Meta

Rate paper: 👍 👎 ♥ Save

Abstract
Recent years have witnessed success of sequential modeling, generative recommender, and large language model for recommendation. Though the scaling law has been validated for sequential models, it showed inefficiency in computational capacity when considering real-world applications like recommendation, due to the non-linear(quadratic) increasing nature of the transformer model. To improve the efficiency of the sequential model, we introduced a novel approach to sequential recommendation that leverages personalization techniques to enhance efficiency and performance. Our method compresses long user interaction histories into learnable tokens, which are then combined with recent interactions to generate recommendations. This approach significantly reduces computational costs while maintaining high recommendation accuracy. Our method could be applied to existing transformer based recommendation models, e.g., HSTU and HLLM. Extensive experiments on multiple sequential models demonstrate its versatility and effectiveness. Source code is available at \href{https://github.com/facebookresearch/PerSRec}{https://github.com/facebookresearch/PerSRec}.

Why we are recommending this paper?
Due to your Interest in Personalization

SmartSearch: Process Reward-Guided Query Refinement for Search Agents

Renmin University of China

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

SmartSearch is a framework designed to optimize the quality of intermediate search queries through two key mechanisms: Process rewards and Query refinement. [3]
The framework uses a three-stage curriculum learning framework that guides the agent through a progression from imitation and alignment to generalization. [3]
Experiments across four challenging benchmarks demonstrate that SmartSearch consistently surpasses existing baselines, with further quantitative analyses confirming significant gains in both search efficiency and query quality. [3]
Process rewards: Fine-grained supervision for the quality of each query through Dual-Level Assessment. [3]
Query refinement: Promoting the optimization of query generation by selectively refining low-quality queries and regenerating subsequent search rounds from these refined points. [3]
The paper assumes that the teacher model is accurate, which may not always be the case in real-world scenarios. [3]
The framework relies on a three-stage curriculum learning framework, which may not be suitable for all applications or domains. [3]
SmartSearch is a robust framework that can effectively optimize intermediate search queries, achieving significant gains in both search efficiency and query quality. [2]

Abstract
Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems by incorporating information retrieval capabilities. Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked. As a result, the generated queries often remain inaccurate, leading to unexpected retrieval results and ultimately limiting search agents' overall effectiveness. To mitigate this issue, we introduce SmartSearch, a framework built upon two key mechanisms: (1) Process rewards, which provide fine-grained supervision for the quality of each intermediate search query through Dual-Level Credit Assessment. (2) Query refinement, which promotes the optimization of query generation by selectively refining low-quality search queries and regenerating subsequent search rounds based on these refinements. To enable the search agent to progressively internalize the ability to improve query quality under the guidance of process rewards, we design a three-stage curriculum learning framework. This framework guides the agent through a progression from imitation, to alignment, and ultimately to generalization. Experimental results show that SmartSearch consistently surpasses existing baselines, and additional quantitative analyses further confirm its significant gains in both search efficiency and query quality. The code is available at https://github.com/MYVAE/SmartSearch.

Why we are recommending this paper?
Due to your Interest in Paid Search

Auditing Search Query Suggestion Bias Through Recursive Algorithm Interrogation

TH Kln University of Applied Sciences

Rate paper: 👍 👎 ♥ Save

Abstract
Despite their important role in online information search, search query suggestions have not been researched as much as most other aspects of search engines. Although reasons for this are multi-faceted, the sparseness of context and the limited data basis of up to ten suggestions per search query pose the most significant problem in identifying bias in search query suggestions. The most proven method to reduce sparseness and improve the validity of bias identification of search query suggestions so far is to consider suggestions from subsequent searches over time for the same query. This work presents a new, alternative approach to search query bias identification that includes less high-level suggestions to deepen the data basis of bias analyses. We employ recursive algorithm interrogation techniques and create suggestion trees that enable access to more subliminal search query suggestions. Based on these suggestions, we investigate topical group bias in person-related searches in the political domain.

Why we are recommending this paper?
Due to your Interest in Paid Search

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

Data Science Management
Direction on Data Science Organizations
customer relationship management (crm) optimization

You can edit or add more interests any time.

💬 Help Shape Our Pricing

We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.

Share Your Feedback

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback