Hi!

Your personalized paper recommendations for 19 to 23 January, 2026.

DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs

Peking University

Rate paper: 👍 👎 ♥ Save

AI Insights

Further research is needed to explore the potential applications and limitations of this approach. (ML: 0.96)👍👎
The paper proposes a novel approach to improve the performance of large language models (LLMs) in online advertising by leveraging reinforcement learning. (ML: 0.94)👍👎
The paper presents experimental results that demonstrate the effectiveness of AERO in improving ad click-through rates (CTR) and revenue compared to traditional methods. (ML: 0.90)👍👎
LLMs: Large Language Models RL: Reinforcement Learning AERO: Auto-Bidding with Evolutionary Reinforcement Optimization The proposed approach of using AERO for auto-bidding in online advertising has shown promising results. (ML: 0.88)👍👎
AERO uses a combination of evolutionary algorithms and reinforcement learning to optimize bidding strategies in real-time. (ML: 0.84)👍👎
The combination of evolutionary algorithms and reinforcement learning can lead to improved performance in real-time bidding. (ML: 0.84)👍👎
The authors introduce a new algorithm called AERO, which stands for Auto-Bidding with Evolutionary Reinforcement Optimization. (ML: 0.77)👍👎

Abstract
Optimizing the advertiser's cumulative value of winning impressions under budget constraints poses a complex challenge in online advertising, under the paradigm of AI-Generated Bidding (AIGB). Advertisers often have personalized objectives but limited historical interaction data, resulting in few-shot scenarios where traditional reinforcement learning (RL) methods struggle to perform effectively. Large Language Models (LLMs) offer a promising alternative for AIGB by leveraging their in-context learning capabilities to generalize from limited data. However, they lack the numerical precision required for fine-grained optimization. To address this limitation, we introduce GRPO-Adaptive, an efficient LLM post-training strategy that enhances both reasoning and numerical precision by dynamically updating the reference policy during training. Built upon this foundation, we further propose DARA, a novel dual-phase framework that decomposes the decision-making process into two stages: a few-shot reasoner that generates initial plans via in-context prompting, and a fine-grained optimizer that refines these plans using feedback-driven reasoning. This separation allows DARA to combine LLMs' in-context learning strengths with precise adaptability required by AIGB tasks. Extensive experiments on both real-world and synthetic data environments demonstrate that our approach consistently outperforms existing baselines in terms of cumulative advertiser value under budget constraints.

Why we are recommending this paper?
Due to your Interest in Marketing Channels

This paper directly addresses budget allocation, a key element of paid search optimization. The use of LLMs for bidding strategies aligns with the user’s interest in data science management and personalization within marketing channels.

BanditLP: Large-Scale Stochastic Optimization for Personalized Recommendations

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

Adaptive heuristics: Heuristics that adapt to changing conditions or environments. (ML: 0.97)👍👎
The paper assumes that the true dominant reward and cost functions are known, which may not be realistic in practice. (ML: 0.95)👍👎
The authors propose a neural optimization approach using adaptive heuristics for intelligent marketing systems. (ML: 0.93)👍👎
The proposed approach can effectively handle complex marketing systems with multiple stakeholders and constraints. (ML: 0.92)👍👎
The offline experiment setup involves generating synthetic data with 500 users and 100 items, each assigned to one of five disjoint sets of items. (ML: 0.91)👍👎
Neural optimization approach: An approach that uses neural networks to optimize a function subject to constraints. (ML: 0.90)👍👎
Multi-stakeholder contextual bandit problem: A problem where the goal is to maximize the cumulative reward while satisfying multiple constraints. (ML: 0.89)👍👎
The paper presents a multi-stakeholder contextual bandit problem, where the goal is to maximize the cumulative reward while satisfying multiple constraints. (ML: 0.88)👍👎
The offline experiment setup is limited to 500 users and 100 items, which may not generalize well to larger-scale problems. (ML: 0.85)👍👎
The offline experiment results demonstrate the effectiveness of the approach in maximizing cumulative reward while satisfying constraints. (ML: 0.81)👍👎

Abstract
We present BanditLP, a scalable multi-stakeholder contextual bandit framework that unifies neural Thompson Sampling for learning objective-specific outcomes with a large-scale linear program for constrained action selection at serving time. The methodology is application-agnostic, compatible with arbitrary neural architectures, and deployable at web scale, with an LP solver capable of handling billions of variables. Experiments on public benchmarks and synthetic data show consistent gains over strong baselines. We apply this approach in LinkedIn's email marketing system and demonstrate business win, illustrating the value of integrated exploration and constrained optimization in production.

Why we are recommending this paper?
Due to your Interest in Personalization

The focus on contextual bandits and large-scale optimization is highly relevant to personalization efforts, particularly in recommendation systems. This approach aligns with the user's interest in CRM optimization and data science management.

Rerank Before You Reason: Analyzing Reranking Tradeoffs through Effective Token Cost in Deep Search Agents

University of Waterloo

Rate paper: 👍 👎 ♥ Save

AI Insights

Reasoning effort: The level of cognitive processing required by a model to generate search results, with higher levels requiring more computational resources. (ML: 0.98)👍👎
However, the optimal reranking depth and reasoning effort depend on the specific use case and available resources. (ML: 0.98)👍👎
The optimal reranking depth and reasoning effort depend on the specific use case and available resources, which can make it difficult to determine the best approach. (ML: 0.98)👍👎
The optimal reranking depth and reasoning effort depend on the specific use case and available resources. (ML: 0.98)👍👎
Reranking: The process of reordering search results based on additional information or criteria. (ML: 0.96)👍👎
The use of large language models like oss-20b and oss-120b may increase computational costs. (ML: 0.96)👍👎
Using large language models like oss-20b and oss-120b can provide better results than smaller models, but may also increase computational costs. (ML: 0.95)👍👎
Using large language models like oss-20b and oss-120b can provide better results than smaller models, but may also increase computational costs. (ML: 0.95)👍👎
Reranking with deep search agents can be an effective way to improve accuracy at a lower cost than traditional ranking methods. (ML: 0.94)👍👎
Reranking with deep search agents can improve accuracy at a lower cost than traditional ranking methods. (ML: 0.92)👍👎
Deep search agents: AI-powered systems that use complex algorithms to rank search results. (ML: 0.91)👍👎

Abstract
Deep research agents rely on iterative retrieval and reasoning to answer complex queries, but scaling test-time computation raises significant efficiency concerns. We study how to allocate reasoning budget in deep search pipelines, focusing on the role of listwise reranking. Using the BrowseComp-Plus benchmark, we analyze tradeoffs between model scale, reasoning effort, reranking depth, and total token cost via a novel effective token cost (ETC) metric. Our results show that reranking consistently improves retrieval and end-to-end accuracy, and that moderate reranking often yields larger gains than increasing search-time reasoning, achieving comparable accuracy at substantially lower cost. All our code is available at https://github.com/texttron/BrowseComp-Plus.git

Why we are recommending this paper?
Due to your Interest in Paid Search

This research investigates deep search agents, a growing area of interest for optimizing paid search strategies. The focus on efficient reranking aligns with the need to manage complex data science pipelines.

Learning to Explain: Supervised Token Attribution from Transformer Attention Patterns

University of North Texas

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

The inter-annotator agreement analysis shows that the reliability and consistency of the rationale annotations are high, with Krippendorff's Alpha values ranging from 0.46 to 0.84. (ML: 0.97)👍👎
The paper also presents a large-scale annotation of rationales for three benchmark datasets: SST-2, CoLA, and HateXplain. (ML: 0.96)👍👎
The paper presents a new model-agnostic explanation method called ExpNet, which outperforms existing methods in terms of F1-score and AUROC on three benchmark datasets. (ML: 0.95)👍👎
F1-score Precision Recall AUROC (Area Under the ROC Curve) ExpNet is a state-of-the-art model-agnostic explanation method that outperforms existing methods in terms of F1-score and AUROC. (ML: 0.91)👍👎
ExpNet achieves 13.889 examples/second, making it approximately 70× faster than LIME, 12× faster than SHAP, and 5× faster than relevance propagation methods (LRP, GAE, MGAE). (ML: 0.84)👍👎
The computational efficiency of ExpNet is significantly better than other methods, making it suitable for large-scale applications. (ML: 0.75)👍👎

Abstract
Explainable AI (XAI) has become critical as transformer-based models are deployed in high-stakes applications including healthcare, legal systems, and financial services, where opacity hinders trust and accountability. Transformers self-attention mechanisms have proven valuable for model interpretability, with attention weights successfully used to understand model focus and behavior (Xu et al., 2015); (Wiegreffe and Pinter, 2019). However, existing attention-based explanation methods rely on manually defined aggregation strategies and fixed attribution rules (Abnar and Zuidema, 2020a); (Chefer et al., 2021), while model-agnostic approaches (LIME, SHAP) treat the model as a black box and incur significant computational costs through input perturbation. We introduce Explanation Network (ExpNet), a lightweight neural network that learns an explicit mapping from transformer attention patterns to token-level importance scores. Unlike prior methods, ExpNet discovers optimal attention feature combinations automatically rather than relying on predetermined rules. We evaluate ExpNet in a challenging cross-task setting and benchmark it against a broad spectrum of model-agnostic methods and attention-based techniques spanning four methodological families.

Why we are recommending this paper?
Due to your Interest in Attribution

The paper’s exploration of explainable AI (XAI) using transformer attention patterns is directly relevant to understanding attribution within marketing channels. This addresses the user’s interest in attribution and data science management.

DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems

Hong Kong Polytechnic University

Rate paper: 👍 👎 ♥ Save

AI Insights

The task granularity is flexible, and every reasoning chain must start from the raw data or a logically prior step. (ML: 0.97)👍👎
The instructions may be too complex or detailed for some users, potentially leading to confusion. (ML: 0.95)👍👎
The provided Jupyter Notebook content is a template for generating data science questions based on an answered notebook. (ML: 0.95)👍👎
QRA: Question-Reasoning-Answer triplet JSON: JavaScript Object Notation Generating high-quality data science questions based on an answered notebook requires careful analysis and adherence to specific guidelines. (ML: 0.94)👍👎
The output format requires a valid JSON object with specific keys such as 'data_type', 'domain', 'task_type', 'language', 'question', 'reasoning', 'answer', 'best_score (Optional)', and 'confidence'. (ML: 0.89)👍👎
The final output must be a valid JSON object with the specified structure. (ML: 0.82)👍👎
The instructions provide detailed guidelines for generating QRA triplets, including the importance of not mentioning the notebook and ensuring diversity across task types. (ML: 0.79)👍👎
The output format must conform to a valid JSON object with specified keys, ensuring that the generated QRA triplets are accurate and comprehensive. (ML: 0.77)👍👎

Abstract
Recent LLM-based data agents aim to automate data science tasks ranging from data analysis to deep learning. However, the open-ended nature of real-world data science problems, which often span multiple taxonomies and lack standard answers, poses a significant challenge for evaluation. To address this, we introduce DSAEval, a benchmark comprising 641 real-world data science problems grounded in 285 diverse datasets, covering both structured and unstructured data (e.g., vision and text). DSAEval incorporates three distinctive features: (1) Multimodal Environment Perception, which enables agents to interpret observations from multiple modalities including text and vision; (2) Multi-Query Interactions, which mirror the iterative and cumulative nature of real-world data science projects; and (3) Multi-Dimensional Evaluation, which provides a holistic assessment across reasoning, code, and results. We systematically evaluate 11 advanced agentic LLMs using DSAEval. Our results show that Claude-Sonnet-4.5 achieves the strongest overall performance, GPT-5.2 is the most efficient, and MiMo-V2-Flash is the most cost-effective. We further demonstrate that multimodal perception consistently improves performance on vision-related tasks, with gains ranging from 2.04% to 11.30%. Overall, while current data science agents perform well on structured data and routine data anlysis workflows, substantial challenges remain in unstructured domains. Finally, we offer critical insights and outline future research directions to advance the development of data science agents.

Why we are recommending this paper?
Due to your Interest in Data Science Management

The work on evaluating data science agents aligns with the user’s interest in data science management and automation. The focus on real-world problems is particularly valuable for optimizing data science workflows.

Citation of scientific evidence from video description and its association with attention and impact

University of Las Palmas de Gran Canaria

Rate paper: 👍 👎 ♥ Save

AI Insights

It uses various statistical models to understand the relationships between different types of references to research papers, such as citations, patents, policy documents, news articles, blogs, social media conversations, Wikipedia references, YouTube views, and readership metrics. (ML: 0.95)👍👎
The provided text appears to be a research paper or dissertation on the topic of measuring the impact and attention of research papers. (ML: 0.94)👍👎
The text is a complex analysis of how research papers are perceived and used by different audiences. (ML: 0.94)👍👎
The text includes various tables, statistics, and regression models that analyze different types of references to research papers, including citations, patents, policy documents, news articles, blogs, social media conversations, Wikipedia references, YouTube views, and readership metrics. (ML: 0.94)👍👎
The goal is to identify which types of references have the most significant impact on a paper's visibility and usage. (ML: 0.86)👍👎

Abstract
This study investigates how YouTube content creators utilize scientific evidence in videos. Log-linear regression examines the influence of alternative communication channels on video creators in Biotechnology, using data from 81,302 papers (2018-2023). This reveals a positive association with news articles and Wikipedia pages, but a negative association with scientific papers, policy documents, and patents. Despite the potential for enriching discussions, science video creators seem to favor materials with wider public attention over influential science, technology, and policy papers. These findings suggest a need for improved dissemination strategies for scientific research. Authors, universities, and journals should consider how their work can be made more accessible and engaging for science communicators on video.

Why we are recommending this paper?
Due to your Interest in Attribution

HumanLLM: Towards Personalized Understanding and Simulation of Human Nature

University of Science and Technology of China

Rate paper: 👍 👎 ♥ Save

AI Insights

Machine psychology: A research trend advocating for new data collection methodologies, sophisticated simulation tasks, and human-centered training regimes to align LLMs with human cognition and behavior. (ML: 0.98)👍👎
Role-playing and theory-of-mind tasks are commonly used to assess whether models can understand and replicate human behavior. (ML: 0.98)👍👎
Further research is needed to improve the performance of human-centric LLMs and address the limitations of current models. (ML: 0.98)👍👎
The development of HumanLLM demonstrates a significant step towards bridging the gap between academic and social capabilities in LLMs. (ML: 0.97)👍👎
Human-centric LLMs are needed to capture the intricacies of real human behaviors. (ML: 0.96)👍👎
The need for systematic evaluation of LLMs' human-like abilities has given rise to specialized benchmarks and datasets. (ML: 0.96)👍👎
Current LLMs have limited ability to simulate individual personalities, motivations, and dynamic social contexts. (ML: 0.95)👍👎
Current LLMs are limited in their ability to simulate individual personalities, motivations, and dynamic social contexts. (ML: 0.94)👍👎
Human-centric LLMs are necessary for authentic social intelligence in applications that interact directly with humans. (ML: 0.92)👍👎
Human-centric LLMs: LLMs that prioritize capturing the intricacies of real human behaviors and structured persona-scenario-behavior data for advanced social simulation. (ML: 0.89)👍👎

Abstract
Motivated by the remarkable progress of large language models (LLMs) in objective tasks like mathematics and coding, there is growing interest in their potential to simulate human behavior--a capability with profound implications for transforming social science research and customer-centric business insights. However, LLMs often lack a nuanced understanding of human cognition and behavior, limiting their effectiveness in social simulation and personalized applications. We posit that this limitation stems from a fundamental misalignment: standard LLM pretraining on vast, uncontextualized web data does not capture the continuous, situated context of an individual's decisions, thoughts, and behaviors over time. To bridge this gap, we introduce HumanLLM, a foundation model designed for personalized understanding and simulation of individuals. We first construct the Cognitive Genome Dataset, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon. Through a rigorous, multi-stage pipeline involving data filtering, synthesis, and quality control, we automatically extract over 5.5 million user logs to distill rich profiles, behaviors, and thinking patterns. We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences. Comprehensive evaluations demonstrate that HumanLLM achieves superior performance in predicting user actions and inner thoughts, more accurately mimics user writing styles and preferences, and generates more authentic user profiles compared to base models. Furthermore, HumanLLM shows significant gains on out-of-domain social intelligence benchmarks, indicating enhanced generalization.

Why we are recommending this paper?
Due to your Interest in Personalization

Collaboration versus Specialization in Service Systems with Impatient Customers

Georgia Institute of Technology

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

A lower bound on θ is derived that guarantees the proposed policy in Section 6.2 has a decision rule of n = 1. (ML: 0.89)👍👎
Synergy: The increase in service rate when multiple servers work together on the same task. (ML: 0.89)👍👎
Expedite Policy: A policy where all servers work together as a single team at full efficiency. (ML: 0.87)👍👎
The expedite policy is shown to be optimal for systems with N stations and M generalist servers, where the synergy among the servers is allowed to be task (station) dependent. (ML: 0.84)👍👎
The problem of maximizing the long-run average throughput in a multi-server system with task-dependent synergy is considered. (ML: 0.83)👍👎
Generalist Servers: Servers that can perform tasks at any station. (ML: 0.82)👍👎
Task-Dependent Synergy: The synergy among servers is allowed to be dependent on the specific task being performed. (ML: 0.82)👍👎
The problem of maximizing the long-run average throughput in a multi-server system with task-dependent synergy is considered, and a dynamic server assignment policy is proposed for systems with 2 servers and 2 stations. (ML: 0.82)👍👎
Proposed Policy: A dynamic server assignment policy for systems with 2 servers and 2 stations, which achieves the maximum throughput. (ML: 0.81)👍👎
A dynamic server assignment policy is proposed for systems with 2 servers and 2 stations, which achieves the maximum throughput. (ML: 0.77)👍👎

Abstract
We study tandem queueing systems in which servers work more efficiently in teams than on their own and customers are impatient in that they may leave the system while waiting for service. Our goal is to determine the server assignment policy that maximizes the long-run average throughput. We show that when each server is equally skilled at all tasks, the optimal policy has all the servers working together at all times. We also provide a complete characterization of the optimal policy for Markovian systems with two stations and two servers when each server's efficiency may be task dependent. We show that the throughput is maximized under the policy which assigns one server to each station (based on their relative skill at that station) unless station 2 has no work (in which case both servers work at station 1) or the number of customers in the buffer reaches a threshold whose value we characterize (in which case both servers work at station 2). We study how the optimal policy varies with the level of server synergy (including no synergy) and also compare the optimal policy for systems with different customer abandonment rates (including no abandonments). Finally, we investigate the case where the synergy among collaborating servers can be task-dependent and provide numerical results.

Why we are recommending this paper?
Due to your Interest in customer relationship management (crm) optimization

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

Bidding
Direction on Data Science Organizations

You can edit or add more interests any time.

💬 Help Shape Our Pricing

We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.

Share Your Feedback

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback