Paid Search

Pay-Per-Search Models are Abstention Models

Cornell University

Abstract
LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek external help for such questions or abstain. In this paper, we introduce MASH (Modeling Abstention via Selective Help-seeking), a training framework that readily extracts abstentions from LLMs. Our key idea is that any external help-seeking by an LLM, i.e. search tool use, can serve as a proxy for abstention if the external help (search) is appropriately penalized while simultaneously rewarding answer accuracy. MASH operationalizes this idea using reinforcement learning with a pay-per-search reward. We run experiments on three knowledge-intensive QA datasets. Our results show that MASH substantially improves upon the selective help-seeking performance of prior efficient search approaches; on multi-hop datasets, MASH improves answer accuracy by 7.6%. Furthermore, MASH demonstrates strong off-the-shelf abstention -- it can distinguish between unanswerable/answerable questions and selectively generate responses for answerable questions -- showcasing behavior analogous to specialized abstention approaches. We emphasize that contrary to prior abstention methods, MASH does not require pre-determining knowledge boundaries to construct training data. Instead, MASH's abstentions are a by-product of training for the auxiliary selective help-seeking task. Overall, we show that MASH training effectively aligns search tool use with parametric knowledge, which can be successfully leveraged for making abstention decisions.

👍 👎 ♥ Save

ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards

Tencent

Abstract
Search agents powered by Large Language Models (LLMs) have demonstrated significant potential in tackling knowledge-intensive tasks. Reinforcement learning (RL) has emerged as a powerful paradigm for training these agents to perform complex, multi-step reasoning. However, prior RL-based methods often rely on sparse or rule-based rewards, which can lead agents to commit to suboptimal or erroneous reasoning paths without the ability to recover. To address these limitations, we propose ReSeek, a novel self-correcting framework for training search agents. Our framework introduces a self-correction mechanism that empowers the agent to dynamically identify and recover from erroneous search paths during an episode. By invoking a special JUDGE action, the agent can judge the information and re-plan its search strategy. To guide this process, we design a dense, instructive process reward function, which decomposes into a correctness reward for retrieving factual information and a utility reward for finding information genuinely useful for the query. Furthermore, to mitigate the risk of data contamination in existing datasets, we introduce FictionalHot, a new and challenging benchmark with recently curated questions requiring complex reasoning. Being intuitively reasonable and practically simple, extensive experiments show that agents trained with ReSeek significantly outperform SOTA baselines in task success rate and path faithfulness.

Bidding

👍 👎 ♥ Save

Joint Bidding on Intraday and Frequency Containment Reserve Markets

Rate this image: 😍 👍 👎

Abstract
As renewable energy integration increases supply variability, battery energy storage systems (BESS) present a viable solution for balancing supply and demand. This paper proposes a novel approach for optimizing battery BESS participation in multiple electricity markets. We develop a joint bidding strategy that combines participation in the primary frequency reserve market with continuous trading in the intraday market, addressing a gap in the extant literature which typically considers these markets in isolation or simplifies the continuous nature of intraday trading. Our approach utilizes a mixed integer linear programming implementation of the rolling intrinsic algorithm for intraday decisions and state of charge recovery, alongside a learned classifier strategy (LCS) that determines optimal capacity allocation between markets. A comprehensive out-of-sample backtest over more than one year of historical German market data validates our approach: The LCS increases overall profits by over 4% compared to the best-performing static strategy and by more than 3% over a naive dynamic benchmark. Crucially, our method closes the gap to a theoretical perfect foresight strategy to just 4%, demonstrating the effectiveness of dynamic, learning-based allocation in a complex, multi-market environment.

👍 👎 ♥ Save

Bidding strategies for energy storage players in 100% renewable electricity market: A game-theoretical approach

Department of Technology

Abstract
The transition to electricity systems powered entirely by renewable energy sources makes energy storage indispensable for balancing intermittency and ensuring reliability. Since RES operate at near-zero marginal cost, storage operators can strongly influence electricity prices and energy security when renewable supply alone cannot meet demand. We develop a Cournot competition model in which storage operators strategically bid quantities to maximize their profits. We propose a MILP model with the big-M method and reformulation using continuous variables, incorporating demand blocks. The strategic bidding game is solved using the diagonalization algorithm, and the social planner's problem is used for benchmarking, cast as a one-shot optimization. The proposed model is applied to Denmark's power system using current data and 2030 renewable projections, capturing both current and future market conditions. Results show that storage operators affect market performance by arbitrage between low-and high-price periods, which can smooth supply-demand imbalances, thereby improving welfare relative to the no-storage case. With limited competition, however, strategic withholding increases prices and reduces welfare, while expanding storage capacity beyond a certain point yields no further gains. As the number of firms increases, competition mitigates distortions, and outcomes converge toward the social planner's benchmark with only two to three strategic players. These findings highlight storage's dual role in both stabilizing markets and creating market power. underscoring the need for market designs that align operators' incentives with social welfare.

Marketing Channels

👍 👎 ♥ Save

Designing Inferable Signaling Schemes for Bayesian Persuasion

University of Texas at A

Abstract
In Bayesian persuasion, an informed sender, who observes a state, commits to a randomized signaling scheme that guides a self-interested receiver's actions. Classical models assume the receiver knows the commitment. We, instead, study the setting where the receiver infers the scheme from repeated interactions. We bound the sender's performance loss relative to the known-commitment case by a term that grows with the signal space size and shrinks as the receiver's optimal actions become more distinct. We then lower bound the samples required for the sender to approximately achieve their known-commitment performance in the inference setting. We show that the sender requires more samples in persuasion compared to the leader in a Stackelberg game, which includes commitment but lacks signaling. Motivated by these bounds, we propose two methods for designing inferable signaling schemes, one being stochastic gradient descent (SGD) on the sender's inference-setting utility, and the other being optimization with a boundedly-rational receiver model. SGD performs best in low-interaction regimes, but modeling the receiver as boundedly-rational and tuning the rationality constant still provides a flexible method for designing inferable schemes. Finally, we apply SGD to a safety alert example and show it to find schemes that have fewer signals and make citizens' optimal actions more distinct compared to the known-commitment case.

👍 👎 ♥ Save

Linear-Size QAC0 Channels: Learning, Testing and Hardness

Nanjing University, China

Abstract
Shallow quantum circuits have attracted increasing attention in recent years, due to the fact that current noisy quantum hardware can only perform faithful quantum computation for a short amount of time. The constant-depth quantum circuits $\mathbf{QAC}^0$, a quantum counterpart of $\mathbf{AC}^0$ circuits, are the polynomial-size and constant-depth quantum circuits composed of only single-qubit unitaries and polynomial-size generalized Toffoli gates. The computational power of $\mathbf{QAC}^0$ has been extensively investigated in recent years. In this paper, we are concerned with $\mathbf{QLC}^0$ circuits, which are linear-size $\mathbf{QAC}^0$ circuits, a quantum counterpart of $\mathbf{LC}^0$. * We show that depth-$d$ $\mathbf{QAC}^0$ circuits working on $n$ input qubits and $a$ ancilla qubits have approximate degree at most $\tilde{O}((n+a)^{1-2^{-d}})$, improving the $\tilde{O}((n+a)^{1-3^{-d}})$ degree upper bound of previous works. Consequently, this directly implies that to compute the parity function, $\mathbf{QAC}^0$ circuits need at least $\tilde{O}(n^{1+2^{-d}})$ circuit size. * We present the first agnostic learning algorithm for $\mathbf{QLC}^0$ channels using subexponential running time and queries. Moreover, we also establish exponential lower bounds on the query complexity of learning $\mathbf{QAC}^0$ channels under both the spectral norm distance of the Choi matrix and the diamond norm distance. * We present a tolerant testing algorithm which determines whether an unknown quantum channel is a $\mathbf{QLC}^0$ channel. This tolerant testing algorithm is based on our agnostic learning algorithm. Our approach leverages low-degree approximations of $\mathbf{QAC}^0$ circuits and Pauli analysis as key technical tools. Collectively, these results advance our understanding of agnostic learning for shallow quantum circuits.

Personalization

👍 👎 ♥ Save

T-POP: Test-Time Personalization with Online Preference Feedback

The Chinese University of

Rate this image: 😍 👍 👎

Abstract
Personalizing large language models (LLMs) to individual user preferences is a critical step beyond generating generically helpful responses. However, current personalization methods are ill-suited for new users, as they typically require either slow, resource-intensive fine-tuning or a substantial amount of pre-existing user data, creating a significant cold-start problem. To address this challenge, we introduce a new paradigm for real-time personalization by learning from online pairwise preference feedback collected during text generation. We propose T-POP (Test-Time Personalization with Online Preference Feedback}), a novel algorithm that synergistically combines test-time alignment with dueling bandits. Without updating the LLM parameters, T-POP steers the decoding process of a frozen LLM by learning a reward function online that captures user preferences. By leveraging dueling bandits, T-POP intelligently queries the user to efficiently balance between exploring their preferences and exploiting the learned knowledge to generate personalized text. Extensive experiments demonstrate that T-POP achieves rapid and data-efficient personalization, significantly outperforming existing baselines and showing consistent improvement with more user interactions.

👍 👎 ♥ Save

Continual Personalization for Diffusion Models

National Taiwan Univerisr

Abstract
Updating diffusion models in an incremental setting would be practical in real-world applications yet computationally challenging. We present a novel learning strategy of Concept Neuron Selection (CNS), a simple yet effective approach to perform personalization in a continual learning scheme. CNS uniquely identifies neurons in diffusion models that are closely related to the target concepts. In order to mitigate catastrophic forgetting problems while preserving zero-shot text-to-image generation ability, CNS finetunes concept neurons in an incremental manner and jointly preserves knowledge learned of previous concepts. Evaluation of real-world datasets demonstrates that CNS achieves state-of-the-art performance with minimal parameter adjustments, outperforming previous methods in both single and multi-concept personalization works. CNS also achieves fusion-free operation, reducing memory storage and processing time for continual personalization.

Direction on Data Science Organizations

👍 👎 ♥ Save

AI in data science education: experiences from the classroom

Wageningen University

Abstract
This study explores the integration of AI, particularly large language models (LLMs) like ChatGPT, into educational settings, focusing on the implications for teaching and learning. Through interviews with course coordinators from data science courses at Wageningen University, this research identifies both the benefits and challenges associated with AI in the classroom. While AI tools can streamline tasks and enhance learning, concerns arise regarding students' overreliance on these technologies, potentially hindering the development of essential cognitive and problem solving skills. The study highlights the importance of responsible AI usage, ethical considerations, and the need for adapting assessment methods to ensure educational outcomes are met. With careful integration, AI can be a valuable asset in education, provided it is used to complement rather than replace fundamental learning processes.

👍 👎 ♥ Save

LLM-Based Data Science Agents: A Survey of Capabilities, Challenges, and Future Directions

Abstract
Recent advances in large language models (LLMs) have enabled a new class of AI agents that automate multiple stages of the data science workflow by integrating planning, tool use, and multimodal reasoning across text, code, tables, and visuals. This survey presents the first comprehensive, lifecycle-aligned taxonomy of data science agents, systematically analyzing and mapping forty-five systems onto the six stages of the end-to-end data science process: business understanding and data acquisition, exploratory analysis and visualization, feature engineering, model building and selection, interpretation and explanation, and deployment and monitoring. In addition to lifecycle coverage, we annotate each agent along five cross-cutting design dimensions: reasoning and planning style, modality integration, tool orchestration depth, learning and alignment methods, and trust, safety, and governance mechanisms. Beyond classification, we provide a critical synthesis of agent capabilities, highlight strengths and limitations at each stage, and review emerging benchmarks and evaluation practices. Our analysis identifies three key trends: most systems emphasize exploratory analysis, visualization, and modeling while neglecting business understanding, deployment, and monitoring; multimodal reasoning and tool orchestration remain unresolved challenges; and over 90% lack explicit trust and safety mechanisms. We conclude by outlining open challenges in alignment stability, explainability, governance, and robust evaluation frameworks, and propose future research directions to guide the development of robust, trustworthy, low-latency, transparent, and broadly accessible data science agents.

Attribution

👍 👎 ♥ Save

Who is responsible? Social Identity, Robot Errors and Blame Attribution

Lund University, Sweden

Abstract
This paper argues that conventional blame practices fall short of capturing the complexity of moral experiences, neglecting power dynamics and discriminatory social practices. It is evident that robots, embodying roles linked to specific social groups, pose a risk of reinforcing stereotypes of how these groups behave or should behave, so they set a normative and descriptive standard. In addition, we argue that faulty robots might create expectations of who is supposed to compensate and repair after their errors, where social groups that are already disadvantaged might be blamed disproportionately if they do not act according to their ascribed roles. This theoretical and empirical gap becomes even more urgent to address as there have been indications of potential carryover effects from Human-Robot Interactions (HRI) to Human-Human Interactions (HHI). We therefore urge roboticists and designers to stay in an ongoing conversation about how social traits are conceptualised and implemented in this technology. We also argue that one solution could be to 'embrace the glitch' and to focus on constructively disrupting practices instead of prioritizing efficiency and smoothness of interaction above everything else. Apart from considering ethical aspects in the design phase of social robots, we see our analysis as a call for more research on the consequences of robot stereotyping and blame attribution.

👍 👎 ♥ Save

Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

Stony Brook University

Abstract
Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, recent studies reveal that their final answers often contradict their own reasoning traces. We hypothesize that this inconsistency stems from two competing mechanisms for generating answers: CoT reasoning and memory retrieval. To test this hypothesis, we conduct controlled experiments that challenge LRMs with misleading cues during reasoning and/or corrupted answers during retrieval. Our results across models and datasets confirm that both mechanisms operate simultaneously, with their relative dominance influenced by multiple factors: problem domains, model scales, and fine-tuning approaches (e.g., reinforcement learning vs. distillation). The findings reveal a critical limitation in current reasoning fine-tuning paradigms: models can exploit the retrieval mechanism as a shortcut, effectively "hacking" the reward signal and undermining genuine reasoning development. To address this challenge, we introduce FARL, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning. By carefully suppressing retrieval shortcuts during the fine-tuning process, FARL promotes reasoning-dominant behavior and enhances generalizable reasoning capabilities.

Interests not found

Help us improve your experience!