Hi!

Your personalized paper recommendations for 24 to 28 November, 2025.
🎯 Top Personalized Recommendations
Rate paper: 👍 👎 ♥ Save
AI Summary
  • The field of artificial intelligence (AI) is rapidly evolving with the development of new models and techniques. [3]
  • Researchers are exploring various approaches to improve the efficiency and effectiveness of AI systems, including the use of compound AI systems, which combine multiple models and tools to achieve complex tasks. [3]
  • Compound AI systems: A combination of multiple models and tools that work together to achieve complex tasks. [3]
  • The development of efficient and effective AI systems is crucial for achieving human-like intelligence and solving complex problems. [3]
  • The field of AI is rapidly evolving, and new breakthroughs and discoveries are being made regularly. [3]
  • Limited understanding of human cognition and behavior. [3]
  • There is a growing interest in developing agents that can assist humans in real-world tasks, such as information seeking and question-answering. [1]
Abstract
Large language models are powerful generalists, yet solving deep and complex problems such as those of the Humanity's Last Exam (HLE) remains both conceptually challenging and computationally expensive. We show that small orchestrators managing other models and a variety of tools can both push the upper bound of intelligence and improve efficiency in solving difficult agentic tasks. We introduce ToolOrchestra, a method for training small orchestrators that coordinate intelligent tools. ToolOrchestra explicitly uses reinforcement learning with outcome-, efficiency-, and user-preference-aware rewards. Using ToolOrchestra, we produce Orchestrator, an 8B model that achieves higher accuracy at lower cost than previous tool-use agents while aligning with user preferences on which tools are to be used for a given query. On HLE, Orchestrator achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being 2.5x more efficient. On tau2-Bench and FRAMES, Orchestrator surpasses GPT-5 by a wide margin while using only about 30% of the cost. Extensive analysis shows that Orchestrator achieves the best trade-off between performance and cost under multiple metrics, and generalizes robustly to unseen tools. These results demonstrate that composing diverse tools with a lightweight orchestration model is both more efficient and more effective than existing methods, paving the way for practical and scalable tool-augmented reasoning systems.
Why we think this paper is great for you:
This paper directly addresses enhancing problem-solving capabilities by orchestrating LLMs and various tools, which is highly relevant to improving your productivity through advanced AI systems.
Rate paper: 👍 👎 ♥ Save
AI Summary
  • Glanceability: The ability to quickly scan and understand the content without needing to read it thoroughly. [3]
  • Developers want explanations for bug alerts that are concise, clear, and easy to understand. [2]
  • Inline and embedded options: Displaying explanations directly next to the code or in a pop-up window, allowing for quick checks and easy access to more detailed information. [1]
Abstract
AI-assisted tools support developers in performing cognitively demanding tasks such as bug detection and code readability assessment. Despite the advancements in the technical characteristics of these tools, little is known about how developers mentally model them and how mismatches affect trust, control, and adoption. We conducted six co-design workshops with 58 developers to elicit their mental models about AI-assisted bug detection and readability features. It emerged that developers conceive bug detection tools as \textit{bug detectives}, which warn users only in case of critical issues, guaranteeing transparency, actionable feedback, and confidence cues. Readability assessment tools, on the other hand, are envisioned as \textit{quality coaches}, which provide contextual, personalized, and progressive guidance. Trust, in both tasks, depends on the clarity of explanations, timing, and user control. A set of design principles for Human-Centered AI in IDEs has been distilled, aiming to balance disruption with support, conciseness with depth, and automation with human agency.
Why we think this paper is great for you:
This paper explores how AI-assisted tools enhance developer productivity by supporting complex tasks, directly aligning with your interest in AI for productivity tools.
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
AI Summary
  • The first principal component explains more than 26% of the variance in both periods, and is strongly associated with property, plant, and equipment measures. [3]
  • The third principal component is positively related to different credit/debt measures in the two periods, but can be used commonly as a proxy for firm's R&D effort and efficiency. [3]
  • Principal Component Analysis (PCA): A statistical technique that transforms a set of correlated variables into a new set of uncorrelated variables called principal components. [3]
  • Scree plot: A graphical tool used to determine the number of principal components to retain based on the proportion of variance explained. [3]
  • The findings have implications for policymakers and managers seeking to understand the drivers of firm performance and develop strategies to improve productivity. [3]
  • The second principal component is positively related to property, plant, and equipment measures, but negatively associated with net profits and earnings after tax. [2]
Abstract
The paper is related to the identification of firm's features which serve as determinants for firm's total factor productivity through unsupervised learning techniques (principal component analysis, self organizing maps, clustering). This bottom-up approach can effectively manage the problem of the heterogeneity of the firms and provides new ways to look at firms' standard classifications. Using the large sample provided by the ORBIS database, the analyses covers the years before the outbreak of Covid-19 (2015-2019) and the immediate post-Covid period (year 2020). It has been shown that in both periods, the main determinants of productivity growth are related to profitability, credit/debts measures, cost and capital efficiency, and effort and outcome of the R&D activity conducted by the firms. Finally, a linear relationship between determinants and productivity growth has been found.
Why we think this paper is great for you:
This paper directly investigates the determinants of total factor productivity at the firm level, offering valuable insights into the economic aspects of productivity.
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
AI Summary
  • The study's results may not be generalizable to other medical or pharmaceutical assessments due to the specific nature of the exam questions and the models' training data. [3]
  • Clinical pharmacist: A healthcare professional responsible for providing medication therapy management and other clinical services in hospitals or clinics. [3]
  • The study's findings suggest that DeepSeek is a more effective tool for assisting with the Chinese pharmacist licensure examination, but further research is needed to confirm these results and explore their generalizability. [3]
  • The study's results may not be generalizable to other medical or pharmaceutical assessments due to the specific nature of the exam questions and the models' training data. [3]
  • The study did not account for potential internal translation errors in ChatGPT-4o, which may have affected its performance. [3]
  • The study compared the performance of two large language models, DeepSeek and ChatGPT-4o, on a Chinese pharmacist licensure examination. [2]
Abstract
Background: As large language models (LLMs) become increasingly integrated into digital health education and assessment workflows, their capabilities in supporting high-stakes, domain-specific certification tasks remain underexplored.In China, the national pharmacist licensure exam serves as a standardized benchmark for evaluating pharmacists' clinical and theoretical competencies. Objective: This study aimed to compare the performance of two LLMs: ChatGPT-4o and DeepSeek-R1 on real questions from the Chinese Pharmacist Licensing Examination (2017-2021), and to discuss the implications of these performance differences for AI-enabled formative evaluation. Methods: A total of 2,306 multiple-choice (text-only) questions were compiled from official exams, training materials, and public databases. Questions containing tables or images were excluded. Each item was input in its original Chinese format, and model responses were evaluated for exact accuracy. Pearson's Chi-squared test was used to compare overall performance, and Fisher's exact test was applied to year-wise multiple-choice accuracy. Results: DeepSeek-R1 outperformed ChatGPT-4o with a significantly higher overall accuracy (90.0% vs. 76.1%, p < 0.001). Unit-level analyses revealed consistent advantages for DeepSeek-R1, particularly in foundational and clinical synthesis modules. While year-by-year multiple-choice performance also favored DeepSeek-R1, this performance gap did not reach statistical significance in any specific unit-year (all p > 0.05). Conclusion: DeepSeek-R1 demonstrated robust alignment with the structural and semantic demands of the pharmacist licensure exam. These findings suggest that domain-specific models warrant further investigation for this context, while also reinforcing the necessity of human oversight in legally and ethically sensitive contexts.
Why we think this paper is great for you:
This paper explores LLMs' capabilities in high-stakes professional tasks, which is relevant to understanding their potential for enhancing productivity in specialized fields.
Rate paper: 👍 👎 ♥ Save
AI Summary
  • The proposed opinion dynamics model captures both short- and long-term behavioral shifts induced by external policies. [3]
  • The work proposes an opinion dynamics model that captures both short- and long-term behavioral shifts induced by external policies. [3]
  • The model does not account for network structures and their impact on opinion dynamics. [3]
  • Memory plays a crucial role in shaping individual opinions and behavior. [3]
  • Two policy design strategies under budget constraints are proposed, leveraging the developed model. [2]
  • Opinion dynamics models have been used to study the spread of information and influence in social networks. [1]
Abstract
In this paper, we propose a new framework for the design of incentives aimed at promoting innovation diffusion in social influence networks. In particular, our framework relies on an extension of the Friedkin and Johnsen opinion dynamics model characterizing the effects of (i) short-memory incentives, which have an immediate yet transient impact, and (ii) long-term structural incentives, whose impact persists via an exponentially decaying memory. We propose to design these incentives via a model-predictive control (MPC) scheme over an augmented state that captures the memory in our opinion dynamics model, yielding a convex quadratic program with linear constraints. Our numerical simulations based on data on sustainable mobility habits show the effectiveness of the proposed approach, which balances large-scale adoption and resource allocation
Why we think this paper is great for you:
This paper explores optimal policy design for promoting innovation diffusion, which is a critical economic factor influencing long-term productivity and growth.
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
AI Summary
  • The tool is designed to promote transparency, reproducibility, and rigor in conversational AI research. [3]
  • Empirical Research: Research that involves collecting data through observation or experimentation. [3]
  • Conversational AI: AI systems that enable humans to interact with machines using natural language. [3]
  • A new tool called Simple Chat has been developed to facilitate the integration of Large Language Models (LLMs) in empirical research. [2]
  • Simple Chat allows researchers to easily integrate LLMs into their studies, making it easier to investigate human-LLM interaction. [1]
Abstract
As large language models (LLMs) become increasingly prevalent, understanding human-LLM interactions is emerging as a central priority in psychological research. Online experiments offer an efficient means to study human-LLM interactions, yet integrating LLMs into established survey platforms remains technically demanding, particularly when aiming for ecologically valid, real-time conversational experiences with strong experimental control. We introduce Simple Chat, an open-source, research-focused chat interface that streamlines LLM integration for platforms such as Qualtrics, oTree, and LimeSurvey, while presenting a unified participant experience across conditions. Simple Chat connects to both commercial providers and open-weights models, supports streaming responses to preserve conversational flow, and offers an administrative interface for fine-grained control of prompts and interface features. By reducing technical barriers, standardizing interfaces, and improving participant experience, Simple Chat helps advance the study of human-LLM interaction. In this article, we outline Simple Chat's key features, provide a step-by-step tutorial, and demonstrate its utility through two illustrative case studies.
Why we think this paper is great for you:
This paper discusses the integration of LLMs into research experiments, which could indirectly inform how LLMs are deployed in various tools and applications.