Papers from 06 to 10 October, 2025

Here are the personalized paper recommendations sorted by most relevant
Economics of Productivity
👍 👎 ♄ Save
Ankara Sosyal Bilimler n
Abstract
This study presents a computational and theoretical framework inspired by thermodynamic principles to analyze the dynamics of economic inflation within adiabatic and non-adiabatic systems. In a framework referred to as developmental symmetry, inflation is formulated as a scalar field evolving through continuity equations, drawing an analogy with the Raychaudhuri equation in gravitational dynamics. The results show that adiabatic systems fail to reach equilibrium, while non-adiabatic systems can evolve toward stable states over time. The model successfully reproduces observed inflationary regimes-from hyperinflation to stable low-inflation phases-with characteristic transition periods of about a decade. These results indicate that production continuity and controlled monetary flow are crucial for achieving stability in complex economic systems, linking thermodynamic balance to macroeconomic equilibrium.
AI Insights
  • The model maps inflation to a scalar field obeying Raychaudhuri‑type equations, mirroring relativistic cosmology dynamics.
  • Adiabatic economies never equilibrate, while non‑adiabatic flows converge to stable states after ~10‑year transitions.
  • Production continuity and regulated monetary outflows act as thermodynamic entropy controls driving macro‑equilibrium.
  • The framework predicts hyperinflation as a finite‑time singularity, matching empirical signatures of past crises.
  • Statistical‑mechanics tools (e.g., partition functions) could extend the scalar‑field model to wealth distributions.
  • Future research must test scalability to heterogeneous agents and sensitivity to policy shock parameters.
👍 👎 ♄ Save
Lund University, Sweden
Abstract
The observation of an excess of ttbar production in the threshold region, by CMS and ATLAS, has been interpreted as a toponium contribution, i.e. from below-threshold ttbar virtual states. The news here is the nontrivial experimental extraction of such a signal, not its existence as such. Indeed, already 35+ years ago an NRQCD Green's function approach was used to model the above- and below-threshold production of ttbar pairs in pp/ppbar collisions. The relevant cross section equations from that study are now (re-)implemented in the Pythia 8 event generator. While the above-threshold part is straightforward, the physical interpretation and modelling of below-threshold events is nontrivial, and a final prescription is cross-checked against two simpler ones. Cross sections and some event properties are presented.
AI Insights
  • Integrated cross sections for E < 0 are 5.70, 6.27, 6.76 pb in the three benchmark scenarios.
  • Pythia 8.316 implements full Breit‑Wigner smearing for realistic resonance shapes.
  • This study replaces Coulomb models with NRQCD Green’s functions to capture virtual toponium dynamics.
  • The code is publicly available but may require user modifications for specific analyses.
  • It assumes unpolarized, uncorrelated top quarks at production, a simplification that future work could relax.
  • For deeper context, see “Quantum Field Theory for the Gifted Amateur” and the 2025 review on pseudoscalar excess.
  • Pythia’s site and arXiv host the latest threshold‑physics tools.
AI for Productivity Tools
👍 👎 ♄ Save
Abstract
Generative AI solutions like GitHub Copilot have been shown to increase the productivity of software developers. Yet prior work remains unclear on the quality of code produced and the challenges of maintaining it in software projects. If quality declines as volume grows, experienced developers face increased workloads reviewing and reworking code from less-experienced contributors. We analyze developer activity in Open Source Software (OSS) projects following the introduction of GitHub Copilot. We find that productivity indeed increases. However, the increase in productivity is primarily driven by less-experienced (peripheral) developers. We also find that code written after the adoption of AI requires more rework. Importantly, the added rework burden falls on the more experienced (core) developers, who review 6.5% more code after Copilot's introduction, but show a 19% drop in their original code productivity. More broadly, this finding raises caution that productivity gains of AI may mask the growing burden of maintenance on a shrinking pool of experts.
👍 👎 ♄ Save
Yixin AI
Abstract
Tool-augmented language models have demonstrated strong capabilities, but their reliance on live API access creates scalability and reliability challenges during training and deployment. We propose MTR, a simulation-first training framework for tool-augmented reasoning. Instead of relying on live APIs, MTR learns from complete ReAct traces with schema-validated, simulated observations. Our approach operates through a multi-agent architecture where a ToolMaker generates task-specific, OpenAI-compatible tool interfaces, an AutoAgent produces structured think-act-observe sequences, and a ToolActor simulates realistic responses. Training proceeds in two stages: Stage-1 Supervised Fine-Tuning (SFT) teaches 'trace grammar' from complete reasoning sequences; Stage-2 Group Relative Policy Optimization (GRPO) optimizes strategy with a composite trace reward that balances answer correctness and internal consistency. Across four multi-hop QA benchmarks (HotpotQA, MuSiQue, 2WikiMultiHopQA, Bamboogle), MTR attains competitive Exact Match (EM) scores to live-API systems and excels on reasoning-intensive tasks, suggesting that effective tool reasoning can be learned from structured traces without live interactions.
AI Insights
  • The ToolMaker’s output follows a power‑law distribution, mirroring natural tool ecosystems.
  • Functional Intelligence is defined as the agent’s ability to craft task‑specific, usable tools.
  • Semantic Intelligence captures the agent’s grasp of diverse domain semantics.
  • Contextual awareness lets the ToolMaker scale tool complexity to match domain demands.
  • The simulation‑first framework can be repurposed for any multi‑hop reasoning domain.
  • Weaknesses surface when generated tools miss subtle task constraints, highlighting a need for better validation.
  • Recommended reading: “Attention Is All You Need” for transformer insights and “Deep Learning” for foundational theory.
LLMs for Productivity
👍 👎 ♄ Save
Pennsylvania State Univer
Paper visualization
Rate this image: 😍 👍 👎
Abstract
In this paper, we report our experience with several LLMs for their ability to understand a process model in an interactive, conversational style, find syntactical and logical errors in it, and reason with it in depth through a natural language (NL) interface. Our findings show that a vanilla, untrained LLM like ChatGPT (model o3) in a zero-shot setting is effective in understanding BPMN process models from images and answering queries about them intelligently at syntactic, logic, and semantic levels of depth. Further, different LLMs vary in performance in terms of their accuracy and effectiveness. Nevertheless, our empirical analysis shows that LLMs can play a valuable role as assistants for business process designers and users. We also study the LLM's "thought process" and ability to perform deeper reasoning in the context of process analysis and optimization. We find that the LLMs seem to exhibit anthropomorphic properties.
👍 👎 ♄ Save
Abstract
As the demand for comprehensive evaluations of diverse model capabilities steadily increases, benchmark suites have correspondingly grown significantly in scale. Despite notable advances in redundancy reduction and subset-level performance prediction, a systematic framework that effectively integrates these methods to ensure both prediction accuracy and ranking consistency is still largely elusive. In this paper, we first perform a sample-level analysis of benchmark redundancy and identify several highly similar samples that can be eliminated. Besides, we frame benchmark compression as an optimization problem with the aim of score reconstruction. Building on these, we then propose EssenceBench, a coarse-to-fine framework utilizing an iterative Genetic Algorithm (GA), which takes the advantages of fitness-based subset search and attribution-based sample search. Compared to previous methods, our approach yields superior compression results with lower reconstruction error and markedly higher efficiency. In particular, on the HellaSwag benchmark (10K samples), our method preserves the ranking of all models shifting within 5% using 25x fewer samples, and achieves 95% ranking preservation shifting within 5% using only 200x fewer samples.
Unsubscribe from these updates