Papers from 13 to 17 October, 2025

Here are the personalized paper recommendations sorted by most relevant
Economics of Productivity
👍 👎 ♥ Save
University of Glasgow
Paper visualization
Rate this image: 😍 👍 👎
Abstract
Do industrial "superstars" help others up or crowd them out? We examine the relationship between the spillovers of superstar firms (those with the top market share in their industry) and the productivity dynamics in Indonesia. Employing data on Indonesian manufacturing firms from 2001 to 2015, we find that superstar exposures in the market raise both the productivity level and the growth of non-superstar firms through horizontal (within a sector-province) and vertical (across sectors) channels. When we distinguish by ownership, foreign superstars consistently encourage productivity except through the horizontal channel. In contrast, domestic superstars generate positive spillovers through both horizontal and vertical linkages, indicating that foreign firms do not solely drive positive externalities. Furthermore, despite overall productivity growth being positive in 2001-2015, the source of negative growth is mainly driven by within-group reallocation, evidence of misallocation among surviving firms, notably by domestic superstars. Although Indonesian superstar firms are more efficient in their operations, their relatively modest growth rates suggest a potential stagnation, which can be plausibly attributed to limited innovation activity or a slow pace of adopting new technologies.
AI Insights
  • Dynamic OP decomposition reveals plant upgrades drive most TFP gains among surviving firms.
  • Reallocation within surviving firms dampens growth, while entry–exit reallocation boosts it.
  • Foreign‑owned superstars lift TFP, yet domestic superstars depress it through misallocation.
  • Exporting firms exhibit lower TFP than non‑exporters, while larger firms outperform smaller ones.
  • The study builds on Amiti et al. (2024) and Krugman (1995) to link trade frictions with productivity spillovers.
  • “Superstar” is defined as a firm with exceptionally high productivity, not merely market share.
  • “TFP” stands for total factor productivity, the efficiency measure central to the analysis.
👍 👎 ♥ Save
Zhejiang University of ZF
Abstract
We quantify the impact of Generative Artificial Intelligence (GenAI) on firm productivity through a series of large-scale randomized field experiments involving millions of users and products at a leading cross-border online retail platform. Over six months in 2023-2024, GenAI-based enhancements were integrated into seven consumer-facing business workflows. We find that GenAI adoption significantly increases sales, with treatment effects ranging from 0\% to 16.3\%, depending on GenAI's marginal contribution relative to existing firm practices. Because inputs and prices were held constant across experimental arms, these gains map directly into total factor productivity improvements. Across the four GenAI applications with positive effects, the implied annual incremental value is approximately \$5 per consumer-an economically meaningful impact given the retailer's scale and the early stage of GenAI adoption. The primary mechanism operates through higher conversion rates, consistent with GenAI reducing frictions in the marketplace and improving consumer experience. We also document substantial heterogeneity: smaller and newer sellers, as well as less experienced consumers, exhibit disproportionately larger gains. Our findings provide novel, large-scale causal evidence on the productivity effects of GenAI in online retail, highlighting both its immediate value and broader potential.
AI Insights
  • Marketing tactics boosted sales, yet their potency shifts across product categories, with niche items often reaping the most.
  • High‑concentration markets amplify marketing gains, while low‑concentration segments see muted effects.
  • Premium‑priced goods tend to respond more strongly to promotional pushes than budget alternatives.
  • Tail products—those with modest annual sales—experience disproportionately larger conversion lifts than high‑volume head items.
  • Surprisingly, the Google Advertising Titles experiment delivered no measurable uplift, hinting at diminishing returns for certain ad formats.
  • The study’s short‑term focus leaves open questions about the durability of these marketing‑induced sales spikes.
  • For deeper insight, consult Kotler & Keller’s “Marketing Management” and Cialdini’s “Influence” to unpack the psychological drivers behind these findings.
AI for Productivity Tools
👍 👎 ♥ Save
Abstract
AI-powered web agents have the potential to automate repetitive tasks, such as form filling, information retrieval, and scheduling, but they struggle to reliably execute these tasks without human intervention, requiring users to provide detailed guidance during every run. We address this limitation by automatically synthesizing reusable workflows from an agent's successful and failed attempts. These workflows incorporate execution guards that help agents detect and fix errors while keeping users informed of progress and issues. Our approach enables agents to successfully complete repetitive tasks of the same type with minimal intervention, increasing the success rates from 24.2% to 70.1% across fifteen tasks. To evaluate this approach, we invited nine users and found that our agent helped them complete web tasks with a higher success rate and less guidance compared to two baseline methods, as well as allowed users to easily monitor agent behavior and understand failures.
LLMs for Productivity
👍 👎 ♥ Save
Paper visualization
Rate this image: 😍 👍 👎
Abstract
We propose and test the LLM Brain Rot Hypothesis: continual exposure to junk web text induces lasting cognitive decline in large language models (LLMs). To causally isolate data quality, we run controlled experiments on real Twitter/X corpora, constructing junk and reversely controlled datasets via two orthogonal operationalizations: M1 (engagement degree) and M2 (semantic quality), with matched token scale and training operations across conditions. Contrary to the control group, continual pre-training of 4 LLMs on the junk dataset causes non-trivial declines (Hedges' $g>0.3$) on reasoning, long-context understanding, safety, and inflating "dark traits" (e.g., psychopathy, narcissism). The gradual mixtures of junk and control datasets also yield dose-response cognition decay: for example, under M1, ARC-Challenge with Chain Of Thoughts drops $74.9 \rightarrow 57.2$ and RULER-CWE $84.4 \rightarrow 52.3$ as junk ratio rises from $0\%$ to $100\%$. Error forensics reveal several key insights. First, we identify thought-skipping as the primary lesion: models increasingly truncate or skip reasoning chains, explaining most of the error growth. Second, partial but incomplete healing is observed: scaling instruction tuning and clean data pre-training improve the declined cognition yet cannot restore baseline capability, suggesting persistent representational drift rather than format mismatch. Finally, we discover that the popularity, a non-semantic metric, of a tweet is a better indicator of the Brain Rot effect than the length in M1. Together, the results provide significant, multi-perspective evidence that data quality is a causal driver of LLM capability decay, reframing curation for continual pretraining as a \textit{training-time safety} problem and motivating routine "cognitive health checks" for deployed LLMs.
👍 👎 ♥ Save
Abstract
Recent LLM agents have made great use of chain of thought reasoning and function calling. As their capabilities grow, an important question arises: can this software represent not only a smart problem-solving tool, but an entity in its own right, that can plan, design immediate tasks, and reason toward broader, more ambiguous goals? To study this question, we adopt an open-ended experimental setting where we augment a pretrained LLM agent with the ability to generate its own tasks, accumulate knowledge, and interact extensively with its environment. We study the resulting open-ended agent qualitatively. It can reliably follow complex multi-step instructions, store and reuse information across runs, and propose and solve its own tasks, though it remains sensitive to prompt design, prone to repetitive task generation, and unable to form self-representations. These findings illustrate both the promise and current limits of adapting pretrained LLMs toward open-endedness, and point to future directions for training agents to manage memory, explore productively, and pursue abstract long-term goals.
Unsubscribe from these updates