AI for Productivity Tools

The AI Data Scientist

Department of Machine Learning, MBZUAI, Abu Dhabi, UAE

Abstract
Imagine decision-makers uploading data and, within minutes, receiving clear, actionable insights delivered straight to their fingertips. That is the promise of the AI Data Scientist, an autonomous Agent powered by large language models (LLMs) that closes the gap between evidence and action. Rather than simply writing code or responding to prompts, it reasons through questions, tests ideas, and delivers end-to-end insights at a pace far beyond traditional workflows. Guided by the scientific tenet of the hypothesis, this Agent uncovers explanatory patterns in data, evaluates their statistical significance, and uses them to inform predictive modeling. It then translates these results into recommendations that are both rigorous and accessible. At the core of the AI Data Scientist is a team of specialized LLM Subagents, each responsible for a distinct task such as data cleaning, statistical testing, validation, and plain-language communication. These Subagents write their own code, reason about causality, and identify when additional data is needed to support sound conclusions. Together, they achieve in minutes what might otherwise take days or weeks, enabling a new kind of interaction that makes deep data science both accessible and actionable.

August 25, 2025

♥Save to Reading List

Operating advanced scientific instruments with AI agents that learn on the job

Abstract
Advanced scientific user facilities, such as next generation X-ray light sources and self-driving laboratories, are revolutionizing scientific discovery by automating routine tasks and enabling rapid experimentation and characterizations. However, these facilities must continuously evolve to support new experimental workflows, adapt to diverse user projects, and meet growing demands for more intricate instruments and experiments. This continuous development introduces significant operational complexity, necessitating a focus on usability, reproducibility, and intuitive human-instrument interaction. In this work, we explore the integration of agentic AI, powered by Large Language Models (LLMs), as a transformative tool to achieve this goal. We present our approach to developing a human-in-the-loop pipeline for operating advanced instruments including an X-ray nanoprobe beamline and an autonomous robotic station dedicated to the design and characterization of materials. Specifically, we evaluate the potential of various LLMs as trainable scientific assistants for orchestrating complex, multi-task workflows, which also include multimodal data, optimizing their performance through optional human input and iterative learning. We demonstrate the ability of AI agents to bridge the gap between advanced automation and user-friendly operation, paving the way for more adaptable and intelligent scientific facilities.

August 27, 2025

♥Save to Reading List

Economics of Productivity

Worker Quality, Matching and Productivity Slowdown

Abstract
Measured aggregate productivity and the income share of top earners are strongly and positively correlated in the Canadian data. Productivity slowdown since the early 2000s was accompanied with a flattening income share of top earners. Motivated by these facts, we study the role of firms' top-paid workers and worker matching in accounting for the slowdown of measured total factor productivity. We first estimate total factor productivity for Canadian firms in the period of 2003-2015, taking into account the assortative matching between top workers and non-top workers. Measured total factor productivity consists of the Hicks-neutral technology and the quality of top workers. Our estimation suggests that measured aggregate total factor productivity declined from 2003 to 2015, in line with that estimated by the statistical agency. The decline of measured productivity is entirely accounted for by the declining quality of top workers, while the Hicks-neutral technology improved. Both the within-firm changes and the cross-firm reallocation of top-worker quality are important in contributing to the decline of overall top-worker quality. We also discuss possible causes of declines in the quality of top workers, e.g., the emigration of top talents as studied in recent literature.

August 30, 2025

♥Save to Reading List

Across Time and (Product) Space: A Capability-Centric Model of Relatedness and Economic Complexity

Abstract
Economic complexity - a group of dimensionality-reduction methods that apply network science to trade data - represented a paradigm shift in development economics towards materializing the once-intangible concept of capabilities as inferrable and quantifiable. Measures such as the Economic Complexity Index (ECI) and the Product Space have proven their worth as robust estimators of an economy's subsequent growth; less obvious, however, is how they have come to be so. Despite ECI drawing its micro-foundations from a combinatorial model of capabilities, where a set of homogeneous capabilities combine to form products and the economies which can produce them, such a model is consistent with neither the fact that distinct product classes draw on distinct capabilities, nor the interrelations between different products in the Product Space which so much of economic complexity is based upon. In this paper, we extend the combinatorial model of economic complexity through two innovations: an underlying network which governs the relatedness between capabilities, and a production function which trades the original binary specialization function for a fine-grained, product-level output function. Using country-product trade data across 216 countries, 5000 products and two decades, we show that this model is able to accurately replicate both the characteristic topology of the Product Space and the complexity distribution of countries' export baskets. In particular, the model bridges the gap between the ECI and capabilities by transforming measures of economic complexity into direct measures of the capabilities held by an economy - a transformation shown to both improve the informativeness of the Economic Complexity Index in predicting economic growth and enable an interpretation of economic complexity as a proxy for productive structure in the form of capability substitutability.

August 29, 2025

♥Save to Reading List

LLMs for Productivity

Beyond Benchmark: LLMs Evaluation with an Anthropomorphic and Value-oriented Roadmap

Department of Networks, China Mobile Communications Group Co.,Ltd.

Abstract
For Large Language Models (LLMs), a disconnect persists between benchmark performance and real-world utility. Current evaluation frameworks remain fragmented, prioritizing technical metrics while neglecting holistic assessment for deployment. This survey introduces an anthropomorphic evaluation paradigm through the lens of human intelligence, proposing a novel three-dimensional taxonomy: Intelligence Quotient (IQ)-General Intelligence for foundational capacity, Emotional Quotient (EQ)-Alignment Ability for value-based interactions, and Professional Quotient (PQ)-Professional Expertise for specialized proficiency. For practical value, we pioneer a Value-oriented Evaluation (VQ) framework assessing economic viability, social impact, ethical alignment, and environmental sustainability. Our modular architecture integrates six components with an implementation roadmap. Through analysis of 200+ benchmarks, we identify key challenges including dynamic assessment needs and interpretability gaps. It provides actionable guidance for developing LLMs that are technically proficient, contextually relevant, and ethically sound. We maintain a curated repository of open-source evaluation resources at: https://github.com/onejune2018/Awesome-LLM-Eval.

August 26, 2025

♥Save to Reading List

It's All About In-Context Learning! Teaching Extremely Low-Resource Languages to LLMs

Department of Computer Science, University of Sheffield, UK

Abstract
Extremely low-resource languages, especially those written in rare scripts, as shown in Figure 1, remain largely unsupported by large language models (LLMs). This is due in part to compounding factors such as the lack of training data. This paper delivers the first comprehensive analysis of whether LLMs can acquire such languages purely via in-context learning (ICL), with or without auxiliary alignment signals, and how these methods compare to parameter-efficient fine-tuning (PEFT). We systematically evaluate 20 under-represented languages across three state-of-the-art multilingual LLMs. Our findings highlight the limitation of PEFT when both language and its script are extremely under-represented by the LLM. In contrast, zero-shot ICL with language alignment is impressively effective on extremely low-resource languages, while few-shot ICL or PEFT is more beneficial for languages relatively better represented by LLMs. For LLM practitioners working on extremely low-resource languages, we summarise guidelines grounded by our results on adapting LLMs to low-resource languages, e.g., avoiding fine-tuning a multilingual model on languages of unseen scripts.

August 26, 2025

♥Save to Reading List

Help us improve your experience!