Department of Machine Learning, MBZUAI, Abu Dhabi, UAE
Abstract
Imagine decision-makers uploading data and, within minutes, receiving clear,
actionable insights delivered straight to their fingertips. That is the promise
of the AI Data Scientist, an autonomous Agent powered by large language models
(LLMs) that closes the gap between evidence and action. Rather than simply
writing code or responding to prompts, it reasons through questions, tests
ideas, and delivers end-to-end insights at a pace far beyond traditional
workflows. Guided by the scientific tenet of the hypothesis, this Agent
uncovers explanatory patterns in data, evaluates their statistical
significance, and uses them to inform predictive modeling. It then translates
these results into recommendations that are both rigorous and accessible. At
the core of the AI Data Scientist is a team of specialized LLM Subagents, each
responsible for a distinct task such as data cleaning, statistical testing,
validation, and plain-language communication. These Subagents write their own
code, reason about causality, and identify when additional data is needed to
support sound conclusions. Together, they achieve in minutes what might
otherwise take days or weeks, enabling a new kind of interaction that makes
deep data science both accessible and actionable.
AWorld Team
Abstract
The learning from practice paradigm is crucial for developing capable Agentic
AI systems, yet it is severely hampered by inefficient experience generation, a
bottleneck especially pronounced in complex benchmarks like GAIA. To address
this, we introduce AWorld, an open-source system engineered for large-scale
agent-environment interaction. By distributing tasks across a cluster, AWorld
accelerates experience collection by 14.6x compared to standard single-node,
sequential execution. This critical speedup makes extensive reinforcement
learning practical and scalable. Leveraging this capability, we trained a
Qwen3-32B-based agent that significantly outperforms its base model, increasing
its overall GAIA accuracy from 21.59% to 32.23%. On the benchmark's most
challenging levels, our agent achieves a score of 16.33%, surpassing the
performance of leading proprietary models. Our open-source system and resulting
agent provide a practical blueprint for a complete agentic AI training
pipeline, from efficient interaction to demonstrable model improvement.