Abstract
Recent advances in large language models (LLMs) have enabled a new class of
AI agents that automate multiple stages of the data science workflow by
integrating planning, tool use, and multimodal reasoning across text, code,
tables, and visuals. This survey presents the first comprehensive,
lifecycle-aligned taxonomy of data science agents, systematically analyzing and
mapping forty-five systems onto the six stages of the end-to-end data science
process: business understanding and data acquisition, exploratory analysis and
visualization, feature engineering, model building and selection,
interpretation and explanation, and deployment and monitoring. In addition to
lifecycle coverage, we annotate each agent along five cross-cutting design
dimensions: reasoning and planning style, modality integration, tool
orchestration depth, learning and alignment methods, and trust, safety, and
governance mechanisms. Beyond classification, we provide a critical synthesis
of agent capabilities, highlight strengths and limitations at each stage, and
review emerging benchmarks and evaluation practices. Our analysis identifies
three key trends: most systems emphasize exploratory analysis, visualization,
and modeling while neglecting business understanding, deployment, and
monitoring; multimodal reasoning and tool orchestration remain unresolved
challenges; and over 90% lack explicit trust and safety mechanisms. We conclude
by outlining open challenges in alignment stability, explainability,
governance, and robust evaluation frameworks, and propose future research
directions to guide the development of robust, trustworthy, low-latency,
transparent, and broadly accessible data science agents.
Texas A&M University, USA
Abstract
Modern engineering increasingly relies on vast datasets generated by
experiments and simulations, driving a growing demand for efficient, reliable,
and broadly applicable modeling strategies. There is also heightened interest
in developing data-driven approaches, particularly neural network models, for
effective prediction and analysis of scientific datasets. Traditional
data-driven methods frequently involve extensive manual intervention, limiting
their ability to scale effectively and generalize to diverse applications. In
this study, we propose an innovative pipeline utilizing Large Language Model
(LLM) agents to automate data-driven modeling and analysis, with a particular
emphasis on regression tasks. We evaluate two LLM-agent frameworks: a
multi-agent system featuring specialized collaborative agents, and a
single-agent system based on the Reasoning and Acting (ReAct) paradigm. Both
frameworks autonomously handle data preprocessing, neural network development,
training, hyperparameter optimization, and uncertainty quantification (UQ). We
validate our approach using a critical heat flux (CHF) prediction benchmark,
involving approximately 25,000 experimental data points from the OECD/NEA
benchmark dataset. Results indicate that our LLM-agent-developed model
surpasses traditional CHF lookup tables and delivers predictive accuracy and UQ
on par with state-of-the-art Bayesian optimized deep neural network models
developed by human experts. These outcomes underscore the significant potential
of LLM-based agents to automate complex engineering modeling tasks, greatly
reducing human workload while meeting or exceeding existing standards of
predictive performance.