Data-Driven Methods and AI in Engineering Design: A Systematic Literature Review Focusing on Challenges and Opportunities

Rate paper: 👍 👎 ♥ Save

AI Summary

Data-Driven Methods (DDMs): approaches that utilize data to inform design decisions. [2]
The use of data-driven methods in mechanical engineering design is increasing, with a focus on system integration and validation. [1]

Abstract
The increasing availability of data and advancements in computational intelligence have accelerated the adoption of data-driven methods (DDMs) in product development. However, their integration into product development remains fragmented. This fragmentation stems from uncertainty, particularly the lack of clarity on what types of DDMs to use and when to employ them across the product development lifecycle. To address this, a necessary first step is to investigate the usage of DDM in engineering design by identifying which methods are being used, at which development stages, and for what application. This paper presents a PRISMA systematic literature review. The V-model as a product development framework was adopted and simplified into four stages: system design, system implementation, system integration, and validation. A structured search across Scopus, Web of Science, and IEEE Xplore (2014--2024) retrieved 1{,}689 records. After screening, 114 publications underwent full-text analysis. Findings show that machine learning (ML) and statistical methods dominate current practice, whereas deep learning (DL), though still less common, exhibits a clear upward trend in adoption. Additionally, supervised learning, clustering, regression analysis, and surrogate modeling are prevalent in design, implementation, and integration system stages but contributions to validation remain limited. Key challenges in existing applications include limited model interpretability, poor cross-stage traceability, and insufficient validation under real-world conditions. Additionally, it highlights key limitations and opportunities such as the need for interpretable hybrid models. This review is a first step toward design-stage guidelines; a follow-up synthesis should map computer science algorithms to engineering design problems and activities.

Why we think this paper is great for you:
This paper directly addresses the integration of AI and data-driven methods into engineering design, offering insights into challenges and opportunities. It provides valuable context for managing innovation and strategic adoption of these technologies.

AI4X Roadmap: Artificial Intelligence for the advancement of scientific pursuit and its future directions

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Summary

Explainable Artificial Intelligence (XAI) is rapidly evolving, moving beyond post-hoc methods towards a more comprehensive mechanistic understanding of AI model behavior and decision-making process. [3]
Embedding causal reasoning into interpretability pipelines through counterfactuals or mechanistic priors provides the necessary structure to distinguish genuine drivers from spurious correlations. [3]
Explainable Artificial Intelligence (XAI): AI that can explain its own decision-making process and provide insights into how it arrived at a particular conclusion. [3]
Causality: The relationship between cause and effect, essential for understanding the underlying mechanisms of complex systems. [3]
Interpretability: The ability to understand and interpret the results of an algorithm or model. [3]
XAI has the potential to contribute to trustworthy and discovery-driven science by providing insights into AI decision-making processes. [3]
Causality plays a decisive role in XAI, without causal grounding even stable and reproducible explanations risk being superficially plausible while scientifically misleading. [2]
Interpretability results that are accurate but misaligned with domain concepts may fail to resonate with experts, rendering them ineffective in real scientific workflows. [1]

Abstract
Artificial intelligence and machine learning are reshaping how we approach scientific discovery, not by replacing established methods but by extending what researchers can probe, predict, and design. In this roadmap we provide a forward-looking view of AI-enabled science across biology, chemistry, climate science, mathematics, materials science, physics, self-driving laboratories and unconventional computing. Several shared themes emerge: the need for diverse and trustworthy data, transferable electronic-structure and interatomic models, AI systems integrated into end-to-end scientific workflows that connect simulations to experiments and generative systems grounded in synthesisability rather than purely idealised phases. Across domains, we highlight how large foundation models, active learning and self-driving laboratories can close loops between prediction and validation while maintaining reproducibility and physical interpretability. Taken together, these perspectives outline where AI-enabled science stands today, identify bottlenecks in data, methods and infrastructure, and chart concrete directions for building AI systems that are not only more powerful but also more transparent and capable of accelerating discovery in complex real-world environments.

Why we think this paper is great for you:
This roadmap offers a forward-looking perspective on AI's role in scientific discovery and its future directions. It will help you strategically plan for AI integration within data science and engineering initiatives.

MorphingDB: A Task-Centric AI-Native DBMS for Model Management and Inference

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Summary

It provides advanced pipeline techniques to streamline the execution of complex inference workflows. [3]
The system employs a dependency analysis algorithm based on the DAG to optimize the execution flow. [3]
The system is evaluated using nine widely-used open datasets and compared with several open-source AI-native databases. [3]
DAG (Directed Acyclic Graph): A graph data structure consisting of nodes and directed edges, used to represent dependencies between operators in MorphingDB. [3]
Dependency analysis algorithm: An algorithm used by MorphingDB to analyze the dependencies between operators in a DAG and optimize the execution flow. [3]
Batch size: The number of data points processed together in a single inference operation, which affects the trade-off between throughput and latency. [3]
MorphingDB is a task-centric AI-native DBMS that supports model management and inference. [2]
MorphingDB is a powerful AI-native DBMS that supports model management and inference tasks within the database itself. [1]

Abstract
The increasing demand for deep neural inference within database environments has driven the emergence of AI-native DBMSs. However, existing solutions either rely on model-centric designs requiring developers to manually select, configure, and maintain models, resulting in high development overhead, or adopt task-centric AutoML approaches with high computational costs and poor DBMS integration. We present MorphingDB, a task-centric AI-native DBMS that automates model storage, selection, and inference within PostgreSQL. To enable flexible, I/O-efficient storage of deep learning models, we first introduce specialized schemas and multi-dimensional tensor data types to support BLOB-based all-in-one and decoupled model storage. Then we design a transfer learning framework for model selection in two phases, which builds a transferability subspace via offline embedding of historical tasks and employs online projection through feature-aware mapping for real-time tasks. To further optimize inference throughput, we propose pre-embedding with vectoring sharing to eliminate redundant computations and DAG-based batch pipelines with cost-aware scheduling to minimize the inference time. Implemented as a PostgreSQL extension with LibTorch, MorphingDB outperforms AI-native DBMSs (EvaDB, Madlib, GaussML) and AutoML platforms (AutoGluon, AutoKeras, AutoSklearn) across nine public datasets, encompassing series, NLP, and image tasks. Our evaluation demonstrates a robust balance among accuracy, resource consumption, and time cost in model selection and significant gains in throughput and resource efficiency.

Why we think this paper is great for you:
Focusing on AI-native database management systems and model management, this paper is highly relevant to optimizing infrastructure for AI and data science operations. It provides insights into the technical backbone supporting advanced data applications.

The DataSquad Experiment: Lessons for Preparing Data and Computer Scientists for Work

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Summary

The DataSquad program has been highly effective in providing students with practical experience and skills in data science, software engineering, and project management. [3]
The program's emphasis on teamwork, communication, and client interaction has helped students develop valuable soft skills. [3]
The DataSquad environment is highly encouraging, with 100% of alumni reporting that they felt encouraged to participate in the program. [3]
Statistical Analysis: Collecting, exploring and presenting large amounts of data to discover underlying patterns and trends Database Design/Cloud Systems: Designing a safe place to capture your data (in SQL or other), working with data capture or management tools like Qualtrics or Google Forms Coding, Software Engineering: Using programming languages, such as Python, R, etc., and utilizing file management tools like Git Project Management/Planning: Organizing tasks, managing time, and coordinating resources to achieve goals Effective Teamwork: Collaborating well with others, supporting teammates, and achieving shared objectives [3]
Many students experienced multiple roles during their tenure, gaining breadth across the program's offerings. [2]

Abstract
The DataSquad at Carleton College addresses a common problem at small liberal arts colleges: limited capacity for data services and few opportunities for students to gain practical experience with data and software development. Academic Technologist Paula Lackie designed the program as a work-study position that trains undergraduates through structured peer mentorship and real client projects. Students tackle data problems of increasing complexity-from basic data analysis to software development-while learning FAIR data principles and open science practices. The model's core components (peer mentorship structure, project-based learning, and communication training) make it adaptable to other institutions. UCLA and other colleges have adopted the model using openly shared materials through "DataSquad International." This paper describes the program's implementation at Carleton College and examines how structured peer mentorship can simultaneously improve institutional data services and provide students with professional skills and confidence.

Why we think this paper is great for you:
This paper provides practical lessons on preparing data and computer scientists for real-world work, which is crucial for developing effective tech teams. It offers valuable perspectives on fostering practical experience within your teams.

CLIMATEAGENT: Multi-Agent Orchestration for Complex Climate Data Science Workflows

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Climate science demands automated workflows to transform comprehensive questions into data-driven statements across massive, heterogeneous datasets. However, generic LLM agents and static scripting pipelines lack climate-specific context and flexibility, thus, perform poorly in practice. We present ClimateAgent, an autonomous multi-agent framework that orchestrates end-to-end climate data analytic workflows. ClimateAgent decomposes user questions into executable sub-tasks coordinated by an Orchestrate-Agent and a Plan-Agent; acquires data via specialized Data-Agents that dynamically introspect APIs to synthesize robust download scripts; and completes analysis and reporting with a Coding-Agent that generates Python code, visualizations, and a final report with a built-in self-correction loop. To enable systematic evaluation, we introduce Climate-Agent-Bench-85, a benchmark of 85 real-world tasks spanning atmospheric rivers, drought, extreme precipitation, heat waves, sea surface temperature, and tropical cyclones. On Climate-Agent-Bench-85, ClimateAgent achieves 100% task completion and a report quality score of 8.32, outperforming GitHub-Copilot (6.27) and a GPT-5 baseline (3.26). These results demonstrate that our multi-agent orchestration with dynamic API awareness and self-correcting execution substantially advances reliable, end-to-end automation for climate science analytic tasks.

Why we think this paper is great for you:
Exploring multi-agent orchestration for complex data science workflows, this paper offers insights into managing intricate data processes. It is relevant for optimizing and automating advanced data science engineering tasks.

Regularity as Structural Amplifier, Not Trap: A Causal and Archetype-Based Analysis of Dropout in a Constrained Engineering Curriculum

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Engineering programmes, particularly in Latin America, are often governed by rigid curricula and strict regularity rules that are claimed to create a Regularity Trap for capable students. This study tests that causal hypothesis using the CAPIRE framework, a leakage-aware pipeline that integrates curriculum topology and causal estimation. Using longitudinal data from 1,343 civil engineering students in Argentina, we formalize academic lag (accumulated friction) as a treatment and academic velocity as an ability proxy. A manual LinearDML estimator is employed to assess the average (ATE) and conditional (CATE) causal effects of lag on subsequent dropout, controlling for macro shocks (strikes, inflation). Results confirm that academic lag significantly increases dropout risk overall (ATE = 0.0167, p < 0.0001). However, the effect decreases sharply for high-velocity (high-ability) students, contradicting the universal Trap hypothesis. Archetype analysis (UMAP/DBSCAN) shows that friction disproportionately harms trajectories already characterized by high initial friction and unstable progression. 8 We conclude that regularity rules function as a Structural Amplifier of pre-existing vulnerability rather than a universal trap. This has direct implications for engineering curriculum design, demanding targeted slack allocation and intervention policies to reduce friction at core basic-cycle courses

Why we think this paper is great for you:
While focused on engineering curricula, this paper's analysis of structural rules and student outcomes might offer indirect insights into organizational design and team dynamics. Its direct relevance to AI, data science, or tech team management is limited.

Interests not found

Help us improve your experience!