Beijing Institute of Tech
Abstract
A growing trend in modern data analysis is the integration of data management
with learning, guided by accuracy, latency, and cost requirements. In practice,
applications draw data of different formats from many sources. In the
meanwhile, the objectives and budgets change over time. Existing systems handle
these applications across databases, analysis libraries, and tuning services.
Such fragmentation leads to complex user interaction, limited adaptability,
suboptimal performance, and poor extensibility across components. To address
these challenges, we present Aixel, a unified, adaptive, and extensible system
for AI-powered data analysis. The system organizes work across four layers:
application, task, model, and data. The task layer provides a declarative
interface to capture user intent, which is parsed into an executable operator
plan. An optimizer compiles and schedules this plan to meet specified goals in
accuracy, latency, and cost. The task layer coordinates the execution of data
and model operators, with built-in support for reuse and caching to improve
efficiency. The model layer offers versioned storage for index, metadata,
tensors, and model artifacts. It supports adaptive construction, task-aligned
drift detection, and safe updates that reuse shared components. The data layer
provides unified data management capabilities, including indexing,
constraint-aware discovery, task-aligned selection, and comprehensive feature
management. With the above designed layers, Aixel delivers a user friendly,
adaptive, efficient, and extensible system.
MIT, Cornell University
Abstract
Cutting-edge research in Artificial Intelligence (AI) requires considerable
resources, including Graphics Processing Units (GPUs), data, and human
resources. In this paper, we evaluate of the relationship between these
resources and the scientific advancement of foundation models (FM). We reviewed
6517 FM papers published between 2022 to 2024, and surveyed 229 first-authors
to the impact of computing resources on scientific output. We find that
increased computing is correlated with national funding allocations and
citations, but our findings don't observe the strong correlations with research
environment (academic or industrial), domain, or study methodology. We advise
that individuals and institutions focus on creating shared and affordable
computing opportunities to lower the entry barrier for under-resourced
researchers. These steps can help expand participation in FM research, foster
diversity of ideas and contributors, and sustain innovation and progress in AI.
The data will be available at: https://mit-calc.csail.mit.edu/
AI Insights - Only 12âŻ% of 6,517 papers disclosed GPU specs, revealing a major transparency gap.
- Dataset cost reporting was sparse and often underestimated, obscuring true financial burden.
- Human laborâannotators and researchersâwas underreported, masking effort behind models.
- The 122âresponse survey had a low rate, hinting at bias and the need for broader participation.
- Authors should follow the âComputational Resources for Machine Learningâ guide and âReproducibility in Machine Learningâ book for standardized reporting.
- Key resources: arXiv survey 2203.00001 and Kaggle datasets for benchmarking.
- Computational resources = hardware/software for ML; reproducibility = ability to replicate results with same methods and data.