Beijing Institute of Tech
Abstract
A growing trend in modern data analysis is the integration of data management
with learning, guided by accuracy, latency, and cost requirements. In practice,
applications draw data of different formats from many sources. In the
meanwhile, the objectives and budgets change over time. Existing systems handle
these applications across databases, analysis libraries, and tuning services.
Such fragmentation leads to complex user interaction, limited adaptability,
suboptimal performance, and poor extensibility across components. To address
these challenges, we present Aixel, a unified, adaptive, and extensible system
for AI-powered data analysis. The system organizes work across four layers:
application, task, model, and data. The task layer provides a declarative
interface to capture user intent, which is parsed into an executable operator
plan. An optimizer compiles and schedules this plan to meet specified goals in
accuracy, latency, and cost. The task layer coordinates the execution of data
and model operators, with built-in support for reuse and caching to improve
efficiency. The model layer offers versioned storage for index, metadata,
tensors, and model artifacts. It supports adaptive construction, task-aligned
drift detection, and safe updates that reuse shared components. The data layer
provides unified data management capabilities, including indexing,
constraint-aware discovery, task-aligned selection, and comprehensive feature
management. With the above designed layers, Aixel delivers a user friendly,
adaptive, efficient, and extensible system.
Abstract
In Machine Learning (ML), a regression algorithm aims to minimize a loss
function based on data. An assessment method in this context seeks to quantify
the discrepancy between the optimal response for an input-output system and the
estimate produced by a learned predictive model (the student). Evaluating the
quality of a learned regressor remains challenging without access to the true
data-generating mechanism, as no data-driven assessment method can ensure the
achievability of global optimality. This work introduces the Information
Teacher, a novel data-driven framework for evaluating regression algorithms
with formal performance guarantees to assess global optimality. Our novel
approach builds on estimating the Shannon mutual information (MI) between the
input variables and the residuals and applies to a broad class of additive
noise models. Through numerical experiments, we confirm that the Information
Teacher is capable of detecting global optimality, which is aligned with the
condition of zero estimation error with respect to the -- inaccessible, in
practice -- true model, working as a surrogate measure of the ground truth
assessment loss and offering a principled alternative to conventional empirical
performance metrics.