Hi!

Your personalized paper recommendations for 08 to 12 December, 2025.

🎯 Top Personalized Recommendations

Designing AI-Resilient Assessments Using Interconnected Problems: A Theoretically Grounded and Empirically Validated Framework

University of Pennsylvann

Rate paper: 👍 👎 ♥ Save

AI Summary

They are designed to assess student learning outcomes more accurately than traditional multiple-choice tests. [3]
The paper discusses the challenges posed by large language models (LLMs) in education, particularly in assessing student learning and academic integrity. [2]
The authors propose a framework for designing such assessments, which includes elements of task complexity, interactivity, and feedback. [1]
It highlights the need for more authentic assessments that reflect real-world problems and require critical thinking and problem-solving skills. [0]

Abstract
The rapid adoption of generative AI has undermined traditional modular assessments in computing education, creating a disconnect between academic evaluation and industry practice. This paper presents a theoretically grounded framework for designing AI-resilient assessments, supported by formal analysis and multi-year empirical validation. We make three contributions. First, we establish two theoretical results: (1) assessments composed of interconnected problems, where outputs feed into subsequent stages, are more AI-resilient than modular assessments because current language models struggle with sustained multi-step reasoning and context; and (2) semi-structured problems with deterministic success criteria provide more reliable measures of student competency than fully open-ended projects, which allow AI systems to default to familiar solution patterns. These results challenge common policy and institutional guidance that promotes open-ended assessments as the primary safeguard for academic integrity. Second, we validate these results using data from four university data science courses (N = 138). While students achieve near-perfect scores on AI-assisted modular homework, performance drops by roughly 30 percentage points on proctored exams, indicating substantial AI score inflation. Interconnected projects remain strongly correlated with modular assessments, suggesting they measure the same underlying skills while resisting AI misuse. Proctored exams show weaker alignment, implying they may assess test-taking ability rather than intended learning outcomes. Third, we translate these findings into a practical assessment design framework. The proposed approach enables educators to create assessments that promote integrative thinking, reflect real-world AI-augmented workflows, and naturally resist trivial delegation to generative AI, thereby helping restore academic integrity.

Why we think this paper is great for you:
This paper directly addresses the need for robust assessments, aligning with the user’s interest in machine learning resilience and testing. It offers a framework for designing evaluations that can withstand the challenges posed by AI, a key area of concern.

Intelligent Resilience Testing for Decision-Making Agents with Dual-Mode Surrogate Adaptation

Tsinghua University

Rate paper: 👍 👎 ♥ Save

AI Summary

IRTest effectively reduces the surrogate-to-real gap with relatively few tests. [2]

Abstract
Testing and evaluating decision-making agents remains challenging due to unknown system architectures, limited access to internal states, and the vastness of high-dimensional scenario spaces. Existing testing approaches often rely on surrogate models of decision-making agents to generate large-scale scenario libraries; however, discrepancies between surrogate models and real decision-making agents significantly limit their generalizability and practical applicability. To address this challenge, this paper proposes intelligent resilience testing (IRTest), a unified online adaptive testing framework designed to rapidly adjust to diverse decision-making agents. IRTest initializes with an offline-trained surrogate prediction model and progressively reduces surrogate-to-real gap during testing through two complementary adaptation mechanisms: (i) online neural fine-tuning in data-rich regimes, and (ii) lightweight importance-sampling-based weighting correction in data-limited regimes. A Bayesian optimization strategy, equipped with bias-corrected acquisition functions, guides scenario generation to balance exploration and exploitation in complex testing spaces. Extensive experiments across varying levels of task complexity and system heterogeneity demonstrate that IRTest consistently improves failure-discovery efficiency, testing robustness, and cross-system generalizability. These results highlight the potential of IRTest as a practical solution for scalable, adaptive, and resilient testing of decision-making agents.

Why we think this paper is great for you:
The focus on testing decision-making agents resonates with the user's interest in machine learning resilience and fault tolerance. The approach of using surrogate models is a common technique for evaluating complex systems, directly relevant to their interests.

Reusability in MLOps: Leveraging Ports and Adapters to Build a Microservices Architecture for the Maritime Domain

Jheronimus Academy of

Rate paper: 👍 👎 ♥ Save

AI Summary

CORE business logic: The core functionality of the system, which is not part of the PORTS or ADAPTERS. [3]
The OCEANGUARD tool is an extensible Machine Learning System (MLES) that aims to analyze and detect anomalies across multiple types of data from the maritime domain. [2]
The authors faced two major challenges during development: generality, related to defining PORTS that are specific and dependency-agnostic, and separation of concerns, related to defining ADAPTERS that are distinct and logic-thin. [1]

Abstract
ML-Enabled Systems (MLES) are inherently complex since they require multiple components to achieve their business goal. This experience report showcases the software architecture reusability techniques applied while building Ocean Guard, an MLES for anomaly detection in the maritime domain. In particular, it highlights the challenges and lessons learned to reuse the Ports and Adapters pattern to support building multiple microservices from a single codebase. This experience report hopes to inspire software engineers, machine learning engineers, and data scientists to apply the Hexagonal Architecture pattern to build their MLES.

Why we think this paper is great for you:
This paper’s exploration of MLOps and reusability aligns strongly with the user’s interest in MLOps, data science development environments, and building robust systems. The microservices architecture is a key component of modern MLOps.

Finite-Sample Failures and Condition-Number Diagnostics in Double Machine Learning

Universidad del Pacfico

Rate paper: 👍 👎 ♥ Save

AI Summary

The problem statement is not provided. [3]
Please provide the problem you'd like help with, and I'll do my best to assist you. [3]
Lack of problem statement Technical complexity The text appears to be a mathematical derivation related to Doubly Robust Estimation (DRE) in statistics. [3]
It discusses the concept of Neyman orthogonality, empirical scores, and condition numbers in the context of DML estimators. [3]
This text is about a statistical concept called Doubly Robust Estimation. [3]
It's a way to estimate parameters in a model, and it involves using two different methods to get the same result. [3]
The text talks about how this method works and some of its properties. [3]

Abstract
Standard Double Machine Learning (DML; Chernozhukov et al., 2018) confidence intervals can exhibit substantial finite-sample coverage distortions when the underlying score equations are ill-conditioned, even if nuisance functions are estimated with state-of-the-art methods. Focusing on the partially linear regression (PLR) model, we show that a simple, easily computed condition number for the orthogonal score, denoted kappa_DML := 1 / |J_theta|, largely determines when DML inference is reliable. Our first result derives a nonasymptotic, Berry-Esseen-type bound showing that the coverage error of the usual DML t-statistic is of order n^{-1/2} + sqrt(n) * r_n, where r_n is the standard DML remainder term summarizing nuisance estimation error. Our second result provides a refined linearization in which both estimation error and confidence interval length scale as kappa_DML / sqrt(n) + kappa_DML * r_n, so that ill-conditioning directly inflates both variance and bias. These expansions yield three conditioning regimes - well-conditioned, moderately ill-conditioned, and severely ill-conditioned - and imply that informative, shrinking confidence sets require kappa_DML = o_p(sqrt(n)) and kappa_DML * r_n -> 0. We conduct Monte Carlo experiments across overlap levels, nuisance learners (OLS, Lasso, random forests), and both low- and high-dimensional (p > n) designs. Across these designs, kappa_DML is highly predictive of finite-sample performance: well-conditioned designs with kappa_DML < 1 deliver near-nominal coverage with short intervals, whereas severely ill-conditioned designs can exhibit large bias and coverage around 40% for nominal 95% intervals, despite flexible nuisance fitting. We propose reporting kappa_DML alongside DML estimates as a routine diagnostic of score conditioning, in direct analogy to condition-number checks and weak-instrument diagnostics in IV settings.

Why we think this paper is great for you:
The focus on finite-sample failures and condition numbers directly addresses the user's interest in machine learning resilience and diagnostics. Understanding these issues is crucial for building reliable ML systems.

Verifiable Deep Quantitative Group Testing

Indian Institute of Techn

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
We present a neural network-based framework for solving the quantitative group testing (QGT) problem that achieves both high decoding accuracy and structural verifiability. In QGT, the objective is to identify a small subset of defective items among $N$ candidates using only $M \ll N$ pooled tests, each reporting the number of defectives in the tested subset. We train a multi-layer perceptron to map noisy measurement vectors to binary defect indicators, achieving accurate and robust recovery even under sparse, bounded perturbations. Beyond accuracy, we show that the trained network implicitly learns the underlying pooling structure that links items to tests, allowing this structure to be recovered directly from the network's Jacobian. This indicates that the model does not merely memorize training patterns but internalizes the true combinatorial relationships governing QGT. Our findings reveal that standard feedforward architectures can learn verifiable inverse mappings in structured combinatorial recovery problems.

Why we think this paper is great for you:
This paper’s exploration of verifiable deep quantitative group testing aligns with the user's interest in machine learning testing and fault tolerance. The concept of structural verifiability is a key element in ensuring system robustness.

TriHaRd: Higher Resilience for TEE Trusted Time

INSA Lyon

Rate paper: 👍 👎 ♥ Save

AI Summary

TriHaRd has high resilience against attacks, including those that slow down or speed up the clock, and reduces the attack power compared to Triad. [3]
TriHaRd is a protocol that provides trusted time to TEE-based systems by verifying the TEE clock's consistency with peers in a Byzantine-resilient manner. [2]
TEE (Trusted Execution Environment): A secure environment within a system where sensitive data can be processed without compromising security. [1]

Abstract
Accurately measuring time passing is critical for many applications. However, in Trusted Execution Environments (TEEs) such as Intel SGX, the time source is outside the Trusted Computing Base: a malicious host can manipulate the TEE's notion of time, jumping in time or affecting perceived time speed. Previous work (Triad) proposes protocols for TEEs to maintain a trustworthy time source by building a cluster of TEEs that collaborate with each other and with a remote Time Authority to maintain a continuous notion of passing time. However, such approaches still allow an attacker to control the operating system and arbitrarily manipulate their own TEE's perceived clock speed. An attacker can even propagate faster passage of time to honest machines participating in Triad's trusted time protocol, causing them to skip to timestamps arbitrarily far in the future. We propose TriHaRd, a TEE trusted time protocol achieving high resilience against clock speed and offset manipulations, notably through Byzantine-resilient clock updates and consistency checks. We empirically show that TriHaRd mitigates known attacks against Triad.

Why we think this paper is great for you:
Given the user's interest in machine learning resilience, this paper’s focus on time measurement in Trusted Execution Environments (TEEs) is highly relevant. TEEs are increasingly important for secure and reliable ML deployments.

Building a Data Dashboard for Magic: The Gathering: Initial Design Considerations

University of Lisbon

Rate paper: 👍 👎 ♥ Save

AI Summary

The system aims to help players analyze and understand their gameplay data. [3]
They developed a prototype that incorporates various visualization techniques, including network analysis, time-series plots, and scatterplots. [3]
Visual analytics: The use of interactive visualizations to support analytical reasoning and decision-making. [3]
Co-creation: A process where users collaborate with designers to create a product or service that meets their needs. [3]
The system's effectiveness is attributed to its ability to provide actionable insights through interactive visualizations. [3]
The paper discusses the design of a visual analytics system for Magic: The Gathering, a popular trading card game. [2]

Abstract
This paper presents the initial stages of a design study aimed at developing a dashboard to visualize gameplay data of the Commander format from Magic: The Gathering. We conducted a user-task analysis to identify requirements for a data visualization dashboard tailored to the Commander format. Afterwards, we proposed a design for the dashboard leveraging visualizations to address players' needs and pain points for typical data analysis tasks in the context domain. Then, we followed-up with a structured user test to evaluate players' comprehension and preferences of data visualizations. Results show that players prioritize contextually relevant, outcome-driven metrics over peripheral ones, and that canonical charts like heatmaps and line charts support higher comprehension than complex ones such as scatterplots or icicle plots. Our findings also highlight the importance of localized views, user customization, and progressive disclosure, emphasizing that adaptability and contextual relevance are as essential as accuracy in effective dashboard design. Our study contributes practical design guidelines for data visualization in gaming contexts and highlights broader implications for engagement-driven dashboards.

Why we think this paper is great for you:
The paper’s focus on visualizing gameplay data from Magic: The Gathering aligns with the user’s interest in data science development tools and potentially data-driven insights within a complex system.

Fault tolerance

Error Mitigation of Fault-Tolerant Quantum Circuits with Soft Information

Yale University

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Quantum error mitigation (QEM) is typically viewed as a suite of practical techniques for today's noisy intermediate-scale quantum devices, with limited relevance once fault-tolerant quantum computers become available. In this work, we challenge this conventional wisdom by showing that QEM can continue to provide substantial benefits in the era of quantum error correction (QEC), and in an even more efficient manner than it does on current devices. We introduce a framework for logical-level QEM that leverages soft information naturally produced by QEC decoders, requiring no additional data, hardware modifications, or runtime overhead beyond what QEC protocols already provide. Within this framework, we develop and analyze three logical-level QEM techniques: post-selection and runtime abort policies, probabilistic error cancellation, and zero-noise extrapolation. Our techniques reduce logical error rates by more than 100x while discarding fewer than 0.1% of shots; they also provide in situ characterization of logical channels for QEM protocols. As a proof of principle, we benchmark our approach using a surface-code architecture and two state-of-the-art decoders based on tensor-network contraction and minimum-weight perfect matching. We evaluate logical-level QEM on random Clifford circuits and molecular simulation algorithms and find that, compared to previous approaches relying on QEC only or QEC combined with QEM, we can achieve up to 87.4% spacetime overhead savings. Our results demonstrate that logical-level QEM with QEC decoder soft information can reliably improve logical performance, underscoring the efficiency and usefulness of QEM techniques for fault-tolerant quantum computers.

Machine Learning Infrastructure

High-Dimensional Data Processing: Benchmarking Machine Learning and Deep Learning Architectures in Local and Distributed Environments

Universidad de Guanajuato

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
This document reports the sequence of practices and methodologies implemented during the Big Data course. It details the workflow beginning with the processing of the Epsilon dataset through group and individual strategies, followed by text analysis and classification with RestMex and movie feature analysis with IMDb. Finally, it describes the technical implementation of a distributed computing cluster with Apache Spark on Linux using Scala.

AI Summary

In the big data era, data completeness can be as important as algorithm sophistication. [3]
Big Data Analytics Distributed Computing Scalability Algorithm Sophistication Data Completeness The chronological progression demonstrates that mastering big data requires a systematic approach. [3]
The choice between local and distributed architectures is not merely about computational resources, but about the quality and completeness of the data available to the model. [2]

A Privacy-Preserving Cloud Architecture for Distributed Machine Learning at Scale

Texas

Rate paper: 👍 👎 ♥ Save

Abstract
Distributed machine learning systems require strong privacy guarantees, verifiable compliance, and scalable deploy- ment across heterogeneous and multi-cloud environments. This work introduces a cloud-native privacy-preserving architecture that integrates federated learning, differential privacy, zero- knowledge compliance proofs, and adaptive governance powered by reinforcement learning. The framework supports secure model training and inference without centralizing sensitive data, while enabling cryptographically verifiable policy enforcement across institutions and cloud platforms. A full prototype deployed across hybrid Kubernetes clusters demonstrates reduced membership- inference risk, consistent enforcement of formal privacy budgets, and stable model performance under differential privacy. Ex- perimental evaluation across multi-institution workloads shows that the architecture maintains utility with minimal overhead while providing continuous, risk-aware governance. The pro- posed framework establishes a practical foundation for deploying trustworthy and compliant distributed machine learning systems at scale.

Online inference

Motivated Reasoning and Information Aggregation

Stanford

Rate paper: 👍 👎 ♥ Save

Abstract
If agents engage in motivated reasoning, how does that affect the aggregation of information in society? We study the effects of motivated reasoning in two canonical settings - the Condorcet jury theorem (CJT), and the sequential social learning model (SLM). We define a notion of motivated reasoning that applies to these and a broader class of other settings, and contrast it to other approaches in the literature. We show for the CJT that information aggregates in the large electorate limit even with motivated reasoning. When signal quality differs across states, increasing motivation improves welfare in the state with the more informative signal and worsens it in the other state. In the SLM, motivated reasoning improves information aggregation up to a point; but if agents place too little weight on truth-seeking, this can lead to worse aggregation relative to the fully Bayesian benchmark.

AI Summary

Welfare (WN): The expected total utility of the population at a given restaurant choice probability ρ and population size N. [3]
The problem involves finding the maximum welfare for a given population size N and restaurant choice probability ρ. [2]
Stopping time (n∗): The first time the process stops. [1]

Inference for Batched Adaptive Experiments

University of Mannheim

Rate paper: 👍 👎 ♥ Save

Abstract
The advantages of adaptive experiments have led to their rapid adoption in economics, other fields, as well as among practitioners. However, adaptive experiments pose challenges for causal inference. This note suggests a BOLS (batched ordinary least squares) test statistic for inference of treatment effects in adaptive experiments. The statistic provides a precision-equalizing aggregation of per-period treatment-control differences under heteroskedasticity. The combined test statistic is a normalized average of heteroskedastic per-period z-statistics and can be used to construct asymptotically valid confidence intervals. We provide simulation results comparing rejection rates in the typical case with few treatment periods and few (or many) observations per batch.

Data Science Development Tools

dtreg: Describing Data Analysis in Machine-Readable Format in Python and R

TIB Hannover

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
For scientific knowledge to be findable, accessible, interoperable, and reusable, it needs to be machine-readable. Moving forward from post-publication extraction of knowledge, we adopted a pre-publication approach to write research findings in a machine-readable format at early stages of data analysis. For this purpose, we developed the package dtreg in Python and R. Registered and persistently identified data types, aka schemata, which dtreg applies to describe data analysis in a machine-readable format, cover the most widely used statistical tests and machine learning methods. The package supports (i) downloading a relevant schema as a mutable instance of a Python or R class, (ii) populating the instance object with metadata about data analysis, and (iii) converting the object into a lightweight Linked Data format. This paper outlines the background of our approach, explains the code architecture, and illustrates the functionality of dtreg with a machine-readable description of a t-test on Iris Data. We suggest that the dtreg package can enhance the methodological repertoire of researchers aiming to adhere to the FAIR principles.

AI Summary

Some sections of the paper seem disconnected from the main topic. [3]
FAIR data management and stewardship Scientific workflow management Machine-readable expressions of research findings The paper could benefit from a clearer explanation of the technical aspects of machine-readable expressions of research findings. [2]
Imagine you're working on a scientific project and you want to share your results with others. [1]

Machine Learning Operations

Machine learning in an expectation-maximisation framework for nowcasting

KU Leuven

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Decision making often occurs in the presence of incomplete information, leading to the under- or overestimation of risk. Leveraging the observable information to learn the complete information is called nowcasting. In practice, incomplete information is often a consequence of reporting or observation delays. In this paper, we propose an expectation-maximisation (EM) framework for nowcasting that uses machine learning techniques to model both the occurrence as well as the reporting process of events. We allow for the inclusion of covariate information specific to the occurrence and reporting periods as well as characteristics related to the entity for which events occurred. We demonstrate how the maximisation step and the information flow between EM iterations can be tailored to leverage the predictive power of neural networks and (extreme) gradient boosting machines (XGBoost). With simulation experiments, we show that we can effectively model both the occurrence and reporting of events when dealing with high-dimensional covariate information. In the presence of non-linear effects, we show that our methodology outperforms existing EM-based nowcasting frameworks that use generalised linear models in the maximisation step. Finally, we apply the framework to the reporting of Argentinian Covid-19 cases, where the XGBoost-based approach again is most performant.

AI Summary

The authors generate simulated data with different specifications for the coefficient vectors, including both linear and non-linear effects. [3]
The authors' findings have implications for various applications, including epidemiology and insurance. [3]
The authors do not provide any evidence to support this assumption in real-world data. [3]
The paper presents a simulation study on event occurrence and reporting patterns using a Poisson-Multinomial model. [2]
The simulation study relies heavily on the assumption that the data is generated from a Poisson-Multinomial distribution. [1]

The Interplay of Statistics and Noisy Optimization: Learning Linear Predictors with Random Data Weights

University of California

Rate paper: 👍 👎 ♥ Save

Abstract
We analyze gradient descent with randomly weighted data points in a linear regression model, under a generic weighting distribution. This includes various forms of stochastic gradient descent, importance sampling, but also extends to weighting distributions with arbitrary continuous values, thereby providing a unified framework to analyze the impact of various kinds of noise on the training trajectory. We characterize the implicit regularization induced through the random weighting, connect it with weighted linear regression, and derive non-asymptotic bounds for convergence in first and second moments. Leveraging geometric moment contraction, we also investigate the stationary distribution induced by the added noise. Based on these results, we discuss how specific choices of weighting distribution influence both the underlying optimization problem and statistical properties of the resulting estimator, as well as some examples for which weightings that lead to fast convergence cause bad statistical performance.

Machine Learning Deployment

Monitoring Deployed AI Systems in Health Care

Stanford

Rate paper: 👍 👎 ♥ Save

Abstract
Post-deployment monitoring of artificial intelligence (AI) systems in health care is essential to ensure their safety, quality, and sustained benefit-and to support governance decisions about which systems to update, modify, or decommission. Motivated by these needs, we developed a framework for monitoring deployed AI systems grounded in the mandate to take specific actions when they fail to behave as intended. This framework, which is now actively used at Stanford Health Care, is organized around three complementary principles: system integrity, performance, and impact. System integrity monitoring focuses on maximizing system uptime, detecting runtime errors, and identifying when changes to the surrounding IT ecosystem have unintended effects. Performance monitoring focuses on maintaining accurate system behavior in the face of changing health care practices (and thus input data) over time. Impact monitoring assesses whether a deployed system continues to have value in the form of benefit to clinicians and patients. Drawing on examples of deployed AI systems at our academic medical center, we provide practical guidance for creating monitoring plans based on these principles that specify which metrics to measure, when those metrics should be reviewed, who is responsible for acting when metrics change, and what concrete follow-up actions should be taken-for both traditional and generative AI. We also discuss challenges to implementing this framework, including the effort and cost of monitoring for health systems with limited resources and the difficulty of incorporating data-driven monitoring practices into complex organizations where conflicting priorities and definitions of success often coexist. This framework offers a practical template and starting point for health systems seeking to ensure that AI deployments remain safe and effective over time.

AI Summary

Traditional and generative AI systems require unique monitoring considerations for deployment in clinical systems. [3]
Performance monitoring: Evaluates the longitudinal accuracy and quality of AI system outputs to detect drift. [3]
Impact monitoring: Verifies if the AI system produces sustained benefits to patients, health system staff, or health system finances over time. [3]
The framework is applicable to both traditional and generative AI systems and can be tailored to specific use cases and deployments. [3]
Post-deployment AI monitoring is crucial for ensuring the safety and effectiveness of AI systems in healthcare. [2]

Machine Learning Validation

Learning to Split: A Reinforcement-Learning-Guided Splitting Heuristic for Neural Network Verification

The Hebrew University of

Rate paper: 👍 👎 ♥ Save

Abstract
State-of-the-art neural network verifiers operate by encoding neural network verification as constraint satisfaction problems. When dealing with standard piecewise-linear activation functions, such as ReLUs, verifiers typically employ branching heuristics that break a complex constraint satisfaction problem into multiple, simpler problems. The verifier's performance depends heavily on the order in which this branching is performed: a poor selection may give rise to exponentially many sub-problem, hampering scalability. Here, we focus on the setting where multiple verification queries must be solved for the same neural network. The core idea is to use past experience to make good branching decisions, expediting verification. We present a reinforcement-learning-based branching heuristic that achieves this, by applying a learning from demonstrations (DQfD) techniques. Our experimental evaluation demonstrates a substantial reduction in average verification time and in the average number of iterations required, compared to modern splitting heuristics. These results highlight the great potential of reinforcement learning in the context of neural network verification.

Model Monitoring

A model-free Screening procedure

Universit e Paris Cit e

Rate paper: 👍 👎 ♥ Save

Abstract
In this article, we propose a generic screening method for selecting explanatory variables correlated with the response variable Y . We make no assumptions about the existence of a model that could link Y with a subset of explanatory variables, nor about the distribution of the variables. Our procedure can therefore be described as ''model-free'' and can be applied in a wide range of situations. In order to obtain precise theoretical guarantees (Sure Screening Property and control of the False Positive Rate), we establish a Berry-Esseen type inequality for the studentized statistic of the slope estimator. We illustrate our selection procedure using two simulated examples and a real data set.

AI Summary

The final bound obtained is a function of several parameters related to the distributions of X and Y, including their means, variances, and higher moments. [3]
Berry-Esseen inequality: A mathematical inequality that provides an upper bound for the distance between the distribution of a sum of independent random variables and the normal distribution. [3]
The derivation is complex and requires careful manipulation of inequalities and algebraic expressions. [3]
The problem statement involves deriving an upper bound for the probability of a certain event related to random variables X and Y, with specific conditions on their distributions and parameters. [2]
The solution involves applying various mathematical techniques to derive bounds for different terms in the expression, ultimately leading to the final bound obtained. [1]
The derivation is quite complex and requires careful manipulation of inequalities and algebraic expressions. [0]

NeurIDA: Dynamic Modeling for Effective In-Database Analytics

National University of

Rate paper: 👍 👎 ♥ Save

Abstract
Relational Database Management Systems (RDBMS) manage complex, interrelated data and support a broad spectrum of analytical tasks. With the growing demand for predictive analytics, the deep integration of machine learning (ML) into RDBMS has become critical. However, a fundamental challenge hinders this evolution: conventional ML models are static and task-specific, whereas RDBMS environments are dynamic and must support diverse analytical queries. Each analytical task entails constructing a bespoke pipeline from scratch, which incurs significant development overhead and hence limits wide adoption of ML in analytics. We present NeurIDA, an autonomous end-to-end system for in-database analytics that dynamically "tweaks" the best available base model to better serve a given analytical task. In particular, we propose a novel paradigm of dynamic in-database modeling to pre-train a composable base model architecture over the relational data. Upon receiving a task, NeurIDA formulates the task and data profile to dynamically select and configure relevant components from the pool of base models and shared model components for prediction. For friendly user experience, NeurIDA supports natural language queries; it interprets user intent to construct structured task profiles, and generates analytical reports with dedicated LLM agents. By design, NeurIDA enables ease-of-use and yet effective and efficient in-database AI analytics. Extensive experiment study shows that NeurIDA consistently delivers up to 12% improvement in AUC-ROC and 25% relative reduction in MAE across ten tasks on five real-world datasets. The source code is available at https://github.com/Zrealshadow/NeurIDA

AI Summary

NeurIDA outperforms all baselines across various tasks and datasets. [3]
NeurIDA achieves state-of-the-art results in classification (AUC-ROC) and regression (MAE) metrics. [3]
The effectiveness of NeurIDA is demonstrated through its ability to improve upon existing models, including those with automated learning approaches. [3]
AUC-ROC: Area Under the Receiver Operating Characteristic Curve, a metric used for classification tasks to evaluate model performance. [3]
MAE: Mean Absolute Error, a metric used for regression tasks to evaluate model performance. [3]
Early Stopping: A technique used in training neural networks where the training process is stopped when the model's performance on the validation set starts to degrade. [3]
NeurIDA demonstrates superior performance across various classification and regression tasks, outperforming existing models with or without automated learning approaches. [3]
The effectiveness of NeurIDA can be attributed to its ability to incorporate relational structure into tuple representations, leading to improved model performance. [3]
The paper does not provide an extensive analysis of the computational complexity of NeurIDA. [3]
The evaluation metrics used are limited to AUC-ROC and MAE, which may not capture other aspects of model performance. [3]

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

Data Science Development Environment and Productivity
Machine Learning Lifecycle

You can edit or add more interests any time.

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback