Hi!

Your personalized paper recommendations for 15 to 19 December, 2025.

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Peking University

Rate paper: 👍 👎 ♥ Save

AI Insights

It organizes operators along multiple orthogonal categorization dimensions, including modality, core vs. [3]
PyTorch: An open-source machine learning library for Python. [3]
domain-specific, and functional categories. [2]
DataFlow is a unified data preparation framework that supports end-to-end LLM data preparation workflows. [1]

Abstract
The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current practices remain dominated by ad-hoc scripts and loosely specified workflows, which lack principled abstractions, hinder reproducibility, and offer limited support for model-in-the-loop data generation. To address these challenges, we present DataFlow, a unified and extensible LLM-driven data preparation framework. DataFlow is designed with system-level abstractions that enable modular, reusable, and composable data transformations, and provides a PyTorch-style pipeline construction API for building debuggable and optimizable dataflows. The framework consists of nearly 200 reusable operators and six domain-general pipelines spanning text, mathematical reasoning, code, Text-to-SQL, agentic RAG, and large-scale knowledge extraction. To further improve usability, we introduce DataFlow-Agent, which automatically translates natural-language specifications into executable pipelines via operator synthesis, pipeline planning, and iterative verification. Across six representative use cases, DataFlow consistently improves downstream LLM performance. Our math, code, and text pipelines outperform curated human datasets and specialized synthetic baselines, achieving up to +3\% execution accuracy in Text-to-SQL over SynSQL, +7\% average improvements on code benchmarks, and 1--3 point gains on MATH, GSM8K, and AIME. Moreover, a unified 10K-sample dataset produced by DataFlow enables base models to surpass counterparts trained on 1M Infinity-Instruct data. These results demonstrate that DataFlow provides a practical and high-performance substrate for reliable, reproducible, and scalable LLM data preparation, and establishes a system-level foundation for future data-centric AI development.

Why we are recommending this paper?
Due to your Interest in: Data Science Development Tools

This paper directly addresses the need for robust data preparation pipelines, a critical component of MLOps and data science development environments. Given the increasing reliance on LLMs, this framework offers a valuable solution for automating and streamlining data workflows, aligning with the user's interests.

A Data Annotation Requirements Representation and Specification (DARS)

University of Gothenburg

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
With the rise of AI-enabled cyber-physical systems, data annotation has become a critical yet often overlooked process in the development of these intelligent information systems. Existing work in requirements engineering (RE) has explored how requirements for AI systems and their data can be represented. However, related interviews with industry professionals show that data annotations and their related requirements introduce distinct challenges, indicating a need for annotation-specific requirement representations. We propose the Data Annotation Requirements Representation and Specification (DARS), including an Annotation Negotiation Card to align stakeholders on objectives and constraints, and a Scenario-Based Annotation Specification to express atomic and verifiable data annotation requirements. We evaluate DARS with an automotive perception case related to an ongoing project, and a mapping against 18 real-world data annotation error types. The results suggest that DARS mitigates root causes of completeness, accuracy, and consistency annotation errors. By integrating DARS into RE, this work improves the reliability of safety-critical systems using data annotations and demonstrates how engineering frameworks must evolve for data-dependent components of today's intelligent information systems.

Why we are recommending this paper?
Due to your Interest in: Data Science Development Tools

This work tackles the crucial aspect of data annotation, a key element in Machine Learning Validation and Model Monitoring. The paper’s focus on requirements engineering provides a structured approach to ensure data quality, directly supporting the user’s interests in robust ML systems.

From Risk to Resilience: Towards Assessing and Mitigating the Risk of Data Reconstruction Attacks in Federated Learning

Beijing Jiaotong Universt

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Data Reconstruction Attacks (DRA) pose a significant threat to Federated Learning (FL) systems by enabling adversaries to infer sensitive training data from local clients. Despite extensive research, the question of how to characterize and assess the risk of DRAs in FL systems remains unresolved due to the lack of a theoretically-grounded risk quantification framework. In this work, we address this gap by introducing Invertibility Loss (InvLoss) to quantify the maximum achievable effectiveness of DRAs for a given data instance and FL model. We derive a tight and computable upper bound for InvLoss and explore its implications from three perspectives. First, we show that DRA risk is governed by the spectral properties of the Jacobian matrix of exchanged model updates or feature embeddings, providing a unified explanation for the effectiveness of defense methods. Second, we develop InvRE, an InvLoss-based DRA risk estimator that offers attack method-agnostic, comprehensive risk evaluation across data instances and model architectures. Third, we propose two adaptive noise perturbation defenses that enhance FL privacy without harming classification accuracy. Extensive experiments on real-world datasets validate our framework, demonstrating its potential for systematic DRA risk evaluation and mitigation in FL systems.

Why we are recommending this paper?
Due to your Interest in: Machine Learning Resilience

This paper addresses a significant security concern – data reconstruction attacks – within Federated Learning, a growing area of interest. Understanding and mitigating these risks is essential for building resilient MLOps and robust data infrastructure.

Empirical Likelihood Meets Prediction-Powered Inference

Nankai University

Rate paper: 👍 👎 ♥ Save

AI Insights

The method, called EPI, uses empirical likelihood to choose weights on the labeled sample, reconciling supervised and prediction-based information while keeping the target defined by the original estimating equation. [3]
It also provides a flexible framework for incorporating auxiliary moment conditions built from predictions, which can be used to improve estimation performance. [3]
The paper assumes that the prediction model is externally provided, which may not be realistic in practice. [3]
The paper builds on previous work in prediction-powered inference, including Angelopoulos et al. [3]
It also discusses connections to other areas of research, such as semi-supervised learning, high-dimensional regression, and distributional functionals. [3]
The method, called EPI, uses empirical likelihood to choose weights on the labeled sample, reconciling supervised and prediction-based information while keeping the target defined by the original estimating equation. [3]
The method uses empirical likelihood to choose weights on the labeled sample, reconciling supervised and prediction-based information while keeping the target defined by the original estimating equation. [3]
The paper proposes an empirical likelihood framework for prediction-powered inference, which combines supervised estimating equations with auxiliary moment conditions built from predictions. [2]

Abstract
We study inference with a small labeled sample, a large unlabeled sample, and high-quality predictions from an external model. We link prediction-powered inference with empirical likelihood by stacking supervised estimating equations based on labeled outcomes with auxiliary moment conditions built from predictions, and then optimizing empirical likelihood under these joint constraints. The resulting empirical likelihood-based prediction-powered inference (EPI) estimator is asymptotically normal, has asymptotic variance no larger than the fully supervised estimator, and attains the semiparametric efficiency bound when the auxiliary functions span the predictable component of the supervised score. For hypothesis testing and confidence sets, empirical likelihood ratio statistics admit chi-squared-type limiting distributions. As a by-product, the empirical likelihood weights induce a calibrated empirical distribution that integrates supervised and prediction-based information, enabling estimation and uncertainty quantification for general functionals beyond parameters defined by estimating equations. We present two practical implementations: one based on basis expansions in the predictions and covariates, and one that learns an approximately optimal auxiliary function by cross-fitting. In simulations and applications, EPI reduces mean squared error and shortens confidence intervals while maintaining nominal coverage.

Why we are recommending this paper?
Due to your Interest in: Machine Learning Infrastructure

This research explores inference techniques using predictions, a core concept in online inference and model monitoring. The approach aligns with the user's interest in leveraging external models for improved data analysis and validation.

Machine learning discovers new champion codes

OpenAI

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

They then trained a two-stage neural network with a pre-training stage on a larger dataset and a training stage on the full dataset, achieving good performance on predicting minimum Hamming distance. [3]
The authors used machine learning to predict the minimum Hamming distance of generalised toric codes over finite fields F7 and F8. [2]
They generated a dataset of 100,000 codes for each field using Magma, with the number of lattice points in [0,m]2 ranging from 5 to 31 for F7 and 4 to 17 for F8. [1]

Abstract
Linear error-correcting codes form the mathematical backbone of modern digital communication and storage systems, but identifying champion linear codes (linear codes achieving or exceeding the best known minimum Hamming distance) remains challenging. By training a transformer to predict the minimum Hamming distance of a class of linear codes and pairing it with a genetic algorithm over the search space, we develop a novel method for discovering champion codes. This model effectively reduces the search space of linear codes needed to achieve champion codes. Our results present the use of this method in the study and construction of error-correcting codes, applicable to codes such as generalised toric, Reed-Muller, Bose-Chaudhuri-Hocquenghem, algebrogeometric, and potentially quantum codes.

Why we are recommending this paper?
Due to your Interest in: Machine Learning Testing

This paper investigates the use of machine learning to optimize code, a fundamental aspect of data science development environments. The focus on identifying 'champion codes' is directly relevant to improving the efficiency and reliability of data science tools and workflows.

Dual Language Models: Balancing Training Efficiency and Overfitting Resilience

University of Oslo

Rate paper: 👍 👎 ♥ Save

Abstract
This paper combines autoregressive and masked-diffusion training objectives without any architectural modifications, resulting in flexible language models that outperform single-objective models. Autoregressive modeling has been a popular approach, partly because of its training efficiency; however, that comes at the cost of sensitivity to overfitting. On the other hand, masked-diffusion models are less efficient to train while being more resilient to overfitting. In this work, we demonstrate that dual-objective training achieves the best of both worlds. To derive the optimal ratio between both objectives, we train and evaluate 50 language models under varying levels of data repetition. We show that it is optimal to combine both objectives under all evaluated settings and that the optimal ratio is similar whether targeting autoregressive or masked-diffusion downstream performance.

AI Insights

Scaling laws for neural language models Muon: An optimizer for hidden layers in neural networks OLMES: A standard for language model evaluations Discrete diffusion modeling by estimating the ratios of the data distribution Computational linguistics Language models Neural networks Optimization methods Scaling laws The field of computational linguistics is rapidly advancing with new techniques and tools being developed to improve language model performance. [3]

Why we are recommending this paper?
Due to your Interest in: Machine Learning Resilience

Can AI Generate more Comprehensive Test Scenarios? Review on Automated Driving Systems Test Scenario Generation Methods

Graz University of Technn

Rate paper: 👍 👎 ♥ Save

Abstract
Ensuring the safety and reliability of Automated Driving Systems (ADS) remains a critical challenge, as traditional verification methods such as large-scale on-road testing are prohibitively costly and time-consuming.To address this,scenario-based testing has emerged as a scalable and efficient alternative,yet existing surveys provide only partial coverage of recent methodological and technological advances.This review systematically analyzes 31 primary studies,and 10 surveys identified through a comprehensive search spanning 2015~2025;however,the in-depth methodological synthesis and comparative evaluation focus primarily on recent frameworks(2023~2025),reflecting the surge of Artificial Intelligent(AI)-assisted and multimodal approaches in this period.Traditional approaches rely on expert knowledge,ontologies,and naturalistic driving or accident data,while recent developments leverage generative models,including large language models,generative adversarial networks,diffusion models,and reinforcement learning frameworks,to synthesize diverse and safety-critical scenarios.Our synthesis identifies three persistent gaps:the absence of standardized evaluation metrics,limited integration of ethical and human factors,and insufficient coverage of multimodal and Operational Design Domain (ODD)-specific scenarios.To address these challenges,this review contributes a refined taxonomy that incorporates multimodal extensions,an ethical and safety checklist for responsible scenario design,and an ODD coverage map with a scenario-difficulty schema to enable transparent benchmarking.Collectively,these contributions provide methodological clarity for researchers and practical guidance for industry,supporting reproducible evaluation and accelerating the safe deployment of higher-level ADS.

AI Insights

GANs: Generative Adversarial Networks, a class of deep learning algorithms used for unsupervised learning and generating new data samples. [3]
Diffusion models: A type of generative model that gradually adds noise to data and then denoises it to generate samples. [3]
Vision Language Models (VLMs) have emerged as a crucial technology in advancing Autonomous Driving System (ADS) testing, particularly in scenario generation. [2]
Diffusion models have demonstrated exceptional capabilities in producing high-quality and diverse outputs, overcoming limitations in traditional scenario generation. [1]

Why we are recommending this paper?
Due to your Interest in: Machine Learning Testing

Aligning Academia with Industry: An Empirical Study of Industrial Needs and Academic Capabilities in AI-Driven Software Engineering

Beihang University

Rate paper: 👍 👎 ♥ Save

Abstract
The rapid advancement of large language models (LLMs) is fundamentally reshaping software engineering (SE), driving a paradigm shift in both academic research and industrial practice. While top-tier SE venues continue to show sustained or emerging focus on areas like automated testing and program repair, with researchers worldwide reporting continuous performance gains, the alignment of these academic advances with real industrial needs remains unclear. To bridge this gap, we first conduct a systematic analysis of 1,367 papers published in FSE, ASE, and ICSE between 2022 and 2025, identifying key research topics, commonly used benchmarks, industrial relevance, and open-source availability. We then carry out an empirical survey across 17 organizations, collecting 282 responses on six prominent topics, i.e., program analysis, automated testing, code generation/completion, issue resolution, pre-trained code models, and dependency management, through structured questionnaires. By contrasting academic capabilities with industrial feedback, we derive seven critical implications, highlighting under-addressed challenges in software requirements and architecture, the reliability and explainability of intelligent SE approaches, input assumptions in academic research, practical evaluation tensions, and ethical considerations. This study aims to refocus academic attention on these important yet under-explored problems and to guide future SE research toward greater industrial impact.

AI Insights

Stand-alone function-level benchmarks assess the generation of self-contained code snippets, while repository-level benchmarks examine code completion in the context of larger software projects. [3]
AI-driven software engineering: The use of artificial intelligence and machine learning techniques to improve the development, testing, and maintenance of software systems. [3]
Program analysis: The process of analyzing source code to identify errors, optimize performance, or understand the behavior of a program. [3]
Code generation and completion: The process of generating new code snippets or completing partially written code based on input from users or other sources. [3]
Despite integration with intelligent technologies, the research paradigm remains rooted in traditional techniques like static and dynamic analysis. [2]
The research field of AI-driven software engineering is highly diverse, with studies often pursuing different objectives even when using the same benchmarks. [1]

Why we are recommending this paper?
Due to your Interest in: Data Science Development Environment and Productivity

Open Source Software and Data for Human Service Development: A Case Study on Predicting Housing Instability

University at Buffalo S

Rate paper: 👍 👎 ♥ Save

Abstract
Open-source data and tools are lauded as essential for replicable and usable social science, though little is known about their use in resource constrained human service provision. This paper examines the challenges and opportunities of open-source tools and data in human service development by using both to forecast failure to pay eviction filings in Bronx County, NY. We use zip code level data from the Housing Data Coalition, the American Community Survey 5-year estimates, and DeepMaps Model of the Labor Force to forecast rates through July 2021. We employ multilevel (MLM) and exponential smoothing (ETS) models using the R project for Statistical Computing, an oft used open-source statistical software. We compare our results to what happened during the same period, to illustrate the efficacy of the open-source tools and techniques employed. We argue open-source data and software may facilitate rapid analysis of public data - a much-needed ability in human service intervention development under increasingly constrained resources - but find public data are limited by the information they reliably capture, limiting their utility by a non-trivial margin of error. The manuscript concludes by considering lessons for human service organizations with limited analytical resources and a vested interest in low-resourced communities.

AI Insights

Open-source data: Data that are publicly available and can be accessed by anyone without restrictions. [3]
The use of open source data and tools can facilitate targeted intervention development by accounting for a greater variety of demographic characteristics. [3]
Relying on multiple sources of open data can impact its veracity. [3]
The study contributes to the literature on open-source data and tools in human service research by demonstrating their use in a real-world application. [3]
The authors argue that using open source data and tools can facilitate targeted intervention development by accounting for demographic characteristics. [3]
The study used open-source data and tools to forecast eviction filings in Bronx County, NY, during the COVID-19 pandemic. [2]

Why we are recommending this paper?
Due to your Interest in: Data Science Development Environment and Productivity

Distributed inference for heterogeneous mixture models using multi-site data

University of Missouri

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Mixture models postulate the overall population as a mixture of finite subpopulations with unobserved membership. Fitting mixture models usually requires large sample sizes and combining data from multiple sites can be beneficial. However, sharing individual participant data across sites is often less feasible due to various types of practical constraints, such as data privacy concerns. Moreover, substantial heterogeneity may exist across sites, and locally identified latent classes may not be comparable across sites. We propose a unified modeling framework where a common definition of the latent classes is shared across sites and heterogeneous mixing proportions of latent classes are allowed to account for between-site heterogeneity. To fit the heterogeneous mixture model on multi-site data, we propose a novel distributed Expectation-Maximization (EM) algorithm where at each iteration a density ratio tilted surrogate Q function is constructed to approximate the standard Q function of the EM algorithm as if the data from multiple sites could be pooled together. Theoretical analysis shows that our estimator achieves the same contraction property as the estimators derived from the EM algorithm based on the pooled data.

Why we are recommending this paper?
Due to your Interest in: Online inference

XTC, A Research Platform for Optimizing AI Workload Operators

Inria

Rate paper: 👍 👎 ♥ Save

Abstract
Achieving high efficiency on AI operators demands precise control over computation and data movement. However, existing scheduling languages are locked into specific compiler ecosystems, preventing fair comparison, reuse, and evaluation across frameworks. No unified interface currently decouples scheduling specification from code generation and measurement. We introduce XTC, a platform that unifies scheduling and performance evaluation across compilers. With its common API and reproducible measurement framework, XTC enables portable experimentation and accelerates research on optimization strategies.

AI Insights

It decouples scheduling from code generation, enabling fair comparison, reproducible measurement, and rapid prototyping of optimization strategies. [3]
TVM: an automated end-to-end optimizing compiler for deep learning Ansor: generating high-performance tensor programs for deep learning Aidge: a framework for building and optimizing compilers MLIR: A compiler infrastructure for the end of Moore's law XTC is a valuable tool for researchers and developers working on compiler frameworks. [3]
XTC is a research platform for experimenting with scheduling and performance optimization across compiler frameworks. [2]

Why we are recommending this paper?
Due to your Interest in: Machine Learning Operations

Towards Fine-Tuning-Based Site Calibration for Knowledge-Guided Machine Learning: A Summary of Results

University of Minnesota

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Accurate and cost-effective quantification of the agroecosystem carbon cycle at decision-relevant scales is essential for climate mitigation and sustainable agriculture. However, both transfer learning and the exploitation of spatial variability in this field are challenging, as they involve heterogeneous data and complex cross-scale dependencies. Conventional approaches often rely on location-independent parameterizations and independent training, underutilizing transfer learning and spatial heterogeneity in the inputs, and limiting their applicability in regions with substantial variability. We propose FTBSC-KGML (Fine-Tuning-Based Site Calibration-Knowledge-Guided Machine Learning), a pretraining- and fine-tuning-based, spatial-variability-aware, and knowledge-guided machine learning framework that augments KGML-ag with a pretraining-fine-tuning process and site-specific parameters. Using a pretraining-fine-tuning process with remote-sensing GPP, climate, and soil covariates collected across multiple midwestern sites, FTBSC-KGML estimates land emissions while leveraging transfer learning and spatial heterogeneity. A key component is a spatial-heterogeneity-aware transfer-learning scheme, which is a globally pretrained model that is fine-tuned at each state or site to learn place-aware representations, thereby improving local accuracy under limited data without sacrificing interpretability. Empirically, FTBSC-KGML achieves lower validation error and greater consistency in explanatory power than a purely global model, thereby better capturing spatial variability across states. This work extends the prior SDSA-KGML framework.

Why we are recommending this paper?
Due to your Interest in: Machine Learning Deployment

Estimating Program Participation with Partial Validation

Davidson College

Rate paper: 👍 👎 ♥ Save

Abstract
This paper considers the estimation of binary choice models when survey responses are possibly misclassified but one of the response category can be validated. Partial validation may occur when survey questions about participation include follow-up questions on that particular response category. In this case, we show that the initial two-sided misclassification problem can be transformed into a one-sided one, based on the partially validated responses. Using the updated responses naively for estimation does not solve or mitigate the misclassification bias, and we derive the ensuing asymptotic bias under general conditions. We then show how the partially validated responses can be used to construct a model for participation and propose consistent and asymptotically normal estimators that overcome misclassification error. Monte Carlo simulations are provided to demonstrate the finite sample performance of the proposed and selected existing methods. We provide an empirical illustration on the determinants of health insurance coverage in Ghana. We discuss implications for the design of survey questionnaires that allow researchers to overcome misclassification biases without recourse to relatively costly and often imperfect validation data.

AI Insights

The PO MLE and PPO MLE estimators are consistent for the true parameters in the presence of covariate-dependent misclassification. [3]
Partial validation and appropriate misclassification models can address the estimation challenges posed by misclassified binary variables. [2]

Why we are recommending this paper?
Due to your Interest in: Machine Learning Validation

ModelTables: A Corpus of Tables about Models

University of Waterloo

Rate paper: 👍 👎 ♥ Save

Abstract
We present ModelTables, a benchmark of tables in Model Lakes that captures the structured semantics of performance and configuration tables often overlooked by text only retrieval. The corpus is built from Hugging Face model cards, GitHub READMEs, and referenced papers, linking each table to its surrounding model and publication context. Compared with open data lake tables, model tables are smaller yet exhibit denser inter table relationships, reflecting tightly coupled model and benchmark evolution. The current release covers over 60K models and 90K tables. To evaluate model and table relatedness, we construct a multi source ground truth using three complementary signals: (1) paper citation links, (2) explicit model card links and inheritance, and (3) shared training datasets. We present one extensive empirical use case for the benchmark which is table search. We compare canonical Data Lake search operators (unionable, joinable, keyword) and Information Retrieval baselines (dense, sparse, hybrid retrieval) on this benchmark. Union based semantic table retrieval attains 54.8 % P@1 overall (54.6 % on citation, 31.3 % on inheritance, 30.6 % on shared dataset signals); table based dense retrieval reaches 66.5 % P@1, and metadata hybrid retrieval achieves 54.1 %. This evaluation indicates clear room for developing better table search methods. By releasing ModelTables and its creation protocol, we provide the first large scale benchmark of structured data describing AI model. Our use case of table discovery in Model Lakes, provides intuition and evidence for developing more accurate semantic retrieval, structured comparison, and principled organization of structured model knowledge. Source code, data, and other artifacts have been made available at https://github.com/RJMillerLab/ModelTables.

AI Insights

The benchmark includes three types of relatedness: paper relatedness (𝑅paper), model relatedness (𝑀model), and dataset relatedness (𝑊dataset). [3]
The paper relatedness graph has a density of 8.04% after applying influence and intent filters, while the direct citation graphs have a density of around 3-8%. [3]
The Dataset relatedness is also sparse but represents a different semantics of relatedness that should be included depending on the task. [3]
𝑅paper (Paper Relatedness): The ground-truth graph connecting tables based on their association with papers. [3]
The ModelTables benchmark provides a comprehensive evaluation of table search methods on a large-scale dataset of tables related to scientific papers, models, and datasets. [2]
The ModelTables benchmark provides a comprehensive evaluation of table search methods on a large-scale dataset of tables related to scientific papers, models, and datasets. [1]

Why we are recommending this paper?
Due to your Interest in: Model Monitoring

Towards Effective Model Editing for LLM Personalization

Emory University, Amazon

Rate paper: 👍 👎 ♥ Save

Abstract
Personalization is becoming indispensable for LLMs to align with individual user preferences and needs. Yet current approaches are often computationally expensive, data-intensive, susceptible to catastrophic forgetting, and prone to performance degradation in multi-turn interactions or when handling implicit queries. To address these challenges, we conceptualize personalization as a model editing task and introduce Personalization Editing, a framework that applies localized edits guided by clustered preference representations. This design enables precise preference-aligned updates while preserving overall model capabilities. In addition, existing personalization benchmarks frequently rely on persona-based dialogs between LLMs rather than user-LLM interactions, or focus primarily on stylistic imitation while neglecting information-seeking tasks that require accurate recall of user-specific preferences. We introduce User Preference Question Answering (UPQA), a short-answer QA dataset constructed from in-situ user queries with varying levels of difficulty. Unlike prior benchmarks, UPQA directly evaluates a model's ability to recall and apply specific user preferences. Across experimental settings, Personalization Editing achieves higher editing accuracy and greater computational efficiency than fine-tuning, while outperforming prompting-based baselines in multi-turn conversations and implicit preference questions settings.

AI Insights

The paper presents a novel approach to personalization, focusing on the creation of fine-grained persona attributes and their corresponding preference representations. [3]
The evaluation results are based on a limited dataset (200-sample subset of UPQA) and may not generalize well to larger datasets. [3]
The proposed method, UPQA, involves generating structured person-alization data from raw text inputs, which can be used for various downstream applications such as conversational AI systems. [2]
UPQA: Unified Persona-based Question Answering LoRA: Low-Rank Adaptation GRACE: Generating Relevant Attributes for Conversational Editing The proposed UPQA method demonstrates the potential to improve conversational AI systems by providing fine-grained persona attributes and their corresponding preference representations. [1]

Why we are recommending this paper?
Due to your Interest in: Model Monitoring

Machines, AI and the past//future of things

University of the Arts

Rate paper: 👍 👎 ♥ Save

Abstract
This essay explores a techno-artistic experiment that reanimates a 1980s East German typewriter using a contemporary AI language model. Situated at the intersection of media archaeology and speculative design, the project questions dominant narratives of progress by embedding generative AI in an obsolete, tactile interface. Through public exhibitions and aesthetic intervention, we demonstrate how slowness, friction, and material render artificial intelligence not only visible but open to critical inquiry. Drawing on concepts such as zombie media, technostalgia, and speculative design, we argue that reappropriating outdated technologies enables new forms of critical engagement. Erika - the AI-enabled typewriter - functions as both interface and interruption, making space for reflection, irony, and cultural memory. In a moment of accelerated digital abstraction, projects like this foreground the value of deliberate slowness, experiential materiality, and historical depth. We conclude by advocating for a historicist design sensibility that challenges presentism and reorients human-machine interaction toward alternative, perceived futures.

AI Insights

The article discusses a project called Erika that embeds AI in an obsolete device, reframing it as a conversation rather than a tool. [3]
Erika's materiality and historical context evoke histories of control, collectivity, and latency, making it a unique interface for interacting with AI. [3]
The project challenges the trajectory of AI becoming imperceptible and opaque by making visible what has become hidden. [3]
Technostalgia: a nostalgic longing for past technologies, often used to critique the present and imagine alternative futures. [3]
Material friction: the idea that material objects can deepen engagement and foster critical awareness by introducing obstacles or challenges in interaction design. [3]
The next decade of 'things' will not be defined by novelty, but by recognition, with interfaces that slow us down and demand listening. [3]
Technostalgia as critique is presented as an active disruption that reframes AI as contested terrain where form matters and history lingers. [2]

Why we are recommending this paper?
Due to your Interest in: Machine Learning Lifecycle

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

MLOps
Fault tolerance

You can edit or add more interests any time.

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback