Papers from 13 to 17 October, 2025

Here are the personalized paper recommendations sorted by most relevant
Personalization
👍 👎 ♥ Save
MBZUAI2ByteDanceNational
Abstract
Large language models (LLMs) have grown more powerful in language generation, producing fluent text and even imitating personal style. Yet, this ability also heightens the risk of identity impersonation. To the best of our knowledge, no prior work has examined personalized machine-generated text (MGT) detection. In this paper, we introduce \dataset, the first benchmark for evaluating detector robustness in personalized settings, built from literary and blog texts paired with their LLM-generated imitations. Our experimental results demonstrate large performance gaps across detectors in personalized settings: some state-of-the-art models suffer significant drops. We attribute this limitation to the \textit{feature-inversion trap}, where features that are discriminative in general domains become inverted and misleading when applied to personalized text. Based on this finding, we propose \method, a simple and reliable way to predict detector performance changes in personalized settings. \method identifies latent directions corresponding to inverted features and constructs probe datasets that differ primarily along these features to evaluate detector dependence. Our experiments show that \method can accurately predict both the direction and the magnitude of post-transfer changes, showing 85\% correlation with the actual performance gaps. We hope that this work will encourage further research on personalized text detection.
AI Insights
  • The benchmark draws from classic Jane Austen and Bernard Shaw excerpts, adding a nostalgic twist to AI research.
  • Long, clause‑laden sentences test detectors’ parsing of complex syntax, a nuance absent from the abstract.
  • The literature review highlights 19th‑century social critique, focusing on marriage and gender roles.
  • Revisiting Pride and Prejudice and Mrs. Warren’s Profession helps grasp the stylistic cues detectors face.
  • A weakness: excerpts may lack broader context, potentially biasing detector evaluation.
  • Inverted features flip formal language meaning, turning subtle cues into misleading signals.
  • Probe datasets along latent feature directions show how tiny stylistic shifts can swing detection accuracy.
👍 👎 ♥ Save
The University of HongKo
Abstract
Training student models on synthetic data generated by strong teacher models is a promising way to distilling the capabilities of teachers. However, recent studies show that stronger models are not always optimal teachers, revealing a mismatch between teacher outputs and student learnability. To address this issue, we propose PerSyn (Personalized data Synthesis), a novel synthesis strategy that operates under a new ``Route then Generate'' paradigm to create data tailored to each student model, enabling it to learn more effectively. Specifically, PerSyn first assigns each prompt to its optimal teacher via a query-level router that jointly considers student learnability and teacher response quality. Each teacher then synthesizes data only for its assigned prompts, making the process more efficient than the conventional ``Generate then Select'' paradigm, where all teachers must generate parallel responses for the entire prompt set before constructing the final dataset. Extensive experiments across different model families and scales demonstrate that PerSyn consistently achieves superior or comparable performance to all baselines in instruct tuning and math reasoning settings. Further analysis verifies the effectiveness of PerSyn and offers extra insights to propel future research.
AI Insights
  • PerSyn’s gains rise with reward‑model strength; Skywork‑Reward‑V2‑Llama‑3.1‑8B beats the 3B version.
  • Even with a weak reward model, PerSyn outperforms the strong CAR baseline.
  • Baselines Strong, Mix, Family‑Strong, and CAR trail PerSyn in instruction‑tuning and math reasoning.
  • Teacher‑assignment tables show Strong and Family‑Strong favor large models; CAR picks a single average teacher.
  • The router jointly optimizes student learnability and teacher response quality—an innovation beyond the abstract.
  • Suggested papers: Letzelter et al. (2025), Zhang et al. (2025), Ponkshe et al. (2025), Li et al. (2025b), Xu et al. (2025b); Magpie‑Zoo (Xu et al., 2025b) is a key dataset.
Data Driven CRM
👍 👎 ♥ Save
Beijing Institute of Tech
Paper visualization
Rate this image: 😍 👍 👎
Abstract
A growing trend in modern data analysis is the integration of data management with learning, guided by accuracy, latency, and cost requirements. In practice, applications draw data of different formats from many sources. In the meanwhile, the objectives and budgets change over time. Existing systems handle these applications across databases, analysis libraries, and tuning services. Such fragmentation leads to complex user interaction, limited adaptability, suboptimal performance, and poor extensibility across components. To address these challenges, we present Aixel, a unified, adaptive, and extensible system for AI-powered data analysis. The system organizes work across four layers: application, task, model, and data. The task layer provides a declarative interface to capture user intent, which is parsed into an executable operator plan. An optimizer compiles and schedules this plan to meet specified goals in accuracy, latency, and cost. The task layer coordinates the execution of data and model operators, with built-in support for reuse and caching to improve efficiency. The model layer offers versioned storage for index, metadata, tensors, and model artifacts. It supports adaptive construction, task-aligned drift detection, and safe updates that reuse shared components. The data layer provides unified data management capabilities, including indexing, constraint-aware discovery, task-aligned selection, and comprehensive feature management. With the above designed layers, Aixel delivers a user friendly, adaptive, efficient, and extensible system.
👍 👎 ♥ Save
Abstract
In Machine Learning (ML), a regression algorithm aims to minimize a loss function based on data. An assessment method in this context seeks to quantify the discrepancy between the optimal response for an input-output system and the estimate produced by a learned predictive model (the student). Evaluating the quality of a learned regressor remains challenging without access to the true data-generating mechanism, as no data-driven assessment method can ensure the achievability of global optimality. This work introduces the Information Teacher, a novel data-driven framework for evaluating regression algorithms with formal performance guarantees to assess global optimality. Our novel approach builds on estimating the Shannon mutual information (MI) between the input variables and the residuals and applies to a broad class of additive noise models. Through numerical experiments, we confirm that the Information Teacher is capable of detecting global optimality, which is aligned with the condition of zero estimation error with respect to the -- inaccessible, in practice -- true model, working as a surrogate measure of the ground truth assessment loss and offering a principled alternative to conventional empirical performance metrics.
CRM Optimization
👍 👎 ♥ Save
University of Maryland, F
Abstract
Test-time scaling has enabled Large Language Models (LLMs) with remarkable reasoning capabilities, particularly in mathematical domains, through intermediate chain-of-thought (CoT) reasoning before generating final answers. However, the specific sources and mechanisms underlying these reasoning capabilities remain insufficiently understood. Optimization reasoning, i.e. finding extrema under constraints, represents a fundamental abstraction that underpins critical applications in planning, control, resource allocation, and prompt search. To systematically evaluate this capability, we introduce ExtremBench, a benchmark dataset for solving mathematical extremal problems, curated from inequality exercises used for Chinese Mathematical Olympiad and transformed into $93$ standardized extrema-finding problems. We conduct extensive evaluations across various state-of-the-art open-source model families, including the Qwen3, GPT-OSS, and DeepSeek. Our results reveal that LLMs' extremal-solving reasoning capabilities do not always align with those of current mathematical benchmarks such as AIME25 and MATH-500, with some models showing strong general mathematical reasoning but poor extremal-solving skills, and vice versa. This discrepancy highlights a critical gap in current evaluation practices and suggests that existing benchmarks may not comprehensively capture the full spectrum of mathematical reasoning abilities.
AI Insights
  • ExtremBench’s 93 problems span inequalities, calculus, and linear‑algebraic constraints, richer than typical Olympiad tasks.
  • Models strong on AIME or MATH‑500 often fail on ExtremBench, exposing a hidden benchmark gap.
  • Self‑consistency, which boosts chain‑of‑thought accuracy, could bridge the extremal‑reasoning divide.
  • ExtremBench can be coded in Python or MATLAB, yet the authors omitted code, inviting community replication.
  • Any transformer—Qwen3, GPT‑OSS, DeepSeek—can solve extremal problems with proper reasoning prompts.
  • Heavy training data can overfit models, especially when confronting unseen optimization constraints.
  • Spivak’s Calculus and Strang’s Linear Algebra illuminate the math behind ExtremBench’s challenges.
👍 👎 ♥ Save
University of Southern,US
Abstract
Simulation-based learning has enabled policies for precise, contact-rich tasks (e.g., robotic assembly) to reach high success rates (~80%) under high levels of observation noise and control error. Although such performance may be sufficient for research applications, it falls short of industry standards and makes policy chaining exceptionally brittle. A key limitation is the high variance in individual policy performance across diverse initial conditions. We introduce Refinery, an effective framework that bridges this performance gap, robustifying policy performance across initial conditions. We propose Bayesian Optimization-guided fine-tuning to improve individual policies, and Gaussian Mixture Model-based sampling during deployment to select initializations that maximize execution success. Using Refinery, we improve mean success rates by 10.98% over state-of-the-art methods in simulation-based learning for robotic assembly, reaching 91.51% in simulation and comparable performance in the real world. Furthermore, we demonstrate that these fine-tuned policies can be chained to accomplish long-horizon, multi-part assembly$\unicode{x2013}$successfully assembling up to 8 parts without requiring explicit multi-step training.
AI Insights
  • The authors survey robotic assembly literature, pinpointing robustness gaps.
  • They cite key works such as Residual RL and Offline Meta‑RL for industrial insertion.
  • The paper gives concise definitions of reinforcement learning and Bayesian optimization.
  • Deployment‑time optimization selects favorable initializations via Gaussian mixture sampling.
  • Fine‑tuned policies can be chained without extra training, assembling up to eight parts.
  • The study highlights assumptions about prior knowledge and the need for broader validation.
  • Authors note the framework’s theoretical focus limits immediate applicability to varied real‑world tasks.

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • MLOps
  • Email Marketing
  • Personalization Platform
You can edit or add more interests any time.

Unsubscribe from these updates