Hi!

Your personalized paper recommendations for 05 to 09 January, 2026.
Northeastern University
Abstract
We study the problem of personalization in large language models (LLMs). Prior work predominantly represents user preferences as implicit, model-specific vectors or parameters, yielding opaque ``black-box'' profiles that are difficult to interpret and transfer across models and tasks. In contrast, we advocate natural language as a universal, model- and task-agnostic interface for preference representation. The formulation leads to interpretable and reusable preference descriptions, while naturally supporting continual evolution as new interactions are observed. To learn such representations, we introduce a two-stage training framework that combines supervised fine-tuning on high-quality synthesized data with reinforcement learning to optimize long-term utility and cross-task transferability. Based on this framework, we develop AlignXplore+, a universal preference reasoning model that generates textual preference summaries. Experiments on nine benchmarks show that our 8B model achieves state-of-the-art performanc -- outperforming substantially larger open-source models -- while exhibiting strong transferability across tasks, model families, and interaction formats.
Why we are recommending this paper?
Due to your Interest in Personalization

This paper explores the crucial challenge of transferring personalization across different LLMs, aligning directly with your interest in personalization platforms and data-driven CRM approaches. The focus on interpretable profiles is highly relevant to your interest in understanding and leveraging personalization techniques.
University of Wrzburg
Abstract
Large language models (LLMs) have achieved notable performance in code synthesis; however, data-aware augmentation remains a limiting factor, handled via heuristic design or brute-force approaches. We introduce a performance-aware, closed-loop solution in the NNGPT ecosystem of projects that enables LLMs to autonomously engineer optimal transformations by internalizing empirical performance cues. We fine-tune LLMs with Low-Rank Adaptation on a novel repository of more than 6,000 empirically evaluated PyTorch augmentation functions, each annotated solely by downstream model accuracy. Training uses pairwise performance ordering (better-worse transformations), enabling alignment through empirical feedback without reinforcement learning, reward models, or symbolic objectives. This reduces the need for exhaustive search, achieving up to 600x times fewer evaluated candidates than brute-force discovery while maintaining competitive peak accuracy and shifting generation from random synthesis to task-aligned design. Ablation studies show that structured Chain-of-Thought prompting introduces syntactic noise and degrades performance, whereas direct prompting ensures stable optimization in performance-critical code tasks. Qualitative and quantitative analyses demonstrate that the model internalizes semantic performance cues rather than memorizing syntax. These results show that LLMs can exhibit task-level reasoning through non-textual feedback loops, bypassing explicit symbolic rewards.
Why we are recommending this paper?
Due to your Interest in Data Driven CRM

Given your interest in MLOps and leveraging LLMs, this paper’s approach to performance-aware data transformation offers a valuable perspective on optimizing LLM workflows. The use of LLMs for data augmentation is a key area of interest within your domain.
Meta
Abstract
Recent years have witnessed success of sequential modeling, generative recommender, and large language model for recommendation. Though the scaling law has been validated for sequential models, it showed inefficiency in computational capacity when considering real-world applications like recommendation, due to the non-linear(quadratic) increasing nature of the transformer model. To improve the efficiency of the sequential model, we introduced a novel approach to sequential recommendation that leverages personalization techniques to enhance efficiency and performance. Our method compresses long user interaction histories into learnable tokens, which are then combined with recent interactions to generate recommendations. This approach significantly reduces computational costs while maintaining high recommendation accuracy. Our method could be applied to existing transformer based recommendation models, e.g., HSTU and HLLM. Extensive experiments on multiple sequential models demonstrate its versatility and effectiveness. Source code is available at \href{https://github.com/facebookresearch/PerSRec}{https://github.com/facebookresearch/PerSRec}.
Why we are recommending this paper?
Due to your Interest in Personalization

This paper tackles the challenge of sequential recommendation, a core component of your interest in data-driven CRM and personalization platforms. The exploration of scaling laws and computational efficiency is particularly relevant to your focus on practical applications.
Santa Clara University
Abstract
Automatic prompt optimization reduces manual prompt engineering, but relies on task performance measured on a small, often randomly sampled evaluation subset as its main source of feedback signal. Despite this, how to select that evaluation subset is usually treated as an implementation detail. We study evaluation subset selection for prompt optimization from a principled perspective and propose SESS, a submodular evaluation subset selection method. We frame selection as maximizing an objective set function and show that, under mild conditions, it is monotone and submodular, enabling greedy selection with theoretical guarantees. Across GSM8K, MATH, and GPQA-Diamond, submodularly selected evaluation subsets can yield better optimized prompts than random or heuristic baselines.
Why we are recommending this paper?
Due to your Interest in CRM Optimization

This work directly addresses prompt optimization, a key area within automatic prompt engineering and personalization. The focus on evaluation subset selection is a critical component of building effective personalization systems.
Leibniz University Hannover
Paper visualization
Rate image: πŸ‘ πŸ‘Ž
Abstract
Choosing a suitable ML model is a complex task that can depend on several objectives, e.g., accuracy, model size, fairness, inference time, or energy consumption. In practice, this requires trading off multiple, often competing, objectives through multi-objective optimization (MOO). However, existing MOO methods typically treat all hyperparameters as equally important, overlooking that hyperparameter importance (HPI) can vary significantly depending on the trade-off between objectives. We propose a novel dynamic optimization approach that prioritizes the most influential hyperparameters based on varying objective trade-offs during the search process, which accelerates empirical convergence and leads to better solutions. Building on prior work on HPI for MOO post-analysis, we now integrate HPI, calculated with HyperSHAP, into the optimization. For this, we leverage the objective weightings naturally produced by the MOO algorithm ParEGO and adapt the configuration space by fixing the unimportant hyperparameters, allowing the search to focus on the important ones. Eventually, we validate our method with diverse tasks from PyMOO and YAHPO-Gym. Empirical results demonstrate improvements in convergence speed and Pareto front quality compared to baselines.
Why we are recommending this paper?
Due to your Interest in CRM Optimization

This paper’s exploration of multi-objective optimization aligns with your interest in personalized optimization strategies. The focus on efficient model selection is a valuable contribution to the broader field of MLOps and personalization.
Hasso Plattner Institute for Digital Engineering, University of Potsdam
Abstract
Large foundation models (LFMs) transform healthcare AI in prevention, diagnostics, and treatment. However, whether LFMs can provide truly personalized treatment recommendations remains an open question. Recent research has revealed multiple challenges for personalization, including the fundamental generalizability paradox: models achieving high accuracy in one clinical study perform at chance level in others, demonstrating that personalization and external validity exist in tension. This exemplifies broader contradictions in AI-driven healthcare: the privacy-performance paradox, scale-specificity paradox, and the automation-empathy paradox. As another challenge, the degree of causal understanding required for personalized recommendations, as opposed to mere predictive capacities of LFMs, remains an open question. N-of-1 trials -- crossover self-experiments and the gold standard for individual causal inference in personalized medicine -- resolve these tensions by providing within-person causal evidence while preserving privacy through local experimentation. Despite their impressive capabilities, this paper argues that LFMs cannot replace N-of-1 trials. We argue that LFMs and N-of-1 trials are complementary: LFMs excel at rapid hypothesis generation from population patterns using multimodal data, while N-of-1 trials excel at causal validation for a given individual. We propose a hybrid framework that combines the strengths of both to enable personalization and navigate the identified paradoxes: LFMs generate ranked intervention candidates with uncertainty estimates, which trigger subsequent N-of-1 trials. Clarifying the boundary between prediction and causation and explicitly addressing the paradoxical tensions are essential for responsible AI integration in personalized medicine.
Why we are recommending this paper?
Due to your Interest in Personalization Platform
Rochester Institute of Technology
Paper visualization
Rate image: πŸ‘ πŸ‘Ž
AI Insights
  • Phishing detection systems can benefit from human-machine collaboration to enhance detection strategies and improve interpretability. [3]
  • The study highlights the complementary strengths of humans and interpretable models in phishing detection, emphasizing the importance of cognitive reasoning patterns. [3]
  • Interpretable models: Models that provide transparent and understandable explanations for their predictions or decisions, such as decision trees or linear regression. [3]
  • The study demonstrates the value of human-machine collaboration in phishing detection, highlighting opportunities for improving detection strategies and interpretability. [3]
  • Future research should focus on integrating larger, more diverse datasets, comparing human reasoning with state-of-the-art architectures, and incorporating behavioral confidence measures to better understand decision certainty. [3]
  • The study focused on interpretable models rather than state-of-the-art deep learning architectures, which may underrepresent predictive accuracy. [3]
  • The human annotation dataset was small and synthetic, comprising 20 emails with balanced phishing and legitimate examples. [3]
  • Machine learning: A subfield of artificial intelligence that involves training algorithms to learn from data and make predictions or decisions. [2]
  • Future research directions include integrating larger, more diverse, and real-world phishing datasets, comparing human reasoning with state-of-the-art transformer architectures, and incorporating behavioral confidence measures. [1]
Abstract
Identifying deceptive content like phishing emails demands sophisticated cognitive processes that combine pattern recognition, confidence assessment, and contextual analysis. This research examines how human cognition and machine learning models work together to distinguish phishing emails from legitimate ones. We employed three interpretable algorithms Logistic Regression, Decision Trees, and Random Forests training them on both TF-IDF features and semantic embeddings, then compared their predictions against human evaluations that captured confidence ratings and linguistic observations. Our results show that machine learning models provide good accuracy rates, but their confidence levels vary significantly. Human evaluators, on the other hand, use a greater variety of language signs and retain more consistent confidence. We also found that while language proficiency has minimal effect on detection performance, aging does. These findings offer helpful direction for creating transparent AI systems that complement human cognitive functions, ultimately improving human-AI cooperation in challenging content analysis tasks.
Why we are recommending this paper?
Due to your Interest in Email Marketing

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • MLOps
You can edit or add more interests any time.