Personalization Platform

Towards On-Device Personalization: Cloud-device Collaborative Data Augmentation for Efficient On-device Language Model

Abstract
With the advancement of large language models (LLMs), significant progress has been achieved in various Natural Language Processing (NLP) tasks. However, existing LLMs still face two major challenges that hinder their broader adoption: (1) their responses tend to be generic and lack personalization tailored to individual users, and (2) they rely heavily on cloud infrastructure due to intensive computational requirements, leading to stable network dependency and response delay. Recent research has predominantly focused on either developing cloud-based personalized LLMs or exploring the on-device deployment of general-purpose LLMs. However, few studies have addressed both limitations simultaneously by investigating personalized on-device language models. To bridge this gap, we propose CDCDA-PLM, a framework for deploying personalized on-device language models on user devices with support from a powerful cloud-based LLM. Specifically, CDCDA-PLM leverages the server-side LLM's strong generalization capabilities to augment users' limited personal data, mitigating the issue of data scarcity. Using both real and synthetic data, A personalized on-device language models (LMs) is fine-tuned via parameter-efficient fine-tuning (PEFT) modules and deployed on users' local devices, enabling them to process queries without depending on cloud-based LLMs. This approach eliminates reliance on network stability and ensures high response speeds. Experimental results across six tasks in a widely used personalization benchmark demonstrate the effectiveness of CDCDA-PLM.

August 29, 2025

♥Save to Reading List

A Survey of Affective Recommender Systems: Modeling Attitudes, Emotions, and Moods for Personalization

Tonmoy Hasan

Abstract
Affective Recommender Systems are an emerging class of intelligent systems that aim to enhance personalization by aligning recommendations with users' affective states. Reflecting a growing interest, a number of surveys have been published in this area, however they lack an organizing taxonomy grounded in psychology and they often study only specific types of affective states or application domains. This survey addresses these limitations by providing a comprehensive, systematic review of affective recommender systems across diverse domains. Drawing from Scherer's typology of affective states, we introduce a classification scheme that organizes systems into four main categories: attitude aware, emotion aware, mood aware, and hybrid. We further document affective signal extraction techniques, system architectures, and application areas, highlighting key trends, limitations, and open challenges. As future research directions, we emphasize hybrid models that leverage multiple types of affective states across different modalities, the development of large-scale affect-aware datasets, and the need to replace the folk vocabulary of affective states with a more precise terminology grounded in cognitive and social psychology. Through its systematic review of existing research and challenges, this survey aims to serve as a comprehensive reference and a useful guide for advancing academic research and industry applications in affect-driven personalization.

August 27, 2025

♥Save to Reading List

Data Driven CRM

Enabling Content Management Systems as an Information Source in Model-driven Projects

Internet Interdisciplinary Institute (IN3), Universitat Oberta de Catalunya (UOC), Barcelona, Spain

Abstract
Content Management Systems (CMSs) are the most popular tool when it comes to create and publish content across the web. Recently, CMSs have evolved, becoming \emph{headless}. Content served by a \emph{headless CMS} aims to be consumed by other applications and services through REST APIs rather than by human users through a web browser. This evolution has enabled CMSs to become a notorious source of content to be used in a variety of contexts beyond pure web navigation. As such, CMS have become an important component of many information systems. Unfortunately, we still lack the tools to properly discover and manage the information stored in a CMS, often highly customized to the needs of a specific domain. Currently, this is mostly a time-consuming and error-prone manual process. In this paper, we propose a model-based framework to facilitate the integration of headless CMSs in software development processes. Our framework is able to discover and explicitly represent the information schema behind the CMS. This facilitates designing the interaction between the CMS model and other components consuming that information. These interactions are then generated as part of a middleware library that offers platform-agnostic access to the CMS to all the client applications. The complete framework is open-source and available online.

August 27, 2025

♥Save to Reading List

The AI Data Scientist

Department of Machine Learning, MBZUAI, Abu Dhabi, UAE

Abstract
Imagine decision-makers uploading data and, within minutes, receiving clear, actionable insights delivered straight to their fingertips. That is the promise of the AI Data Scientist, an autonomous Agent powered by large language models (LLMs) that closes the gap between evidence and action. Rather than simply writing code or responding to prompts, it reasons through questions, tests ideas, and delivers end-to-end insights at a pace far beyond traditional workflows. Guided by the scientific tenet of the hypothesis, this Agent uncovers explanatory patterns in data, evaluates their statistical significance, and uses them to inform predictive modeling. It then translates these results into recommendations that are both rigorous and accessible. At the core of the AI Data Scientist is a team of specialized LLM Subagents, each responsible for a distinct task such as data cleaning, statistical testing, validation, and plain-language communication. These Subagents write their own code, reason about causality, and identify when additional data is needed to support sound conclusions. Together, they achieve in minutes what might otherwise take days or weeks, enabling a new kind of interaction that makes deep data science both accessible and actionable.

August 25, 2025

♥Save to Reading List

CRM Optimization

Toward real-time optimization through model reduction and model discrepancy sensitivities

Abstract
Optimization problems arise in a range of scenarios, from optimal control to model parameter estimation. In many applications, such as the development of digital twins, it is essential to solve these optimization problems within wall-clock-time limitations. However, this is often unattainable for complex systems, such as those modeled by nonlinear partial differential equations. One strategy for mitigating this issue is to construct a reduced-order model (ROM) that enables more rapid optimization. In particular, the use of nonintrusive ROMs -- those that do not require access to the full-order model at evaluation time -- is popular because they facilitate optimization solutions can be computed within the wall-clock-time requirements. However, the optimization solution will be unreliable if the iterates move outside the ROM training data. This article proposes the use of hyper-differential sensitivity analysis with respect to model discrepancy (HDSA-MD) as a computationally efficient tool to augment ROM-constrained optimization and improve its reliability. The proposed approach consists of two phases: (i) an offline phase where several full-order model evaluations are computed to train the ROM, and (ii) an online phase where a ROM-constrained optimization problem is solved, $N=\mathcal{O}(1)$ full-order model evaluations are computed, and HDSA-MD is used to enhance the optimization solution using the full-order model data. Numerical results are demonstrated for two examples, atmospheric contaminant control and wildfire ignition location estimation, in which a ROM is trained offline using inaccurate atmospheric data. The HDSA-MD update yields a significant improvement in the ROM-constrained optimization solution using only one full-order model evaluation online with corrected atmospheric data.

August 29, 2025

♥Save to Reading List

Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning

Abstract
Supervised Fine-Tuning (SFT) Large Language Models (LLM) fundamentally rely on high-quality training data. While data selection and data synthesis are two common strategies to improve data quality, existing approaches often face limitations in static dataset curation that fail to adapt to evolving model capabilities. In this paper, we introduce Middo, a self-evolving Model-informed dynamic data optimization framework that uses model-aware data selection and context-preserving data refinement. Unlike conventional one-off filtering/synthesis methods, our framework establishes a closed-loop optimization system: (1) A self-referential diagnostic module proactively identifies suboptimal samples through tri-axial model signals - loss patterns (complexity), embedding cluster dynamics (diversity), and self-alignment scores (quality); (2) An adaptive optimization engine then transforms suboptimal samples into pedagogically valuable training points while preserving semantic integrity; (3) This optimization process continuously evolves with model capability through dynamic learning principles. Experiments on multiple benchmarks demonstrate that our \method consistently enhances the quality of seed data and boosts LLM's performance with improving accuracy by 7.15% on average while maintaining the original dataset scale. This work establishes a new paradigm for sustainable LLM training through dynamic human-AI co-evolution of data and models. Our datasets, models, and code are coming soon.

August 29, 2025

♥Save to Reading List

Email Marketing

Using item recommendations and LLMs in marketing email titles

Mercari Inc.

Abstract
E-commerce marketplaces make use of a number of marketing channels like emails, push notifications, etc. to reach their users and stimulate purchases. Personalized emails especially are a popular touch point for marketers to inform users of latest items in stock, especially for those who stopped visiting the marketplace. Such emails contain personalized recommendations tailored to each user's interests, enticing users to buy relevant items. A common limitation of these emails is that the primary entry point, the title of the email, tends to follow fixed templates, failing to inspire enough interest in the contents. In this work, we explore the potential of large language models (LLMs) for generating thematic titles that reflect the personalized content of the emails. We perform offline simulations and conduct online experiments on the order of millions of users, finding our techniques useful in improving the engagement between customers and our emails. We highlight key findings and learnings as we productionize the safe and automated generation of email titles for millions of users.

August 27, 2025

♥Save to Reading List

Interests not found

Help us improve your experience!