Email Marketing

Send to which account? Evaluation of an LLM-based Scambaiting System

Cybera Global Inc UNCW

Abstract
Scammers are increasingly harnessing generative AI(GenAI) technologies to produce convincing phishing content at scale, amplifying financial fraud and undermining public trust. While conventional defenses, such as detection algorithms, user training, and reactive takedown efforts remain important, they often fall short in dismantling the infrastructure scammers depend on, including mule bank accounts and cryptocurrency wallets. To bridge this gap, a proactive and emerging strategy involves using conversational honeypots to engage scammers and extract actionable threat intelligence. This paper presents the first large-scale, real-world evaluation of a scambaiting system powered by large language models (LLMs). Over a five-month deployment, the system initiated over 2,600 engagements with actual scammers, resulting in a dataset of more than 18,700 messages. It achieved an Information Disclosure Rate (IDR) of approximately 32%, successfully extracting sensitive financial information such as mule accounts. Additionally, the system maintained a Human Acceptance Rate (HAR) of around 70%, indicating strong alignment between LLM-generated responses and human operator preferences. Alongside these successes, our analysis reveals key operational challenges. In particular, the system struggled with engagement takeoff: only 48.7% of scammers responded to the initial seed message sent by defenders. These findings highlight the need for further refinement and provide actionable insights for advancing the design of automated scambaiting systems.

AI Insights

Human reviewers cut the time to pull mule account data, beating fully automated threads.
Message latency proved a useful signal, letting defenders focus on highly responsive scammers.
Mapping dialogue state changes revealed strong links to scammer escalation behavior.
A hybrid LLM‑human‑in‑the‑loop setup outperformed either component alone in IDR.
Core terms: LLM – Large Language Model; Scambaiting – active‑defense to waste scammers; Human‑in‑the‑loop – human‑guided decisions.
Read “Into the Gray Zone” for deeper insight into private‑sector active defense.
See “Puppeteer” and “ScamGPT‑J” for cutting‑edge LLM scambaiting research.

Personalization

👍 👎 ♥ Save

Cost-Free Personalization via Information-Geometric Projection in Bayesian Federated Learning

Abstract
Bayesian Federated Learning (BFL) combines uncertainty modeling with decentralized training, enabling the development of personalized and reliable models under data heterogeneity and privacy constraints. Existing approaches typically rely on Markov Chain Monte Carlo (MCMC) sampling or variational inference, often incorporating personalization mechanisms to better adapt to local data distributions. In this work, we propose an information-geometric projection framework for personalization in parametric BFL. By projecting the global model onto a neighborhood of the user's local model, our method enables a tunable trade-off between global generalization and local specialization. Under mild assumptions, we show that this projection step is equivalent to computing a barycenter on the statistical manifold, allowing us to derive closed-form solutions and achieve cost-free personalization. We apply the proposed approach to a variational learning setup using the Improved Variational Online Newton (IVON) optimizer and extend its application to general aggregation schemes in BFL. Empirical evaluations under heterogeneous data distributions confirm that our method effectively balances global and local performance with minimal computational overhead.

👍 👎 ♥ Save

A Contextual Bandits Approach for Personalization of Hand Gesture Recognition

Reality Labs, Meta

Abstract
In human-computer interaction applications like hand gesture recognition, supervised learning models are often trained on a large population of users to achieve high task accuracy. However, due to individual variability in sensor signals and user behavior, static models may not provide optimal performance for all users. Personalizing pretrained models via calibration--collecting labeled data from each user--can improve performance but introduces user friction and struggles with limited data. To overcome these issues, we propose a calibrationless longitudinal personalization method: a contextual multi-arm bandit (MAB) algorithm combined with a pretrained neural network for gesture recognition. This reinforcement-learning-style approach enables personalization using binary reward signals, either user-provided or inferred by the system. We validated this method in a user study. Participants wore a surface electromyography (sEMG) device and played multiple rounds of a 2-D navigation game using six hand gestures. In the session, they completed a baseline round and then a round with our algorithm; in the second session, they played another round with our algorithm. Our approach led to a significant reduction in users' average false negative rate by 0.113 from the initial to the final round, with further decreases between sessions. Average precision also trended upward (by 0.139) from the start to end of a round, continuing in the next session. Notably, some users who could not complete the game with the baseline model succeeded with our contextual MAB model. In summary, our

AI Insights

The algorithm casts each gesture as a bandit arm, updating its policy online with binary success/failure rewards.
It fuses implicit system confidence and explicit user clicks, enabling calibration‑free personalization.
Across two sessions, false negatives fell (p = 0.002) while precision rose by 0.139 per round.
Users who failed the baseline game reached 100 % success after one learning round, rescuing edge‑case performers.
The sEMG‑based 2‑D navigation task shows bandits adapt to highly variable muscle signals across individuals.
A limitation is the linear reward‑feature assumption, which may miss complex gesture dynamics.

Personalization Platform

👍 👎 ♥ Save

Federated Recommender System with Data Valuation for E-commerce Platform

Rate this image: 😍 👍 👎

Abstract
Federated Learning (FL) is gaining prominence in machine learning as privacy concerns grow. This paradigm allows each client (e.g., an individual online store) to train a recommendation model locally while sharing only model updates, without exposing the raw interaction logs to a central server, thereby preserving privacy in a decentralized environment. Nonetheless, most existing FL-based recommender systems still rely solely on each client's private data, despite the abundance of publicly available datasets that could be leveraged to enrich local training; this potential remains largely underexplored. To this end, we consider a realistic scenario wherein a large shopping platform collaborates with multiple small online stores to build a global recommender system. The platform possesses global data, such as shareable user and item lists, while each store holds a portion of interaction data privately (or locally). Although integrating global data can help mitigate the limitations of sparse and biased clients' local data, it also introduces additional challenges: simply combining all global interactions can amplify noise and irrelevant patterns, worsening personalization and increasing computational costs. To address these challenges, we propose FedGDVE, which selectively augments each client's local graph with semantically aligned samples from the global dataset. FedGDVE employs: (i) a pre-trained graph encoder to extract global structural features, (ii) a local valid predictor to assess client-specific relevance, (iii) a reinforcement-learning-based probability estimator to filter and sample only the most pertinent global interactions. FedGDVE improves performance by up to 34.86% on recognized benchmarks in FL environments.

Data Driven CRM

👍 👎 ♥ Save

Changing the Paradigm from Dynamic Queries to LLM-generated SQL Queries with Human Intervention

Inria, SNUSungbok ShinIn

Rate this image: 😍 👍 👎

Abstract
We propose leveraging Large Language Models (LLMs) as an interaction layer for medical visualization systems. In domains like healthcare, where users must navigate high-dimensional, coded, and heterogeneous datasets, LLM-generated queries enable expert medical users to express complex analytical intents in natural language. These intents are then translated into editable and executable queries, replacing the dynamic query interfaces used by traditional visualization systems built around sliders, check boxes, and drop-downs. This interaction model reduces visual clutter and eliminates the need for users to memorize field names or system codes, supporting fluid exploration, with the drawback of not exposing all the filtering criteria. We also reintroduce dynamic queries on demand to better support interactive exploration. We posit that medical users are trained to know the possible filtering options but challenged to remember the details of the attribute names and code values. We demonstrate this paradigm in ParcoursVis, our scalable EventFlow-inspired patient care pathway visualization system powered by the French National Health Data System, one of the largest health data repositories in the world.

CRM Optimization

👍 👎 ♥ Save

Minimal Data, Maximum Clarity: A Heuristic for Explaining Optimization

Amirali Rayegan, Tim Menz

Rate this image: 😍 👍 👎

Abstract
Efficient, interpretable optimization is a critical but underexplored challenge in software engineering, where practitioners routinely face vast configuration spaces and costly, error-prone labeling processes. This paper introduces EZR, a novel and modular framework for multi-objective optimization that unifies active sampling, learning, and explanation within a single, lightweight pipeline. Departing from conventional wisdom, our Maximum Clarity Heuristic demonstrates that using less (but more informative) data can yield optimization models that are both effective and deeply understandable. EZR employs an active learning strategy based on Naive Bayes sampling to efficiently identify high-quality configurations with a fraction of the labels required by fully supervised approaches. It then distills optimization logic into concise decision trees, offering transparent, actionable explanations for both global and local decision-making. Extensive experiments across 60 real-world datasets establish that EZR reliably achieves over 90% of the best-known optimization performance in most cases, while providing clear, cohort-based rationales that surpass standard attribution-based explainable AI (XAI) methods (LIME, SHAP, BreakDown) in clarity and utility. These results endorse "less but better"; it is both possible and often preferable to use fewer (but more informative) examples to generate label-efficient optimization and explanations in software systems. To support transparency and reproducibility, all code and experimental materials are publicly available at https://github.com/amiiralii/Minimal-Data-Maximum-Clarity.

AI Insights

They use SHAP values as a feature‑selection compass, steering models toward the most informative inputs.
The paper reviews XAI methods—SHAP, LIME, TPE—highlighting strengths and blind spots.
It exposes gaps in current XAI tools, urging research into lighter, sharper explanations for software.
Explainability is framed as essential, not optional, for trustworthy software engineering.
A curated reading list appears, from The Book of Why to LLM‑driven active‑learning papers.
The authors note limited empirical evidence, reminding theory must meet data.
Definitions are clear: XAI demystifies AI; interpretability lets us see model decisions.

👍 👎 ♥ Save

Constant Time with Minimal Preprocessing, a Robust and Extensive Complexity Class

Abstract
In this paper, we study the class $\mathtt{cstPP}$ of operations $\mathtt{op}: \mathbb{N}^k\to\mathbb{N}$, of any fixed arity $k\ge 1$, satisfying the following property: for each fixed integer $d\ge 1$, there exists an algorithm for a RAM machine which, for any input integer $N\ge 2$, - pre-computes some tables in $O(N)$ time, - then reads $k$ operands $x_1,\ldots,x_k1$, or conversely, is reduced to $N^{\varepsilon}$, for any positive $\varepsilon<1$ (provided the set of primitive operation includes $+$, $\mathtt{div}$ and $\mathtt{mod}$). To complete the picture, we demonstrate that the $\mathtt{cstPP}$ class degenerates if the preprocessing time reduces to $N^{o(1)}$.

Interests not found

Help us improve your experience!