Papers from 13 to 17 October, 2025

Here are the personalized paper recommendations sorted by most relevant
Paid Search
👍 👎 ♄ Save
Shanghai Jiao Tong Univer
Abstract
Accurately modeling query-item relevance drives e-commerce ranking, yet long-tail, knowledge-heavy, and fast-evolving queries exceed parametric LLM coverage. External context (reviews, attribute encyclopedias, UGC) can help but is noisy, and single-pass latency and cost forbid any clean-then-summarize step. The model must, per query, judge relevance and decide whether to use, partially use, or ignore the context. DyKnow-RAG is a dynamic noisy-RAG framework built on Group Relative Policy Optimization. It trains two rollout groups (no external context vs a single retrieved chunk) and applies posterior-driven inter-group advantage scaling that adaptively reweights their contributions by the per-query correctness gap. This teaches when to trust retrieval versus fall back to parametric knowledge, without process labels, value networks, or extra inference passes, preserving single-pass, single-chunk deployment under production latency. Training combines: (1) supervised initialization with a structured rationale that explicitly records the context-usage decision; (2) an RL pool prioritized by SFT uncertainty to focus where context choice is most consequential; and (3) an optional lightweight DPO warm start to stabilize with-context calibration. Under a unified retrieval/index and fixed latency budget, DyKnow-RAG outperforms SFT, DPO, and vanilla GRPO in offline tests, and delivers consistent lifts on GSB, Query Goodrate, and Item Goodrate in Taobao A/B testing. It is deployed in Taobao's production relevance system, serving live traffic. To our knowledge, it is among the first single-pass RAG solutions for e-commerce relevance, turning noisy external signals into reliable gains without added online complexity.
Bidding
👍 👎 ♄ Save
IIT Bombay
Abstract
Firms (businesses, service providers, entertainment organizations, political parties, etc.) advertise on social networks to draw people's attention and improve their awareness of the brands of the firms. In all such cases, the competitive nature of their engagements gives rise to a game where the firms need to decide how to distribute their budget over the agents on a network to maximize their brand's awareness. The firms (players) therefore need to optimize how much budget they should put on the vertices of the network so that the spread improves via direct (via advertisements or free promotional offers) and indirect marketing (words-of-mouth). We propose a two-timescale model of decisions where the communication between the vertices happen in a faster timescale and the strategy update of the firms happen in a slower timescale. We show that under fairly standard conditions, the best response dynamics of the firms converge to a pure strategy Nash equilibrium. However, such equilibria can be away from a socially optimal one. We provide a characterization of the contest success functions and provide examples for the designers of such contests (e.g., regulators, social network providers, etc.) such that the Nash equilibrium becomes unique and social welfare maximizing. Our experiments show that for realistic scenarios, such contest success functions perform fairly well.
AI Insights
  • A two‑phase generator first samples realistic demographics, then builds a graph with edge weights mirroring true influence.
  • Influence strengths use a Bernoulli likelihood; adoption thresholds follow a General Threshold model, achieving 95–96 % accuracy.
  • The method stays stable across densities; larger populations reduce the variance seen in 2,000‑node tests.
  • Unlike standard diffusion models, it preserves real‑world social tie heterogeneity for faithful brand‑awareness simulations.
  • Synthetic graphs reproduce observed product holdings, validating the approach for marketing‑strategy testing.
  • The framework can be extended with temporal dynamics or multi‑product interactions while remaining computationally tractable.
  • Key references: “Network Science”, “Influence Maximization in Social Networks”, and foundational Bernoulli/Threshold papers.
👍 👎 ♄ Save
University of Warsaw, Tsu
Abstract
We study a two-player model of conflict with multiple battlefields -- the novel element is that each of the players has their own network of spillovers so that resources allocated to one battle can be utilized in winning neighboring battles. There exists a unique equilibrium in which the relative probability of a player winning a battle is the product of the ratio of the centrality of the battlefield in the two respective competing networks and the ratio of the relative cost of efforts of the two players. We study the design of networks and characterize networks that maximize total efforts and maximize total utility. Finally, we characterize the equilibrium of a game in which players choose both networks and efforts in the battles.
AI Insights
  • The model admits a continuum of Nash equilibria even when all I+ρi matrices are nonsingular, revealing rich strategic diversity.
  • In the B={1,2,3} example, player 1’s spillover network yields effective efforts y1=4/9, y2=1/4, while battlefield 3 receives zero from player 2, illustrating asymmetric influence.
  • The equilibrium strategy is valid only when ÂŸâ€Ż< ”3ÂČ < 9/2, a subtle constraint that shapes the feasible effort profiles.
  • Each distinct value of ”2 generates a different equilibrium, underscoring the model’s sensitivity to marginal effort adjustments.
  • Core definitions: Conflict Function maps effort to win probability; Spillover Structure captures cross‑battlefield influence.
  • Recommended reading: “The Theory of Games and Economic Behavior” and Aumann & Maschler’s survey on repeated games with incomplete information.
  • Caveats: reliance on Tullock CSF, exclusion of coalitions or side payments, and fixed spillover structures limit real‑world applicability.
customer relationship management (crm) optimization
👍 👎 ♄ Save
School of Computing, SriL
Paper visualization
Rate this image: 😍 👍 👎
Abstract
In online retail, customer acquisition typically incurs higher costs than customer retention, motivating firms to invest in churn analytics. However, many contemporary churn models operate as opaque black boxes, limiting insight into the determinants of attrition, the timing of retention opportunities, and the identification of high-risk customer segments. Accordingly, the emphasis should shift from prediction alone to the design of personalized retention strategies grounded in interpretable evidence. This study advances a three-component framework that integrates explainable AI to quantify feature contributions, survival analysis to model time-to-event churn risk, and RFM profiling to segment customers by transactional behaviour. In combination, these methods enable the attribution of churn drivers, estimation of intervention windows, and prioritization of segments for targeted actions, thereby supporting strategies that reduce attrition and strengthen customer loyalty.
AI Insights
  • SHAP values pinpoint the exact contribution of each RFM feature to churn risk, enabling targeted incentive design.
  • Random Forest, Gradient Boosting, and XGBoost models achieve >85% AUC on the Kaggle e‑commerce churn dataset, outperforming baseline logistic regression.
  • Survival analysis estimates a median churn time of 42 days for high‑risk segments, guiding optimal timing for retention offers.
  • RFM clustering reveals a “Gold” cohort with 3× higher lifetime value, suggesting a priority for loyalty programs.
  • The study’s cross‑validation scheme mitigates overfitting, yet future work should incorporate concept‑drift detection for evolving customer behavior.
  • A recommended reading: “Explainable AI: Interpreting, Explaining and Visualizing Deep Learning” for deeper insight into SHAP visualizations.
  • Definition: Explainable AI (XAI) refers to methods that render machine‑learning decisions transparent to domain experts.
Marketing Channels
👍 👎 ♄ Save
Independent Researcher
Abstract
Marketing Mix Modeling (MMM) is a statistical technique used to estimate the impact of marketing activities on business outcomes such as sales, revenue, or customer visits. Traditional MMM approaches often rely on linear regression or Bayesian hierarchical models that assume independence between marketing channels and struggle to capture complex temporal dynamics and non-linear saturation effects [@Hanssens2005; @Ng2021Bayesian]. DeepCausalMMM is a Python package that addresses these limitations by combining deep learning, causal inference, and advanced marketing science. The package uses Gated Recurrent Units (GRUs) to automatically learn temporal patterns such as adstock (carryover effects) and lag, while simultaneously learning statistical dependencies and potential causal structures between marketing channels through Directed Acyclic Graph (DAG) learning [@Zheng2018NOTEARS; @Gong2024CausalMMM]. Additionally, it implements Hill equation-based saturation curves to model diminishing returns and optimize budget allocation. Key innovations include: (1) a data-driven design where hyperparameters and transformations (e.g., adstock decay, saturation curves) are learned or estimated from data with sensible defaults, rather than requiring fixed heuristics or manual specification, (2) multi-region modeling with both shared and region-specific parameters, (3) robust statistical methods including Huber loss and advanced regularization, (4) comprehensive response curve analysis for understanding channel saturation, and (5) an extensive visualization suite with 14+ interactive dashboards for business insights.
AI Insights
  • Demonstrated predictive accuracy on 190 regions, 109 weeks, 13 channels, and 7 controls in anonymized real‑world data.
  • Employs Huber loss and elastic‑net regularization to guard against outliers and over‑fitting.
  • DAG structure is learned via the continuous NOTEARS optimization, revealing causal links between media.
  • Multi‑region modeling shares global parameters while allowing region‑specific adjustments for local effects.
  • Requires GPU‑accelerated training; small‑scale setups may struggle with memory demands.
  • Comparable to open‑source tools like Robyn, LightweightMMM, PyMC‑Marketing, and CausalMMM, but adds causal inference.
  • Recommended reading: Hanssens et al. “Market Response Models” and Li et al. “Deep Causal Models Survey”.
👍 👎 ♄ Save
Tongji University,Stanfod
Abstract
Understanding human intent is a complex, high-level task for large language models (LLMs), requiring analytical reasoning, contextual interpretation, dynamic information aggregation, and decision-making under uncertainty. Real-world public discussions, such as consumer product discussions, are rarely linear or involve a single user. Instead, they are characterized by interwoven and often conflicting perspectives, divergent concerns, goals, emotional tendencies, as well as implicit assumptions and background knowledge about usage scenarios. To accurately understand such explicit public intent, an LLM must go beyond parsing individual sentences; it must integrate multi-source signals, reason over inconsistencies, and adapt to evolving discourse, similar to how experts in fields like politics, economics, or finance approach complex, uncertain environments. Despite the importance of this capability, no large-scale benchmark currently exists for evaluating LLMs on real-world human intent understanding, primarily due to the challenges of collecting real-world public discussion data and constructing a robust evaluation pipeline. To bridge this gap, we introduce \bench, the first dynamic, live evaluation benchmark specifically designed for intent understanding, particularly in the consumer domain. \bench is the largest and most diverse benchmark of its kind, supporting real-time updates while preventing data contamination through an automated curation pipeline.
AI Insights
  • Google Nest Smart Speakers excel in multi‑room audio but users report intermittent connectivity glitches.
  • Voice‑recognition accuracy drops in noisy environments, prompting users to tweak sensitivity settings.
  • Local server integration can reduce latency and shield against cloud‑service outages.
  • Long‑term support concerns drive hesitation to expand the Nest ecosystem beyond core devices.
  • Smart Home Automation leverages APIs to sync Nest speakers with lighting and thermostats.
  • A recent study quantifies Nest’s audio fidelity against competing brands, revealing a 3‑dB advantage in bass response.
  • For deeper analysis, consult “An Analysis of Voice Recognition Accuracy in Google Nest Smart Speakers” (IEEE 2023).
Personalization
👍 👎 ♄ Save
MBZUAI2ByteDanceNational
Abstract
Large language models (LLMs) have grown more powerful in language generation, producing fluent text and even imitating personal style. Yet, this ability also heightens the risk of identity impersonation. To the best of our knowledge, no prior work has examined personalized machine-generated text (MGT) detection. In this paper, we introduce \dataset, the first benchmark for evaluating detector robustness in personalized settings, built from literary and blog texts paired with their LLM-generated imitations. Our experimental results demonstrate large performance gaps across detectors in personalized settings: some state-of-the-art models suffer significant drops. We attribute this limitation to the \textit{feature-inversion trap}, where features that are discriminative in general domains become inverted and misleading when applied to personalized text. Based on this finding, we propose \method, a simple and reliable way to predict detector performance changes in personalized settings. \method identifies latent directions corresponding to inverted features and constructs probe datasets that differ primarily along these features to evaluate detector dependence. Our experiments show that \method can accurately predict both the direction and the magnitude of post-transfer changes, showing 85\% correlation with the actual performance gaps. We hope that this work will encourage further research on personalized text detection.
AI Insights
  • The benchmark draws from classic Jane Austen and Bernard Shaw excerpts, adding a nostalgic twist to AI research.
  • Long, clause‑laden sentences test detectors’ parsing of complex syntax, a nuance absent from the abstract.
  • The literature review highlights 19th‑century social critique, focusing on marriage and gender roles.
  • Revisiting Pride and Prejudice and Mrs. Warren’s Profession helps grasp the stylistic cues detectors face.
  • A weakness: excerpts may lack broader context, potentially biasing detector evaluation.
  • Inverted features flip formal language meaning, turning subtle cues into misleading signals.
  • Probe datasets along latent feature directions show how tiny stylistic shifts can swing detection accuracy.
👍 👎 ♄ Save
The University of HongKo
Abstract
Training student models on synthetic data generated by strong teacher models is a promising way to distilling the capabilities of teachers. However, recent studies show that stronger models are not always optimal teachers, revealing a mismatch between teacher outputs and student learnability. To address this issue, we propose PerSyn (Personalized data Synthesis), a novel synthesis strategy that operates under a new ``Route then Generate'' paradigm to create data tailored to each student model, enabling it to learn more effectively. Specifically, PerSyn first assigns each prompt to its optimal teacher via a query-level router that jointly considers student learnability and teacher response quality. Each teacher then synthesizes data only for its assigned prompts, making the process more efficient than the conventional ``Generate then Select'' paradigm, where all teachers must generate parallel responses for the entire prompt set before constructing the final dataset. Extensive experiments across different model families and scales demonstrate that PerSyn consistently achieves superior or comparable performance to all baselines in instruct tuning and math reasoning settings. Further analysis verifies the effectiveness of PerSyn and offers extra insights to propel future research.
AI Insights
  • PerSyn’s gains rise with reward‑model strength; Skywork‑Reward‑V2‑Llama‑3.1‑8B beats the 3B version.
  • Even with a weak reward model, PerSyn outperforms the strong CAR baseline.
  • Baselines Strong, Mix, Family‑Strong, and CAR trail PerSyn in instruction‑tuning and math reasoning.
  • Teacher‑assignment tables show Strong and Family‑Strong favor large models; CAR picks a single average teacher.
  • The router jointly optimizes student learnability and teacher response quality—an innovation beyond the abstract.
  • Suggested papers: Letzelter et al. (2025), Zhang et al. (2025), Ponkshe et al. (2025), Li et al. (2025b), Xu et al. (2025b); Magpie‑Zoo (Xu et al., 2025b) is a key dataset.
Data Science Management
👍 👎 ♄ Save
Beijing Institute of Tech
Paper visualization
Rate this image: 😍 👍 👎
Abstract
A growing trend in modern data analysis is the integration of data management with learning, guided by accuracy, latency, and cost requirements. In practice, applications draw data of different formats from many sources. In the meanwhile, the objectives and budgets change over time. Existing systems handle these applications across databases, analysis libraries, and tuning services. Such fragmentation leads to complex user interaction, limited adaptability, suboptimal performance, and poor extensibility across components. To address these challenges, we present Aixel, a unified, adaptive, and extensible system for AI-powered data analysis. The system organizes work across four layers: application, task, model, and data. The task layer provides a declarative interface to capture user intent, which is parsed into an executable operator plan. An optimizer compiles and schedules this plan to meet specified goals in accuracy, latency, and cost. The task layer coordinates the execution of data and model operators, with built-in support for reuse and caching to improve efficiency. The model layer offers versioned storage for index, metadata, tensors, and model artifacts. It supports adaptive construction, task-aligned drift detection, and safe updates that reuse shared components. The data layer provides unified data management capabilities, including indexing, constraint-aware discovery, task-aligned selection, and comprehensive feature management. With the above designed layers, Aixel delivers a user friendly, adaptive, efficient, and extensible system.
Attribution
👍 👎 ♄ Save
Abstract
Attributing authorship in the era of large language models (LLMs) is increasingly challenging as machine-generated prose rivals human writing. We benchmark two complementary attribution mechanisms , fixed Style Embeddings and an instruction-tuned LLM judge (GPT-4o) on the Human AI Parallel Corpus, an open dataset of 600 balanced instances spanning six domains (academic, news, fiction, blogs, spoken transcripts, and TV/movie scripts). Each instance contains a human prompt with both a gold continuation and an LLM-generated continuation from either GPT-4o or LLaMA-70B-Instruct. The Style Embedding baseline achieves stronger aggregate accuracy on GPT continuations (82 pct vs. 68 pct). The LLM Judge is slightly better than the Style embeddings on LLaMA continuations (85 pct vs. 81 pct) but the results are not statistically significant. Crucially, the LLM judge significantly outperforms in fiction and academic prose, indicating semantic sensitivity, whereas embeddings dominate in spoken and scripted dialogue, reflecting structural strengths. These complementary patterns highlight attribution as a multidimensional problem requiring hybrid strategies. To support reproducibility we provide code on GitHub and derived data on Hugging Face under the MIT license. This open framework provides a reproducible benchmark for attribution quality assessment in AI-generated content, along with a review of related literature influencing this work.
👍 👎 ♄ Save
University College London
Abstract
Current training data attribution (TDA) methods treat the influence one sample has on another as static, but neural networks learn in distinct stages that exhibit changing patterns of influence. In this work, we introduce a framework for stagewise data attribution grounded in singular learning theory. We predict that influence can change non-monotonically, including sign flips and sharp peaks at developmental transitions. We first validate these predictions analytically and empirically in a toy model, showing that dynamic shifts in influence directly map to the model's progressive learning of a semantic hierarchy. Finally, we demonstrate these phenomena at scale in language models, where token-level influence changes align with known developmental stages.

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • Direction on Data Science Organizations
You can edit or add more interests any time.

Unsubscribe from these updates