Hi!

Your personalized paper recommendations for 08 to 12 December, 2025.

🎯 Top Personalized Recommendations

Evolving Excellence: Automated Optimization of LLM-based Agents

TurinTech AI

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Summary

The system's ability to discover non-obvious optimizations through semantic mutations demonstrates the value of evolutionary approaches for natural language components. [3]
The system's ability to discover non-obvious optimizations through semantic mutations demonstrates the value of evolutionary approaches for natural language components. [3]
The mixed results also highlight that automated optimization is not universally beneficial; practitioners should assess their agents' baseline quality and task characteristics before investing in optimization efforts. [3]
Artemis is a framework that uses evolutionary techniques to optimize the performance of agents, particularly those with clear performance metrics and room for improvement. [3]
Artemis is a practical framework for automated agent optimization. [2]
Evolutionary prompt engineering: a method of optimizing the performance of agents by modifying their input prompts. [1]

Abstract
Agentic AI systems built on large language models (LLMs) offer significant potential for automating complex workflows, from software development to customer support. However, LLM agents often underperform due to suboptimal configurations; poorly tuned prompts, tool descriptions, and parameters that typically require weeks of manual refinement. Existing optimization methods either are too complex for general use or treat components in isolation, missing critical interdependencies. We present ARTEMIS, a no-code evolutionary optimization platform that jointly optimizes agent configurations through semantically-aware genetic operators. Given only a benchmark script and natural language goals, ARTEMIS automatically discovers configurable components, extracts performance signals from execution logs, and evolves configurations without requiring architectural modifications. We evaluate ARTEMIS on four representative agent systems: the \emph{ALE Agent} for competitive programming on AtCoder Heuristic Contest, achieving a \textbf{$13.6\%$ improvement} in acceptance rate; the \emph{Mini-SWE Agent} for code optimization on SWE-Perf, with a statistically significant \textbf{10.1\% performance gain}; and the \emph{CrewAI Agent} for cost and mathematical reasoning on Math Odyssey, achieving a statistically significant \textbf{$36.9\%$ reduction} in the number of tokens required for evaluation. We also evaluate the \emph{MathTales-Teacher Agent} powered by a smaller open-source model (Qwen2.5-7B) on GSM8K primary-level mathematics problems, achieving a \textbf{22\% accuracy improvement} and demonstrating that ARTEMIS can optimize agents based on both commercial and local models.

Why we think this paper is great for you:
This paper directly addresses the optimization of LLM-based agents, a key area for improving performance in automated workflows – a core interest for the user. Understanding how to tune these agents will be valuable for enhancing personalization platform capabilities.

Evolving Excellence: Automated Optimization of LLM-based Agents

TurinTech AI

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Summary

The system's ability to discover non-obvious optimizations through semantic mutations demonstrates the value of evolutionary approaches for natural language components. [3]
The system's ability to discover non-obvious optimizations through semantic mutations demonstrates the value of evolutionary approaches for natural language components. [3]
The mixed results also highlight that automated optimization is not universally beneficial; practitioners should assess their agents' baseline quality and task characteristics before investing in optimization efforts. [3]
Artemis is a framework that uses evolutionary techniques to optimize the performance of agents, particularly those with clear performance metrics and room for improvement. [3]
Artemis is a practical framework for automated agent optimization. [2]
Evolutionary prompt engineering: a method of optimizing the performance of agents by modifying their input prompts. [1]

Why we think this paper is great for you:
The paper focuses on improving LLM agent performance, aligning with the user's interest in data-driven CRM optimization and personalization platform development.

Reusability in MLOps: Leveraging Ports and Adapters to Build a Microservices Architecture for the Maritime Domain

Jheronimus Academy of

Rate paper: 👍 👎 ♥ Save

AI Summary

CORE business logic: The core functionality of the system, which is not part of the PORTS or ADAPTERS. [3]
The OCEANGUARD tool is an extensible Machine Learning System (MLES) that aims to analyze and detect anomalies across multiple types of data from the maritime domain. [2]
The authors faced two major challenges during development: generality, related to defining PORTS that are specific and dependency-agnostic, and separation of concerns, related to defining ADAPTERS that are distinct and logic-thin. [1]

Abstract
ML-Enabled Systems (MLES) are inherently complex since they require multiple components to achieve their business goal. This experience report showcases the software architecture reusability techniques applied while building Ocean Guard, an MLES for anomaly detection in the maritime domain. In particular, it highlights the challenges and lessons learned to reuse the Ports and Adapters pattern to support building multiple microservices from a single codebase. This experience report hopes to inspire software engineers, machine learning engineers, and data scientists to apply the Hexagonal Architecture pattern to build their MLES.

Why we think this paper is great for you:
This paper explores MLOps, a critical area for deploying and managing ML models within a broader system – a direct match to the user's interest in MLOps.

LLM-PEA: Leveraging Large Language Models Against Phishing Email Attacks

University of North Carol

Rate paper: 👍 👎 ♥ Save

AI Summary

Phishing email detection models are vulnerable to adversarial attacks. [3]
Large language models (LLMs) such as GPT-4, Claude Sonnet 4, and Grok-3 Beta can be used for phishing email detection. [3]
Zero-shot learning: A type of machine learning where a model can learn to perform a task without being explicitly trained on it. [3]
Few-shot learning: A type of machine learning where a model is trained on a small number of examples, but still performs well on new tasks. [3]
Phishing email detection models are vulnerable to adversarial attacks and require robustness against such threats. [3]
Zero-shot and few-shot learning strategies can improve multilingual performance in LLMs. [2]

Abstract
Email phishing is one of the most prevalent and globally consequential vectors of cyber intrusion. As systems increasingly deploy Large Language Models (LLMs) applications, these systems face evolving phishing email threats that exploit their fundamental architectures. Current LLMs require substantial hardening before deployment in email security systems, particularly against coordinated multi-vector attacks that exploit architectural vulnerabilities. This paper proposes LLMPEA, an LLM-based framework to detect phishing email attacks across multiple attack vectors, including prompt injection, text refinement, and multilingual attacks. We evaluate three frontier LLMs (e.g., GPT-4o, Claude Sonnet 4, and Grok-3) and comprehensive prompting design to assess their feasibility, robustness, and limitations against phishing email attacks. Our empirical analysis reveals that LLMs can detect the phishing email over 90% accuracy while we also highlight that LLM-based phishing email detection systems could be exploited by adversarial attack, prompt injection, and multilingual attacks. Our findings provide critical insights for LLM-based phishing detection in real-world settings where attackers exploit multiple vulnerabilities in combination.

Why we think this paper is great for you:
Given the user's interest in email marketing and personalization, this paper's focus on LLMs combating phishing aligns with the need to protect user data and improve personalization platform security.

Platform Competition with User-Generated Content

Changan University

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Summary

User-generated content (UGC) quality is an endogenous outcome driven by user heterogeneity and strategic advertising decisions. [3]
The model shows that platforms can compete by adjusting advertising intensity, but doing so also alters the perceived content quality, influencing user distribution and platform profitability. [2]

Abstract
This paper develops a theoretical model of platform competition where user-generated content (UGC) quality arises endogenously from the composition of the user base. Users differ in their relative preferences for content quality and network size, and platforms compete by choosing advertising intensity, which affects user utility through perceived quality. We characterize equilibrium platform choice, identifying conditions under which equilibria are stable. The model captures how platforms' strategic decisions shape user allocation and market outcomes, including coexistence and dominance scenarios. We consider two types of equilibria in advertising levels: Nash equilibria and Stackelberg equilibria, and discuss the industry and policy implications of our results.

Why we think this paper is great for you:
The paper's exploration of platform competition and user-generated content is relevant to understanding how personalization platforms operate and compete in the market.

A data-driven approach to linking design features with manufacturing process data for sustainable product development

Karlsruhe Institute of

Rate paper: 👍 👎 ♥ Save

AI Summary

Artificial Intelligence (AI)-based knowledge extraction: The use of AI algorithms to extract relevant information from existing designs or databases, enabling automatic generation of new design proposals. [3]
The integration of AI and DFM has the potential to revolutionize product development by improving efficiency and reducing costs. [3]
The paper discusses the integration of artificial intelligence (AI) and design for manufacturing (DFM) to improve product development efficiency and reduce production costs. [2]

Abstract
The growing adoption of Industrial Internet of Things (IIoT) technologies enables automated, real-time collection of manufacturing process data, unlocking new opportunities for data-driven product development. Current data-driven methods are generally applied within specific domains, such as design or manufacturing, with limited exploration of integrating design features and manufacturing process data. Since design decisions significantly affect manufacturing outcomes, such as error rates, energy consumption, and processing times, the lack of such integration restricts the potential for data-driven product design improvements. This paper presents a data-driven approach to mapping and analyzing the relationship between design features and manufacturing process data. A comprehensive system architecture is developed to ensure continuous data collection and integration. The linkage between design features and manufacturing process data serves as the basis for developing a machine learning model that enables automated design improvement suggestions. By integrating manufacturing process data with sustainability metrics, this approach opens new possibilities for sustainable product development.

Why we think this paper is great for you:
This paper's focus on data-driven product development and leveraging IIoT data aligns with the user's interest in data-driven CRM optimization and sustainable product development.

Fair Benchmarking of Optimisation Applications

Maastricht University

Rate paper: 👍 👎 ♥ Save

AI Summary

The paper proposes a set of principles for fair benchmarking of quantum optimization algorithms, emphasizing end-to-end workflows, transparency in tuning and reporting, problem diversity, and avoidance of speculative claims. [3]
The authors introduce the concept of algorithm class awareness, which ensures comparisons are restricted to families of algorithms with similar objectives and trade-offs. [3]
A particular challenge arises with hybrid quantum-classical systems, where benchmarking must include the full workflow: number of circuit evaluations, efficiency of classical routines, and quantum-classical communication overhead. [3]
Algorithm class awareness: Ensures comparisons are restricted to families of algorithms with similar objectives and trade-offs. [3]
End-to-end workflows: Evaluates performance across the entire workflow, including problem encoding, quantum execution, and classical post-processing. [3]
Fair benchmarking is not just a technical necessity but vital for building trust between quantum researchers, industry stakeholders, and the public. [3]
Robust benchmarking will guide practitioners in deciding when and for which problems quantum acceleration truly matters as quantum computing transitions from laboratory experiments to practical applications. [3]
A framework for practical benchmarking protocols is outlined, incorporating multiple evaluation criteria such as solution quality relative to baselines, workflow-level timing, robustness across instances, and energy consumption. [2]

Abstract
Quantum optimisation is emerging as a promising approach alongside classical heuristics and specialised hardware, yet its performance is often difficult to assess fairly. Traditional benchmarking methods, rooted in digital complexity theory, do not directly capture the continuous dynamics, probabilistic outcomes, and workflow overheads of quantum and hybrid systems. This paper proposes principles and protocols for fair benchmarking of quantum optimisation, emphasising end-to-end workflows, transparency in tuning and reporting, problem diversity, and avoidance of speculative claims. By extending lessons from classical benchmarking and incorporating application-driven and energy-aware metrics, we outline a framework that enables practitioners to evaluate quantum methods responsibly, ensuring reproducibility, comparability, and trust in reported results.

Why we think this paper is great for you:
The paper's investigation into benchmarking optimization applications is relevant to the user's interest in improving the performance of data-driven systems and personalization platforms.

Personalization Platform

Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Snap Inc

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However, existing methods rely on holistic embeddings from general-purpose image encoders, which entangle multiple visual factors and make it difficult to isolate a single attribute. This often leads to information leakage and incoherent synthesis. To address this limitation, we introduce Omni-Attribute, the first open-vocabulary image attribute encoder designed to learn high-fidelity, attribute-specific representations. Our approach jointly designs the data and model: (i) we curate semantically linked image pairs annotated with positive and negative attributes to explicitly teach the encoder what to preserve or suppress; and (ii) we adopt a dual-objective training paradigm that balances generative fidelity with contrastive disentanglement. The resulting embeddings prove effective for open-vocabulary attribute retrieval, personalization, and compositional generation, achieving state-of-the-art performance across multiple benchmarks.

AI Summary

Diffusion models: a type of generative model that learns to transform a noise signal into a data distribution. [3]
Requires large amounts of data and computational resources. [3]
It also mentions the work of Jason Wei et al. [3]
The paper discusses the use of diffusion models for image synthesis and text-to-image translation. [2]
(2022) on chain-of-thought prompting elicits reasoning in large language models. [1]

Data Driven CRM

Building a Data Dashboard for Magic: The Gathering: Initial Design Considerations

University of Lisbon

Rate paper: 👍 👎 ♥ Save

Abstract
This paper presents the initial stages of a design study aimed at developing a dashboard to visualize gameplay data of the Commander format from Magic: The Gathering. We conducted a user-task analysis to identify requirements for a data visualization dashboard tailored to the Commander format. Afterwards, we proposed a design for the dashboard leveraging visualizations to address players' needs and pain points for typical data analysis tasks in the context domain. Then, we followed-up with a structured user test to evaluate players' comprehension and preferences of data visualizations. Results show that players prioritize contextually relevant, outcome-driven metrics over peripheral ones, and that canonical charts like heatmaps and line charts support higher comprehension than complex ones such as scatterplots or icicle plots. Our findings also highlight the importance of localized views, user customization, and progressive disclosure, emphasizing that adaptability and contextual relevance are as essential as accuracy in effective dashboard design. Our study contributes practical design guidelines for data visualization in gaming contexts and highlights broader implications for engagement-driven dashboards.

AI Summary

The system aims to help players analyze and understand their gameplay data. [3]
They developed a prototype that incorporates various visualization techniques, including network analysis, time-series plots, and scatterplots. [3]
Visual analytics: The use of interactive visualizations to support analytical reasoning and decision-making. [3]
Co-creation: A process where users collaborate with designers to create a product or service that meets their needs. [3]
The system's effectiveness is attributed to its ability to provide actionable insights through interactive visualizations. [3]
The paper discusses the design of a visual analytics system for Magic: The Gathering, a popular trading card game. [2]

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

Personalization

You can edit or add more interests any time.

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback