🎯 Top Personalized Recommendations
TurinTech AI
AI Summary - The system's ability to discover non-obvious optimizations through semantic mutations demonstrates the value of evolutionary approaches for natural language components. [3]
- The system's ability to discover non-obvious optimizations through semantic mutations demonstrates the value of evolutionary approaches for natural language components. [3]
- The mixed results also highlight that automated optimization is not universally beneficial; practitioners should assess their agents' baseline quality and task characteristics before investing in optimization efforts. [3]
- Artemis is a framework that uses evolutionary techniques to optimize the performance of agents, particularly those with clear performance metrics and room for improvement. [3]
- Artemis is a practical framework for automated agent optimization. [2]
- Evolutionary prompt engineering: a method of optimizing the performance of agents by modifying their input prompts. [1]
Abstract
Agentic AI systems built on large language models (LLMs) offer significant potential for automating complex workflows, from software development to customer support. However, LLM agents often underperform due to suboptimal configurations; poorly tuned prompts, tool descriptions, and parameters that typically require weeks of manual refinement. Existing optimization methods either are too complex for general use or treat components in isolation, missing critical interdependencies.
We present ARTEMIS, a no-code evolutionary optimization platform that jointly optimizes agent configurations through semantically-aware genetic operators. Given only a benchmark script and natural language goals, ARTEMIS automatically discovers configurable components, extracts performance signals from execution logs, and evolves configurations without requiring architectural modifications.
We evaluate ARTEMIS on four representative agent systems: the \emph{ALE Agent} for competitive programming on AtCoder Heuristic Contest, achieving a \textbf{$13.6\%$ improvement} in acceptance rate; the \emph{Mini-SWE Agent} for code optimization on SWE-Perf, with a statistically significant \textbf{10.1\% performance gain}; and the \emph{CrewAI Agent} for cost and mathematical reasoning on Math Odyssey, achieving a statistically significant \textbf{$36.9\%$ reduction} in the number of tokens required for evaluation. We also evaluate the \emph{MathTales-Teacher Agent} powered by a smaller open-source model (Qwen2.5-7B) on GSM8K primary-level mathematics problems, achieving a \textbf{22\% accuracy improvement} and demonstrating that ARTEMIS can optimize agents based on both commercial and local models.
Why we think this paper is great for you:
This paper directly addresses the optimization of LLM-based agents, a key area for improving performance in automated workflows – a core interest for the user. Understanding how to tune these agents will be valuable for enhancing personalization platform capabilities.
TurinTech AI
AI Summary - The system's ability to discover non-obvious optimizations through semantic mutations demonstrates the value of evolutionary approaches for natural language components. [3]
- The system's ability to discover non-obvious optimizations through semantic mutations demonstrates the value of evolutionary approaches for natural language components. [3]
- The mixed results also highlight that automated optimization is not universally beneficial; practitioners should assess their agents' baseline quality and task characteristics before investing in optimization efforts. [3]
- Artemis is a framework that uses evolutionary techniques to optimize the performance of agents, particularly those with clear performance metrics and room for improvement. [3]
- Artemis is a practical framework for automated agent optimization. [2]
- Evolutionary prompt engineering: a method of optimizing the performance of agents by modifying their input prompts. [1]
Abstract
Agentic AI systems built on large language models (LLMs) offer significant potential for automating complex workflows, from software development to customer support. However, LLM agents often underperform due to suboptimal configurations; poorly tuned prompts, tool descriptions, and parameters that typically require weeks of manual refinement. Existing optimization methods either are too complex for general use or treat components in isolation, missing critical interdependencies.
We present ARTEMIS, a no-code evolutionary optimization platform that jointly optimizes agent configurations through semantically-aware genetic operators. Given only a benchmark script and natural language goals, ARTEMIS automatically discovers configurable components, extracts performance signals from execution logs, and evolves configurations without requiring architectural modifications.
We evaluate ARTEMIS on four representative agent systems: the \emph{ALE Agent} for competitive programming on AtCoder Heuristic Contest, achieving a \textbf{$13.6\%$ improvement} in acceptance rate; the \emph{Mini-SWE Agent} for code optimization on SWE-Perf, with a statistically significant \textbf{10.1\% performance gain}; and the \emph{CrewAI Agent} for cost and mathematical reasoning on Math Odyssey, achieving a statistically significant \textbf{$36.9\%$ reduction} in the number of tokens required for evaluation. We also evaluate the \emph{MathTales-Teacher Agent} powered by a smaller open-source model (Qwen2.5-7B) on GSM8K primary-level mathematics problems, achieving a \textbf{22\% accuracy improvement} and demonstrating that ARTEMIS can optimize agents based on both commercial and local models.
Why we think this paper is great for you:
The paper focuses on improving LLM agent performance, aligning with the user's interest in data-driven CRM optimization and personalization platform development.
Jheronimus Academy of
AI Summary - CORE business logic: The core functionality of the system, which is not part of the PORTS or ADAPTERS. [3]
- The OCEANGUARD tool is an extensible Machine Learning System (MLES) that aims to analyze and detect anomalies across multiple types of data from the maritime domain. [2]
- The authors faced two major challenges during development: generality, related to defining PORTS that are specific and dependency-agnostic, and separation of concerns, related to defining ADAPTERS that are distinct and logic-thin. [1]
Abstract
ML-Enabled Systems (MLES) are inherently complex since they require multiple components to achieve their business goal. This experience report showcases the software architecture reusability techniques applied while building Ocean Guard, an MLES for anomaly detection in the maritime domain. In particular, it highlights the challenges and lessons learned to reuse the Ports and Adapters pattern to support building multiple microservices from a single codebase. This experience report hopes to inspire software engineers, machine learning engineers, and data scientists to apply the Hexagonal Architecture pattern to build their MLES.
Why we think this paper is great for you:
This paper explores MLOps, a critical area for deploying and managing ML models within a broader system – a direct match to the user's interest in MLOps.
University of North Carol
AI Summary - Phishing email detection models are vulnerable to adversarial attacks. [3]
- Large language models (LLMs) such as GPT-4, Claude Sonnet 4, and Grok-3 Beta can be used for phishing email detection. [3]
- Zero-shot learning: A type of machine learning where a model can learn to perform a task without being explicitly trained on it. [3]
- Few-shot learning: A type of machine learning where a model is trained on a small number of examples, but still performs well on new tasks. [3]
- Phishing email detection models are vulnerable to adversarial attacks and require robustness against such threats. [3]
- Zero-shot and few-shot learning strategies can improve multilingual performance in LLMs. [2]
Abstract
Email phishing is one of the most prevalent and globally consequential vectors of cyber intrusion. As systems increasingly deploy Large Language Models (LLMs) applications, these systems face evolving phishing email threats that exploit their fundamental architectures. Current LLMs require substantial hardening before deployment in email security systems, particularly against coordinated multi-vector attacks that exploit architectural vulnerabilities. This paper proposes LLMPEA, an LLM-based framework to detect phishing email attacks across multiple attack vectors, including prompt injection, text refinement, and multilingual attacks. We evaluate three frontier LLMs (e.g., GPT-4o, Claude Sonnet 4, and Grok-3) and comprehensive prompting design to assess their feasibility, robustness, and limitations against phishing email attacks. Our empirical analysis reveals that LLMs can detect the phishing email over 90% accuracy while we also highlight that LLM-based phishing email detection systems could be exploited by adversarial attack, prompt injection, and multilingual attacks. Our findings provide critical insights for LLM-based phishing detection in real-world settings where attackers exploit multiple vulnerabilities in combination.
Why we think this paper is great for you:
Given the user's interest in email marketing and personalization, this paper's focus on LLMs combating phishing aligns with the need to protect user data and improve personalization platform security.
Changan University
AI Summary - User-generated content (UGC) quality is an endogenous outcome driven by user heterogeneity and strategic advertising decisions. [3]
- The model shows that platforms can compete by adjusting advertising intensity, but doing so also alters the perceived content quality, influencing user distribution and platform profitability. [2]
Abstract
This paper develops a theoretical model of platform competition where user-generated content (UGC) quality arises endogenously from the composition of the user base. Users differ in their relative preferences for content quality and network size, and platforms compete by choosing advertising intensity, which affects user utility through perceived quality. We characterize equilibrium platform choice, identifying conditions under which equilibria are stable. The model captures how platforms' strategic decisions shape user allocation and market outcomes, including coexistence and dominance scenarios. We consider two types of equilibria in advertising levels: Nash equilibria and Stackelberg equilibria, and discuss the industry and policy implications of our results.
Why we think this paper is great for you:
The paper's exploration of platform competition and user-generated content is relevant to understanding how personalization platforms operate and compete in the market.
Karlsruhe Institute of
AI Summary - Artificial Intelligence (AI)-based knowledge extraction: The use of AI algorithms to extract relevant information from existing designs or databases, enabling automatic generation of new design proposals. [3]
- The integration of AI and DFM has the potential to revolutionize product development by improving efficiency and reducing costs. [3]
- The paper discusses the integration of artificial intelligence (AI) and design for manufacturing (DFM) to improve product development efficiency and reduce production costs. [2]
Abstract
The growing adoption of Industrial Internet of Things (IIoT) technologies enables automated, real-time collection of manufacturing process data, unlocking new opportunities for data-driven product development. Current data-driven methods are generally applied within specific domains, such as design or manufacturing, with limited exploration of integrating design features and manufacturing process data. Since design decisions significantly affect manufacturing outcomes, such as error rates, energy consumption, and processing times, the lack of such integration restricts the potential for data-driven product design improvements. This paper presents a data-driven approach to mapping and analyzing the relationship between design features and manufacturing process data. A comprehensive system architecture is developed to ensure continuous data collection and integration. The linkage between design features and manufacturing process data serves as the basis for developing a machine learning model that enables automated design improvement suggestions. By integrating manufacturing process data with sustainability metrics, this approach opens new possibilities for sustainable product development.
Why we think this paper is great for you:
This paper's focus on data-driven product development and leveraging IIoT data aligns with the user's interest in data-driven CRM optimization and sustainable product development.
Maastricht University
AI Summary - The paper proposes a set of principles for fair benchmarking of quantum optimization algorithms, emphasizing end-to-end workflows, transparency in tuning and reporting, problem diversity, and avoidance of speculative claims. [3]
- The authors introduce the concept of algorithm class awareness, which ensures comparisons are restricted to families of algorithms with similar objectives and trade-offs. [3]
- A particular challenge arises with hybrid quantum-classical systems, where benchmarking must include the full workflow: number of circuit evaluations, efficiency of classical routines, and quantum-classical communication overhead. [3]
- Algorithm class awareness: Ensures comparisons are restricted to families of algorithms with similar objectives and trade-offs. [3]
- End-to-end workflows: Evaluates performance across the entire workflow, including problem encoding, quantum execution, and classical post-processing. [3]
- Fair benchmarking is not just a technical necessity but vital for building trust between quantum researchers, industry stakeholders, and the public. [3]
- Robust benchmarking will guide practitioners in deciding when and for which problems quantum acceleration truly matters as quantum computing transitions from laboratory experiments to practical applications. [3]
- A framework for practical benchmarking protocols is outlined, incorporating multiple evaluation criteria such as solution quality relative to baselines, workflow-level timing, robustness across instances, and energy consumption. [2]
Abstract
Quantum optimisation is emerging as a promising approach alongside classical heuristics and specialised hardware, yet its performance is often difficult to assess fairly. Traditional benchmarking methods, rooted in digital complexity theory, do not directly capture the continuous dynamics, probabilistic outcomes, and workflow overheads of quantum and hybrid systems. This paper proposes principles and protocols for fair benchmarking of quantum optimisation, emphasising end-to-end workflows, transparency in tuning and reporting, problem diversity, and avoidance of speculative claims. By extending lessons from classical benchmarking and incorporating application-driven and energy-aware metrics, we outline a framework that enables practitioners to evaluate quantum methods responsibly, ensuring reproducibility, comparability, and trust in reported results.
Why we think this paper is great for you:
The paper's investigation into benchmarking optimization applications is relevant to the user's interest in improving the performance of data-driven systems and personalization platforms.