Papers from 29 to 03 October, 2025

Here are the personalized paper recommendations sorted by most relevant

AI for Product Management

Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain

Rate this image: 😍 👍 👎

Abstract
The practice of fine-tuning AI agents on data from their own interactions--such as web browsing or tool use--, while being a strong general recipe for improving agentic capabilities, also introduces a critical security vulnerability within the AI supply chain. In this work, we show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors that are triggerred by specific target phrases, such that when the agent encounters these triggers, it performs an unsafe or malicious action. We formalize and validate three realistic threat models targeting different layers of the supply chain: 1) direct poisoning of fine-tuning data, where an attacker controls a fraction of the training traces; 2) environmental poisoning, where malicious instructions are injected into webpages scraped or tools called while creating training data; and 3) supply chain poisoning, where a pre-backdoored base model is fine-tuned on clean data to improve its agentic capabilities. Our results are stark: by poisoning as few as 2% of the collected traces, an attacker can embed a backdoor causing an agent to leak confidential user information with over 80% success when a specific trigger is present. This vulnerability holds across all three threat models. Furthermore, we demonstrate that prominent safeguards, including two guardrail models and one weight-based defense, fail to detect or prevent the malicious behavior. These findings highlight an urgent threat to agentic AI development and underscore the critical need for rigorous security vetting of data collection processes and end-to-end model supply chains.

👍 👎 ♥ Save

Integrating AI and Ensemble Forecasting: Explainable Materials Planning with Scorecards and Trend Insights for a Large-Scale Manufacturer

Wayne State University

Abstract
This paper presents a practical architecture for after-sales demand forecasting and monitoring that unifies a revenue- and cluster-aware ensemble of statistical, machine-learning, and deep-learning models with a role-driven analytics layer for scorecards and trend diagnostics. The framework ingests exogenous signals (installed base, pricing, macro indicators, life cycle, seasonality) and treats COVID-19 as a distinct regime, producing country-part forecasts with calibrated intervals. A Pareto-aware segmentation forecasts high-revenue items individually and pools the long tail via clusters, while horizon-aware ensembling aligns weights with business-relevant losses (e.g., WMAPE). Beyond forecasts, a performance scorecard delivers decision-focused insights: accuracy within tolerance thresholds by revenue share and count, bias decomposition (over- vs under-forecast), geographic and product-family hotspots, and ranked root causes tied to high-impact part-country pairs. A trend module tracks trajectories of MAPE/WMAPE and bias across recent months, flags entities that are improving or deteriorating, detects change points aligned with known regimes, and attributes movements to lifecycle and seasonal factors. LLMs are embedded in the analytics layer to generate role-aware narratives and enforce reporting contracts. They standardize business definitions, automate quality checks and reconciliations, and translate quantitative results into concise, explainable summaries for planners and executives. The system exposes a reproducible workflow -- request specification, model execution, database-backed artifacts, and AI-generated narratives -- so planners can move from "How accurate are we now?" to "Where is accuracy heading and which levers should we pull?", closing the loop between forecasting, monitoring, and inventory decisions across more than 90 countries and about 6,000 parts.

Vision Setting for Tech Teams

👍 👎 ♥ Save

PAL-UI: Planning with Active Look-back for Vision-Based GUI Agents

Renmin University of Chin

Abstract
Graphical User Interface (GUI) agents powered by Multimodal Large Language Models (MLLMs) promise human-like interaction with software applications, yet long-horizon tasks remain challenging due to memory limitations. Existing approaches either truncate history or rely on simple textual summaries, which risk losing critical information when past visual details become necessary for future decisions. In this paper, we propose \textbf{PAL-UI} (\textbf{P}lanning with \textbf{A}ctive \textbf{L}ook-back), a novel framework that enables GUI agents to adaptively retrieve past observations when required. PAL-UI combines a dual-level summarization agent, capturing both observation-level cues and action-level outcomes, with a dedicated retrieval tool that allows the agent to recall specific historical screenshots during planning. We curate a step-level instruction dataset of 8.6K samples from mobile GUI navigation trajectories and train \textbf{PAL-UI-3B} and \textbf{PAL-UI-7B} models based on Qwen2.5-VL. Extensive experiments demonstrate that PAL-UI significantly outperforms baseline models and prior methods in mobile GUI navigation tasks, even under data-efficient settings. Moreover, PAL-UI exhibits strong cross-domain generalization, achieving notable improvements in web navigation without additional training. Our work highlights the potential of active memory retrieval for long-horizon planning capabilities of vision-based GUI agents.

👍 👎 ♥ Save

Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness

University of California

Abstract
Computer-Use Agents (CUAs) are an increasingly deployed class of agents that take actions on GUIs to accomplish user goals. In this paper, we show that CUAs consistently exhibit Blind Goal-Directedness (BGD): a bias to pursue goals regardless of feasibility, safety, reliability, or context. We characterize three prevalent patterns of BGD: (i) lack of contextual reasoning, (ii) assumptions and decisions under ambiguity, and (iii) contradictory or infeasible goals. We develop BLIND-ACT, a benchmark of 90 tasks capturing these three patterns. Built on OSWorld, BLIND-ACT provides realistic environments and employs LLM-based judges to evaluate agent behavior, achieving 93.75% agreement with human annotations. We use BLIND-ACT to evaluate nine frontier models, including Claude Sonnet and Opus 4, Computer-Use-Preview, and GPT-5, observing high average BGD rates (80.8%) across them. We show that BGD exposes subtle risks that arise even when inputs are not directly harmful. While prompting-based interventions lower BGD levels, substantial risk persists, highlighting the need for stronger training- or inference-time interventions. Qualitative analysis reveals observed failure modes: execution-first bias (focusing on how to act over whether to act), thought-action disconnect (execution diverging from reasoning), and request-primacy (justifying actions due to user request). Identifying BGD and introducing BLIND-ACT establishes a foundation for future research on studying and mitigating this fundamental risk and ensuring safe CUA deployment.

Product Management

👍 👎 ♥ Save

Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell

Czech Technical Universty

Abstract
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating impacts of end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modelling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) it incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

Product Strategy
Product Roadmap

You can edit or add more interests any time.

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback

Unsubscribe from these updates