Harbin Institute of Techn
Abstract
Large language models (LLMs) have demonstrated strong capabilities in
language understanding and reasoning, yet they remain limited when tackling
real-world tasks that require up-to-date knowledge, precise operations, or
specialized tool use. To address this, we propose Tool-R1, a reinforcement
learning framework that enables LLMs to perform general, compositional, and
multi-step tool use by generating executable Python code. Tool-R1 supports
integration of user-defined tools and standard libraries, with variable sharing
across steps to construct coherent workflows. An outcome-based reward function,
combining LLM-based answer judgment and code execution success, guides policy
optimization. To improve training efficiency, we maintain a dynamic sample
queue to cache and reuse high-quality trajectories, reducing the overhead of
costly online sampling. Experiments on the GAIA benchmark show that Tool-R1
substantially improves both accuracy and robustness, achieving about 10\% gain
over strong baselines, with larger improvements on complex multi-step tasks.
These results highlight the potential of Tool-R1 for enabling reliable and
efficient tool-augmented reasoning in real-world applications. Our code will be
available at https://github.com/YBYBZhang/Tool-R1.
Minerva CQ, California, U
Abstract
Despite advances in AI for contact centers, customer experience (CX)
continues to suffer from high average handling time (AHT), low first-call
resolution, and poor customer satisfaction (CSAT). A key driver is the
cognitive load on agents, who must navigate fragmented systems, troubleshoot
manually, and frequently place customers on hold. Existing AI-powered
agent-assist tools are often reactive driven by static rules, simple prompting,
or retrieval-augmented generation (RAG) without deeper contextual reasoning. We
introduce Agentic AI goal-driven, autonomous, tool-using systems that
proactively support agents in real time. Unlike conventional approaches,
Agentic AI identifies customer intent, triggers modular workflows, maintains
evolving context, and adapts dynamically to conversation state. This paper
presents a case study of Minerva CQ, a real-time Agent Assist product deployed
in voice-based customer support. Minerva CQ integrates real-time transcription,
intent and sentiment detection, entity recognition, contextual retrieval,
dynamic customer profiling, and partial conversational summaries enabling
proactive workflows and continuous context-building. Deployed in live
production, Minerva CQ acts as an AI co-pilot, delivering measurable
improvements in agent efficiency and customer experience across multiple
deployments.
AI Insights - Minerva CQ’s agentic AI auto‑triggers tools mid‑conversation, turning calls into dynamic workflows.
- Real‑time summarization keeps a rolling context, letting agents skip manual KB lookups.
- Intent, sentiment, and entities are first‑class, cutting cognitive load and AHT by up to 30 %.
- Selective retrieval pulls only the most relevant docs, preventing the overload that plagues RAG assistants.
- Deployments show a 15 % lift in first‑call resolution, proving proactive co‑piloting beats reactive rules.
- The study spotlights a shift to “context‑aware co‑piloting,” where AI shapes outcomes in real time.
- For deeper dives, read “Generative AI at scale: The productivity effects of real‑world agent assist” and EMNLP 2023 Industry Track.