🎯 Top Personalized Recommendations
Meta
AI Summary - SWE-Bench: a comprehensive benchmark to evaluate autonomous code-writing and code-fixing agents on realistic tasks. [3]
- The combination of monorepo development and LLM-based tools like ECO underscores a trend toward holistic scale: treating an entire organization’s code as a single evolvable system, with AI agents providing the intelligence to manage global changes, dependency analysis, and performance tuning in ways humans alone could not easily scale. [2]
- Large-scale software engineering has driven interest in AI assistance for code discovery, understanding, and consistent changes at scale. [1]
Abstract
Real-world AI software engineering demands coding agents that can reason over massive repositories, maintain durable memory across and within long sessions, and robustly coordinate complex toolchains at test time. Existing open-source coding agents provide transparency but frequently fall short when pushed to these industrial-scale workloads, while proprietary coding agents offer strong practical performance but limited extensibility, interpretability, and controllability. We present the Confucius Code Agent (CCA), an open-sourced AI software engineer that can operate at an industrial scale. CCA is built atop the Confucius SDK, an open-sourced agent development platform designed around three complementary perspectives: Agent Experience (AX), User Experience (UX), and Developer Experience (DX). The SDK introduces a unified orchestrator with hierarchical working memory for long-context reasoning, a persistent note-taking system for cross-session continual learning, and a modular extension module for robust tool use. Moreover, a meta-agent automates the synthesis, evaluation, and refinement of agent configurations through a build-test-improve loop, enabling rapid agent development on new tasks, environments, and tool stacks. Instantiated on Confucius SDK with these mechanisms, CCA delivers strong performance on real-world software engineering tasks. On SWE-Bench-Pro, CCA achieves a state-of-the-art Resolve@1 performance of 54.3%, substantially improving over prior coding agents. Together, the Confucius SDK and CCA provide a transparent, extensible, and reproducible foundation for AI agents, bridge gaps between research prototypes and production-grade systems, and support agent development and deployment at industrial scale.
Why we think this paper is great for you:
This paper directly addresses the development of AI agents, a core interest given the potential for job displacement and the need for robust, scalable AI systems in software engineering.
University of Arizona
AI Summary - The study's results are based on a single dataset and may not be generalizable to other companies or industries. [3]
- The optimal frequency of remote work for both turnover risk and promotion likelihood is around two days per week. [2]
Abstract
Hybrid work policy, especially return-to-office requirements, remains a globally salient topic as workers, companies, and governments continue to debate and disagree. Despite extensive discussions on the benefits and drawbacks of remote and hybrid arrangements, the optimal number of remote days that jointly considers multiple organizational outcomes has not been empirically established. Focusing on two critical career outcomes -- turnover risk and promotion -- we examine how remote work frequency shapes employee trajectories using large-scale observational activity data from a company with over one million employees. We find that increased remote-work frequency is associated with an initial decrease and then an increase in turnover, while promotion likelihood initially rises and then declines. Accordingly, we identify approximately two remote days per week as an optimal balance -- maximizing promotion, a positive outcome for employees, while minimizing turnover, which is undesirable for organizations and may indicate negative employee experiences. These patterns vary across subgroups defined by gender, role type, and leadership status. Several notable results emerge. First, male employees derive greater promotion benefits from remote work than female employees. Second, support workers (non-core business roles) do not experience promotion gains, and the reduction in turnover at their optimal remote-work frequency is marginal compared with employees in core business roles. Third, organizational leaders face greater challenges in remote settings than individual contributors: their turnover risk increases substantially at higher remote frequencies, and their likelihood of promotion decreases as remote frequency rises. We further show that time-allocation patterns partly explain how remote-work frequency influences these career outcomes.
Why we think this paper is great for you:
The research on hybrid work frequency aligns with concerns about changes in the labor market and the impact of evolving work arrangements on employment.
Halmstad University
AI Summary - Multi-agent systems exchange a single 'do-everything' agent for a team of specialised agents that co-operate (or compete) under explicit protocols. [3]
- Planning- and self-improvement agents: A class of AI systems that use search and optimization techniques to solve complex problems. [3]
- Embodied and web agents: AI systems that act in the world, either physically (embodied) or through interactions with untrusted websites and enterprise systems (web). [3]
- Planning- and self-improvement agents can be prone to state explosion, speculative arithmetic errors, and over-confident selection. [3]
- Planning- and self-improvement agents deliver substantial reliability dividends when their power is channelled through explicit controllers, trustworthy verifiers, and disciplined governance of cost and side-effects. [2]
Abstract
This chapter argues that the reliability of agentic and generative AI is chiefly an architectural property. We define agentic systems as goal-directed, tool-using decision makers operating in closed loops, and show how reliability emerges from principled componentisation (goal manager, planner, tool-router, executor, memory, verifiers, safety monitor, telemetry), disciplined interfaces (schema-constrained, validated, least-privilege tool calls), and explicit control and assurance loops. Building on classical foundations, we propose a practical taxonomy-tool-using agents, memory-augmented agents, planning and self-improvement agents, multi-agent systems, and embodied or web agents - and analyse how each pattern reshapes the reliability envelope and failure modes. We distil design guidance on typed schemas, idempotency, permissioning, transactional semantics, memory provenance and hygiene, runtime governance (budgets, termination conditions), and simulate-before-actuate safeguards.
Why we think this paper is great for you:
This paper focuses on the architectural foundations of agentic AI, a crucial area for understanding and developing AI systems that can operate autonomously and potentially impact the future of work.
Peking University
AI Summary - Previous research has shown that human-AI collaboration can improve performance in various tasks, including theorem discovery and proof verification. [3]
- The collaboration between human experts and an LLM is organized into three stages, starting from an informal conjecture and ending with a precise theorem and proof. [2]
- Human-AI collaboration can significantly improve mathematical proof and theorem discovery. [1]
Abstract
We investigate how large language models can be used as research tools in scientific computing while preserving mathematical rigor. We propose a human-in-the-loop workflow for interactive theorem proving and discovery with LLMs. Human experts retain control over problem formulation and admissible assumptions, while the model searches for proofs or contradictions, proposes candidate properties and theorems, and helps construct structures and parameters that satisfy explicit constraints, supported by numerical experiments and simple verification checks. Experts treat these outputs as raw material, further refine them, and organize the results into precise statements and rigorous proofs. We instantiate this workflow in a case study on the connection between manifold optimization and Grover's quantum search algorithm, where the pipeline helps identify invariant subspaces, explore Grover-compatible retractions, and obtain convergence guarantees for the retraction-based gradient method. The framework provides a practical template for integrating large language models into frontier mathematical research, enabling faster exploration of proof space and algorithm design while maintaining transparent reasoning responsibilities. Although illustrated on manifold optimization problems in quantum computing, the principles extend to other core areas of scientific computing.
Why we think this paper is great for you:
The research on human-in-the-loop theorem proving represents a valuable intersection of AI and mathematical research, aligning with interests in AGI development and advanced computational tools.
Perplexity
AI Summary - The agent is used primarily for productivity-related tasks (36% of all queries), followed by learning, media, and shopping. [3]
- Research, document editing, and shopping-related tasks appear consistently across occupation clusters. [3]
- Knowledge-intensive sectors like digital technology, entrepreneurship, finance, and academia tend to use the agent for research and learning-related tasks. [3]
- Productivity and learning topics are the most sticky, while travel is the least sticky. [2]
- Users' first queries often fall into productivity, learning, or media topics, but over time, there's a shift towards more cognitively oriented use cases. [1]
Abstract
This paper presents the first large-scale field study of the adoption, usage intensity, and use cases of general-purpose AI agents operating in open-world web environments. Our analysis centers on Comet, an AI-powered browser developed by Perplexity, and its integrated agent, Comet Assistant. Drawing on hundreds of millions of anonymized user interactions, we address three fundamental questions: Who is using AI agents? How intensively are they using them? And what are they using them for? Our findings reveal substantial heterogeneity in adoption and usage across user segments. Earlier adopters, users in countries with higher GDP per capita and educational attainment, and individuals working in digital or knowledge-intensive sectors -- such as digital technology, academia, finance, marketing, and entrepreneurship -- are more likely to adopt or actively use the agent. To systematically characterize the substance of agent usage, we introduce a hierarchical agentic taxonomy that organizes use cases across three levels: topic, subtopic, and task. The two largest topics, Productivity & Workflow and Learning & Research, account for 57% of all agentic queries, while the two largest subtopics, Courses and Shopping for Goods, make up 22%. The top 10 out of 90 tasks represent 55% of queries. Personal use constitutes 55% of queries, while professional and educational contexts comprise 30% and 16%, respectively. In the short term, use cases exhibit strong stickiness, but over time users tend to shift toward more cognitively oriented topics. The diffusion of increasingly capable AI agents carries important implications for researchers, businesses, policymakers, and educators, inviting new lines of inquiry into this rapidly emerging class of AI capabilities.
Why we think this paper is great for you:
Analyzing the adoption of AI agents, as demonstrated by Perplexity's Comet Assistant, provides insights into the practical applications and potential scale of agentic AI systems.
Northeastern University
AI Summary - RAMTN系统是一种基于元交互的人机协作认知增强范式,旨在通过提取专家决策框架来实现智能辅助和知识共享。 该系统的核心思想是将人类专家的认知过程与计算机系统的信息处理能力结合起来,从而实现高效的决策支持和知识推理。 RAMTN系统的应用领域包括投资、医疗和教育等多个领域,旨在通过提取专家决策框架来提高决策准确性和效率。 元交互(Meta-Interaction):一种将人类认知过程与计算机系统信息处理能力结合起来的技术,旨在实现高效的决策支持和知识推理。 人机协作认知增强范式(Human-Machine Collaborative Cognition Enhancement Paradigm):一种基于元交互的框架,旨在通过提取专家决策框架来实现智能辅助和知识共享。 RAMTN系统是一种创新性的解决方案,旨在通过提取专家决策框架来提高决策准确性和效率。 该系统的应用领域包括投资、医疗和教育等多个领域,具有广泛的潜力和前景。 该系统的开发和应用依赖于大量的数据和信息资源,可能存在数据质量和可靠性的问题。 该系统的安全性和隐私保护需要进一步研究和解决。 元交互技术在决策支持和知识推理领域有广泛的应用和研究。 [3]
Abstract
Currently, there exists a fundamental divide between the "cognitive black box" (implicit intuition) of human experts and the "computational black box" (untrustworthy decision-making) of artificial intelligence (AI). This paper proposes a new paradigm of "human-AI collaborative cognitive enhancement," aiming to transform the dual black boxes into a composable, auditable, and extensible "functional white-box" system through structured "meta-interaction." The core breakthrough lies in the "plug-and-play cognitive framework"--a computable knowledge package that can be extracted from expert dialogues and loaded into the Recursive Adversarial Meta-Thinking Network (RAMTN). This enables expert thinking, such as medical diagnostic logic and teaching intuition, to be converted into reusable and scalable public assets, realizing a paradigm shift from "AI as a tool" to "AI as a thinking partner." This work not only provides the first engineering proof for "cognitive equity" but also opens up a new path for AI governance: constructing a verifiable and intervenable governance paradigm through "transparency of interaction protocols" rather than prying into the internal mechanisms of models. The framework is open-sourced to promote technology for good and cognitive inclusion. This paper is an independent exploratory research conducted by the author. All content presented, including the theoretical framework (RAMTN), methodology (meta-interaction), system implementation, and case validation, constitutes the author's individual research achievements.
Why we think this paper is great for you:
The exploration of human-AI collaborative cognitive enhancement is relevant to understanding how humans and AI can work together, a key consideration for managing the integration of AI into various domains.
German Cancer Research
Abstract
Developing generalizable AI for medical imaging requires both access to large, multi-center datasets and standardized, reproducible tooling within research environments. However, leveraging real-world imaging data in clinical research environments is still hampered by strict regulatory constraints, fragmented software infrastructure, and the challenges inherent in conducting large-cohort multicentre studies. This leads to projects that rely on ad-hoc toolchains that are hard to reproduce, difficult to scale beyond single institutions and poorly suited for collaboration between clinicians and data scientists. We present Kaapana, a comprehensive open-source platform for medical imaging research that is designed to bridge this gap. Rather than building single-use, site-specific tooling, Kaapana provides a modular, extensible framework that unifies data ingestion, cohort curation, processing workflows and result inspection under a common user interface. By bringing the algorithm to the data, it enables institutions to keep control over their sensitive data while still participating in distributed experimentation and model development. By integrating flexible workflow orchestration with user-facing applications for researchers, Kaapana reduces technical overhead, improves reproducibility and enables conducting large-scale, collaborative, multi-centre imaging studies. We describe the core concepts of the platform and illustrate how they can support diverse use cases, from local prototyping to nation-wide research networks. The open-source codebase is available at https://github.com/kaapana/kaapana
Why we think this paper is great for you:
Research into AI for medical imaging aligns with broader interests in AGI applications and the potential for AI to transform healthcare and related industries.
We did not find tons of content matching your interests we've included some additional topics that are popular.
Also be aware that if the topics is not present in arxiv we wont be able to recommend it.
AI Agents
Halmstad University
Abstract
This chapter argues that the reliability of agentic and generative AI is chiefly an architectural property. We define agentic systems as goal-directed, tool-using decision makers operating in closed loops, and show how reliability emerges from principled componentisation (goal manager, planner, tool-router, executor, memory, verifiers, safety monitor, telemetry), disciplined interfaces (schema-constrained, validated, least-privilege tool calls), and explicit control and assurance loops. Building on classical foundations, we propose a practical taxonomy-tool-using agents, memory-augmented agents, planning and self-improvement agents, multi-agent systems, and embodied or web agents - and analyse how each pattern reshapes the reliability envelope and failure modes. We distil design guidance on typed schemas, idempotency, permissioning, transactional semantics, memory provenance and hygiene, runtime governance (budgets, termination conditions), and simulate-before-actuate safeguards.
AI Summary - Multi-agent systems exchange a single 'do-everything' agent for a team of specialised agents that co-operate (or compete) under explicit protocols. [3]
- Planning- and self-improvement agents: A class of AI systems that use search and optimization techniques to solve complex problems. [3]
- Embodied and web agents: AI systems that act in the world, either physically (embodied) or through interactions with untrusted websites and enterprise systems (web). [3]
- Planning- and self-improvement agents can be prone to state explosion, speculative arithmetic errors, and over-confident selection. [3]
- Planning- and self-improvement agents deliver substantial reliability dividends when their power is channelled through explicit controllers, trustworthy verifiers, and disciplined governance of cost and side-effects. [2]
Perplexity
Abstract
This paper presents the first large-scale field study of the adoption, usage intensity, and use cases of general-purpose AI agents operating in open-world web environments. Our analysis centers on Comet, an AI-powered browser developed by Perplexity, and its integrated agent, Comet Assistant. Drawing on hundreds of millions of anonymized user interactions, we address three fundamental questions: Who is using AI agents? How intensively are they using them? And what are they using them for? Our findings reveal substantial heterogeneity in adoption and usage across user segments. Earlier adopters, users in countries with higher GDP per capita and educational attainment, and individuals working in digital or knowledge-intensive sectors -- such as digital technology, academia, finance, marketing, and entrepreneurship -- are more likely to adopt or actively use the agent. To systematically characterize the substance of agent usage, we introduce a hierarchical agentic taxonomy that organizes use cases across three levels: topic, subtopic, and task. The two largest topics, Productivity & Workflow and Learning & Research, account for 57% of all agentic queries, while the two largest subtopics, Courses and Shopping for Goods, make up 22%. The top 10 out of 90 tasks represent 55% of queries. Personal use constitutes 55% of queries, while professional and educational contexts comprise 30% and 16%, respectively. In the short term, use cases exhibit strong stickiness, but over time users tend to shift toward more cognitively oriented topics. The diffusion of increasingly capable AI agents carries important implications for researchers, businesses, policymakers, and educators, inviting new lines of inquiry into this rapidly emerging class of AI capabilities.
AI Summary - The agent is used primarily for productivity-related tasks (36% of all queries), followed by learning, media, and shopping. [3]
- Research, document editing, and shopping-related tasks appear consistently across occupation clusters. [3]
- Knowledge-intensive sectors like digital technology, entrepreneurship, finance, and academia tend to use the agent for research and learning-related tasks. [3]
- Productivity and learning topics are the most sticky, while travel is the least sticky. [2]
- Users' first queries often fall into productivity, learning, or media topics, but over time, there's a shift towards more cognitively oriented use cases. [1]
Research Automation with AI
Peking University
Abstract
We investigate how large language models can be used as research tools in scientific computing while preserving mathematical rigor. We propose a human-in-the-loop workflow for interactive theorem proving and discovery with LLMs. Human experts retain control over problem formulation and admissible assumptions, while the model searches for proofs or contradictions, proposes candidate properties and theorems, and helps construct structures and parameters that satisfy explicit constraints, supported by numerical experiments and simple verification checks. Experts treat these outputs as raw material, further refine them, and organize the results into precise statements and rigorous proofs. We instantiate this workflow in a case study on the connection between manifold optimization and Grover's quantum search algorithm, where the pipeline helps identify invariant subspaces, explore Grover-compatible retractions, and obtain convergence guarantees for the retraction-based gradient method. The framework provides a practical template for integrating large language models into frontier mathematical research, enabling faster exploration of proof space and algorithm design while maintaining transparent reasoning responsibilities. Although illustrated on manifold optimization problems in quantum computing, the principles extend to other core areas of scientific computing.
AI Summary - Previous research has shown that human-AI collaboration can improve performance in various tasks, including theorem discovery and proof verification. [3]
- The collaboration between human experts and an LLM is organized into three stages, starting from an informal conjecture and ending with a precise theorem and proof. [2]
- Human-AI collaboration can significantly improve mathematical proof and theorem discovery. [1]
German Cancer Research
Abstract
Developing generalizable AI for medical imaging requires both access to large, multi-center datasets and standardized, reproducible tooling within research environments. However, leveraging real-world imaging data in clinical research environments is still hampered by strict regulatory constraints, fragmented software infrastructure, and the challenges inherent in conducting large-cohort multicentre studies. This leads to projects that rely on ad-hoc toolchains that are hard to reproduce, difficult to scale beyond single institutions and poorly suited for collaboration between clinicians and data scientists. We present Kaapana, a comprehensive open-source platform for medical imaging research that is designed to bridge this gap. Rather than building single-use, site-specific tooling, Kaapana provides a modular, extensible framework that unifies data ingestion, cohort curation, processing workflows and result inspection under a common user interface. By bringing the algorithm to the data, it enables institutions to keep control over their sensitive data while still participating in distributed experimentation and model development. By integrating flexible workflow orchestration with user-facing applications for researchers, Kaapana reduces technical overhead, improves reproducibility and enables conducting large-scale, collaborative, multi-centre imaging studies. We describe the core concepts of the platform and illustrate how they can support diverse use cases, from local prototyping to nation-wide research networks. The open-source codebase is available at https://github.com/kaapana/kaapana
AGI: Artificial General Intelligence
Meta
Abstract
Real-world AI software engineering demands coding agents that can reason over massive repositories, maintain durable memory across and within long sessions, and robustly coordinate complex toolchains at test time. Existing open-source coding agents provide transparency but frequently fall short when pushed to these industrial-scale workloads, while proprietary coding agents offer strong practical performance but limited extensibility, interpretability, and controllability. We present the Confucius Code Agent (CCA), an open-sourced AI software engineer that can operate at an industrial scale. CCA is built atop the Confucius SDK, an open-sourced agent development platform designed around three complementary perspectives: Agent Experience (AX), User Experience (UX), and Developer Experience (DX). The SDK introduces a unified orchestrator with hierarchical working memory for long-context reasoning, a persistent note-taking system for cross-session continual learning, and a modular extension module for robust tool use. Moreover, a meta-agent automates the synthesis, evaluation, and refinement of agent configurations through a build-test-improve loop, enabling rapid agent development on new tasks, environments, and tool stacks. Instantiated on Confucius SDK with these mechanisms, CCA delivers strong performance on real-world software engineering tasks. On SWE-Bench-Pro, CCA achieves a state-of-the-art Resolve@1 performance of 54.3%, substantially improving over prior coding agents. Together, the Confucius SDK and CCA provide a transparent, extensible, and reproducible foundation for AI agents, bridge gaps between research prototypes and production-grade systems, and support agent development and deployment at industrial scale.
AI Summary - SWE-Bench: a comprehensive benchmark to evaluate autonomous code-writing and code-fixing agents on realistic tasks. [3]
- The combination of monorepo development and LLM-based tools like ECO underscores a trend toward holistic scale: treating an entire organization’s code as a single evolvable system, with AI agents providing the intelligence to manage global changes, dependency analysis, and performance tuning in ways humans alone could not easily scale. [2]
- Large-scale software engineering has driven interest in AI assistance for code discovery, understanding, and consistent changes at scale. [1]
Abstract
Foundation models (FMs) are increasingly assuming the role of the "brain" of AI agents. While recent efforts have begun to equip FMs with native single-agent abilities -- such as GUI interaction or integrated tool use -- we argue that the next frontier is endowing FMs with native multi-agent intelligence. We identify four core capabilities of FMs in multi-agent contexts: understanding, planning, efficient communication, and adaptation. Contrary to assumptions about the spontaneous emergence of such abilities, we provide extensive empirical evidence across 41 large language models showing that strong single-agent performance alone does not automatically yield robust multi-agent intelligence. To address this gap, we outline key research directions -- spanning dataset construction, evaluation, training paradigms, and safety considerations -- for building FMs with native multi-agent intelligence.