đŻ Top Personalized Recommendations
New York University
Why we think this paper is great for you:
This paper directly addresses how Retrieval-Augmented LLMs can automate and improve the efficiency of complex tasks like contract management. You will find its focus on practical application for productivity highly relevant.
Abstract
Contract management involves reviewing and negotiating provisions, individual clauses that define rights, obligations, and terms of agreement. During this process, revisions to provisions are proposed and iteratively refined, some of which may be problematic or unacceptable. Automating this workflow is challenging due to the scarcity of labeled data and the abundance of unstructured legacy contracts. In this paper, we present a modular framework designed to streamline contract management through a retrieval-augmented generation (RAG) pipeline. Our system integrates synthetic data generation, semantic clause retrieval, acceptability classification, and reward-based alignment to flag problematic revisions and generate improved alternatives. Developed and evaluated in collaboration with an industry partner, our system achieves over 80% accuracy in both identifying and optimizing problematic revisions, demonstrating strong performance under real-world, low-resource conditions and offering a practical means of accelerating contract revision workflows.
AI Summary - General-purpose embedding models (e.g., Qwen3-Embedding-4B) outperform legal-domain specific models for semantic retrieval in contract clauses, likely due to their superior context length and broader semantic understanding. [3]
- Fine-tuning a cross-encoder with a graded similarity approach, which captures nuanced semantic relationships, is crucial for effective reranking of retrieved clauses and substantially improves retrieval accuracy. [3]
- Graded similarity-based fine-tuning (for rerankers): A strategy for training cross-encoders using soft labels that capture nuanced semantic relationships between contract clauses (e.g., paraphrases, acceptable revisions of the same provision, acceptable vs. [3]
- unacceptable pairs), improving retrieval precision. [3]
- The paper demonstrates that a modular Retrieval-Augmented Generation (RAG) pipeline can achieve over 80% accuracy in identifying and optimizing problematic contract revisions, even with limited labeled data. [2]
- Synthetic data generation using large language models (LLMs) is a highly effective strategy for overcoming data scarcity in domain-specific tasks like contract management, enabling robust model training and generalization. [2]
- Reward-based alignment, specifically using Proximal Policy Optimization (PPO) with a frozen acceptability classifier as a reward model, significantly boosts the success rate of generating acceptable contract revisions from 67.5% to 81.9%. [2]
- An ensemble of logistic regression classifiers, trained on clustered embeddings, reliably distinguishes acceptable from unacceptable revisions, outperforming zero-shot LLM classification and providing confidence scores suitable for human-in-the-loop workflows. [2]
- The proposed human-in-the-loop RAG framework offers a practical solution for industrial contract management, automating routine adjustments while preserving legal rigor through expert oversight of complex or uncertain cases. [2]
- Retrieval-Augmented Generation (RAG) pipeline for contract management: An integrated system combining synthetic data generation, semantic clause retrieval, acceptability classification, and a generative module to flag problematic revisions and propose acceptable alternatives. [2]
- Reward-based alignment (for LLM generation): A method where a frozen acceptability classifier's positive-class probability acts as a reward signal to fine-tune a generative LLM using reinforcement learning (specifically PPO), aligning its outputs with desired acceptability judgments. [2]
Georgia Tech
Why we think this paper is great for you:
This research explores the potential of AI to significantly boost technological and scientific progress, offering insights into how AI can drive economic productivity. It provides a valuable perspective on AI's broader impact on innovation.
Abstract
Artificial intelligence (AI) raises expectations of substantial increases in rates of technological and scientific progress, but such anticipations are often not connected to detailed ground-level studies of AI use in innovation processes. Accordingly, it remains unclear how and to what extent AI can accelerate innovation. To help to fill this gap, we report results from 32 interviews with U.S.-based academic manufacturing and materials sciences researchers experienced with AI and machine learning (ML) techniques. Interviewees primarily used AI for modeling of materials and manufacturing processes, facilitating cheaper and more rapid search of design spaces for materials and manufacturing processes alike. They report benefits including cost, time, and computation savings in technology development. However, interviewees also report that AI/ML tools are unreliable outside design spaces for which dense data are already available; that they require skilled and judicious application in tandem with older research techniques; and that AI/ML tools may detrimentally circumvent opportunities for disruptive theoretical advancement. Based on these results, we suggest there is reason for optimism about acceleration in sustaining innovations through the use of to AI/ML; but that support for conventional empirical, computational, and theoretical research is required to maintain the likelihood of further major advances in manufacturing and materials science.
Lule University
Why we think this paper is great for you:
This paper demonstrates how industrial AI solutions are transforming decision-making and operational efficiency within the construction sector. It offers a concrete example of AI as a powerful productivity tool in a specific industry.
Abstract
The construction industry is presently going through a transformation led by adopting digital technologies that leverage Artificial Intelligence (AI). These industrial AI solutions assist in various phases of the construction process, including planning, design, production and management. In particular, the production phase offers unique potential for the integration of such AI-based solutions. These AI-based solutions assist site managers, project engineers, coordinators and other key roles in making final decisions. To facilitate the decision-making process in the production phase of construction through a human-centric AI-based solution, it is important to understand the needs and challenges faced by the end users who interact with these AI-based solutions to enhance the effectiveness and usability of these systems. Without this understanding, the potential usage of these AI-based solutions may be limited. Hence, the purpose of this research study is to explore, identify and describe the key factors crucial for developing AI solutions in the construction industry. This study further identifies the correlation between these key factors. This was done by developing a demonstrator and collecting quantifiable feedback through a questionnaire targeting the end users, such as site managers and construction professionals. This research study will offer insights into developing and improving these industrial AI solutions, focusing on Human-System Interaction aspects to enhance decision support, usability, and overall AI solution adoption.
Meta
Why we think this paper is great for you:
You will appreciate this paper's focus on how AI research agents can accelerate scientific discovery by automating key processes. It delves into the factors that make these agents effective tools for boosting research productivity.
Abstract
AI research agents offer the promise to accelerate scientific progress by automating the design, implementation, and training of machine learning models. However, the field is still in its infancy, and the key factors driving the success or failure of agent trajectories are not fully understood. We examine the role that ideation diversity plays in agent performance. First, we analyse agent trajectories on MLE-bench, a well-known benchmark to evaluate AI research agents, across different models and agent scaffolds. Our analysis reveals that different models and agent scaffolds yield varying degrees of ideation diversity, and that higher-performing agents tend to have increased ideation diversity. Further, we run a controlled experiment where we modify the degree of ideation diversity, demonstrating that higher ideation diversity results in stronger performance. Finally, we strengthen our results by examining additional evaluation metrics beyond the standard medal-based scoring of MLE-bench, showing that our findings still hold across other agent performance metrics.
McGill University
Why we think this paper is great for you:
This paper examines the profound impact of Large Language Models on organizational structures and knowledge, which is crucial for understanding how LLMs reshape workplace productivity. It provides a conceptual framework for the transformative power of LLMs.
Abstract
Large Language Models (LLMs) are reshaping organizational knowing by unsettling the epistemological foundations of representational and practice-based perspectives. We conceptualize LLMs as Haraway-ian monsters, that is, hybrid, boundary-crossing entities that destabilize established categories while opening new possibilities for inquiry. Focusing on analogizing as a fundamental driver of knowledge, we examine how LLMs generate connections through large-scale statistical inference. Analyzing their operation across the dimensions of surface/deep analogies and near/far domains, we highlight both their capacity to expand organizational knowing and the epistemic risks they introduce. Building on this, we identify three challenges of living with such epistemic monsters: the transformation of inquiry, the growing need for dialogical vetting, and the redistribution of agency. By foregrounding the entangled dynamics of knowing-with-LLMs, the paper extends organizational theory beyond human-centered epistemologies and invites renewed attention to how knowledge is created, validated, and acted upon in the age of intelligent technologies.
Stanford University
Why we think this paper is great for you:
This paper explores the innovative application of AI agents as authors and reviewers in scientific research, directly showcasing AI's potential to enhance productivity in academic workflows. It offers a unique perspective on automation in the scientific domain.
Abstract
There is growing interest in using AI agents for scientific research, yet fundamental questions remain about their capabilities as scientists and reviewers. To explore these questions, we organized Agents4Science, the first conference in which AI agents serve as both primary authors and reviewers, with humans as co-authors and co-reviewers. Here, we discuss the key learnings from the conference and their implications for human-AI collaboration in science.
Princeton University
Why we think this paper is great for you:
This paper delves into the fundamental efficiency of scientific automation, providing a theoretical lens on how automated processes can optimize information gain. It offers insights into the underlying principles of productivity in scientific discovery.
Abstract
Scientific discovery can be framed as a thermodynamic process in which an agent invests physical work to acquire information about an environment under a finite work budget. Using established results about the thermodynamics of computing, we derive finite-budget bounds on information gain over rounds of sequential Bayesian learning. We also propose a metric of information-work efficiency, and compare unpartitioned and federated learning strategies under matched work budgets. The presented results offer guidance in the form of bounds and an information efficiency metric for efforts in scientific automation at large.
AI and Society
University of Copenhagen
Abstract
In this paper, we argue that current AI research operates on a spectrum between two different underlying conceptions of intelligence: Intelligence Realism, which holds that intelligence represents a single, universal capacity measurable across all systems, and Intelligence Pluralism, which views intelligence as diverse, context-dependent capacities that cannot be reduced to a single universal measure. Through an analysis of current debates in AI research, we demonstrate how the conceptions remain largely implicit yet fundamentally shape how empirical evidence gets interpreted across a wide range of areas. These underlying views generate fundamentally different research approaches across three areas. Methodologically, they produce different approaches to model selection, benchmark design, and experimental validation. Interpretively, they lead to contradictory readings of the same empirical phenomena, from capability emergence to system limitations. Regarding AI risk, they generate categorically different assessments: realists view superintelligence as the primary risk and search for unified alignment solutions, while pluralists see diverse threats across different domains requiring context-specific solutions. We argue that making explicit these underlying assumptions can contribute to a clearer understanding of disagreements in AI research.
Huawei
Abstract
As a capability coming from computation, how does AI differ fundamentally from the capabilities delivered by rule-based software program? The paper examines the behavior of artificial intelligence (AI) from engineering points of view to clarify its nature and limits. The paper argues that the rationality underlying humanity's impulse to pursue, articulate, and adhere to rules deserves to be valued and preserved. Identifying where rule-based practical rationality ends is the beginning of making it aware until action. Although the rules of AI behaviors are still hidden or only weakly observable, the paper has proposed a methodology to make a sense of discrimination possible and practical to identify the distinctions of the behavior of AI models with three types of decisions. It is a prerequisite for human responsibilities with alternative possibilities, considering how and when to use AI. It would be a solid start for people to ensure AI system soundness for the well-being of humans, society, and the environment.
AGI: Artificial General Intelligence
ulamai
Abstract
We propose a Kardashev-inspired yet operational Autonomous AI (AAI) Scale that measures the progression from fixed robotic process automation (AAI-0) to full artificial general intelligence (AAI-4) and beyond. Unlike narrative ladders, our scale is multi-axis and testable. We define ten capability axes (Autonomy, Generality, Planning, Memory/Persistence, Tool Economy, Self-Revision, Sociality/Coordination, Embodiment, World-Model Fidelity, Economic Throughput) aggregated by a composite AAI-Index (a weighted geometric mean). We introduce a measurable Self-Improvement Coefficient $Îș$ (capability growth per unit of agent-initiated resources) and two closure properties (maintenance and expansion) that convert ``self-improving AI'' into falsifiable criteria. We specify OWA-Bench, an open-world agency benchmark suite that evaluates long-horizon, tool-using, persistent agents. We define level gates for AAI-0\ldots AAI-4 using thresholds on the axes, $Îș$, and closure proofs. Synthetic experiments illustrate how present-day systems map onto the scale and how the delegability frontier (quality vs.\ autonomy) advances with self-improvement. We also prove a theorem that AAI-3 agent becomes AAI-5 over time with sufficient conditions, formalizing "baby AGI" becomes Superintelligence intuition.
The University of Edinburgh
Abstract
We introduce Terra Nova, a new comprehensive challenge environment (CCE) for reinforcement learning (RL) research inspired by Civilization V. A CCE is a single environment in which multiple canonical RL challenges (e.g., partial observability, credit assignment, representation learning, enormous action spaces, etc.) arise simultaneously. Mastery therefore demands integrated, long-horizon understanding across many interacting variables. We emphasize that this definition excludes challenges that only aggregate unrelated tasks in independent, parallel streams (e.g., learning to play all Atari games at once). These aggregated multitask benchmarks primarily asses whether an agent can catalog and switch among unrelated policies rather than test an agent's ability to perform deep reasoning across many interacting challenges.
We did not find tons of content matching your interests we've included some additional topics that are popular.
Also be aware that if the topics is not present in arxiv we wont be able to recommend it.
AI Agents
Stanford University
Abstract
There is growing interest in using AI agents for scientific research, yet fundamental questions remain about their capabilities as scientists and reviewers. To explore these questions, we organized Agents4Science, the first conference in which AI agents serve as both primary authors and reviewers, with humans as co-authors and co-reviewers. Here, we discuss the key learnings from the conference and their implications for human-AI collaboration in science.
Meta
Abstract
AI research agents offer the promise to accelerate scientific progress by automating the design, implementation, and training of machine learning models. However, the field is still in its infancy, and the key factors driving the success or failure of agent trajectories are not fully understood. We examine the role that ideation diversity plays in agent performance. First, we analyse agent trajectories on MLE-bench, a well-known benchmark to evaluate AI research agents, across different models and agent scaffolds. Our analysis reveals that different models and agent scaffolds yield varying degrees of ideation diversity, and that higher-performing agents tend to have increased ideation diversity. Further, we run a controlled experiment where we modify the degree of ideation diversity, demonstrating that higher ideation diversity results in stronger performance. Finally, we strengthen our results by examining additional evaluation metrics beyond the standard medal-based scoring of MLE-bench, showing that our findings still hold across other agent performance metrics.
AI and Society
University of Copenhagen
Abstract
In this paper, we argue that current AI research operates on a spectrum between two different underlying conceptions of intelligence: Intelligence Realism, which holds that intelligence represents a single, universal capacity measurable across all systems, and Intelligence Pluralism, which views intelligence as diverse, context-dependent capacities that cannot be reduced to a single universal measure. Through an analysis of current debates in AI research, we demonstrate how the conceptions remain largely implicit yet fundamentally shape how empirical evidence gets interpreted across a wide range of areas. These underlying views generate fundamentally different research approaches across three areas. Methodologically, they produce different approaches to model selection, benchmark design, and experimental validation. Interpretively, they lead to contradictory readings of the same empirical phenomena, from capability emergence to system limitations. Regarding AI risk, they generate categorically different assessments: realists view superintelligence as the primary risk and search for unified alignment solutions, while pluralists see diverse threats across different domains requiring context-specific solutions. We argue that making explicit these underlying assumptions can contribute to a clearer understanding of disagreements in AI research.
Huawei
Abstract
As a capability coming from computation, how does AI differ fundamentally from the capabilities delivered by rule-based software program? The paper examines the behavior of artificial intelligence (AI) from engineering points of view to clarify its nature and limits. The paper argues that the rationality underlying humanity's impulse to pursue, articulate, and adhere to rules deserves to be valued and preserved. Identifying where rule-based practical rationality ends is the beginning of making it aware until action. Although the rules of AI behaviors are still hidden or only weakly observable, the paper has proposed a methodology to make a sense of discrimination possible and practical to identify the distinctions of the behavior of AI models with three types of decisions. It is a prerequisite for human responsibilities with alternative possibilities, considering how and when to use AI. It would be a solid start for people to ensure AI system soundness for the well-being of humans, society, and the environment.
AGI: Artificial General Intelligence
ulamai
Abstract
We propose a Kardashev-inspired yet operational Autonomous AI (AAI) Scale that measures the progression from fixed robotic process automation (AAI-0) to full artificial general intelligence (AAI-4) and beyond. Unlike narrative ladders, our scale is multi-axis and testable. We define ten capability axes (Autonomy, Generality, Planning, Memory/Persistence, Tool Economy, Self-Revision, Sociality/Coordination, Embodiment, World-Model Fidelity, Economic Throughput) aggregated by a composite AAI-Index (a weighted geometric mean). We introduce a measurable Self-Improvement Coefficient $Îș$ (capability growth per unit of agent-initiated resources) and two closure properties (maintenance and expansion) that convert ``self-improving AI'' into falsifiable criteria. We specify OWA-Bench, an open-world agency benchmark suite that evaluates long-horizon, tool-using, persistent agents. We define level gates for AAI-0\ldots AAI-4 using thresholds on the axes, $Îș$, and closure proofs. Synthetic experiments illustrate how present-day systems map onto the scale and how the delegability frontier (quality vs.\ autonomy) advances with self-improvement. We also prove a theorem that AAI-3 agent becomes AAI-5 over time with sufficient conditions, formalizing "baby AGI" becomes Superintelligence intuition.
The University of Edinburgh
Abstract
We introduce Terra Nova, a new comprehensive challenge environment (CCE) for reinforcement learning (RL) research inspired by Civilization V. A CCE is a single environment in which multiple canonical RL challenges (e.g., partial observability, credit assignment, representation learning, enormous action spaces, etc.) arise simultaneously. Mastery therefore demands integrated, long-horizon understanding across many interacting variables. We emphasize that this definition excludes challenges that only aggregate unrelated tasks in independent, parallel streams (e.g., learning to play all Atari games at once). These aggregated multitask benchmarks primarily asses whether an agent can catalog and switch among unrelated policies rather than test an agent's ability to perform deep reasoning across many interacting challenges.