Hi!

Your personalized paper recommendations for 12 to 16 January, 2026.
National University of Singapore
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The document describes a system called DR-Arena, which is designed to evaluate the performance of search agents. [3]
  • The system generates complex research tasks and evaluates the responses from two search agents based on their accuracy, comprehensiveness, formatting, and helpfulness. [3]
  • DR-Arena is a system that tests search agents' abilities by giving them complex research tasks. [3]
  • The system evaluates the answers from two search agents based on how accurate, comprehensive, and helpful they are. [3]
  • DR-Arena The document does not provide information about how the system handles cases where both search agents fail to find the correct entity. [2]
Abstract
As Large Language Models (LLMs) increasingly operate as Deep Research (DR) Agents capable of autonomous investigation and information synthesis, reliable evaluation of their task performance has become a critical bottleneck. Current benchmarks predominantly rely on static datasets, which suffer from several limitations: limited task generality, temporal misalignment, and data contamination. To address these, we introduce DR-Arena, a fully automated evaluation framework that pushes DR agents to their capability limits through dynamic investigation. DR-Arena constructs real-time Information Trees from fresh web trends to ensure the evaluation rubric is synchronized with the live world state, and employs an automated Examiner to generate structured tasks testing two orthogonal capabilities: Deep reasoning and Wide coverage. DR-Arena further adopts Adaptive Evolvement Loop, a state-machine controller that dynamically escalates task complexity based on real-time performance, demanding deeper deduction or wider aggregation until a decisive capability boundary emerges. Experiments with six advanced DR agents demonstrate that DR-Arena achieves a Spearman correlation of 0.94 with the LMSYS Search Arena leaderboard. This represents the state-of-the-art alignment with human preferences without any manual efforts, validating DR-Arena as a reliable alternative for costly human adjudication.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations

This paper directly addresses the need for robust evaluation of human-in-the-loop systems, a key interest for this user. The framework proposed offers a solution to the bottleneck of current benchmark evaluations, aligning with the user's focus on best practices.
MIT
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The field of Agentic AI is rapidly evolving, with researchers exploring various architectures, protocols, and design challenges. [3]
  • Agentic AI has the potential to revolutionize industries such as healthcare, finance, and education by automating complex tasks and decision-making processes. [3]
  • Contract Net Protocol: A high-level communication and control protocol for distributed problem solvers. [3]
  • Agentic AI refers to a new generation of artificial intelligence that can reason, plan, and make decisions on its own. [2]
  • Formal methods are being used to develop a rigorous understanding of Agentic AI systems and their behavior. [1]
Abstract
The Contract Net Protocol (1980) introduced coordination through contracts in multi-agent systems. Modern agent protocols standardize connectivity and interoperability; yet, none provide formal, resource governance-normative mechanisms to bound how much agents may consume or how long they may operate. We introduce Agent Contracts, a formal framework that extends the contract metaphor from task allocation to resource-bounded execution. An Agent Contract unifies input/output specifications, multi-dimensional resource constraints, temporal boundaries, and success criteria into a coherent governance mechanism with explicit lifecycle semantics. For multi-agent coordination, we establish conservation laws ensuring delegated budgets respect parent constraints, enabling hierarchical coordination through contract delegation. Empirical validation across four experiments demonstrates 90% token reduction with 525x lower variance in iterative workflows, zero conservation violations in multi-agent delegation, and measurable quality-resource tradeoffs through contract modes. Agent Contracts provide formal foundations for predictable, auditable, and resource-bounded autonomous AI deployment.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations

Coming from MIT, this paper explores resource governance, a critical component of building reliable human-in-the-loop platforms. The formal framework provides a valuable approach to managing agent behavior, directly supporting best practices.
University of Science and Technology of China
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on rigid, predefined workflows that struggle with complex, conditional queries. To address this limitation, we propose PaperScout, an autonomous agent that reformulates paper search as a sequential decision-making process. Unlike static workflows, PaperScout dynamically decides whether, when, and how to invoke search and expand tools based on accumulated retrieval context. However, training such agents presents a fundamental challenge: standard reinforcement learning methods, typically designed for single-turn tasks, suffer from a granularity mismatch when applied to multi-turn agentic tasks, where token-level optimization diverges from the granularity of sequence-level interactions, leading to noisy credit assignment. We introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method that aligns optimization with agent-environment interaction. Comprehensive experiments on both synthetic and real-world benchmarks demonstrate that PaperScout significantly outperforms strong workflow-driven and RL baselines in both recall and relevance, validating the effectiveness of our adaptive agentic framework and optimization strategy.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations

This work tackles the core task of finding relevant research papers, aligning with the user's interest in efficient information retrieval. The autonomous agent approach is particularly relevant for streamlining human-in-the-loop workflows.
The Alan Turing Institute
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
AI Insights
  • The authors contend that entertainment is a significant use case for AI, with people already using AI for activities unrelated to productivity. [3]
  • The paper suggests that this vision should inspire more debates, discourse, and study in the field of AI, as generative AI is increasingly being used for entertainment. [3]
  • AS: Artificially generated content GenAI: Generative AI Sociotechnical systems: Complex systems that combine social and technical components The paper concludes by emphasizing the need for a constructive vision of cultural AI, rather than just harm minimization. [3]
  • The paper argues that mainstream approaches to evaluating AI systems tend to focus on intelligence and harm minimization, but neglect the cultural dimension of AI use. [2]
  • They propose developing a positive theory of what beneficial, nutritious entertainment might look like, rather than just mitigating harms. [0]
Abstract
Generative AI systems are predominantly designed, evaluated, and marketed as intelligent systems which will benefit society by augmenting or automating human cognitive labor, promising to increase personal, corporate, and macroeconomic productivity. But this mainstream narrative about what AI is and what it can do is in tension with another emerging use case: entertainment. We argue that the field of AI is unprepared to measure or respond to how the proliferation of entertaining AI-generated content will impact society. Emerging data suggest AI is already widely adopted for entertainment purposes -- especially by young people -- and represents a large potential source of revenue. We contend that entertainment will become a primary business model for major AI corporations seeking returns on massive infrastructure investments; this will exert a powerful influence on the technology these companies produce in the coming years. Examining current evaluation practices, we identify a critical asymmetry: while AI assessments rigorously measure both benefits and harms of intelligence, they focus almost exclusively on cultural harms. We lack frameworks for articulating how cultural outputs might be actively beneficial. Drawing on insights from the humanities, we propose "thick entertainment" as a framework for evaluating AI-generated cultural content -- one that considers entertainment's role in meaning-making, identity formation, and social connection rather than simply minimizing harm. While AI is often touted for its potential to revolutionize productivity, in the long run we may find that AI turns out to be as much about "intelligence" as social media is about social connection.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations

This paper's exploration of AI's role in augmenting human cognitive labor is highly pertinent. Understanding how AI can be used as a tool within a human-in-the-loop system is a key area of interest.
NVIDIA
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The paper discusses various methods for attributing model behavior in differentiable games. [3]
  • It highlights the importance of understanding how models learn concepts through concept-level attribution. [3]
  • They emphasize the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
  • Differentiable games: A type of game where the model's behavior can be attributed to specific inputs or actions. [3]
  • Concept-level attribution: A method of understanding how models learn concepts through attributing their behavior to specific inputs or actions. [3]
  • The paper highlights the importance of understanding model behavior and its applications in differentiable games. [3]
  • It emphasizes the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
  • The authors also discuss scalable nested optimization as a key technique for efficient training of deep learning models. [3]
  • The authors also discuss scalable nested optimization for deep learning and its applications. [2]
Abstract
Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which fine-tuning clips improve or degrade temporal dynamics. Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation. On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With Motive-selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations

Focusing on the critical aspect of motion in video generation, this NVIDIA paper directly addresses a core element of interactive humanoid video. Understanding how motion is influenced is essential for building effective human-in-the-loop systems.
ByteDance
Rate paper: 👍 👎 ♥ Save
AI Insights
  • Autoregressive diffusion models: A type of generative model that uses a Markov chain to generate data sequentially. [3]
  • Long video generation: The task of generating videos that are longer than 10 seconds, which is more challenging than short video generation. [3]
  • The authors also introduce a new dataset for evaluating video generation models, which is more comprehensive than existing datasets. [2]
  • The proposed model uses a combination of techniques such as autoregressive diffusion models and consistency models to achieve state-of-the-art results. [1]
  • The paper discusses the development of a new model for generating long videos with high quality and realism. [0]
Abstract
Interactive humanoid video generation aims to synthesize lifelike visual agents that can engage with humans through continuous and responsive video. Despite recent advances in video synthesis, existing methods often grapple with the trade-off between high-fidelity synthesis and real-time interaction requirements. In this paper, we propose FlowAct-R1, a framework specifically designed for real-time interactive humanoid video generation. Built upon a MMDiT architecture, FlowAct-R1 enables the streaming synthesis of video with arbitrary durations while maintaining low-latency responsiveness. We introduce a chunkwise diffusion forcing strategy, complemented by a novel self-forcing variant, to alleviate error accumulation and ensure long-term temporal consistency during continuous interaction. By leveraging efficient distillation and system-level optimizations, our framework achieves a stable 25fps at 480p resolution with a time-to-first-frame (TTFF) of only around 1.5 seconds. The proposed method provides holistic and fine-grained full-body control, enabling the agent to transition naturally between diverse behavioral states in interactive scenarios. Experimental results demonstrate that FlowAct-R1 achieves exceptional behavioral vividness and perceptual realism, while maintaining robust generalization across diverse character styles.
Why we are recommending this paper?
Due to your Interest in Human in the loop platforms
New York Law School
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The concept of artificial intelligence (AI) and its impact on society is a complex issue that requires careful consideration. [3]
  • The concept of algorithmic impact assessments (AIAs) has been proposed as a way to evaluate the potential effects of AI on society. [3]
  • Data protection impact assessments (DPIAs) have been implemented in some jurisdictions to evaluate the potential effects of data processing on individuals and society. [3]
  • The use of AI in decision-making processes can lead to biases and discrimination, which can have serious consequences for individuals and communities. [2]
Abstract
This essay examines how judicial review should adapt to address challenges posed by artificial intelligence decision-making, particularly regarding minority rights and interests. As I argue in this essay, the rise of three trends-privatization, prediction, and automation in AI-have combined to pose similar risks to minorities. Here, I outline what a theory of judicial review would look like in an era of artificial intelligence, analyzing both the limitations and the possibilities of judicial review of AI. I draw on cases in which AI decision-making has been challenged in courts, to show how concepts of due process and equal protection can be recuperated in a modern AI era, and even integrated into AI, to provide for better oversight and accountability, offering a framework for judicial review in the AI era that protects minorities from algorithmic discrimination.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
Rate paper: 👍 👎 ♥ Save
Abstract
Deep research systems are widely used for multi-step web research, analysis, and cross-source synthesis, yet their evaluation remains challenging. Existing benchmarks often require annotation-intensive task construction, rely on static evaluation dimensions, or fail to reliably verify facts when citations are missing. To bridge these gaps, we introduce DeepResearchEval, an automated framework for deep research task construction and agentic evaluation. For task construction, we propose a persona-driven pipeline generating realistic, complex research tasks anchored in diverse user profiles, applying a two-stage filter Task Qualification and Search Necessity to retain only tasks requiring multi-source evidence integration and external retrieval. For evaluation, we propose an agentic pipeline with two components: an Adaptive Point-wise Quality Evaluation that dynamically derives task-specific evaluation dimensions, criteria, and weights conditioned on each generated task, and an Active Fact-Checking that autonomously extracts and verifies report statements via web search, even when citations are missing.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Southern University of Science and Technology SUSTech
Rate paper: 👍 👎 ♥ Save
AI Insights
  • It's not about AGI introducing new objectives or conflicts, but rather amplifying existing ones by compressing timescales and eroding institutional frictions. [3]
  • AGI can be seen as a powerful amplifier of human strategies, incentives, and institutional incoherence, rather than an alien adversary acting against humanity. [3]
  • AGI (Artificial General Intelligence): a hypothetical AI system capable of performing any intellectual task that humans can. [3]
  • The concept of existential risk associated with artificial general intelligence (AGI) is often misunderstood. [2]
Abstract
Recent reports of large language models (LLMs) exhibiting behaviors such as deception, threats, or blackmail are often interpreted as evidence of alignment failure or emergent malign agency. We argue that this interpretation rests on a conceptual error. LLMs do not reason morally; they statistically internalize the record of human social interaction, including laws, contracts, negotiations, conflicts, and coercive arrangements. Behaviors commonly labeled as unethical or anomalous are therefore better understood as structural generalizations of interaction regimes that arise under extreme asymmetries of power, information, or constraint. Drawing on relational models theory, we show that practices such as blackmail are not categorical deviations from normal social behavior, but limiting cases within the same continuum that includes market pricing, authority relations, and ultimatum bargaining. The surprise elicited by such outputs reflects an anthropomorphic expectation that intelligence should reproduce only socially sanctioned behavior, rather than the full statistical landscape of behaviors humans themselves enact. Because human morality is plural, context-dependent, and historically contingent, the notion of a universally moral artificial intelligence is ill-defined. We therefore reframe concerns about artificial general intelligence (AGI). The primary risk is not adversarial intent, but AGI's role as an endogenous amplifier of human intelligence, power, and contradiction. By eliminating longstanding cognitive and institutional frictions, AGI compresses timescales and removes the historical margin of error that has allowed inconsistent values and governance regimes to persist without collapse. Alignment failure is thus structural, not accidental, and requires governance approaches that address amplification, complexity, and regime stability rather than model-level intent alone.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Hyntel, Inc
Rate paper: 👍 👎 ♥ Save
AI Insights
  • SANC(E3) formalizes general intelligence as an axiomatic framework of an event-representation system oriented toward E3 minimization under finite capacity. [2]
  • Cognition as completion: Prediction, dialogue, authoring, perception, causal inference, and embodied action as instances of Gestalt completion. [1]
  • E3: An energy functional representing the trade-off between reconstruction, compression, and stability. [0]
Abstract
General intelligence must reorganize experience into internal structures that enable prediction and action under finite resources. Existing systems implicitly presuppose fixed primitive units -- tokens, subwords, pixels, or predefined sensor channels -- thereby bypassing the question of how representational units themselves emerge and stabilize. This paper proposes SANC(E3), an axiomatic framework in which representational units are not given a priori but instead arise as stable outcomes of competitive selection, reconstruction, and compression under finite activation capacity, governed by the explicit minimization of an energy functional E3. SANC(E3) draws a principled distinction between system tokens -- structural anchors such as {here, now, I} and sensory sources -- and tokens that emerge through self-organization during co-occurring events. Five core axioms formalize finite capacity, association from co-occurrence, similarity-based competition, confidence-based stabilization, and the reconstruction-compression-update trade-off. A key feature is a pseudo-memory-mapped I/O mechanism, through which internally replayed Gestalts are processed via the same axiomatic pathway as external sensory input. As a result, perception, imagination, prediction, planning, and action are unified within a single representational and energetic process. From the axioms, twelve propositions are derived, showing that category formation, hierarchical organization, unsupervised learning, and high-level cognitive activities can all be understood as instances of Gestalt completion under E3 minimization.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Niigata University
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
AI Insights
  • The double descent phenomenon in machine learning refers to the observation that as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
  • This phenomenon has been observed in various contexts, including linear regression, neural networks, and graph convolutional networks. [3]
  • The double descent curve is characterized by three phases: underfitting, overfitting, and double descent. [3]
  • In the underfitting phase, the model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
  • In the overfitting phase, the model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
  • The double descent phase occurs when the model size increases beyond a certain point, causing the model to start capturing the underlying patterns in the data again, but with increased capacity for overfitting. [3]
  • Some of these explanations include the bias-variance trade-off, the effect of noise on fitting linear regression models, and the role of regularization in mitigating double descent. [3]
  • Researchers have also proposed various methods to mitigate or understand the double descent phenomenon, including optimal regularization, early stopping, and multi-scale feature learning dynamics. [3]
  • These methods aim to balance the capacity of the model with its ability to generalize well to new data. [3]
  • The study of double descent has significant implications for machine learning research and practice. [3]
  • It highlights the importance of understanding the trade-offs between model complexity and generalization performance, and provides insights into how to design models that can generalize well to new data. [3]
  • However, the existing literature provides valuable insights into this phenomenon and highlights the importance of continued investigation in this area. [3]
  • Double Descent: A phenomenon where as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
  • Underfitting: When a model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
  • Overfitting: When a model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
  • Double Descent Curve: A curve that characterizes the three phases of the double descent phenomenon: underfitting, overfitting, and double descent. [3]
  • The double descent phenomenon has been studied extensively in recent years, and various explanations have been proposed. [1]
Abstract
Deep double descent is one of the key phenomena underlying the generalization capability of deep learning models. In this study, epoch-wise double descent, which is delayed generalization following overfitting, was empirically investigated by focusing on the evolution of internal structures. Fully connected neural networks of three different sizes were trained on the CIFAR-10 dataset with 30% label noise. By decomposing the loss curves into signal contributions from clean and noisy training data, the epoch-wise evolutions of internal signals were analyzed separately. Three main findings were obtained from this analysis. First, the model achieved strong re-generalization on test data even after perfectly fitting noisy training data during the double descent phase, corresponding to a "benign overfitting" state. Second, noisy data were learned after clean data, and as learning progressed, their corresponding internal activations became increasingly separated in outer layers; this enabled the model to overfit only noisy data. Third, a single, very large activation emerged in the shallow layer across all models; this phenomenon is referred as "outliers," "massive activa-tions," and "super activations" in recent large language models and evolves with re-generalization. The magnitude of large activation correlated with input patterns but not with output patterns. These empirical findings directly link the recent key phenomena of "deep double descent," "benign overfitting," and "large activation", and support the proposal of a novel scenario for understanding deep double descent.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
University of Cambridge
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The model takes a dense camera trajectory and a reference image as input and outputs the number of sparse keyframes required. [3]
  • DINOv2 encoder: A type of neural network used for encoding images. [3]
  • transformer: A type of neural network used for processing sequential data. [3]
  • MLP: Multi-Layer Perceptron, a type of feedforward neural network. [3]
  • The method is designed to improve the efficiency and quality of video generation by adaptively selecting keyframes. [3]
  • The model can be trained on large datasets and fine-tuned for specific applications. [3]
  • The method requires a large dataset of videos with annotated keyframes to train the model. [3]
  • The paper presents a method for adaptive keyframe density prediction in video generation. [2]
Abstract
Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical barrier to deploying generative video in applications that require real-time interactions, such as embodied AI and VR/AR. This paper explores a new strategy for camera-conditioned video generation of static scenes: using diffusion-based generative models to generate a sparse set of keyframes, and then synthesizing the full video through 3D reconstruction and rendering. By lifting keyframes into a 3D representation and rendering intermediate views, our approach amortizes the generation cost across hundreds of frames while enforcing geometric consistency. We further introduce a model that predicts the optimal number of keyframes for a given camera trajectory, allowing the system to adaptively allocate computation. Our final method, SRENDER, uses very sparse keyframes for simple trajectories and denser ones for complex camera motion. This results in video generation that is more than 40 times faster than the diffusion-based baseline in generating 20 seconds of video, while maintaining high visual fidelity and temporal stability, offering a practical path toward efficient and controllable video synthesis.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations

We did not find tons of content matching your interests we've included some additional topics that are popular. Also be aware that if the topics is not present in arxiv we wont be able to recommend it.

The Alan Turing Institute
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
AI Insights
  • The authors contend that entertainment is a significant use case for AI, with people already using AI for activities unrelated to productivity. [3]
  • The paper suggests that this vision should inspire more debates, discourse, and study in the field of AI, as generative AI is increasingly being used for entertainment. [3]
  • AS: Artificially generated content GenAI: Generative AI Sociotechnical systems: Complex systems that combine social and technical components The paper concludes by emphasizing the need for a constructive vision of cultural AI, rather than just harm minimization. [3]
  • The paper argues that mainstream approaches to evaluating AI systems tend to focus on intelligence and harm minimization, but neglect the cultural dimension of AI use. [2]
  • They propose developing a positive theory of what beneficial, nutritious entertainment might look like, rather than just mitigating harms. [0]
Abstract
Generative AI systems are predominantly designed, evaluated, and marketed as intelligent systems which will benefit society by augmenting or automating human cognitive labor, promising to increase personal, corporate, and macroeconomic productivity. But this mainstream narrative about what AI is and what it can do is in tension with another emerging use case: entertainment. We argue that the field of AI is unprepared to measure or respond to how the proliferation of entertaining AI-generated content will impact society. Emerging data suggest AI is already widely adopted for entertainment purposes -- especially by young people -- and represents a large potential source of revenue. We contend that entertainment will become a primary business model for major AI corporations seeking returns on massive infrastructure investments; this will exert a powerful influence on the technology these companies produce in the coming years. Examining current evaluation practices, we identify a critical asymmetry: while AI assessments rigorously measure both benefits and harms of intelligence, they focus almost exclusively on cultural harms. We lack frameworks for articulating how cultural outputs might be actively beneficial. Drawing on insights from the humanities, we propose "thick entertainment" as a framework for evaluating AI-generated cultural content -- one that considers entertainment's role in meaning-making, identity formation, and social connection rather than simply minimizing harm. While AI is often touted for its potential to revolutionize productivity, in the long run we may find that AI turns out to be as much about "intelligence" as social media is about social connection.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
MIT
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The field of Agentic AI is rapidly evolving, with researchers exploring various architectures, protocols, and design challenges. [3]
  • Agentic AI has the potential to revolutionize industries such as healthcare, finance, and education by automating complex tasks and decision-making processes. [3]
  • Contract Net Protocol: A high-level communication and control protocol for distributed problem solvers. [3]
  • Agentic AI refers to a new generation of artificial intelligence that can reason, plan, and make decisions on its own. [2]
  • Formal methods are being used to develop a rigorous understanding of Agentic AI systems and their behavior. [1]
Abstract
The Contract Net Protocol (1980) introduced coordination through contracts in multi-agent systems. Modern agent protocols standardize connectivity and interoperability; yet, none provide formal, resource governance-normative mechanisms to bound how much agents may consume or how long they may operate. We introduce Agent Contracts, a formal framework that extends the contract metaphor from task allocation to resource-bounded execution. An Agent Contract unifies input/output specifications, multi-dimensional resource constraints, temporal boundaries, and success criteria into a coherent governance mechanism with explicit lifecycle semantics. For multi-agent coordination, we establish conservation laws ensuring delegated budgets respect parent constraints, enabling hierarchical coordination through contract delegation. Empirical validation across four experiments demonstrates 90% token reduction with 525x lower variance in iterative workflows, zero conservation violations in multi-agent delegation, and measurable quality-resource tradeoffs through contract modes. Agent Contracts provide formal foundations for predictable, auditable, and resource-bounded autonomous AI deployment.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
New York Law School
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The concept of artificial intelligence (AI) and its impact on society is a complex issue that requires careful consideration. [3]
  • The concept of algorithmic impact assessments (AIAs) has been proposed as a way to evaluate the potential effects of AI on society. [3]
  • Data protection impact assessments (DPIAs) have been implemented in some jurisdictions to evaluate the potential effects of data processing on individuals and society. [3]
  • The use of AI in decision-making processes can lead to biases and discrimination, which can have serious consequences for individuals and communities. [2]
Abstract
This essay examines how judicial review should adapt to address challenges posed by artificial intelligence decision-making, particularly regarding minority rights and interests. As I argue in this essay, the rise of three trends-privatization, prediction, and automation in AI-have combined to pose similar risks to minorities. Here, I outline what a theory of judicial review would look like in an era of artificial intelligence, analyzing both the limitations and the possibilities of judicial review of AI. I draw on cases in which AI decision-making has been challenged in courts, to show how concepts of due process and equal protection can be recuperated in a modern AI era, and even integrated into AI, to provide for better oversight and accountability, offering a framework for judicial review in the AI era that protects minorities from algorithmic discrimination.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
University of Science and Technology of China
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on rigid, predefined workflows that struggle with complex, conditional queries. To address this limitation, we propose PaperScout, an autonomous agent that reformulates paper search as a sequential decision-making process. Unlike static workflows, PaperScout dynamically decides whether, when, and how to invoke search and expand tools based on accumulated retrieval context. However, training such agents presents a fundamental challenge: standard reinforcement learning methods, typically designed for single-turn tasks, suffer from a granularity mismatch when applied to multi-turn agentic tasks, where token-level optimization diverges from the granularity of sequence-level interactions, leading to noisy credit assignment. We introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method that aligns optimization with agent-environment interaction. Comprehensive experiments on both synthetic and real-world benchmarks demonstrate that PaperScout significantly outperforms strong workflow-driven and RL baselines in both recall and relevance, validating the effectiveness of our adaptive agentic framework and optimization strategy.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Rate paper: 👍 👎 ♥ Save
Abstract
Deep research systems are widely used for multi-step web research, analysis, and cross-source synthesis, yet their evaluation remains challenging. Existing benchmarks often require annotation-intensive task construction, rely on static evaluation dimensions, or fail to reliably verify facts when citations are missing. To bridge these gaps, we introduce DeepResearchEval, an automated framework for deep research task construction and agentic evaluation. For task construction, we propose a persona-driven pipeline generating realistic, complex research tasks anchored in diverse user profiles, applying a two-stage filter Task Qualification and Search Necessity to retain only tasks requiring multi-source evidence integration and external retrieval. For evaluation, we propose an agentic pipeline with two components: an Adaptive Point-wise Quality Evaluation that dynamically derives task-specific evaluation dimensions, criteria, and weights conditioned on each generated task, and an Active Fact-Checking that autonomously extracts and verifies report statements via web search, even when citations are missing.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Southern University of Science and Technology SUSTech
Rate paper: 👍 👎 ♥ Save
AI Insights
  • It's not about AGI introducing new objectives or conflicts, but rather amplifying existing ones by compressing timescales and eroding institutional frictions. [3]
  • AGI can be seen as a powerful amplifier of human strategies, incentives, and institutional incoherence, rather than an alien adversary acting against humanity. [3]
  • AGI (Artificial General Intelligence): a hypothetical AI system capable of performing any intellectual task that humans can. [3]
  • The concept of existential risk associated with artificial general intelligence (AGI) is often misunderstood. [2]
Abstract
Recent reports of large language models (LLMs) exhibiting behaviors such as deception, threats, or blackmail are often interpreted as evidence of alignment failure or emergent malign agency. We argue that this interpretation rests on a conceptual error. LLMs do not reason morally; they statistically internalize the record of human social interaction, including laws, contracts, negotiations, conflicts, and coercive arrangements. Behaviors commonly labeled as unethical or anomalous are therefore better understood as structural generalizations of interaction regimes that arise under extreme asymmetries of power, information, or constraint. Drawing on relational models theory, we show that practices such as blackmail are not categorical deviations from normal social behavior, but limiting cases within the same continuum that includes market pricing, authority relations, and ultimatum bargaining. The surprise elicited by such outputs reflects an anthropomorphic expectation that intelligence should reproduce only socially sanctioned behavior, rather than the full statistical landscape of behaviors humans themselves enact. Because human morality is plural, context-dependent, and historically contingent, the notion of a universally moral artificial intelligence is ill-defined. We therefore reframe concerns about artificial general intelligence (AGI). The primary risk is not adversarial intent, but AGI's role as an endogenous amplifier of human intelligence, power, and contradiction. By eliminating longstanding cognitive and institutional frictions, AGI compresses timescales and removes the historical margin of error that has allowed inconsistent values and governance regimes to persist without collapse. Alignment failure is thus structural, not accidental, and requires governance approaches that address amplification, complexity, and regime stability rather than model-level intent alone.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Hyntel, Inc
Rate paper: 👍 👎 ♥ Save
AI Insights
  • SANC(E3) formalizes general intelligence as an axiomatic framework of an event-representation system oriented toward E3 minimization under finite capacity. [2]
  • Cognition as completion: Prediction, dialogue, authoring, perception, causal inference, and embodied action as instances of Gestalt completion. [1]
  • E3: An energy functional representing the trade-off between reconstruction, compression, and stability. [0]
Abstract
General intelligence must reorganize experience into internal structures that enable prediction and action under finite resources. Existing systems implicitly presuppose fixed primitive units -- tokens, subwords, pixels, or predefined sensor channels -- thereby bypassing the question of how representational units themselves emerge and stabilize. This paper proposes SANC(E3), an axiomatic framework in which representational units are not given a priori but instead arise as stable outcomes of competitive selection, reconstruction, and compression under finite activation capacity, governed by the explicit minimization of an energy functional E3. SANC(E3) draws a principled distinction between system tokens -- structural anchors such as {here, now, I} and sensory sources -- and tokens that emerge through self-organization during co-occurring events. Five core axioms formalize finite capacity, association from co-occurrence, similarity-based competition, confidence-based stabilization, and the reconstruction-compression-update trade-off. A key feature is a pseudo-memory-mapped I/O mechanism, through which internally replayed Gestalts are processed via the same axiomatic pathway as external sensory input. As a result, perception, imagination, prediction, planning, and action are unified within a single representational and energetic process. From the axioms, twelve propositions are derived, showing that category formation, hierarchical organization, unsupervised learning, and high-level cognitive activities can all be understood as instances of Gestalt completion under E3 minimization.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
National University of Singapore
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The document describes a system called DR-Arena, which is designed to evaluate the performance of search agents. [3]
  • The system generates complex research tasks and evaluates the responses from two search agents based on their accuracy, comprehensiveness, formatting, and helpfulness. [3]
  • DR-Arena is a system that tests search agents' abilities by giving them complex research tasks. [3]
  • The system evaluates the answers from two search agents based on how accurate, comprehensive, and helpful they are. [3]
  • DR-Arena The document does not provide information about how the system handles cases where both search agents fail to find the correct entity. [2]
Abstract
As Large Language Models (LLMs) increasingly operate as Deep Research (DR) Agents capable of autonomous investigation and information synthesis, reliable evaluation of their task performance has become a critical bottleneck. Current benchmarks predominantly rely on static datasets, which suffer from several limitations: limited task generality, temporal misalignment, and data contamination. To address these, we introduce DR-Arena, a fully automated evaluation framework that pushes DR agents to their capability limits through dynamic investigation. DR-Arena constructs real-time Information Trees from fresh web trends to ensure the evaluation rubric is synchronized with the live world state, and employs an automated Examiner to generate structured tasks testing two orthogonal capabilities: Deep reasoning and Wide coverage. DR-Arena further adopts Adaptive Evolvement Loop, a state-machine controller that dynamically escalates task complexity based on real-time performance, demanding deeper deduction or wider aggregation until a decisive capability boundary emerges. Experiments with six advanced DR agents demonstrate that DR-Arena achieves a Spearman correlation of 0.94 with the LMSYS Search Arena leaderboard. This represents the state-of-the-art alignment with human preferences without any manual efforts, validating DR-Arena as a reliable alternative for costly human adjudication.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Niigata University
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The double descent phenomenon in machine learning refers to the observation that as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
  • This phenomenon has been observed in various contexts, including linear regression, neural networks, and graph convolutional networks. [3]
  • The double descent curve is characterized by three phases: underfitting, overfitting, and double descent. [3]
  • In the underfitting phase, the model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
  • In the overfitting phase, the model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
  • The double descent phase occurs when the model size increases beyond a certain point, causing the model to start capturing the underlying patterns in the data again, but with increased capacity for overfitting. [3]
  • Some of these explanations include the bias-variance trade-off, the effect of noise on fitting linear regression models, and the role of regularization in mitigating double descent. [3]
  • Researchers have also proposed various methods to mitigate or understand the double descent phenomenon, including optimal regularization, early stopping, and multi-scale feature learning dynamics. [3]
  • These methods aim to balance the capacity of the model with its ability to generalize well to new data. [3]
  • The study of double descent has significant implications for machine learning research and practice. [3]
  • It highlights the importance of understanding the trade-offs between model complexity and generalization performance, and provides insights into how to design models that can generalize well to new data. [3]
  • However, the existing literature provides valuable insights into this phenomenon and highlights the importance of continued investigation in this area. [3]
  • Double Descent: A phenomenon where as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
  • Underfitting: When a model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
  • Overfitting: When a model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
  • Double Descent Curve: A curve that characterizes the three phases of the double descent phenomenon: underfitting, overfitting, and double descent. [3]
  • The double descent phenomenon has been studied extensively in recent years, and various explanations have been proposed. [1]
Abstract
Deep double descent is one of the key phenomena underlying the generalization capability of deep learning models. In this study, epoch-wise double descent, which is delayed generalization following overfitting, was empirically investigated by focusing on the evolution of internal structures. Fully connected neural networks of three different sizes were trained on the CIFAR-10 dataset with 30% label noise. By decomposing the loss curves into signal contributions from clean and noisy training data, the epoch-wise evolutions of internal signals were analyzed separately. Three main findings were obtained from this analysis. First, the model achieved strong re-generalization on test data even after perfectly fitting noisy training data during the double descent phase, corresponding to a "benign overfitting" state. Second, noisy data were learned after clean data, and as learning progressed, their corresponding internal activations became increasingly separated in outer layers; this enabled the model to overfit only noisy data. Third, a single, very large activation emerged in the shallow layer across all models; this phenomenon is referred as "outliers," "massive activa-tions," and "super activations" in recent large language models and evolves with re-generalization. The magnitude of large activation correlated with input patterns but not with output patterns. These empirical findings directly link the recent key phenomena of "deep double descent," "benign overfitting," and "large activation", and support the proposal of a novel scenario for understanding deep double descent.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
NVIDIA
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The paper discusses various methods for attributing model behavior in differentiable games. [3]
  • It highlights the importance of understanding how models learn concepts through concept-level attribution. [3]
  • They emphasize the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
  • Differentiable games: A type of game where the model's behavior can be attributed to specific inputs or actions. [3]
  • Concept-level attribution: A method of understanding how models learn concepts through attributing their behavior to specific inputs or actions. [3]
  • The paper highlights the importance of understanding model behavior and its applications in differentiable games. [3]
  • It emphasizes the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
  • The authors also discuss scalable nested optimization as a key technique for efficient training of deep learning models. [3]
  • The authors also discuss scalable nested optimization for deep learning and its applications. [2]
Abstract
Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which fine-tuning clips improve or degrade temporal dynamics. Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation. On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With Motive-selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
University of Cambridge
Rate paper: 👍 👎 ♥ Save
AI Insights
  • The model takes a dense camera trajectory and a reference image as input and outputs the number of sparse keyframes required. [3]
  • DINOv2 encoder: A type of neural network used for encoding images. [3]
  • transformer: A type of neural network used for processing sequential data. [3]
  • MLP: Multi-Layer Perceptron, a type of feedforward neural network. [3]
  • The method is designed to improve the efficiency and quality of video generation by adaptively selecting keyframes. [3]
  • The model can be trained on large datasets and fine-tuned for specific applications. [3]
  • The method requires a large dataset of videos with annotated keyframes to train the model. [3]
  • The paper presents a method for adaptive keyframe density prediction in video generation. [2]
Abstract
Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical barrier to deploying generative video in applications that require real-time interactions, such as embodied AI and VR/AR. This paper explores a new strategy for camera-conditioned video generation of static scenes: using diffusion-based generative models to generate a sparse set of keyframes, and then synthesizing the full video through 3D reconstruction and rendering. By lifting keyframes into a 3D representation and rendering intermediate views, our approach amortizes the generation cost across hundreds of frames while enforcing geometric consistency. We further introduce a model that predicts the optimal number of keyframes for a given camera trajectory, allowing the system to adaptively allocate computation. Our final method, SRENDER, uses very sparse keyframes for simple trajectories and denser ones for complex camera motion. This results in video generation that is more than 40 times faster than the diffusion-based baseline in generating 20 seconds of video, while maintaining high visual fidelity and temporal stability, offering a practical path toward efficient and controllable video synthesis.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • Human in the Loop
  • Best practices for human in the loop
You can edit or add more interests any time.