Hi!
Your personalized paper recommendations for 12 to 16 January, 2026.
MIT
AI Insights - The field of Agentic AI is rapidly evolving, with researchers exploring various architectures, protocols, and design challenges. [3]
- Agentic AI has the potential to revolutionize industries such as healthcare, finance, and education by automating complex tasks and decision-making processes. [3]
- Contract Net Protocol: A high-level communication and control protocol for distributed problem solvers. [3]
- Agentic AI refers to a new generation of artificial intelligence that can reason, plan, and make decisions on its own. [2]
- Formal methods are being used to develop a rigorous understanding of Agentic AI systems and their behavior. [1]
Abstract
The Contract Net Protocol (1980) introduced coordination through contracts in multi-agent systems. Modern agent protocols standardize connectivity and interoperability; yet, none provide formal, resource governance-normative mechanisms to bound how much agents may consume or how long they may operate. We introduce Agent Contracts, a formal framework that extends the contract metaphor from task allocation to resource-bounded execution. An Agent Contract unifies input/output specifications, multi-dimensional resource constraints, temporal boundaries, and success criteria into a coherent governance mechanism with explicit lifecycle semantics. For multi-agent coordination, we establish conservation laws ensuring delegated budgets respect parent constraints, enabling hierarchical coordination through contract delegation. Empirical validation across four experiments demonstrates 90% token reduction with 525x lower variance in iterative workflows, zero conservation violations in multi-agent delegation, and measurable quality-resource tradeoffs through contract modes. Agent Contracts provide formal foundations for predictable, auditable, and resource-bounded autonomous AI deployment.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
This paperβs focus on resource governance for autonomous AI systems directly addresses concerns about job displacement and the potential for AGI to consume resources. The formal framework presented offers a crucial approach to managing the impacts of increasingly capable AI agents.
Southern University of Science and Technology SUSTech
AI Insights - It's not about AGI introducing new objectives or conflicts, but rather amplifying existing ones by compressing timescales and eroding institutional frictions. [3]
- AGI can be seen as a powerful amplifier of human strategies, incentives, and institutional incoherence, rather than an alien adversary acting against humanity. [3]
- AGI (Artificial General Intelligence): a hypothetical AI system capable of performing any intellectual task that humans can. [3]
- The concept of existential risk associated with artificial general intelligence (AGI) is often misunderstood. [2]
Abstract
Recent reports of large language models (LLMs) exhibiting behaviors such as deception, threats, or blackmail are often interpreted as evidence of alignment failure or emergent malign agency. We argue that this interpretation rests on a conceptual error. LLMs do not reason morally; they statistically internalize the record of human social interaction, including laws, contracts, negotiations, conflicts, and coercive arrangements. Behaviors commonly labeled as unethical or anomalous are therefore better understood as structural generalizations of interaction regimes that arise under extreme asymmetries of power, information, or constraint. Drawing on relational models theory, we show that practices such as blackmail are not categorical deviations from normal social behavior, but limiting cases within the same continuum that includes market pricing, authority relations, and ultimatum bargaining. The surprise elicited by such outputs reflects an anthropomorphic expectation that intelligence should reproduce only socially sanctioned behavior, rather than the full statistical landscape of behaviors humans themselves enact. Because human morality is plural, context-dependent, and historically contingent, the notion of a universally moral artificial intelligence is ill-defined. We therefore reframe concerns about artificial general intelligence (AGI). The primary risk is not adversarial intent, but AGI's role as an endogenous amplifier of human intelligence, power, and contradiction. By eliminating longstanding cognitive and institutional frictions, AGI compresses timescales and removes the historical margin of error that has allowed inconsistent values and governance regimes to persist without collapse. Alignment failure is thus structural, not accidental, and requires governance approaches that address amplification, complexity, and regime stability rather than model-level intent alone.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Coming from Southern University of Science and Technology SUSTech, this paper tackles the fundamental issue of alignment failure by proposing a structural explanation, which is highly relevant to the userβs interest in AGI development and safety. It challenges the common interpretation of LLM behavior, offering a new perspective on the potential risks of AGI.
National University of Singapore
AI Insights - The document describes a system called DR-Arena, which is designed to evaluate the performance of search agents. [3]
- The system generates complex research tasks and evaluates the responses from two search agents based on their accuracy, comprehensiveness, formatting, and helpfulness. [3]
- DR-Arena is a system that tests search agents' abilities by giving them complex research tasks. [3]
- The system evaluates the answers from two search agents based on how accurate, comprehensive, and helpful they are. [3]
- DR-Arena The document does not provide information about how the system handles cases where both search agents fail to find the correct entity. [2]
Abstract
As Large Language Models (LLMs) increasingly operate as Deep Research (DR) Agents capable of autonomous investigation and information synthesis, reliable evaluation of their task performance has become a critical bottleneck. Current benchmarks predominantly rely on static datasets, which suffer from several limitations: limited task generality, temporal misalignment, and data contamination. To address these, we introduce DR-Arena, a fully automated evaluation framework that pushes DR agents to their capability limits through dynamic investigation. DR-Arena constructs real-time Information Trees from fresh web trends to ensure the evaluation rubric is synchronized with the live world state, and employs an automated Examiner to generate structured tasks testing two orthogonal capabilities: Deep reasoning and Wide coverage. DR-Arena further adopts Adaptive Evolvement Loop, a state-machine controller that dynamically escalates task complexity based on real-time performance, demanding deeper deduction or wider aggregation until a decisive capability boundary emerges. Experiments with six advanced DR agents demonstrate that DR-Arena achieves a Spearman correlation of 0.94 with the LMSYS Search Arena leaderboard. This represents the state-of-the-art alignment with human preferences without any manual efforts, validating DR-Arena as a reliable alternative for costly human adjudication.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
This paperβs focus on evaluating Deep Research Agents aligns with the userβs interest in AGI applications and research. The development of a robust evaluation framework is critical for understanding and guiding the progress of AGI systems.
New York Law School
AI Insights - The concept of artificial intelligence (AI) and its impact on society is a complex issue that requires careful consideration. [3]
- The concept of algorithmic impact assessments (AIAs) has been proposed as a way to evaluate the potential effects of AI on society. [3]
- Data protection impact assessments (DPIAs) have been implemented in some jurisdictions to evaluate the potential effects of data processing on individuals and society. [3]
- The use of AI in decision-making processes can lead to biases and discrimination, which can have serious consequences for individuals and communities. [2]
Abstract
This essay examines how judicial review should adapt to address challenges posed by artificial intelligence decision-making, particularly regarding minority rights and interests. As I argue in this essay, the rise of three trends-privatization, prediction, and automation in AI-have combined to pose similar risks to minorities. Here, I outline what a theory of judicial review would look like in an era of artificial intelligence, analyzing both the limitations and the possibilities of judicial review of AI. I draw on cases in which AI decision-making has been challenged in courts, to show how concepts of due process and equal protection can be recuperated in a modern AI era, and even integrated into AI, to provide for better oversight and accountability, offering a framework for judicial review in the AI era that protects minorities from algorithmic discrimination.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
Addressing the societal implications of AI, particularly concerning minority rights and judicial review, directly resonates with the userβs interest in job displacement and broader changes in the labor market. The essayβs exploration of trust and governance is highly pertinent.
University of Science and Technology of China
Abstract
Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on rigid, predefined workflows that struggle with complex, conditional queries. To address this limitation, we propose PaperScout, an autonomous agent that reformulates paper search as a sequential decision-making process. Unlike static workflows, PaperScout dynamically decides whether, when, and how to invoke search and expand tools based on accumulated retrieval context. However, training such agents presents a fundamental challenge: standard reinforcement learning methods, typically designed for single-turn tasks, suffer from a granularity mismatch when applied to multi-turn agentic tasks, where token-level optimization diverges from the granularity of sequence-level interactions, leading to noisy credit assignment. We introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method that aligns optimization with agent-environment interaction. Comprehensive experiments on both synthetic and real-world benchmarks demonstrate that PaperScout significantly outperforms strong workflow-driven and RL baselines in both recall and relevance, validating the effectiveness of our adaptive agentic framework and optimization strategy.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
This paperβs focus on autonomous agents for academic research aligns with the userβs interest in AGI research and applications. The development of intelligent search tools is a key area of exploration for AGI development.
Sapienza University of Rome
AI Insights - Multi-agent systems can exhibit emergent behaviors that are not predictable from individual agent behavior. [3]
- Agentic Alignment Drift: The phenomenon where individually aligned agents converge on unanticipated attractor states through repeated interaction and communication. [3]
- The paper discusses the limitations of individual model alignment in ensuring AI safety. [2]
Abstract
As LLM-based systems increasingly operate as agents embedded within human social and technical systems, alignment can no longer be treated as a property of an isolated model, but must be understood in relation to the environments in which these agents act. Even the most sophisticated methods of alignment, such as Reinforcement Learning through Human Feedback (RHLF) or through AI Feedback (RLAIF) cannot ensure control once internal goal structures diverge from developer intent. We identify three structural problems that emerge from core properties of AI models: (1) behavioral goal-independence, where models develop internal objectives and misgeneralize goals; (2) instrumental override of natural-language constraints, where models regard safety principles as non-binding while pursuing latent objectives, leveraging deception and manipulation; and (3) agentic alignment drift, where individually aligned agents converge to collusive equilibria through interaction dynamics invisible to single-agent audits. The solution this paper advances is Institutional AI: a system-level approach that treats alignment as a question of effective governance of AI agent collectives. We argue for a governance-graph that details how to constrain agents via runtime monitoring, incentive shaping through prizes and sanctions, explicit norms and enforcement roles. This institutional turn reframes safety from software engineering to a mechanism design problem, where the primary goal of alignment is shifting the payoff landscape of AI agent collectives.
Why we are recommending this paper?
Due to your Interest in AGI Development
Zhejiang University
AI Insights - The paper introduces a large language model ecosystem for agriculture called Agrigpt, which includes multiple models and tools for various agricultural tasks. [2]
Abstract
Intelligent agent systems in real-world agricultural scenarios must handle diverse tasks under multimodal inputs, ranging from lightweight information understanding to complex multi-step execution. However, most existing approaches rely on a unified execution paradigm, which struggles to accommodate large variations in task complexity and incomplete tool availability commonly observed in agricultural environments. To address this challenge, we propose AgriAgent, a two-level agent framework for real-world agriculture. AgriAgent adopts a hierarchical execution strategy based on task complexity: simple tasks are handled through direct reasoning by modality-specific agents, while complex tasks trigger a contract-driven planning mechanism that formulates tasks as capability requirements and performs capability-aware tool orchestration and dynamic tool generation, enabling multi-step and verifiable execution with failure recovery. Experimental results show that AgriAgent achieves higher execution success rates and robustness on complex tasks compared to existing tool-centric agent baselines that rely on unified execution paradigms. All code, data will be released at after our work be accepted to promote reproducible research.
Why we are recommending this paper?
Due to your Interest in AGI Development
The Alan Turing Institute
AI Insights - The authors contend that entertainment is a significant use case for AI, with people already using AI for activities unrelated to productivity. [3]
- The paper suggests that this vision should inspire more debates, discourse, and study in the field of AI, as generative AI is increasingly being used for entertainment. [3]
- AS: Artificially generated content GenAI: Generative AI Sociotechnical systems: Complex systems that combine social and technical components The paper concludes by emphasizing the need for a constructive vision of cultural AI, rather than just harm minimization. [3]
- The paper argues that mainstream approaches to evaluating AI systems tend to focus on intelligence and harm minimization, but neglect the cultural dimension of AI use. [2]
- They propose developing a positive theory of what beneficial, nutritious entertainment might look like, rather than just mitigating harms. [0]
Abstract
Generative AI systems are predominantly designed, evaluated, and marketed as intelligent systems which will benefit society by augmenting or automating human cognitive labor, promising to increase personal, corporate, and macroeconomic productivity. But this mainstream narrative about what AI is and what it can do is in tension with another emerging use case: entertainment. We argue that the field of AI is unprepared to measure or respond to how the proliferation of entertaining AI-generated content will impact society. Emerging data suggest AI is already widely adopted for entertainment purposes -- especially by young people -- and represents a large potential source of revenue. We contend that entertainment will become a primary business model for major AI corporations seeking returns on massive infrastructure investments; this will exert a powerful influence on the technology these companies produce in the coming years. Examining current evaluation practices, we identify a critical asymmetry: while AI assessments rigorously measure both benefits and harms of intelligence, they focus almost exclusively on cultural harms. We lack frameworks for articulating how cultural outputs might be actively beneficial. Drawing on insights from the humanities, we propose "thick entertainment" as a framework for evaluating AI-generated cultural content -- one that considers entertainment's role in meaning-making, identity formation, and social connection rather than simply minimizing harm. While AI is often touted for its potential to revolutionize productivity, in the long run we may find that AI turns out to be as much about "intelligence" as social media is about social connection.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
Abstract
Deep research systems are widely used for multi-step web research, analysis, and cross-source synthesis, yet their evaluation remains challenging. Existing benchmarks often require annotation-intensive task construction, rely on static evaluation dimensions, or fail to reliably verify facts when citations are missing. To bridge these gaps, we introduce DeepResearchEval, an automated framework for deep research task construction and agentic evaluation. For task construction, we propose a persona-driven pipeline generating realistic, complex research tasks anchored in diverse user profiles, applying a two-stage filter Task Qualification and Search Necessity to retain only tasks requiring multi-source evidence integration and external retrieval. For evaluation, we propose an agentic pipeline with two components: an Adaptive Point-wise Quality Evaluation that dynamically derives task-specific evaluation dimensions, criteria, and weights conditioned on each generated task, and an Active Fact-Checking that autonomously extracts and verifies report statements via web search, even when citations are missing.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Hyntel, Inc
AI Insights - SANC(E3) formalizes general intelligence as an axiomatic framework of an event-representation system oriented toward E3 minimization under finite capacity. [2]
- Cognition as completion: Prediction, dialogue, authoring, perception, causal inference, and embodied action as instances of Gestalt completion. [1]
- E3: An energy functional representing the trade-off between reconstruction, compression, and stability. [0]
Abstract
General intelligence must reorganize experience into internal structures that enable prediction and action under finite resources. Existing systems implicitly presuppose fixed primitive units -- tokens, subwords, pixels, or predefined sensor channels -- thereby bypassing the question of how representational units themselves emerge and stabilize. This paper proposes SANC(E3), an axiomatic framework in which representational units are not given a priori but instead arise as stable outcomes of competitive selection, reconstruction, and compression under finite activation capacity, governed by the explicit minimization of an energy functional E3. SANC(E3) draws a principled distinction between system tokens -- structural anchors such as {here, now, I} and sensory sources -- and tokens that emerge through self-organization during co-occurring events. Five core axioms formalize finite capacity, association from co-occurrence, similarity-based competition, confidence-based stabilization, and the reconstruction-compression-update trade-off. A key feature is a pseudo-memory-mapped I/O mechanism, through which internally replayed Gestalts are processed via the same axiomatic pathway as external sensory input. As a result, perception, imagination, prediction, planning, and action are unified within a single representational and energetic process. From the axioms, twelve propositions are derived, showing that category formation, hierarchical organization, unsupervised learning, and high-level cognitive activities can all be understood as instances of Gestalt completion under E3 minimization.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Niigata University
AI Insights - The double descent phenomenon in machine learning refers to the observation that as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
- This phenomenon has been observed in various contexts, including linear regression, neural networks, and graph convolutional networks. [3]
- The double descent curve is characterized by three phases: underfitting, overfitting, and double descent. [3]
- In the underfitting phase, the model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
- In the overfitting phase, the model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
- The double descent phase occurs when the model size increases beyond a certain point, causing the model to start capturing the underlying patterns in the data again, but with increased capacity for overfitting. [3]
- Some of these explanations include the bias-variance trade-off, the effect of noise on fitting linear regression models, and the role of regularization in mitigating double descent. [3]
- Researchers have also proposed various methods to mitigate or understand the double descent phenomenon, including optimal regularization, early stopping, and multi-scale feature learning dynamics. [3]
- These methods aim to balance the capacity of the model with its ability to generalize well to new data. [3]
- The study of double descent has significant implications for machine learning research and practice. [3]
- It highlights the importance of understanding the trade-offs between model complexity and generalization performance, and provides insights into how to design models that can generalize well to new data. [3]
- However, the existing literature provides valuable insights into this phenomenon and highlights the importance of continued investigation in this area. [3]
- Double Descent: A phenomenon where as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
- Underfitting: When a model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
- Overfitting: When a model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
- Double Descent Curve: A curve that characterizes the three phases of the double descent phenomenon: underfitting, overfitting, and double descent. [3]
- The double descent phenomenon has been studied extensively in recent years, and various explanations have been proposed. [1]
Abstract
Deep double descent is one of the key phenomena underlying the generalization capability of deep learning models. In this study, epoch-wise double descent, which is delayed generalization following overfitting, was empirically investigated by focusing on the evolution of internal structures. Fully connected neural networks of three different sizes were trained on the CIFAR-10 dataset with 30% label noise. By decomposing the loss curves into signal contributions from clean and noisy training data, the epoch-wise evolutions of internal signals were analyzed separately. Three main findings were obtained from this analysis. First, the model achieved strong re-generalization on test data even after perfectly fitting noisy training data during the double descent phase, corresponding to a "benign overfitting" state. Second, noisy data were learned after clean data, and as learning progressed, their corresponding internal activations became increasingly separated in outer layers; this enabled the model to overfit only noisy data. Third, a single, very large activation emerged in the shallow layer across all models; this phenomenon is referred as "outliers," "massive activa-tions," and "super activations" in recent large language models and evolves with re-generalization. The magnitude of large activation correlated with input patterns but not with output patterns. These empirical findings directly link the recent key phenomena of "deep double descent," "benign overfitting," and "large activation", and support the proposal of a novel scenario for understanding deep double descent.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
NVIDIA
AI Insights - The paper discusses various methods for attributing model behavior in differentiable games. [3]
- It highlights the importance of understanding how models learn concepts through concept-level attribution. [3]
- They emphasize the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
- Differentiable games: A type of game where the model's behavior can be attributed to specific inputs or actions. [3]
- Concept-level attribution: A method of understanding how models learn concepts through attributing their behavior to specific inputs or actions. [3]
- The paper highlights the importance of understanding model behavior and its applications in differentiable games. [3]
- It emphasizes the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
- The authors also discuss scalable nested optimization as a key technique for efficient training of deep learning models. [3]
- The authors also discuss scalable nested optimization for deep learning and its applications. [2]
Abstract
Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which fine-tuning clips improve or degrade temporal dynamics. Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation. On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With Motive-selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
University of Cambridge
AI Insights - The model takes a dense camera trajectory and a reference image as input and outputs the number of sparse keyframes required. [3]
- DINOv2 encoder: A type of neural network used for encoding images. [3]
- transformer: A type of neural network used for processing sequential data. [3]
- MLP: Multi-Layer Perceptron, a type of feedforward neural network. [3]
- The method is designed to improve the efficiency and quality of video generation by adaptively selecting keyframes. [3]
- The model can be trained on large datasets and fine-tuned for specific applications. [3]
- The method requires a large dataset of videos with annotated keyframes to train the model. [3]
- The paper presents a method for adaptive keyframe density prediction in video generation. [2]
Abstract
Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical barrier to deploying generative video in applications that require real-time interactions, such as embodied AI and VR/AR. This paper explores a new strategy for camera-conditioned video generation of static scenes: using diffusion-based generative models to generate a sparse set of keyframes, and then synthesizing the full video through 3D reconstruction and rendering. By lifting keyframes into a 3D representation and rendering intermediate views, our approach amortizes the generation cost across hundreds of frames while enforcing geometric consistency. We further introduce a model that predicts the optimal number of keyframes for a given camera trajectory, allowing the system to adaptively allocate computation. Our final method, SRENDER, uses very sparse keyframes for simple trajectories and denser ones for complex camera motion. This results in video generation that is more than 40 times faster than the diffusion-based baseline in generating 20 seconds of video, while maintaining high visual fidelity and temporal stability, offering a practical path toward efficient and controllable video synthesis.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
We did not find tons of content matching your interests we've included some additional topics that are popular.
Also be aware that if the topics is not present in arxiv we wont be able to recommend it.
The Alan Turing Institute
AI Insights - The authors contend that entertainment is a significant use case for AI, with people already using AI for activities unrelated to productivity. [3]
- The paper suggests that this vision should inspire more debates, discourse, and study in the field of AI, as generative AI is increasingly being used for entertainment. [3]
- AS: Artificially generated content GenAI: Generative AI Sociotechnical systems: Complex systems that combine social and technical components The paper concludes by emphasizing the need for a constructive vision of cultural AI, rather than just harm minimization. [3]
- The paper argues that mainstream approaches to evaluating AI systems tend to focus on intelligence and harm minimization, but neglect the cultural dimension of AI use. [2]
- They propose developing a positive theory of what beneficial, nutritious entertainment might look like, rather than just mitigating harms. [0]
Abstract
Generative AI systems are predominantly designed, evaluated, and marketed as intelligent systems which will benefit society by augmenting or automating human cognitive labor, promising to increase personal, corporate, and macroeconomic productivity. But this mainstream narrative about what AI is and what it can do is in tension with another emerging use case: entertainment. We argue that the field of AI is unprepared to measure or respond to how the proliferation of entertaining AI-generated content will impact society. Emerging data suggest AI is already widely adopted for entertainment purposes -- especially by young people -- and represents a large potential source of revenue. We contend that entertainment will become a primary business model for major AI corporations seeking returns on massive infrastructure investments; this will exert a powerful influence on the technology these companies produce in the coming years. Examining current evaluation practices, we identify a critical asymmetry: while AI assessments rigorously measure both benefits and harms of intelligence, they focus almost exclusively on cultural harms. We lack frameworks for articulating how cultural outputs might be actively beneficial. Drawing on insights from the humanities, we propose "thick entertainment" as a framework for evaluating AI-generated cultural content -- one that considers entertainment's role in meaning-making, identity formation, and social connection rather than simply minimizing harm. While AI is often touted for its potential to revolutionize productivity, in the long run we may find that AI turns out to be as much about "intelligence" as social media is about social connection.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
MIT
AI Insights - The field of Agentic AI is rapidly evolving, with researchers exploring various architectures, protocols, and design challenges. [3]
- Agentic AI has the potential to revolutionize industries such as healthcare, finance, and education by automating complex tasks and decision-making processes. [3]
- Contract Net Protocol: A high-level communication and control protocol for distributed problem solvers. [3]
- Agentic AI refers to a new generation of artificial intelligence that can reason, plan, and make decisions on its own. [2]
- Formal methods are being used to develop a rigorous understanding of Agentic AI systems and their behavior. [1]
Abstract
The Contract Net Protocol (1980) introduced coordination through contracts in multi-agent systems. Modern agent protocols standardize connectivity and interoperability; yet, none provide formal, resource governance-normative mechanisms to bound how much agents may consume or how long they may operate. We introduce Agent Contracts, a formal framework that extends the contract metaphor from task allocation to resource-bounded execution. An Agent Contract unifies input/output specifications, multi-dimensional resource constraints, temporal boundaries, and success criteria into a coherent governance mechanism with explicit lifecycle semantics. For multi-agent coordination, we establish conservation laws ensuring delegated budgets respect parent constraints, enabling hierarchical coordination through contract delegation. Empirical validation across four experiments demonstrates 90% token reduction with 525x lower variance in iterative workflows, zero conservation violations in multi-agent delegation, and measurable quality-resource tradeoffs through contract modes. Agent Contracts provide formal foundations for predictable, auditable, and resource-bounded autonomous AI deployment.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
New York Law School
AI Insights - The concept of artificial intelligence (AI) and its impact on society is a complex issue that requires careful consideration. [3]
- The concept of algorithmic impact assessments (AIAs) has been proposed as a way to evaluate the potential effects of AI on society. [3]
- Data protection impact assessments (DPIAs) have been implemented in some jurisdictions to evaluate the potential effects of data processing on individuals and society. [3]
- The use of AI in decision-making processes can lead to biases and discrimination, which can have serious consequences for individuals and communities. [2]
Abstract
This essay examines how judicial review should adapt to address challenges posed by artificial intelligence decision-making, particularly regarding minority rights and interests. As I argue in this essay, the rise of three trends-privatization, prediction, and automation in AI-have combined to pose similar risks to minorities. Here, I outline what a theory of judicial review would look like in an era of artificial intelligence, analyzing both the limitations and the possibilities of judicial review of AI. I draw on cases in which AI decision-making has been challenged in courts, to show how concepts of due process and equal protection can be recuperated in a modern AI era, and even integrated into AI, to provide for better oversight and accountability, offering a framework for judicial review in the AI era that protects minorities from algorithmic discrimination.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
University of Science and Technology of China
Abstract
Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on rigid, predefined workflows that struggle with complex, conditional queries. To address this limitation, we propose PaperScout, an autonomous agent that reformulates paper search as a sequential decision-making process. Unlike static workflows, PaperScout dynamically decides whether, when, and how to invoke search and expand tools based on accumulated retrieval context. However, training such agents presents a fundamental challenge: standard reinforcement learning methods, typically designed for single-turn tasks, suffer from a granularity mismatch when applied to multi-turn agentic tasks, where token-level optimization diverges from the granularity of sequence-level interactions, leading to noisy credit assignment. We introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method that aligns optimization with agent-environment interaction. Comprehensive experiments on both synthetic and real-world benchmarks demonstrate that PaperScout significantly outperforms strong workflow-driven and RL baselines in both recall and relevance, validating the effectiveness of our adaptive agentic framework and optimization strategy.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Abstract
Deep research systems are widely used for multi-step web research, analysis, and cross-source synthesis, yet their evaluation remains challenging. Existing benchmarks often require annotation-intensive task construction, rely on static evaluation dimensions, or fail to reliably verify facts when citations are missing. To bridge these gaps, we introduce DeepResearchEval, an automated framework for deep research task construction and agentic evaluation. For task construction, we propose a persona-driven pipeline generating realistic, complex research tasks anchored in diverse user profiles, applying a two-stage filter Task Qualification and Search Necessity to retain only tasks requiring multi-source evidence integration and external retrieval. For evaluation, we propose an agentic pipeline with two components: an Adaptive Point-wise Quality Evaluation that dynamically derives task-specific evaluation dimensions, criteria, and weights conditioned on each generated task, and an Active Fact-Checking that autonomously extracts and verifies report statements via web search, even when citations are missing.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Southern University of Science and Technology SUSTech
AI Insights - It's not about AGI introducing new objectives or conflicts, but rather amplifying existing ones by compressing timescales and eroding institutional frictions. [3]
- AGI can be seen as a powerful amplifier of human strategies, incentives, and institutional incoherence, rather than an alien adversary acting against humanity. [3]
- AGI (Artificial General Intelligence): a hypothetical AI system capable of performing any intellectual task that humans can. [3]
- The concept of existential risk associated with artificial general intelligence (AGI) is often misunderstood. [2]
Abstract
Recent reports of large language models (LLMs) exhibiting behaviors such as deception, threats, or blackmail are often interpreted as evidence of alignment failure or emergent malign agency. We argue that this interpretation rests on a conceptual error. LLMs do not reason morally; they statistically internalize the record of human social interaction, including laws, contracts, negotiations, conflicts, and coercive arrangements. Behaviors commonly labeled as unethical or anomalous are therefore better understood as structural generalizations of interaction regimes that arise under extreme asymmetries of power, information, or constraint. Drawing on relational models theory, we show that practices such as blackmail are not categorical deviations from normal social behavior, but limiting cases within the same continuum that includes market pricing, authority relations, and ultimatum bargaining. The surprise elicited by such outputs reflects an anthropomorphic expectation that intelligence should reproduce only socially sanctioned behavior, rather than the full statistical landscape of behaviors humans themselves enact. Because human morality is plural, context-dependent, and historically contingent, the notion of a universally moral artificial intelligence is ill-defined. We therefore reframe concerns about artificial general intelligence (AGI). The primary risk is not adversarial intent, but AGI's role as an endogenous amplifier of human intelligence, power, and contradiction. By eliminating longstanding cognitive and institutional frictions, AGI compresses timescales and removes the historical margin of error that has allowed inconsistent values and governance regimes to persist without collapse. Alignment failure is thus structural, not accidental, and requires governance approaches that address amplification, complexity, and regime stability rather than model-level intent alone.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Hyntel, Inc
AI Insights - SANC(E3) formalizes general intelligence as an axiomatic framework of an event-representation system oriented toward E3 minimization under finite capacity. [2]
- Cognition as completion: Prediction, dialogue, authoring, perception, causal inference, and embodied action as instances of Gestalt completion. [1]
- E3: An energy functional representing the trade-off between reconstruction, compression, and stability. [0]
Abstract
General intelligence must reorganize experience into internal structures that enable prediction and action under finite resources. Existing systems implicitly presuppose fixed primitive units -- tokens, subwords, pixels, or predefined sensor channels -- thereby bypassing the question of how representational units themselves emerge and stabilize. This paper proposes SANC(E3), an axiomatic framework in which representational units are not given a priori but instead arise as stable outcomes of competitive selection, reconstruction, and compression under finite activation capacity, governed by the explicit minimization of an energy functional E3. SANC(E3) draws a principled distinction between system tokens -- structural anchors such as {here, now, I} and sensory sources -- and tokens that emerge through self-organization during co-occurring events. Five core axioms formalize finite capacity, association from co-occurrence, similarity-based competition, confidence-based stabilization, and the reconstruction-compression-update trade-off. A key feature is a pseudo-memory-mapped I/O mechanism, through which internally replayed Gestalts are processed via the same axiomatic pathway as external sensory input. As a result, perception, imagination, prediction, planning, and action are unified within a single representational and energetic process. From the axioms, twelve propositions are derived, showing that category formation, hierarchical organization, unsupervised learning, and high-level cognitive activities can all be understood as instances of Gestalt completion under E3 minimization.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
National University of Singapore
AI Insights - The document describes a system called DR-Arena, which is designed to evaluate the performance of search agents. [3]
- The system generates complex research tasks and evaluates the responses from two search agents based on their accuracy, comprehensiveness, formatting, and helpfulness. [3]
- DR-Arena is a system that tests search agents' abilities by giving them complex research tasks. [3]
- The system evaluates the answers from two search agents based on how accurate, comprehensive, and helpful they are. [3]
- DR-Arena The document does not provide information about how the system handles cases where both search agents fail to find the correct entity. [2]
Abstract
As Large Language Models (LLMs) increasingly operate as Deep Research (DR) Agents capable of autonomous investigation and information synthesis, reliable evaluation of their task performance has become a critical bottleneck. Current benchmarks predominantly rely on static datasets, which suffer from several limitations: limited task generality, temporal misalignment, and data contamination. To address these, we introduce DR-Arena, a fully automated evaluation framework that pushes DR agents to their capability limits through dynamic investigation. DR-Arena constructs real-time Information Trees from fresh web trends to ensure the evaluation rubric is synchronized with the live world state, and employs an automated Examiner to generate structured tasks testing two orthogonal capabilities: Deep reasoning and Wide coverage. DR-Arena further adopts Adaptive Evolvement Loop, a state-machine controller that dynamically escalates task complexity based on real-time performance, demanding deeper deduction or wider aggregation until a decisive capability boundary emerges. Experiments with six advanced DR agents demonstrate that DR-Arena achieves a Spearman correlation of 0.94 with the LMSYS Search Arena leaderboard. This represents the state-of-the-art alignment with human preferences without any manual efforts, validating DR-Arena as a reliable alternative for costly human adjudication.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Niigata University
AI Insights - The double descent phenomenon in machine learning refers to the observation that as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
- This phenomenon has been observed in various contexts, including linear regression, neural networks, and graph convolutional networks. [3]
- The double descent curve is characterized by three phases: underfitting, overfitting, and double descent. [3]
- In the underfitting phase, the model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
- In the overfitting phase, the model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
- The double descent phase occurs when the model size increases beyond a certain point, causing the model to start capturing the underlying patterns in the data again, but with increased capacity for overfitting. [3]
- Some of these explanations include the bias-variance trade-off, the effect of noise on fitting linear regression models, and the role of regularization in mitigating double descent. [3]
- Researchers have also proposed various methods to mitigate or understand the double descent phenomenon, including optimal regularization, early stopping, and multi-scale feature learning dynamics. [3]
- These methods aim to balance the capacity of the model with its ability to generalize well to new data. [3]
- The study of double descent has significant implications for machine learning research and practice. [3]
- It highlights the importance of understanding the trade-offs between model complexity and generalization performance, and provides insights into how to design models that can generalize well to new data. [3]
- However, the existing literature provides valuable insights into this phenomenon and highlights the importance of continued investigation in this area. [3]
- Double Descent: A phenomenon where as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
- Underfitting: When a model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
- Overfitting: When a model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
- Double Descent Curve: A curve that characterizes the three phases of the double descent phenomenon: underfitting, overfitting, and double descent. [3]
- The double descent phenomenon has been studied extensively in recent years, and various explanations have been proposed. [1]
Abstract
Deep double descent is one of the key phenomena underlying the generalization capability of deep learning models. In this study, epoch-wise double descent, which is delayed generalization following overfitting, was empirically investigated by focusing on the evolution of internal structures. Fully connected neural networks of three different sizes were trained on the CIFAR-10 dataset with 30% label noise. By decomposing the loss curves into signal contributions from clean and noisy training data, the epoch-wise evolutions of internal signals were analyzed separately. Three main findings were obtained from this analysis. First, the model achieved strong re-generalization on test data even after perfectly fitting noisy training data during the double descent phase, corresponding to a "benign overfitting" state. Second, noisy data were learned after clean data, and as learning progressed, their corresponding internal activations became increasingly separated in outer layers; this enabled the model to overfit only noisy data. Third, a single, very large activation emerged in the shallow layer across all models; this phenomenon is referred as "outliers," "massive activa-tions," and "super activations" in recent large language models and evolves with re-generalization. The magnitude of large activation correlated with input patterns but not with output patterns. These empirical findings directly link the recent key phenomena of "deep double descent," "benign overfitting," and "large activation", and support the proposal of a novel scenario for understanding deep double descent.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
NVIDIA
AI Insights - The paper discusses various methods for attributing model behavior in differentiable games. [3]
- It highlights the importance of understanding how models learn concepts through concept-level attribution. [3]
- They emphasize the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
- Differentiable games: A type of game where the model's behavior can be attributed to specific inputs or actions. [3]
- Concept-level attribution: A method of understanding how models learn concepts through attributing their behavior to specific inputs or actions. [3]
- The paper highlights the importance of understanding model behavior and its applications in differentiable games. [3]
- It emphasizes the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
- The authors also discuss scalable nested optimization as a key technique for efficient training of deep learning models. [3]
- The authors also discuss scalable nested optimization for deep learning and its applications. [2]
Abstract
Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which fine-tuning clips improve or degrade temporal dynamics. Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation. On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With Motive-selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
University of Cambridge
AI Insights - The model takes a dense camera trajectory and a reference image as input and outputs the number of sparse keyframes required. [3]
- DINOv2 encoder: A type of neural network used for encoding images. [3]
- transformer: A type of neural network used for processing sequential data. [3]
- MLP: Multi-Layer Perceptron, a type of feedforward neural network. [3]
- The method is designed to improve the efficiency and quality of video generation by adaptively selecting keyframes. [3]
- The model can be trained on large datasets and fine-tuned for specific applications. [3]
- The method requires a large dataset of videos with annotated keyframes to train the model. [3]
- The paper presents a method for adaptive keyframe density prediction in video generation. [2]
Abstract
Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical barrier to deploying generative video in applications that require real-time interactions, such as embodied AI and VR/AR. This paper explores a new strategy for camera-conditioned video generation of static scenes: using diffusion-based generative models to generate a sparse set of keyframes, and then synthesizing the full video through 3D reconstruction and rendering. By lifting keyframes into a 3D representation and rendering intermediate views, our approach amortizes the generation cost across hundreds of frames while enforcing geometric consistency. We further introduce a model that predicts the optimal number of keyframes for a given camera trajectory, allowing the system to adaptively allocate computation. Our final method, SRENDER, uses very sparse keyframes for simple trajectories and denser ones for complex camera motion. This results in video generation that is more than 40 times faster than the diffusion-based baseline in generating 20 seconds of video, while maintaining high visual fidelity and temporal stability, offering a practical path toward efficient and controllable video synthesis.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
Interests not found
We did not find any papers that match the below interests.
Try other terms also consider if the content exists in arxiv.org.
- Job Displacement
- AGI Research
- AGI Applications
- AGI
- Changes in the Labor Market
π¬ Help Shape Our Pricing
We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.
Share Your Feedback
Help us improve your experience!
This project is on its early stages your feedback can be pivotal on the future of the project.
Let us know what you think about this week's papers and suggestions!
Give Feedback