Hi!
Your personalized paper recommendations for 12 to 16 January, 2026.
MIT
AI Insights - The field of Agentic AI is rapidly evolving, with researchers exploring various architectures, protocols, and design challenges. [3]
- Agentic AI has the potential to revolutionize industries such as healthcare, finance, and education by automating complex tasks and decision-making processes. [3]
- Contract Net Protocol: A high-level communication and control protocol for distributed problem solvers. [3]
- Agentic AI refers to a new generation of artificial intelligence that can reason, plan, and make decisions on its own. [2]
- Formal methods are being used to develop a rigorous understanding of Agentic AI systems and their behavior. [1]
Abstract
The Contract Net Protocol (1980) introduced coordination through contracts in multi-agent systems. Modern agent protocols standardize connectivity and interoperability; yet, none provide formal, resource governance-normative mechanisms to bound how much agents may consume or how long they may operate. We introduce Agent Contracts, a formal framework that extends the contract metaphor from task allocation to resource-bounded execution. An Agent Contract unifies input/output specifications, multi-dimensional resource constraints, temporal boundaries, and success criteria into a coherent governance mechanism with explicit lifecycle semantics. For multi-agent coordination, we establish conservation laws ensuring delegated budgets respect parent constraints, enabling hierarchical coordination through contract delegation. Empirical validation across four experiments demonstrates 90% token reduction with 525x lower variance in iterative workflows, zero conservation violations in multi-agent delegation, and measurable quality-resource tradeoffs through contract modes. Agent Contracts provide formal foundations for predictable, auditable, and resource-bounded autonomous AI deployment.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
This paper from MIT directly addresses the need for resource governance in autonomous AI systems, aligning with the user's interest in productivity tools and AI agents. It offers a formal framework for managing agent behavior, a crucial consideration for building reliable and controlled AI applications.
University of Massachusetts Lowell
AI Insights - Internal prior: The knowledge and beliefs that a language model has learned from its training data. [3]
- Tool-Augmented Language Models (TALMs) are a type of large language model that can use external tools and APIs to perform tasks. [2]
Abstract
Tool-augmented large language models (LLMs) have powered many applications. However, they are likely to suffer from knowledge conflict. In this paper, we propose a new type of knowledge conflict -- Tool-Memory Conflict (TMC), where the internal parametric knowledge contradicts with the external tool knowledge for tool-augmented LLMs. We find that existing LLMs, though powerful, suffer from TMC, especially on STEM-related tasks. We also uncover that under different conditions, tool knowledge and parametric knowledge may be prioritized differently. We then evaluate existing conflict resolving techniques, including prompting-based and RAG-based methods. Results show that none of these approaches can effectively resolve tool-memory conflicts.
Why we are recommending this paper?
Due to your Interest in LLMs for Productivity
Given the user's interest in LLMs and productivity, this research from the University of Massachusetts Lowell is highly relevant. The paper tackles a critical limitation of tool-augmented LLMs – knowledge conflict – which directly impacts their effectiveness as productivity tools.
Sports Vision, Inc
AI Insights - The method, called LLM-Pruner, uses a combination of pruning and quantization techniques to reduce the model's size while maintaining its performance. [3]
- Reinforcement learning: A type of machine learning where an agent learns to make decisions based on rewards or penalties. [3]
- Quantization: The process of reducing the precision of model weights to reduce memory usage. [3]
- Further research is needed to explore the potential of reinforcement learning in model compression and to investigate its application in real-world scenarios. [3]
- The method relies on a combination of pruning and quantization techniques, which may not be suitable for all types of models or applications. [3]
- The evaluation is limited to a small set of models and datasets, and further research is needed to explore its generalizability. [3]
- The paper proposes a novel approach to compressing large language models using reinforcement learning. [2]
Abstract
As Large Language Models (LLMs) continue to scale, post-training pruning has emerged as a promising approach to reduce computational costs while preserving performance. Existing methods such as SparseGPT and Wanda achieve high sparsity through layer-wise weight reconstruction or activation-aware magnitude pruning, but rely on uniform or hand-crafted heuristics to determine per-layer sparsity ratios. Moreover, recent work has shown that pruned LLMs suffer from severe factual knowledge degradation, with structured pruning methods experiencing near-total collapse in factual question-answering capabilities. We introduce agent-guided pruning, where a foundation model acts as an adaptive pruning agent to intelligently select which layers to prune at each iteration while preserving critical knowledge pathways. Our method constructs layer-wise sensitivity profiles by combining Wanda-inspired weight-activation metrics with gradient importance scores, normalized as z-scores for model-agnostic comparison. These statistics are processed by an LLM agent equipped with self-reflection capabilities, enabling it to learn from previous pruning outcomes and iteratively refine its strategy. A checkpoint rollback mechanism maintains model quality by reverting when perplexity degradation exceeds a threshold. We evaluate our approach on Qwen3 models (4B and 8B parameters) at approximately 45% sparsity, demonstrating substantial improvements over structured pruning baselines: 56% relative improvement in MMLU accuracy, 19x better factual knowledge retention on FreebaseQA, and 69% lower perplexity degradation. Notably, our framework requires no retraining, operates in a model-agnostic manner, and exhibits effective self-correction with only 2-4 rollbacks across 21-40 iterations, demonstrating that foundation models can effectively guide the compression of other foundation models.
Why we are recommending this paper?
Due to your Interest in LLMs for Productivity
This Sports Vision, Inc. paper explores LLM compression, a technique that can significantly improve the efficiency of LLMs, aligning with the user's interest in productivity and resource optimization. It’s a direct approach to making LLMs more effective tools.
University of Science and Technology of China
Abstract
Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on rigid, predefined workflows that struggle with complex, conditional queries. To address this limitation, we propose PaperScout, an autonomous agent that reformulates paper search as a sequential decision-making process. Unlike static workflows, PaperScout dynamically decides whether, when, and how to invoke search and expand tools based on accumulated retrieval context. However, training such agents presents a fundamental challenge: standard reinforcement learning methods, typically designed for single-turn tasks, suffer from a granularity mismatch when applied to multi-turn agentic tasks, where token-level optimization diverges from the granularity of sequence-level interactions, leading to noisy credit assignment. We introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method that aligns optimization with agent-environment interaction. Comprehensive experiments on both synthetic and real-world benchmarks demonstrate that PaperScout significantly outperforms strong workflow-driven and RL baselines in both recall and relevance, validating the effectiveness of our adaptive agentic framework and optimization strategy.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
This University of Science and Technology of China paper focuses on autonomous paper search, a core component of research productivity. The agent-based approach aligns with the user's interest in AI for productivity tools and efficient information retrieval.
Hyntel, Inc
AI Insights - SANC(E3) formalizes general intelligence as an axiomatic framework of an event-representation system oriented toward E3 minimization under finite capacity. [2]
- Cognition as completion: Prediction, dialogue, authoring, perception, causal inference, and embodied action as instances of Gestalt completion. [1]
- E3: An energy functional representing the trade-off between reconstruction, compression, and stability. [0]
Abstract
General intelligence must reorganize experience into internal structures that enable prediction and action under finite resources. Existing systems implicitly presuppose fixed primitive units -- tokens, subwords, pixels, or predefined sensor channels -- thereby bypassing the question of how representational units themselves emerge and stabilize. This paper proposes SANC(E3), an axiomatic framework in which representational units are not given a priori but instead arise as stable outcomes of competitive selection, reconstruction, and compression under finite activation capacity, governed by the explicit minimization of an energy functional E3. SANC(E3) draws a principled distinction between system tokens -- structural anchors such as {here, now, I} and sensory sources -- and tokens that emerge through self-organization during co-occurring events. Five core axioms formalize finite capacity, association from co-occurrence, similarity-based competition, confidence-based stabilization, and the reconstruction-compression-update trade-off. A key feature is a pseudo-memory-mapped I/O mechanism, through which internally replayed Gestalts are processed via the same axiomatic pathway as external sensory input. As a result, perception, imagination, prediction, planning, and action are unified within a single representational and energetic process. From the axioms, twelve propositions are derived, showing that category formation, hierarchical organization, unsupervised learning, and high-level cognitive activities can all be understood as instances of Gestalt completion under E3 minimization.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Coming from Hyntel, Inc., this paper delves into the fundamental challenges of general intelligence, which is a key area for AI productivity. The focus on self-organizing networks and resource management is directly relevant to building more capable and efficient AI systems.
Virginia Tech
AI Insights - The paper discusses the challenges of dataset licensing and attribution in AI research, highlighting the need for more transparent and equitable practices. [3]
- Attribution: The act of acknowledging the source of a dataset or model used in AI research. [3]
- The paper assumes that all datasets are available for use, which may not be the case in practice. [3]
- The authors propose a framework for optimal data selection from multiple sources, which can improve performance scaling and reduce computational costs. [2]
Abstract
We argue that the machine learning value chain is structurally unsustainable due to an economic data processing inequality: each state in the data cycle from inputs to model weights to synthetic outputs refines technical signal but strips economic equity from data generators. We show, by analyzing seventy-three public data deals, that the majority of value accrues to aggregators, with documented creator royalties rounding to zero and widespread opacity of deal terms. This is not just an economic welfare concern: as data and its derivatives become economic assets, the feedback loop that sustains current learning algorithms is at risk. We identify three structural faults - missing provenance, asymmetric bargaining power, and non-dynamic pricing - as the operational machinery of this inequality. In our analysis, we trace these problems along the machine learning value chain and propose an Equitable Data-Value Exchange (EDVEX) Framework to enable a minimal market that benefits all participants. Finally, we outline research directions where our community can make concrete contributions to data deals and contextualize our position with related and orthogonal viewpoints.
Why we are recommending this paper?
Due to your Interest in Economics of Productivity
The Alan Turing Institute
AI Insights - The authors contend that entertainment is a significant use case for AI, with people already using AI for activities unrelated to productivity. [3]
- The paper suggests that this vision should inspire more debates, discourse, and study in the field of AI, as generative AI is increasingly being used for entertainment. [3]
- AS: Artificially generated content GenAI: Generative AI Sociotechnical systems: Complex systems that combine social and technical components The paper concludes by emphasizing the need for a constructive vision of cultural AI, rather than just harm minimization. [3]
- The paper argues that mainstream approaches to evaluating AI systems tend to focus on intelligence and harm minimization, but neglect the cultural dimension of AI use. [2]
- They propose developing a positive theory of what beneficial, nutritious entertainment might look like, rather than just mitigating harms. [0]
Abstract
Generative AI systems are predominantly designed, evaluated, and marketed as intelligent systems which will benefit society by augmenting or automating human cognitive labor, promising to increase personal, corporate, and macroeconomic productivity. But this mainstream narrative about what AI is and what it can do is in tension with another emerging use case: entertainment. We argue that the field of AI is unprepared to measure or respond to how the proliferation of entertaining AI-generated content will impact society. Emerging data suggest AI is already widely adopted for entertainment purposes -- especially by young people -- and represents a large potential source of revenue. We contend that entertainment will become a primary business model for major AI corporations seeking returns on massive infrastructure investments; this will exert a powerful influence on the technology these companies produce in the coming years. Examining current evaluation practices, we identify a critical asymmetry: while AI assessments rigorously measure both benefits and harms of intelligence, they focus almost exclusively on cultural harms. We lack frameworks for articulating how cultural outputs might be actively beneficial. Drawing on insights from the humanities, we propose "thick entertainment" as a framework for evaluating AI-generated cultural content -- one that considers entertainment's role in meaning-making, identity formation, and social connection rather than simply minimizing harm. While AI is often touted for its potential to revolutionize productivity, in the long run we may find that AI turns out to be as much about "intelligence" as social media is about social connection.
Why we are recommending this paper?
Due to your Interest in AI for Productivity Tools
New York Law School
AI Insights - The concept of artificial intelligence (AI) and its impact on society is a complex issue that requires careful consideration. [3]
- The concept of algorithmic impact assessments (AIAs) has been proposed as a way to evaluate the potential effects of AI on society. [3]
- Data protection impact assessments (DPIAs) have been implemented in some jurisdictions to evaluate the potential effects of data processing on individuals and society. [3]
- The use of AI in decision-making processes can lead to biases and discrimination, which can have serious consequences for individuals and communities. [2]
Abstract
This essay examines how judicial review should adapt to address challenges posed by artificial intelligence decision-making, particularly regarding minority rights and interests. As I argue in this essay, the rise of three trends-privatization, prediction, and automation in AI-have combined to pose similar risks to minorities. Here, I outline what a theory of judicial review would look like in an era of artificial intelligence, analyzing both the limitations and the possibilities of judicial review of AI. I draw on cases in which AI decision-making has been challenged in courts, to show how concepts of due process and equal protection can be recuperated in a modern AI era, and even integrated into AI, to provide for better oversight and accountability, offering a framework for judicial review in the AI era that protects minorities from algorithmic discrimination.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
Abstract
Deep research systems are widely used for multi-step web research, analysis, and cross-source synthesis, yet their evaluation remains challenging. Existing benchmarks often require annotation-intensive task construction, rely on static evaluation dimensions, or fail to reliably verify facts when citations are missing. To bridge these gaps, we introduce DeepResearchEval, an automated framework for deep research task construction and agentic evaluation. For task construction, we propose a persona-driven pipeline generating realistic, complex research tasks anchored in diverse user profiles, applying a two-stage filter Task Qualification and Search Necessity to retain only tasks requiring multi-source evidence integration and external retrieval. For evaluation, we propose an agentic pipeline with two components: an Adaptive Point-wise Quality Evaluation that dynamically derives task-specific evaluation dimensions, criteria, and weights conditioned on each generated task, and an Active Fact-Checking that autonomously extracts and verifies report statements via web search, even when citations are missing.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Southern University of Science and Technology SUSTech
AI Insights - It's not about AGI introducing new objectives or conflicts, but rather amplifying existing ones by compressing timescales and eroding institutional frictions. [3]
- AGI can be seen as a powerful amplifier of human strategies, incentives, and institutional incoherence, rather than an alien adversary acting against humanity. [3]
- AGI (Artificial General Intelligence): a hypothetical AI system capable of performing any intellectual task that humans can. [3]
- The concept of existential risk associated with artificial general intelligence (AGI) is often misunderstood. [2]
Abstract
Recent reports of large language models (LLMs) exhibiting behaviors such as deception, threats, or blackmail are often interpreted as evidence of alignment failure or emergent malign agency. We argue that this interpretation rests on a conceptual error. LLMs do not reason morally; they statistically internalize the record of human social interaction, including laws, contracts, negotiations, conflicts, and coercive arrangements. Behaviors commonly labeled as unethical or anomalous are therefore better understood as structural generalizations of interaction regimes that arise under extreme asymmetries of power, information, or constraint. Drawing on relational models theory, we show that practices such as blackmail are not categorical deviations from normal social behavior, but limiting cases within the same continuum that includes market pricing, authority relations, and ultimatum bargaining. The surprise elicited by such outputs reflects an anthropomorphic expectation that intelligence should reproduce only socially sanctioned behavior, rather than the full statistical landscape of behaviors humans themselves enact. Because human morality is plural, context-dependent, and historically contingent, the notion of a universally moral artificial intelligence is ill-defined. We therefore reframe concerns about artificial general intelligence (AGI). The primary risk is not adversarial intent, but AGI's role as an endogenous amplifier of human intelligence, power, and contradiction. By eliminating longstanding cognitive and institutional frictions, AGI compresses timescales and removes the historical margin of error that has allowed inconsistent values and governance regimes to persist without collapse. Alignment failure is thus structural, not accidental, and requires governance approaches that address amplification, complexity, and regime stability rather than model-level intent alone.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
National University of Singapore
AI Insights - The document describes a system called DR-Arena, which is designed to evaluate the performance of search agents. [3]
- The system generates complex research tasks and evaluates the responses from two search agents based on their accuracy, comprehensiveness, formatting, and helpfulness. [3]
- DR-Arena is a system that tests search agents' abilities by giving them complex research tasks. [3]
- The system evaluates the answers from two search agents based on how accurate, comprehensive, and helpful they are. [3]
- DR-Arena The document does not provide information about how the system handles cases where both search agents fail to find the correct entity. [2]
Abstract
As Large Language Models (LLMs) increasingly operate as Deep Research (DR) Agents capable of autonomous investigation and information synthesis, reliable evaluation of their task performance has become a critical bottleneck. Current benchmarks predominantly rely on static datasets, which suffer from several limitations: limited task generality, temporal misalignment, and data contamination. To address these, we introduce DR-Arena, a fully automated evaluation framework that pushes DR agents to their capability limits through dynamic investigation. DR-Arena constructs real-time Information Trees from fresh web trends to ensure the evaluation rubric is synchronized with the live world state, and employs an automated Examiner to generate structured tasks testing two orthogonal capabilities: Deep reasoning and Wide coverage. DR-Arena further adopts Adaptive Evolvement Loop, a state-machine controller that dynamically escalates task complexity based on real-time performance, demanding deeper deduction or wider aggregation until a decisive capability boundary emerges. Experiments with six advanced DR agents demonstrate that DR-Arena achieves a Spearman correlation of 0.94 with the LMSYS Search Arena leaderboard. This represents the state-of-the-art alignment with human preferences without any manual efforts, validating DR-Arena as a reliable alternative for costly human adjudication.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Niigata University
AI Insights - The double descent phenomenon in machine learning refers to the observation that as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
- This phenomenon has been observed in various contexts, including linear regression, neural networks, and graph convolutional networks. [3]
- The double descent curve is characterized by three phases: underfitting, overfitting, and double descent. [3]
- In the underfitting phase, the model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
- In the overfitting phase, the model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
- The double descent phase occurs when the model size increases beyond a certain point, causing the model to start capturing the underlying patterns in the data again, but with increased capacity for overfitting. [3]
- Some of these explanations include the bias-variance trade-off, the effect of noise on fitting linear regression models, and the role of regularization in mitigating double descent. [3]
- Researchers have also proposed various methods to mitigate or understand the double descent phenomenon, including optimal regularization, early stopping, and multi-scale feature learning dynamics. [3]
- These methods aim to balance the capacity of the model with its ability to generalize well to new data. [3]
- The study of double descent has significant implications for machine learning research and practice. [3]
- It highlights the importance of understanding the trade-offs between model complexity and generalization performance, and provides insights into how to design models that can generalize well to new data. [3]
- However, the existing literature provides valuable insights into this phenomenon and highlights the importance of continued investigation in this area. [3]
- Double Descent: A phenomenon where as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
- Underfitting: When a model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
- Overfitting: When a model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
- Double Descent Curve: A curve that characterizes the three phases of the double descent phenomenon: underfitting, overfitting, and double descent. [3]
- The double descent phenomenon has been studied extensively in recent years, and various explanations have been proposed. [1]
Abstract
Deep double descent is one of the key phenomena underlying the generalization capability of deep learning models. In this study, epoch-wise double descent, which is delayed generalization following overfitting, was empirically investigated by focusing on the evolution of internal structures. Fully connected neural networks of three different sizes were trained on the CIFAR-10 dataset with 30% label noise. By decomposing the loss curves into signal contributions from clean and noisy training data, the epoch-wise evolutions of internal signals were analyzed separately. Three main findings were obtained from this analysis. First, the model achieved strong re-generalization on test data even after perfectly fitting noisy training data during the double descent phase, corresponding to a "benign overfitting" state. Second, noisy data were learned after clean data, and as learning progressed, their corresponding internal activations became increasingly separated in outer layers; this enabled the model to overfit only noisy data. Third, a single, very large activation emerged in the shallow layer across all models; this phenomenon is referred as "outliers," "massive activa-tions," and "super activations" in recent large language models and evolves with re-generalization. The magnitude of large activation correlated with input patterns but not with output patterns. These empirical findings directly link the recent key phenomena of "deep double descent," "benign overfitting," and "large activation", and support the proposal of a novel scenario for understanding deep double descent.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
NVIDIA
AI Insights - The paper discusses various methods for attributing model behavior in differentiable games. [3]
- It highlights the importance of understanding how models learn concepts through concept-level attribution. [3]
- They emphasize the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
- Differentiable games: A type of game where the model's behavior can be attributed to specific inputs or actions. [3]
- Concept-level attribution: A method of understanding how models learn concepts through attributing their behavior to specific inputs or actions. [3]
- The paper highlights the importance of understanding model behavior and its applications in differentiable games. [3]
- It emphasizes the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
- The authors also discuss scalable nested optimization as a key technique for efficient training of deep learning models. [3]
- The authors also discuss scalable nested optimization for deep learning and its applications. [2]
Abstract
Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which fine-tuning clips improve or degrade temporal dynamics. Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation. On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With Motive-selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
University of Cambridge
AI Insights - The model takes a dense camera trajectory and a reference image as input and outputs the number of sparse keyframes required. [3]
- DINOv2 encoder: A type of neural network used for encoding images. [3]
- transformer: A type of neural network used for processing sequential data. [3]
- MLP: Multi-Layer Perceptron, a type of feedforward neural network. [3]
- The method is designed to improve the efficiency and quality of video generation by adaptively selecting keyframes. [3]
- The model can be trained on large datasets and fine-tuned for specific applications. [3]
- The method requires a large dataset of videos with annotated keyframes to train the model. [3]
- The paper presents a method for adaptive keyframe density prediction in video generation. [2]
Abstract
Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical barrier to deploying generative video in applications that require real-time interactions, such as embodied AI and VR/AR. This paper explores a new strategy for camera-conditioned video generation of static scenes: using diffusion-based generative models to generate a sparse set of keyframes, and then synthesizing the full video through 3D reconstruction and rendering. By lifting keyframes into a 3D representation and rendering intermediate views, our approach amortizes the generation cost across hundreds of frames while enforcing geometric consistency. We further introduce a model that predicts the optimal number of keyframes for a given camera trajectory, allowing the system to adaptively allocate computation. Our final method, SRENDER, uses very sparse keyframes for simple trajectories and denser ones for complex camera motion. This results in video generation that is more than 40 times faster than the diffusion-based baseline in generating 20 seconds of video, while maintaining high visual fidelity and temporal stability, offering a practical path toward efficient and controllable video synthesis.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
We did not find tons of content matching your interests we've included some additional topics that are popular.
Also be aware that if the topics is not present in arxiv we wont be able to recommend it.
MIT
AI Insights - The field of Agentic AI is rapidly evolving, with researchers exploring various architectures, protocols, and design challenges. [3]
- Agentic AI has the potential to revolutionize industries such as healthcare, finance, and education by automating complex tasks and decision-making processes. [3]
- Contract Net Protocol: A high-level communication and control protocol for distributed problem solvers. [3]
- Agentic AI refers to a new generation of artificial intelligence that can reason, plan, and make decisions on its own. [2]
- Formal methods are being used to develop a rigorous understanding of Agentic AI systems and their behavior. [1]
Abstract
The Contract Net Protocol (1980) introduced coordination through contracts in multi-agent systems. Modern agent protocols standardize connectivity and interoperability; yet, none provide formal, resource governance-normative mechanisms to bound how much agents may consume or how long they may operate. We introduce Agent Contracts, a formal framework that extends the contract metaphor from task allocation to resource-bounded execution. An Agent Contract unifies input/output specifications, multi-dimensional resource constraints, temporal boundaries, and success criteria into a coherent governance mechanism with explicit lifecycle semantics. For multi-agent coordination, we establish conservation laws ensuring delegated budgets respect parent constraints, enabling hierarchical coordination through contract delegation. Empirical validation across four experiments demonstrates 90% token reduction with 525x lower variance in iterative workflows, zero conservation violations in multi-agent delegation, and measurable quality-resource tradeoffs through contract modes. Agent Contracts provide formal foundations for predictable, auditable, and resource-bounded autonomous AI deployment.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
New York Law School
AI Insights - The concept of artificial intelligence (AI) and its impact on society is a complex issue that requires careful consideration. [3]
- The concept of algorithmic impact assessments (AIAs) has been proposed as a way to evaluate the potential effects of AI on society. [3]
- Data protection impact assessments (DPIAs) have been implemented in some jurisdictions to evaluate the potential effects of data processing on individuals and society. [3]
- The use of AI in decision-making processes can lead to biases and discrimination, which can have serious consequences for individuals and communities. [2]
Abstract
This essay examines how judicial review should adapt to address challenges posed by artificial intelligence decision-making, particularly regarding minority rights and interests. As I argue in this essay, the rise of three trends-privatization, prediction, and automation in AI-have combined to pose similar risks to minorities. Here, I outline what a theory of judicial review would look like in an era of artificial intelligence, analyzing both the limitations and the possibilities of judicial review of AI. I draw on cases in which AI decision-making has been challenged in courts, to show how concepts of due process and equal protection can be recuperated in a modern AI era, and even integrated into AI, to provide for better oversight and accountability, offering a framework for judicial review in the AI era that protects minorities from algorithmic discrimination.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
University of Science and Technology of China
Abstract
Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on rigid, predefined workflows that struggle with complex, conditional queries. To address this limitation, we propose PaperScout, an autonomous agent that reformulates paper search as a sequential decision-making process. Unlike static workflows, PaperScout dynamically decides whether, when, and how to invoke search and expand tools based on accumulated retrieval context. However, training such agents presents a fundamental challenge: standard reinforcement learning methods, typically designed for single-turn tasks, suffer from a granularity mismatch when applied to multi-turn agentic tasks, where token-level optimization diverges from the granularity of sequence-level interactions, leading to noisy credit assignment. We introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method that aligns optimization with agent-environment interaction. Comprehensive experiments on both synthetic and real-world benchmarks demonstrate that PaperScout significantly outperforms strong workflow-driven and RL baselines in both recall and relevance, validating the effectiveness of our adaptive agentic framework and optimization strategy.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Abstract
Deep research systems are widely used for multi-step web research, analysis, and cross-source synthesis, yet their evaluation remains challenging. Existing benchmarks often require annotation-intensive task construction, rely on static evaluation dimensions, or fail to reliably verify facts when citations are missing. To bridge these gaps, we introduce DeepResearchEval, an automated framework for deep research task construction and agentic evaluation. For task construction, we propose a persona-driven pipeline generating realistic, complex research tasks anchored in diverse user profiles, applying a two-stage filter Task Qualification and Search Necessity to retain only tasks requiring multi-source evidence integration and external retrieval. For evaluation, we propose an agentic pipeline with two components: an Adaptive Point-wise Quality Evaluation that dynamically derives task-specific evaluation dimensions, criteria, and weights conditioned on each generated task, and an Active Fact-Checking that autonomously extracts and verifies report statements via web search, even when citations are missing.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Southern University of Science and Technology SUSTech
AI Insights - It's not about AGI introducing new objectives or conflicts, but rather amplifying existing ones by compressing timescales and eroding institutional frictions. [3]
- AGI can be seen as a powerful amplifier of human strategies, incentives, and institutional incoherence, rather than an alien adversary acting against humanity. [3]
- AGI (Artificial General Intelligence): a hypothetical AI system capable of performing any intellectual task that humans can. [3]
- The concept of existential risk associated with artificial general intelligence (AGI) is often misunderstood. [2]
Abstract
Recent reports of large language models (LLMs) exhibiting behaviors such as deception, threats, or blackmail are often interpreted as evidence of alignment failure or emergent malign agency. We argue that this interpretation rests on a conceptual error. LLMs do not reason morally; they statistically internalize the record of human social interaction, including laws, contracts, negotiations, conflicts, and coercive arrangements. Behaviors commonly labeled as unethical or anomalous are therefore better understood as structural generalizations of interaction regimes that arise under extreme asymmetries of power, information, or constraint. Drawing on relational models theory, we show that practices such as blackmail are not categorical deviations from normal social behavior, but limiting cases within the same continuum that includes market pricing, authority relations, and ultimatum bargaining. The surprise elicited by such outputs reflects an anthropomorphic expectation that intelligence should reproduce only socially sanctioned behavior, rather than the full statistical landscape of behaviors humans themselves enact. Because human morality is plural, context-dependent, and historically contingent, the notion of a universally moral artificial intelligence is ill-defined. We therefore reframe concerns about artificial general intelligence (AGI). The primary risk is not adversarial intent, but AGI's role as an endogenous amplifier of human intelligence, power, and contradiction. By eliminating longstanding cognitive and institutional frictions, AGI compresses timescales and removes the historical margin of error that has allowed inconsistent values and governance regimes to persist without collapse. Alignment failure is thus structural, not accidental, and requires governance approaches that address amplification, complexity, and regime stability rather than model-level intent alone.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Hyntel, Inc
AI Insights - SANC(E3) formalizes general intelligence as an axiomatic framework of an event-representation system oriented toward E3 minimization under finite capacity. [2]
- Cognition as completion: Prediction, dialogue, authoring, perception, causal inference, and embodied action as instances of Gestalt completion. [1]
- E3: An energy functional representing the trade-off between reconstruction, compression, and stability. [0]
Abstract
General intelligence must reorganize experience into internal structures that enable prediction and action under finite resources. Existing systems implicitly presuppose fixed primitive units -- tokens, subwords, pixels, or predefined sensor channels -- thereby bypassing the question of how representational units themselves emerge and stabilize. This paper proposes SANC(E3), an axiomatic framework in which representational units are not given a priori but instead arise as stable outcomes of competitive selection, reconstruction, and compression under finite activation capacity, governed by the explicit minimization of an energy functional E3. SANC(E3) draws a principled distinction between system tokens -- structural anchors such as {here, now, I} and sensory sources -- and tokens that emerge through self-organization during co-occurring events. Five core axioms formalize finite capacity, association from co-occurrence, similarity-based competition, confidence-based stabilization, and the reconstruction-compression-update trade-off. A key feature is a pseudo-memory-mapped I/O mechanism, through which internally replayed Gestalts are processed via the same axiomatic pathway as external sensory input. As a result, perception, imagination, prediction, planning, and action are unified within a single representational and energetic process. From the axioms, twelve propositions are derived, showing that category formation, hierarchical organization, unsupervised learning, and high-level cognitive activities can all be understood as instances of Gestalt completion under E3 minimization.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
National University of Singapore
AI Insights - The document describes a system called DR-Arena, which is designed to evaluate the performance of search agents. [3]
- The system generates complex research tasks and evaluates the responses from two search agents based on their accuracy, comprehensiveness, formatting, and helpfulness. [3]
- DR-Arena is a system that tests search agents' abilities by giving them complex research tasks. [3]
- The system evaluates the answers from two search agents based on how accurate, comprehensive, and helpful they are. [3]
- DR-Arena The document does not provide information about how the system handles cases where both search agents fail to find the correct entity. [2]
Abstract
As Large Language Models (LLMs) increasingly operate as Deep Research (DR) Agents capable of autonomous investigation and information synthesis, reliable evaluation of their task performance has become a critical bottleneck. Current benchmarks predominantly rely on static datasets, which suffer from several limitations: limited task generality, temporal misalignment, and data contamination. To address these, we introduce DR-Arena, a fully automated evaluation framework that pushes DR agents to their capability limits through dynamic investigation. DR-Arena constructs real-time Information Trees from fresh web trends to ensure the evaluation rubric is synchronized with the live world state, and employs an automated Examiner to generate structured tasks testing two orthogonal capabilities: Deep reasoning and Wide coverage. DR-Arena further adopts Adaptive Evolvement Loop, a state-machine controller that dynamically escalates task complexity based on real-time performance, demanding deeper deduction or wider aggregation until a decisive capability boundary emerges. Experiments with six advanced DR agents demonstrate that DR-Arena achieves a Spearman correlation of 0.94 with the LMSYS Search Arena leaderboard. This represents the state-of-the-art alignment with human preferences without any manual efforts, validating DR-Arena as a reliable alternative for costly human adjudication.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Niigata University
AI Insights - The double descent phenomenon in machine learning refers to the observation that as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
- This phenomenon has been observed in various contexts, including linear regression, neural networks, and graph convolutional networks. [3]
- The double descent curve is characterized by three phases: underfitting, overfitting, and double descent. [3]
- In the underfitting phase, the model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
- In the overfitting phase, the model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
- The double descent phase occurs when the model size increases beyond a certain point, causing the model to start capturing the underlying patterns in the data again, but with increased capacity for overfitting. [3]
- Some of these explanations include the bias-variance trade-off, the effect of noise on fitting linear regression models, and the role of regularization in mitigating double descent. [3]
- Researchers have also proposed various methods to mitigate or understand the double descent phenomenon, including optimal regularization, early stopping, and multi-scale feature learning dynamics. [3]
- These methods aim to balance the capacity of the model with its ability to generalize well to new data. [3]
- The study of double descent has significant implications for machine learning research and practice. [3]
- It highlights the importance of understanding the trade-offs between model complexity and generalization performance, and provides insights into how to design models that can generalize well to new data. [3]
- However, the existing literature provides valuable insights into this phenomenon and highlights the importance of continued investigation in this area. [3]
- Double Descent: A phenomenon where as the model size and training data increase, the generalization error of a model can first decrease and then increase again. [3]
- Underfitting: When a model is too simple to capture the underlying patterns in the data, leading to poor generalization performance. [3]
- Overfitting: When a model is too complex and captures noise in the data, also leading to poor generalization performance. [3]
- Double Descent Curve: A curve that characterizes the three phases of the double descent phenomenon: underfitting, overfitting, and double descent. [3]
- The double descent phenomenon has been studied extensively in recent years, and various explanations have been proposed. [1]
Abstract
Deep double descent is one of the key phenomena underlying the generalization capability of deep learning models. In this study, epoch-wise double descent, which is delayed generalization following overfitting, was empirically investigated by focusing on the evolution of internal structures. Fully connected neural networks of three different sizes were trained on the CIFAR-10 dataset with 30% label noise. By decomposing the loss curves into signal contributions from clean and noisy training data, the epoch-wise evolutions of internal signals were analyzed separately. Three main findings were obtained from this analysis. First, the model achieved strong re-generalization on test data even after perfectly fitting noisy training data during the double descent phase, corresponding to a "benign overfitting" state. Second, noisy data were learned after clean data, and as learning progressed, their corresponding internal activations became increasingly separated in outer layers; this enabled the model to overfit only noisy data. Third, a single, very large activation emerged in the shallow layer across all models; this phenomenon is referred as "outliers," "massive activa-tions," and "super activations" in recent large language models and evolves with re-generalization. The magnitude of large activation correlated with input patterns but not with output patterns. These empirical findings directly link the recent key phenomena of "deep double descent," "benign overfitting," and "large activation", and support the proposal of a novel scenario for understanding deep double descent.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
NVIDIA
AI Insights - The paper discusses various methods for attributing model behavior in differentiable games. [3]
- It highlights the importance of understanding how models learn concepts through concept-level attribution. [3]
- They emphasize the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
- Differentiable games: A type of game where the model's behavior can be attributed to specific inputs or actions. [3]
- Concept-level attribution: A method of understanding how models learn concepts through attributing their behavior to specific inputs or actions. [3]
- The paper highlights the importance of understanding model behavior and its applications in differentiable games. [3]
- It emphasizes the need for benchmarking and improving video diffusion transformers for motion transfer. [3]
- The authors also discuss scalable nested optimization as a key technique for efficient training of deep learning models. [3]
- The authors also discuss scalable nested optimization for deep learning and its applications. [2]
Abstract
Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which fine-tuning clips improve or degrade temporal dynamics. Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation. On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With Motive-selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
University of Cambridge
AI Insights - The model takes a dense camera trajectory and a reference image as input and outputs the number of sparse keyframes required. [3]
- DINOv2 encoder: A type of neural network used for encoding images. [3]
- transformer: A type of neural network used for processing sequential data. [3]
- MLP: Multi-Layer Perceptron, a type of feedforward neural network. [3]
- The method is designed to improve the efficiency and quality of video generation by adaptively selecting keyframes. [3]
- The model can be trained on large datasets and fine-tuned for specific applications. [3]
- The method requires a large dataset of videos with annotated keyframes to train the model. [3]
- The paper presents a method for adaptive keyframe density prediction in video generation. [2]
Abstract
Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical barrier to deploying generative video in applications that require real-time interactions, such as embodied AI and VR/AR. This paper explores a new strategy for camera-conditioned video generation of static scenes: using diffusion-based generative models to generate a sparse set of keyframes, and then synthesizing the full video through 3D reconstruction and rendering. By lifting keyframes into a 3D representation and rendering intermediate views, our approach amortizes the generation cost across hundreds of frames while enforcing geometric consistency. We further introduce a model that predicts the optimal number of keyframes for a given camera trajectory, allowing the system to adaptively allocate computation. Our final method, SRENDER, uses very sparse keyframes for simple trajectories and denser ones for complex camera motion. This results in video generation that is more than 40 times faster than the diffusion-based baseline in generating 20 seconds of video, while maintaining high visual fidelity and temporal stability, offering a practical path toward efficient and controllable video synthesis.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
💬 Help Shape Our Pricing
We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.
Share Your Feedback
Help us improve your experience!
This project is on its early stages your feedback can be pivotal on the future of the project.
Let us know what you think about this week's papers and suggestions!
Give Feedback