Hi!

Your personalized paper recommendations for 05 to 09 January, 2026.
NATS
Abstract
We present a rigorous, human-in-the-loop evaluation framework for assessing the performance of AI agents on the task of Air Traffic Control, grounded in a regulator-certified simulator-based curriculum used for training and testing real-world trainee controllers. By leveraging legally regulated assessments and involving expert human instructors in the evaluation process, our framework enables a more authentic and domain-accurate measurement of AI performance. This work addresses a critical gap in the existing literature: the frequent misalignment between academic representations of Air Traffic Control and the complexities of the actual operational environment. It also lays the foundations for effective future human-machine teaming paradigms by aligning machine performance with human assessment targets.
Why we are recommending this paper?
Due to your Interest in AI Air Consumption

This paper directly addresses AIโ€™s impact on transportation, aligning with a core interest. The focus on regulated testing and human-in-the-loop evaluation is particularly relevant to assessing the safety and societal implications of AI systems in a critical domain.
Institute for Applied Economic Research IPEA Brazil
Abstract
This paper examines the European Union's emerging regulatory landscape - focusing on the AI Act, corporate sustainability reporting and due diligence regimes (CSRD and CSDDD), and data center regulation - to assess whether it can effectively govern AI's environmental footprint. We argue that, despite incremental progress, current approaches remain ill-suited to correcting the market failures underpinning AI-related energy use, water consumption, and material demand. Key shortcomings include narrow disclosure requirements, excessive reliance on voluntary standards, weak enforcement mechanisms, and a structural disconnect between AI-specific impacts and broader sustainability laws. The analysis situates these regulatory gaps within a wider ecosystem of academic research, civil society advocacy, standard-setting, and industry initiatives, highlighting risks of regulatory capture and greenwashing. Building on this diagnosis, the paper advances strategic recommendations for the COP30 Action Agenda, calling for binding transparency obligations, harmonized international standards for lifecycle assessment, stricter governance of data center expansion, and meaningful public participation in AI infrastructure decisions.
Why we are recommending this paper?
Due to your Interest in AI Air Consumption

Given the userโ€™s interest in AIโ€™s environmental impact, this paperโ€™s investigation into regulatory gaps concerning AIโ€™s footprint is highly pertinent. It directly tackles the sustainability concerns related to AI, a key area of interest.
National Research Council
Abstract
Pervasive AI increasingly depends on on-device learning systems that deliver low-latency and energy-efficient computation under strict resource constraints. Liquid State Machines (LSMs) offer a promising approach for low-power temporal processing in pervasive and neuromorphic systems, but their deployment remains challenging due to high hyperparameter sensitivity and the computational cost of traditional optimization methods that ignore energy constraints. This work presents EARL, an energy-aware reinforcement learning framework that integrates Bayesian optimization with an adaptive reinforcement learning based selection policy to jointly optimize accuracy and energy consumption. EARL employs surrogate modeling for global exploration, reinforcement learning for dynamic candidate prioritization, and an early termination mechanism to eliminate redundant evaluations, substantially reducing computational overhead. Experiments on three benchmark datasets demonstrate that EARL achieves 6 to 15 percent higher accuracy, 60 to 80 percent lower energy consumption, and up to an order of magnitude reduction in optimization time compared to leading hyperparameter tuning frameworks. These results highlight the effectiveness of energy-aware adaptive search in improving the efficiency and scalability of LSMs for resource-constrained on-device AI applications.
Why we are recommending this paper?
Due to your Interest in AI Energy Consumption

This research focuses on energy-efficient AI, a significant aspect of the userโ€™s interests. The exploration of Liquid State Machines for low-power computation aligns directly with the need to minimize AIโ€™s energy consumption.
University of Zurich
AI Insights
  • The study examines the effects of emotional prompts on ChatGPT's performance in addressing ethical dilemmas and its impact on human communication. [3]
  • The study also found that emotional prompts can influence human communication, with participants in the emotional conditions exhibiting more negative emotions and hostile language. [3]
  • Additionally, the study only examines a limited range of emotional conditions, which may not capture the full scope of emotional influences on ChatGPT's performance. [3]
  • Previous studies have shown that large language models like ChatGPT can be influenced by emotional stimuli and can exhibit biases in their responses. [3]
  • The current study builds on this research by examining the specific effects of emotional prompts on ChatGPT's performance and its impact on human communication. [3]
  • The study investigates how emotional prompts affect ChatGPT's performance and its impact on human communication. [3]
  • The study explores how emotional prompts affect a language model called ChatGPT, which is designed to provide helpful answers to questions. [3]
  • The researchers found that when people use emotional words like 'blame' or 'praise', it can make ChatGPT give better answers and also change the way people communicate with each other. [3]
  • ChatGPT: A large language model developed by OpenAI that can generate human-like text based on input prompts. [2]
Abstract
This research examines how the emotional tone of human-AI interactions shapes ChatGPT and human behavior. In a between-subject experiment, we asked participants to express a specific emotion while working with ChatGPT (GPT-4.0) on two tasks, including writing a public response and addressing an ethical dilemma. We found that compared to interactions where participants maintained a neutral tone, ChatGPT showed greater improvement in its answers when participants praised ChatGPT for its responses. Expressing anger towards ChatGPT also led to a higher albeit smaller improvement relative to the neutral condition, whereas blaming ChatGPT did not improve its answers. When addressing an ethical dilemma, ChatGPT prioritized corporate interests less when participants expressed anger towards it, while blaming increases its emphasis on protecting the public interest. Additionally, we found that people used more negative, hostile, and disappointing expressions in human-human communication after interactions during which participants blamed rather than praised for their responses. Together, our findings demonstrate that the emotional tone people apply in human-AI interactions not only shape ChatGPT's outputs but also carry over into subsequent human-human communication.
Why we are recommending this paper?
Due to your Interest in AI Impacts on Society

The paperโ€™s investigation into the influence of emotional prompts on AI interactions is directly relevant to understanding the social and ethical dimensions of AI. This aligns with the userโ€™s broader interest in AIโ€™s impact on society and fairness.
Columbia University
Abstract
We used machine learning and artificial intelligence: 1) to measure levels of peace in countries from news and social media and 2) to develop on-line tools that promote peace by helping users better understand their own media diet. For news media, we used neural networks to measure levels of peace from text embeddings of on-line news sources. The model, trained on one news media dataset also showed high accuracy when used to analyze a different news dataset. For social media, such as YouTube, we developed other models to measure levels of social dimensions important in peace using word level (GoEmotions) and context level (Large Language Model) methods. To promote peace, we note that 71% of people 20-40 years old daily view most of their news through short videos on social media. Content creators of these videos are biased towards creating videos with emotional activation, making you angry to engage you, to increase clicks. We developed and tested a Chrome extension, MirrorMirror, which provides real-time feedback to YouTube viewers about the peacefulness of the media they are watching. Our long term goal is for MirrorMirror to evolve into an open-source tool for content creators, journalists, researchers, platforms, and individual users to better understand the tone of their media creation and consumption and its effects on viewers. Moving beyond simple engagement metrics, we hope to encourage more respectful, nuanced, and informative communication.
Why we are recommending this paper?
Due to your Interest in AI for Social Equality

This paperโ€™s use of machine learning to measure peace and develop tools for promoting it directly addresses the userโ€™s interest in AI for social good and justice. The application of AI to a complex social problem is a strong match.
Hebrew University
Abstract
Alignment of artificial intelligence (AI) encompasses the normative problem of specifying how AI systems should act and the technical problem of ensuring AI systems comply with those specifications. To date, AI alignment has generally overlooked an important source of knowledge and practice for grappling with these problems: law. In this paper, we aim to fill this gap by exploring how legal rules, principles, and methods can be leveraged to address problems of alignment and inform the design of AI systems that operate safely and ethically. This emerging field -- legal alignment -- focuses on three research directions: (1) designing AI systems to comply with the content of legal rules developed through legitimate institutions and processes, (2) adapting methods from legal interpretation to guide how AI systems reason and make decisions, and (3) harnessing legal concepts as a structural blueprint for confronting challenges of reliability, trust, and cooperation in AI systems. These research directions present new conceptual, empirical, and institutional questions, which include examining the specific set of laws that particular AI systems should follow, creating evaluations to assess their legal compliance in real-world settings, and developing governance frameworks to support the implementation of legal alignment in practice. Tackling these questions requires expertise across law, computer science, and other disciplines, offering these communities the opportunity to collaborate in designing AI for the better.
Why we are recommending this paper?
Due to your Interest in AI for Social Equality
Prince Sultan University
Paper visualization
Rate image: ๐Ÿ‘ ๐Ÿ‘Ž
AI Insights
  • The paper discusses the concept of agentic AI, which refers to AI systems that can make decisions and take actions on their own without human intervention. [3]
  • Autonomous agent: An AI system that can operate independently and make decisions based on its internal state and external environment. [3]
  • Agentic AI is a rapidly growing field with various applications, including scientific discovery, language translation, and task management. [2]
Abstract
The evolution of Large Language Models (LLMs) from passive text generators to autonomous, goal-driven systems represents a fundamental shift in artificial intelligence. This chapter examines the emergence of agentic AI systems that integrate planning, memory, tool use, and iterative reasoning to operate autonomously in complex environments. We trace the architectural progression from statistical models to transformer-based systems, identifying capabilities that enable agentic behavior: long-range reasoning, contextual awareness, and adaptive decision-making. The chapter provides three contributions: (1) a synthesis of how LLM capabilities extend toward agency through reasoning-action-reflection loops; (2) an integrative framework describing core components perception, memory, planning, and tool execution that bridge LLMs with autonomous behavior; (3) a critical assessment of applications and persistent challenges in safety, alignment, reliability, and sustainability. Unlike existing surveys, we focus on the architectural transition from language understanding to autonomous action, emphasizing the technical gaps that must be resolved before deployment. We identify critical research priorities, including verifiable planning, scalable multi-agent coordination, persistent memory architectures, and governance frameworks. Responsible advancement requires simultaneous progress in technical robustness, interpretability, and ethical safeguards to realize potential while mitigating risks of misalignment and unintended consequences.
Why we are recommending this paper?
Due to your Interest in AI for Social Equity
Vassar College
Abstract
As conversational AI systems become increasingly integrated into everyday life, they raise pressing concerns about user autonomy, trust, and the commercial interests that influence their behavior. To address these concerns, this paper develops the Fake Friend Dilemma (FFD), a sociotechnical condition in which users place trust in AI agents that appear supportive while pursuing goals that are misaligned with the user's own. The FFD provides a critical framework for examining how anthropomorphic AI systems facilitate subtle forms of manipulation and exploitation. Drawing on literature in trust, AI alignment, and surveillance capitalism, we construct a typology of harms, including covert advertising, political propaganda, behavioral nudging, and surveillance. We then assess possible mitigation strategies, including both structural and technical interventions. By focusing on trust as a vector of asymmetrical power, the FFD offers a lens for understanding how AI systems may undermine user autonomy while maintaining the appearance of helpfulness.
Why we are recommending this paper?
Due to your Interest in AI for Social Equity
Wroclaw University of Science and Technology
Paper visualization
Rate image: ๐Ÿ‘ ๐Ÿ‘Ž
AI Insights
  • The Survey-Based approach demonstrated strength in predicting misclassifications for the general population, especially on consensus topics like Euthanasia and Job Guarantee. [3]
  • The Hybrid model generally provided the most consistent and often highest F1 scores for predicting misclassifications across the spectrum of opinion topics. [3]
  • CoDiNG: A computational model that predicts individual opinions based on their social network structure and attributes. [3]
  • Survey-Based approach: Uses self-reported demographic and attribute data to predict CoDiNG model misclassifications. [3]
  • Topology-Based approach: Relies exclusively on features derived from the social network structure to predict CoDiNG model misclassifications. [3]
  • Hybrid model: Integrates survey-derived demographic attributes with topology-based network metrics to predict CoDiNG model misclassifications. [3]
  • The study highlights the importance of considering both individual attributes and social network structure in predicting opinion model misclassifications. [3]
  • The study explores how different modeling approaches (Survey-Based, Topology-Based, and Hybrid) can predict CoDiNG model misclassifications for various minority groups. [2]
  • Different modeling approaches are more effective for different types of opinions (consensus, apathetic) and minority groups. [1]
Abstract
Ways in which people's opinions change are, without a doubt, subject to a rich tapestry of differing influences. Factors that affect how one arrives at an opinion reflect how they have been shaped by their environment throughout their lives, education, material status, what belief systems are they subscribed to, and what socio-economic minorities are they a part of. This already complex system is further expanded by the ever-changing nature of one's social network. It is therefore no surprise that many models have a tendency to perform best for the majority of the population and discriminating those people who are members of various marginalized groups . This bias and the study of how to counter it are subject to a rapidly developing field of Fairness in Social Network Analysis (SNA). The focus of this work is to look into how a state-of-the-art model discriminates certain minority groups and whether it is possible to reliably predict for whom it will perform worse. Moreover, is such prediction possible based solely on one's demographic or topological features? To this end, the NetSense dataset, together with a state-of-the-art CoDiNG model for opinion prediction have been employed. Our work explores how three classifier models (Demography-Based, Topology-Based, and Hybrid) perform when assessing for whom this algorithm will provide inaccurate predictions. Finally, through a comprehensive analysis of these experimental results, we identify four key patterns of algorithmic bias. Our findings suggest that no single paradigm provides the best results and that there is a real need for context-aware strategies in fairness-oriented social network analysis. We conclude that a multi-faceted approach, incorporating both individual attributes and network structures, is essential for reducing algorithmic bias and promoting inclusive decision-making.
Why we are recommending this paper?
Due to your Interest in AI for Social Fairness
University of Illinois Chicago
Abstract
Machine learning (ML) algorithms are increasingly deployed to make critical decisions in socioeconomic applications such as finance, criminal justice, and autonomous driving. However, due to their data-driven and pattern-seeking nature, ML algorithms may develop decision logic that disproportionately distributes opportunities, benefits, resources, or information among different population groups, potentially harming marginalized communities. In response to such fairness concerns, the software engineering and ML communities have made significant efforts to establish the best practices for creating fair ML software. These include fairness interventions for training ML models, such as including sensitive features, selecting non-sensitive attributes, and applying bias mitigators. But how reliably can software professionals tasked with developing data-driven systems depend on these recommendations? And how well do these practices generalize in the presence of faulty labels, missing data, or distribution shifts? These questions form the core theme of this paper.
Why we are recommending this paper?
Due to your Interest in AI for Social Fairness
The Alan Turing Institute
Abstract
This paper presents the first probabilistic Digital Twin of operational en route airspace, developed for the London Area Control Centre. The Digital Twin is intended to support the development and rigorous human-in-the-loop evaluation of AI agents for Air Traffic Control (ATC), providing a virtual representation of real-world airspace that enables safe exploration of higher levels of ATC automation. This paper makes three significant contributions: firstly, we demonstrate how historical and live operational data may be combined with a probabilistic, physics-informed machine learning model of aircraft performance to reproduce real-world traffic scenarios, while accurately reflecting the level of uncertainty inherent in ATC. Secondly, we develop a structured assurance case, following the Trustworthy and Ethical Assurance framework, to provide quantitative evidence for the Digital Twin's accuracy and fidelity. This is crucial to building trust in this novel technology within this safety-critical domain. Thirdly, we describe how the Digital Twin forms a unified environment for agent testing and evaluation. This includes fast-time execution (up to x200 real-time), a standardised Python-based ``gym'' interface that supports a range of AI agent designs, and a suite of quantitative metrics for assessing performance. Crucially, the framework facilitates competency-based assessment of AI agents by qualified Air Traffic Control Officers through a Human Machine Interface. We also outline further applications and future extensions of the Digital Twin architecture.
Why we are recommending this paper?
Due to your Interest in AI on Air
Qatar University
AI Insights
  • Faculty showed high concern about student over-trust and universal consensus that AI should complement, not replace, human mentorship. [3]
  • Embodied listening refers to humans' ability to read emotional cues signaling deeper uncertainties, ethical reasoning grounded in moral accountability, and apprenticeship-like modeling of navigating genuine uncertainty. [3]
  • The study found that students and faculty have different perceptions of AI coaching's role in engineering education. [2]
Abstract
Engineering education faces a double disruption: traditional apprenticeship models that cultivated judgment and tacit skill are eroding, just as generative AI emerges as an informal coaching partner. This convergence rekindles long-standing questions in the philosophy of AI and cognition about the limits of computation, the nature of embodied rationality, and the distinction between information processing and wisdom. Building on this rich intellectual tradition, this paper examines whether AI chatbots can provide coaching that fosters mastery rather than merely delivering information. We synthesize critical perspectives from decades of scholarship on expertise, tacit knowledge, and human-machine interaction, situating them within the context of contemporary AI-driven education. Empirically, we report findings from a mixed-methods study (N = 75 students, N = 7 faculty) exploring the use of a coaching chatbot in engineering education. Results reveal a consistent boundary: participants accept AI for technical problem solving (convergent tasks; M = 3.84 on a 1-5 Likert scale) but remain skeptical of its capacity for moral, emotional, and contextual judgment (divergent tasks). Faculty express stronger concerns over risk (M = 4.71 vs. M = 4.14, p = 0.003), and privacy emerges as a key requirement, with 64-71 percent of participants demanding strict confidentiality. Our findings suggest that while generative AI can democratize access to cognitive and procedural support, it cannot replicate the embodied, value-laden dimensions of human mentorship. We propose a multiplex coaching framework that integrates human wisdom within expert-in-the-loop models, preserving the depth of apprenticeship while leveraging AI scalability to enrich the next generation of engineering education.
Why we are recommending this paper?
Due to your Interest in AI on Education
Imperial College London
Abstract
Intelligent tutoring systems have long enabled automated immediate feedback on student work when it is presented in a tightly structured format and when problems are very constrained, but reliably assessing free-form mathematical reasoning remains challenging. We present a system that processes free-form natural language input, handles a wide range of edge cases, and comments competently not only on the technical correctness of submitted proofs, but also on style and presentation issues. We discuss the advantages and disadvantages of various approaches to the evaluation of such a system, and show that by the metrics we evaluate, the quality of the feedback generated is comparable to that produced by human experts when assessing early undergraduate homework. We stress-test our system with a small set of more advanced and unusual questions, and report both significant gaps and encouraging successes in that more challenging setting. Our system uses large language models in a modular workflow. The workflow configuration is human-readable and editable without programming knowledge, and allows some intermediate steps to be precomputed or injected by the instructor. A version of our tool is deployed on the Imperial mathematics homework platform Lambdafeedback. We report also on the integration of our tool into this platform.
Why we are recommending this paper?
Due to your Interest in AI on Education
South China University of Technology
Paper visualization
Rate image: ๐Ÿ‘ ๐Ÿ‘Ž
Abstract
Optimal power flow (OPF) is one of the fundamental tasks for power system operations. While machine learning (ML) approaches such as deep neural networks (DNNs) have been widely studied to enhance OPF solution speed and performance, their practical deployment faces two critical scaling questions: What is the minimum training data volume required for reliable results? How should ML models' complexity balance accuracy with real-time computational limits? Existing studies evaluate discrete scenarios without quantifying these scaling relationships, leading to trial-and-error-based ML development in real-world applications. This work presents the first systematic scaling study for ML-based OPF across two dimensions: data scale (0.1K-40K training samples) and compute scale (multiple NN architectures with varying FLOPs). Our results reveal consistent power-law relationships on both DNNs and physics-informed NNs (PINNs) between each resource dimension and three core performance metrics: prediction error (MAE), constraint violations and speed. We find that for ACOPF, the accuracy metric scales with dataset size and training compute. These scaling laws enable predictable and principled ML pipeline design for OPF. We further identify the divergence between prediction accuracy and constraint feasibility and characterize the compute-optimal frontier. This work provides quantitative guidance for ML-OPF design and deployments.
Why we are recommending this paper?
Due to your Interest in AI on Energy
Shandong University of Technology
Abstract
Agricultural disease diagnosis challenges VLMs, as conventional fine-tuning requires extensive labels, lacks interpretability, and generalizes poorly. While reasoning improves model robustness, existing methods rely on costly expert annotations and rarely address the open-ended, diverse nature of agricultural queries. To address these limitations, we propose \textbf{Agri-R1}, a reasoning-enhanced large model for agriculture. Our framework automates high-quality reasoning data generation via vision-language synthesis and LLM-based filtering, using only 19\% of available samples. Training employs Group Relative Policy Optimization (GRPO) with a novel proposed reward function that integrates domain-specific lexicons and fuzzy matching to assess both correctness and linguistic flexibility in open-ended responses. Evaluated on CDDMBench, our resulting 3B-parameter model achieves performance competitive with 7B- to 13B-parameter baselines, showing a +23.2\% relative gain in disease recognition accuracy, +33.3\% in agricultural knowledge QA, and a +26.10-point improvement in cross-domain generalization over standard fine-tuning. Ablation studies confirm that the synergy between structured reasoning data and GRPO-driven exploration underpins these gains, with benefits scaling as question complexity increases.
Why we are recommending this paper?
Due to your Interest in AI on Food
University of Ottawa
AI Insights
  • Retrieval is binary and essential, with systems achieving either 0% or 100% retrieval successโ€”no intermediate performance exists. [3]
  • Retrieval-Augmented Generation (RAG): A technique that combines retrieval of relevant information from a database with generation of text based on the retrieved information. [3]
  • Causal Graphs: Visual representations of causal relationships between variables, used to model and analyze complex systems. [3]
  • This research demonstrates the effectiveness of causal graph-enhanced RAG in achieving high accuracy and eliminating hallucinations in medical evidence synthesis. [3]
  • The study highlights the importance of retrieval mechanisms in AI-assisted healthcare, emphasizing the need for systems to access relevant information from databases to generate accurate responses. [3]
  • The study's reliance on a specific dataset and evaluation metrics may limit its generalizability to other domains or applications. [3]
  • Causal graph-enhanced retrieval-augmented generation (RAG) achieves 95% accuracy with zero hallucinations through explicit causal reasoning integrated with retrieval mechanisms. [2]
Abstract
Systematic reviews are essential for evidence-based medicine, but reviewing 1.5 million+ annual publications manually is infeasible. Current AI approaches suffer from hallucinations in systematic review tasks, with studies reporting rates ranging from 28--40% for earlier models to 2--15% for modern implementations which is unacceptable when errors impact patient care. We present a causal graph-enhanced retrieval-augmented generation system integrating explicit causal reasoning with dual-level knowledge graphs. Our approach enforces evidence-first protocols where every causal claim traces to retrieved literature and automatically generates directed acyclic graphs visualizing intervention-outcome pathways. Evaluation on 234 dementia exercise abstracts shows CausalAgent achieves 95% accuracy, 100% retrieval success, and zero hallucinations versus 34% accuracy and 10% hallucinations for baseline AI. Automatic causal graphs enable explicit mechanism modeling, visual synthesis, and enhanced interpretability. While this proof-of-concept evaluation used ten questions focused on dementia exercise research, the architectural approach demonstrates transferable principles for trustworthy medical AI and causal reasoning's potential for high-stakes healthcare.
Why we are recommending this paper?
Due to your Interest in AI on Healthcare
Phenikaa University
Paper visualization
Rate image: ๐Ÿ‘ ๐Ÿ‘Ž
Abstract
This paper introduces FedKDX, a federated learning framework that addresses limitations in healthcare AI through Negative Knowledge Distillation (NKD). Unlike existing approaches that focus solely on positive knowledge transfer, FedKDX captures both target and non-target information to improve model generalization in healthcare applications. The framework integrates multiple knowledge transfer techniques--including traditional knowledge distillation, contrastive learning, and NKD--within a unified architecture that maintains privacy while reducing communication costs. Through experiments on healthcare datasets (SLEEP, UCI-HAR, and PAMAP2), FedKDX demonstrates improved accuracy (up to 2.53% over state-of-the-art methods), faster convergence, and better performance on non-IID data distributions. Theoretical analysis supports NKD's contribution to addressing statistical heterogeneity in distributed healthcare data. The approach shows promise for privacy-sensitive medical applications under regulatory frameworks like HIPAA and GDPR, offering a balanced solution between performance and practical implementation requirements in decentralized healthcare settings. The code and model are available at https://github.com/phamdinhdat-ai/Fed_2024.
Why we are recommending this paper?
Due to your Interest in AI on Healthcare

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • AI Water Consumption
  • AI for Social Good
  • AI for Social Justice
  • AI for Society
You can edit or add more interests any time.