Hi!

Your personalized paper recommendations for 26 to 30 January, 2026.

FrontierScience: Evaluating AI's Ability to Perform Expert-Level Scientific Tasks

OpenAI

Rate paper: 👍 👎 ♥ Save

AI Insights

The use of a judge model based on GPT-5 thinking at high reasoning effort ensures that responses are graded fairly and consistently. (ML: 0.99)👍👎
The evaluation process uses a judge model based on GPT-5 thinking at high reasoning effort to grade responses. (ML: 0.98)👍👎
Judge model: A model based on GPT-5 thinking at high reasoning effort that is used to grade responses to problems in the FrontierScience benchmark. (ML: 0.97)👍👎
Problems in both sets are designed to be challenging and require 3-5 hours to draft, with a focus on testing depth of reasoning rather than prose, search, or recency (knowledge cutoffs). (ML: 0.96)👍👎
The benchmark includes guidelines for problem writers to ensure clarity, originality, grading consistency, and difficulty. (ML: 0.95)👍👎
FrontierScience-Research: A set of problems designed to test the ability of large language models to solve complex scientific problems at a graduate or research level. (ML: 0.94)👍👎
FrontierScience is a benchmark for evaluating the scientific problem-solving abilities of large language models. (ML: 0.93)👍👎
FrontierScience-Olympiad: A set of problems designed to test the ability of large language models to solve complex scientific problems at a high school or undergraduate level. (ML: 0.92)👍👎
The FrontierScience benchmark provides a comprehensive evaluation of large language models' scientific problem-solving abilities, covering various levels of complexity and difficulty. (ML: 0.91)👍👎
The benchmark consists of two sets: FrontierScience-Olympiad and FrontierScience-Research, which test different levels of reasoning and complexity. (ML: 0.91)👍👎

Abstract
We introduce FrontierScience, a benchmark evaluating expert-level scientific reasoning in frontier language models. Recent model progress has nearly saturated existing science benchmarks, which often rely on multiple-choice knowledge questions or already published information. FrontierScience addresses this gap through two complementary tracks: (1) Olympiad, consisting of international olympiad problems at the level of IPhO, IChO, and IBO, and (2) Research, consisting of PhD-level, open-ended problems representative of sub-tasks in scientific research. FrontierScience contains several hundred questions (including 160 in the open-sourced gold set) covering subfields across physics, chemistry, and biology, from quantum electrodynamics to synthetic organic chemistry. All Olympiad problems are originally produced by international Olympiad medalists and national team coaches to ensure standards of difficulty, originality, and factuality. All Research problems are research sub-tasks written and verified by PhD scientists (doctoral candidates, postdoctoral researchers, or professors). For Research, we introduce a granular rubric-based evaluation framework to assess model capabilities throughout the process of solving a research task, rather than judging only a standalone final answer.

Why we are recommending this paper?
Due to your Interest in AI on Energy

This paper from OpenAI directly addresses the critical need to assess AI’s capabilities in scientific reasoning, aligning with your interest in AI’s impact on knowledge creation and problem-solving. Evaluating advanced AI’s performance on complex scientific tasks is a key area of concern given your broader interest in AI’s potential.

Like a Therapist, But Not: Reddit Narratives of AI in Mental Health Contexts

Drexel University

Rate paper: 👍 👎 ♥ Save

AI Insights

The study's reliance on self-reported data from Reddit users may introduce biases and limit the generalizability of the findings. (ML: 0.99)👍👎
Users reported feeling more comfortable discussing their mental health with a machine than with a human, citing concerns about stigma and judgment. (ML: 0.98)👍👎
The study analyzed 1.3 million posts from Reddit's mental health communities and found that users often turn to AI-powered chatbots for support. (ML: 0.98)👍👎
A significant proportion of users reported using AI-powered chatbots as a supplement to traditional therapy, rather than a replacement for it. (ML: 0.98)👍👎
Users value the convenience and accessibility of AI-powered chatbots, but also express concerns about their limitations and potential biases. (ML: 0.98)👍👎
Users praised the convenience and accessibility of AI-powered chatbots, but also expressed concerns about their limitations and potential biases. (ML: 0.98)👍👎
Further research is needed to fully understand the effectiveness and limitations of AI-powered chatbots in supporting mental health. (ML: 0.96)👍👎
The study highlights the growing trend of using AI-powered chatbots as a supplement to traditional therapy for mental health support. (ML: 0.96)👍👎
The most common topics discussed in the mental health communities were depression, anxiety, and relationships. (ML: 0.94)👍👎
AI-powered chatbot: A computer program that uses artificial intelligence to simulate human-like conversations with users. (ML: 0.87)👍👎

Abstract
Large language models (LLMs) are increasingly used for emotional support and mental health-related interactions outside clinical settings, yet little is known about how people evaluate and relate to these systems in everyday use. We analyze 5,126 Reddit posts from 47 mental health communities describing experiential or exploratory use of AI for emotional support or therapy. Grounded in the Technology Acceptance Model and therapeutic alliance theory, we develop a theory-informed annotation framework and apply a hybrid LLM-human pipeline to analyze evaluative language, adoption-related attitudes, and relational alignment at scale. Our results show that engagement is shaped primarily by narrated outcomes, trust, and response quality, rather than emotional bond alone. Positive sentiment is most strongly associated with task and goal alignment, while companionship-oriented use more often involves misaligned alliances and reported risks such as dependence and symptom escalation. Overall, this work demonstrates how theory-grounded constructs can be operationalized in large-scale discourse analysis and highlights the importance of studying how users interpret language technologies in sensitive, real-world contexts.

Why we are recommending this paper?
Due to your Interest in AI for Social Justice

Given your interest in AI’s impact on society and particularly on mental health, this paper offers a fascinating look at how people are interacting with AI in a sensitive domain. Analyzing user narratives provides valuable insights into the ethical and social implications of AI in this context.

Where Do the Joules Go? Diagnosing Inference Energy Consumption

University of Michigan

Rate paper: 👍 👎 ♥ Save

AI Insights

Energy per video (J): The amount of energy consumed by an AI model to process one input video. (ML: 0.95)👍👎
Energy per image (J): The amount of energy consumed by an AI model to process one input image. (ML: 0.94)👍👎
Batch size: The number of input tokens, images, or videos processed simultaneously by an AI model. (ML: 0.93)👍👎
The authors also discuss the limitations of current methods for measuring and optimizing energy consumption. (ML: 0.91)👍👎
They also compare the energy efficiency of different hardware configurations, such as B200 and H100. (ML: 0.90)👍👎
Energy per token (J): The amount of energy consumed by an AI model to process one input token. (ML: 0.89)👍👎
Increasing batch size can lead to significant reductions in energy consumption, but only up to a certain point. (ML: 0.87)👍👎
The authors analyze the energy consumption per token, image, and video for different models and batch sizes. (ML: 0.81)👍👎
The paper discusses the energy consumption of various AI models, including Qwen3, Llama 3.1, and Gemma 3. (ML: 0.80)👍👎
The results show that increasing batch size can lead to significant reductions in energy consumption, but only up to a certain point. (ML: 0.76)👍👎

Abstract
Energy is now a critical ML computing resource. While measuring energy consumption and observing trends is a valuable first step, accurately understanding and diagnosing why those differences occur is crucial for optimization. To that end, we begin by presenting a large-scale measurement study of inference time and energy across the generative AI landscape with 46 models, 7 tasks, and 1,858 different configurations on NVIDIA H100 and B200 GPUs. Our empirical findings span order-of-magnitude variations: LLM task type can lead to 25$\times$ energy differences, video generation sometimes consumes more than 100$\times$ the energy of images, and GPU utilization differences can result in 3--5$\times$ energy differences. Based on our observations, we present a framework for reasoning about the underlying mechanisms that govern time and energy consumption. The essence is that time and energy are determined by latent metrics like memory and utilization, which are in turn affected by various factors across the algorithm, software, and hardware layers. Our framework also extends directly to throughput per watt, a critical metric for power-constrained datacenters.

Why we are recommending this paper?
Due to your Interest in AI Energy Consumption

With your interest in AI Air Consumption and Energy Consumption, this research directly tackles the critical issue of energy efficiency in AI systems. Understanding the root causes of energy usage is crucial for sustainable AI development.

Sustainable Open-Source AI Requires Tracking the Cumulative Footprint of Derivatives

Vector Institute for Artificial Intelligence

Rate paper: 👍 👎 ♥ Save

AI Insights

The article emphasizes the importance of transparency and accountability in the development and deployment of AI systems, particularly with regards to their environmental impact. (ML: 0.96)👍👎
The authors suggest that policymakers and industry leaders should work together to establish regulations and guidelines for the development and deployment of AI systems that minimize their environmental footprint. (ML: 0.95)👍👎
The authors emphasize the need for policymakers, industry leaders, and researchers to work together to establish regulations and guidelines for the development and deployment of AI systems that minimize their environmental footprint. (ML: 0.95)👍👎
The article discusses the need for tracking the cumulative footprint of derivatives in open-source AI, particularly in language models. (ML: 0.95)👍👎
Standardization: Standardization in this context refers to the development of a standardized method for calculating energy consumption and emissions associated with AI systems. (ML: 0.94)👍👎
The article highlights various challenges associated with measuring the energy consumption of AI systems, including the lack of standardization and the difficulty of estimating indirect emissions. (ML: 0.93)👍👎
Derivatives: In the context of open-source AI, derivatives refer to the various versions or updates of a language model. (ML: 0.91)👍👎
The article concludes that tracking the cumulative footprint of derivatives in open-source AI is essential for understanding their environmental impact and mitigating their carbon emissions. (ML: 0.91)👍👎
The authors argue that this is essential for understanding the environmental impact of these models and mitigating their carbon emissions. (ML: 0.90)👍👎
Cumulative Footprint: The cumulative footprint refers to the total amount of energy consumed and emissions produced by a language model over its entire lifecycle, including development, deployment, and maintenance. (ML: 0.90)👍👎
The authors propose a framework for tracking the cumulative footprint of derivatives in open-source AI, which involves developing a standardized method for calculating energy consumption and emissions. (ML: 0.84)👍👎

Abstract
Open-source AI is scaling rapidly, and model hubs now host millions of artifacts. Each foundation model can spawn large numbers of fine-tunes, adapters, quantizations, merges, and forks. We take the position that compute efficiency alone is insufficient for sustainability in open-source AI: lower per-run costs can accelerate experimentation and deployment, increasing aggregate environmental footprint unless impacts are measurable and comparable across derivative lineages. However, the energy use, water consumption, and emissions of these derivative lineages are rarely measured or disclosed in a consistent, comparable manner, leaving ecosystem-level impact largely invisible. We argue that sustainable open-source AI requires coordination infrastructure that tracks impacts across model lineages, not only base models. We propose Data and Impact Accounting (DIA), a lightweight, non-restrictive transparency layer that (i) standardizes carbon and water reporting metadata, (ii) integrates low-friction measurement into common training and inference pipelines, and (iii) aggregates reports through public dashboards to summarize cumulative impacts across releases and derivatives. DIA makes derivative costs visible and supports ecosystem-level accountability while preserving openness. https://vectorinstitute.github.io/ai-impact-accounting/

Why we are recommending this paper?
Due to your Interest in AI Energy Consumption

This paper’s focus on the sustainability of open-source AI aligns with your broader interest in AI’s environmental impact and social responsibility. It highlights the need for a more comprehensive approach to measuring and mitigating the environmental footprint of AI models.

A Human-Centred AI System for Multi-Actor Planning and Collaboration in Family Learning

University of Notre Dame

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

The results suggest that while AI-powered learning tools can be beneficial, they require careful planning and adaptation to meet the unique needs of each family. (ML: 0.99)👍👎
The results show that while some families effectively utilized AI-powered learning tools for task distribution and tutoring assistance, others struggled to adapt to these new methods. (ML: 0.99)👍👎
The study emphasizes the need for more research on the effective implementation of AI-powered learning tools in family settings, considering factors such as caregiver roles, individual needs, and family dynamics. (ML: 0.98)👍👎
The study explores the use of AI-powered learning tools in family settings, focusing on task distribution and tutoring assistance. (ML: 0.98)👍👎
The study highlights the importance of considering family dynamics, caregiver roles, and individual needs when implementing AI-powered learning tools in family settings. (ML: 0.98)👍👎
Tutoring assistance: The provision of guidance and support by caregivers or AI-powered systems to help family members complete tasks. (ML: 0.97)👍👎
Task distribution: The process of assigning tasks to family members based on their abilities and availability. (ML: 0.97)👍👎
Limited sample size (ML: 0.96)👍👎
AI-powered learning tools: Technology-based platforms that provide personalized learning experiences and support for students. (ML: 0.95)👍👎
Eleven families participated in the study, with a total of 44 individuals involved. (ML: 0.94)👍👎

Abstract
Family learning takes place in everyday routines where children and caregivers read, practice, and develop new skills together. Despite growing interest in AI tutors, most existing systems are designed for single learners or classroom settings and do not address the distributed planning, coordination, and execution demands of learning at home. This paper introduces ParPal, a human-centred, LLM-powered system that supports multi-actor family learning by decomposing learning goals into actionable subtasks, allocating them across caregivers under realistic availability and expertise constraints, and providing caregiver-in-the-loop tutoring support with visibility into individual and collective contributions. Through expert evaluation of generated weekly learning plans and a one-week field deployment with 11 families, we identify systematic failure modes in current LLM-based planning, including misalignment with role expertise, unnecessary or costly collaboration, missing pedagogical learning trajectories, and physically or temporally infeasible tasks. While ParPal improves coordination clarity and recognition of caregiving effort, these findings expose fundamental limitations in how current LLMs operationalize pedagogical knowledge, reason about collaboration, and account for real-world, embodied constraints. We discuss implications for human-centred AI design and AI methodology, positioning multi-actor family learning as a critical testbed for advancing planning, adaptation, and pedagogical structure in next-generation AI systems.

Why we are recommending this paper?
Due to your Interest in AI for Social Good

This research directly addresses the intersection of AI and human interaction within a learning environment, a topic of significant interest given your focus on AI's role in education and social development. The human-centered design approach is particularly relevant.

When Life Gives You AI, Will You Turn It Into A Market for Lemons? Understanding How Information Asymmetries About AI System Capabilities Affect Market Outcomes and Adoption

University of Gttingen

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

Participants who were more familiar with the tasks and had a higher affinity for technology were more likely to delegate decisions to AI. (ML: 0.99)👍👎
The findings suggest that users are more likely to delegate decisions to AI when they have access to accurate and reliable information about each system. (ML: 0.99)👍👎
The researchers suggest that the findings have implications for the design of AI systems and the information provided to users, as well as for the development of policies regulating AI decision-making. (ML: 0.98)👍👎
Lemon density: The proportion of AI systems in the pool that are lemons (i.e., low-accuracy or high-error-rate AIs). (ML: 0.98)👍👎
Delegation to AI: The percentage of decisions made by participants using an AI system. (ML: 0.98)👍👎
The study also found that participants' risk attitudes and perceived lemon density did not have a significant impact on their delegation behavior. (ML: 0.97)👍👎
The study highlights the importance of considering both information disclosure and lemon density when designing AI systems. (ML: 0.97)👍👎
The study aims to investigate how information disclosure affects the behavior of individuals when delegating decisions to AI systems. (ML: 0.97)👍👎
The researchers recruited 330 participants, half of whom were female, and assigned them to one of seven conditions based on the level of information disclosure and lemon density. (ML: 0.96)👍👎
However, the presence of lemons in the AI pool can undermine this effect, leading to decreased delegation rates. (ML: 0.96)👍👎
The results showed that delegation to AI increased with higher levels of information disclosure, but this effect was moderated by the presence of lemons in the AI pool. (ML: 0.96)👍👎
Information disclosure: The amount of information provided to users about each AI system, including its accuracy and data quality. (ML: 0.94)👍👎
Coins earned: The number of virtual coins earned by participants as a result of correct predictions across the 30 trials. (ML: 0.92)👍👎

Abstract
AI consumer markets are characterized by severe buyer-supplier market asymmetries. Complex AI systems can appear highly accurate while making costly errors or embedding hidden defects. While there have been regulatory efforts surrounding different forms of disclosure, large information gaps remain. This paper provides the first experimental evidence on the important role of information asymmetries and disclosure designs in shaping user adoption of AI systems. We systematically vary the density of low-quality AI systems and the depth of disclosure requirements in a simulated AI product market to gauge how people react to the risk of accidentally relying on a low-quality AI system. Then, we compare participants' choices to a rational Bayesian model, analyzing the degree to which partial information disclosure can improve AI adoption. Our results underscore the deleterious effects of information asymmetries on AI adoption, but also highlight the potential of partial disclosure designs to improve the overall efficiency of human decision-making.

Why we are recommending this paper?
Due to your Interest in AI Air Consumption

COMET-SG1: Lightweight Autoregressive Regressor for Edge and Embedded AI

Indian Institute of Technology Guwahati

Rate paper: 👍 👎 ♥ Save

AI Insights

The design choice may limit the model's ability to capture complex temporal patterns. (ML: 0.98)👍👎
The model's performance may degrade in the presence of concept drift or non-stationarity. (ML: 0.96)👍👎
Autoregressive regressor: A type of machine learning model that uses past values to predict future values. (ML: 0.95)👍👎
Edge AI: Artificial intelligence applied at the edge of a network, close to where data is generated or consumed. (ML: 0.85)👍👎
The design choice enables predictable behavior under rollout while remaining compatible with deterministic and resource-constrained execution environments. (ML: 0.80)👍👎
COMET-SG1 is a stability-oriented autoregressive regressor explicitly designed around edge AI constraints. (ML: 0.70)👍👎
The proposed approach is a practical step toward stability-oriented time-series regression for edge and embedded AI systems. (ML: 0.68)👍👎
The model prioritizes bounded long-horizon behavior and achieves predictable rollout while maintaining a compact footprint suitable for embedded systems. (ML: 0.64)👍👎
Microcontroller-class edge platforms: Small computers used in embedded systems, such as Arduino and Cortex-M–based systems. (ML: 0.54)👍👎
The design of COMET-SG1 aligns closely with the constraints of microcontroller-class edge platforms such as Arduino- and Cortex-M–based systems. (ML: 0.52)👍👎

Abstract
COMET-SG1 is a lightweight, stability-oriented autoregressive regression model designed for time-series prediction on edge and embedded AI systems. Unlike recurrent neural networks or transformer-based sequence models, COMET-SG1 operates through linear behavior-space encoding, memory-anchored transition estimation, and deterministic state updates. This structure prioritizes bounded long-horizon behavior under fully autoregressive inference, a critical requirement for edge deployment where prediction errors accumulate over time. Experiments on non-stationary synthetic time-series data demonstrate that COMET-SG1 achieves competitive short-horizon accuracy while exhibiting significantly reduced long-horizon drift compared to MLP, LSTM, and k-nearest neighbor baselines. With a compact parameter footprint and operations compatible with fixed-point arithmetic, COMET-SG1 provides a practical and interpretable approach for stable autoregressive prediction in edge and embedded AI applications.

Why we are recommending this paper?
Due to your Interest in AI Air Consumption

Responsible AI: The Good, The Bad, The AI

University of Tartu

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

It emphasizes the need for a more comprehensive understanding of the complex relationships between technological, social, and economic factors in AI development and deployment. (ML: 0.99)👍👎
It cites studies on AI bias, transparency, accountability, and the need for responsible AI development and deployment. (ML: 0.99)👍👎
The study acknowledges that it has limitations due to the complexity of the topic and the need for further research. (ML: 0.98)👍👎
It also recognizes that the development of AI governance frameworks is a dynamic process that requires ongoing evaluation and refinement. (ML: 0.98)👍👎
The study highlights the importance of considering paradoxes when developing AI governance frameworks. (ML: 0.97)👍👎
The paper explores these kinds of complexities in AI governance and why they need to be considered when developing frameworks for responsible AI development and deployment. (ML: 0.97)👍👎
The paper explores the concept of paradox in the context of artificial intelligence (AI) and its governance, highlighting the importance of considering these complexities when developing AI governance frameworks. (ML: 0.97)👍👎
The paper discusses the concept of paradox in management and organization theories, specifically focusing on artificial intelligence (AI) and its governance. (ML: 0.96)👍👎
The paper draws on existing literature in management, organization theory, and AI ethics to inform its discussion of paradoxes in AI governance. (ML: 0.95)👍👎
Imagine you're trying to create a system that can make decisions without being biased, but at the same time, you want it to be transparent so people understand how those decisions are made. (ML: 0.93)👍👎
That's a paradox! (ML: 0.91)👍👎
Paradox: a situation or condition that is contradictory or opposite to what would be expected. (ML: 0.86)👍👎

Abstract
The rapid proliferation of artificial intelligence across organizational contexts has generated profound strategic opportunities while introducing significant ethical and operational risks. Despite growing scholarly attention to responsible AI, extant literature remains fragmented and is often adopting either an optimistic stance emphasizing value creation or an excessively cautious perspective fixated on potential harms. This paper addresses this gap by presenting a comprehensive examination of AI's dual nature through the lens of strategic information systems. Drawing upon a systematic synthesis of the responsible AI literature and grounded in paradox theory, we develop the Paradox-based Responsible AI Governance (PRAIG) framework that articulates: (1) the strategic benefits of AI adoption, (2) the inherent risks and unintended consequences, and (3) governance mechanisms that enable organizations to navigate these tensions. Our framework advances theoretical understanding by conceptualizing responsible AI governance as the dynamic management of paradoxical tensions between value creation and risk mitigation. We provide formal propositions demonstrating that trade-off approaches amplify rather than resolve these tensions, and we develop a taxonomy of paradox management strategies with specified contingency conditions. For practitioners, we offer actionable guidance for developing governance structures that neither stifle innovation nor expose organizations to unacceptable risks. The paper concludes with a research agenda for advancing responsible AI governance scholarship.

Why we are recommending this paper?
Due to your Interest in AI Impacts on Society

A Multiobjective Water Allocation Model for Economic Efficiency and Environmental Sustainability: Case Study

Federation University Australia

Rate paper: 👍 👎 ♥ Save

AI Insights

The multiobjective optimization model provides a framework for balancing competing objectives, enabling decision-makers to make informed choices. (ML: 0.94)👍👎
The results show that maximizing net benefit leads to a substantial compromise in environmental sustainability, while prioritizing environmental sustainability reduces net benefit. (ML: 0.93)👍👎
The study demonstrates the importance of considering both economic and environmental aspects in water distribution optimization. (ML: 0.93)👍👎
A multiobjective optimization model is developed to balance economic profit and environmental sustainability. (ML: 0.92)👍👎
Pareto front: A set of non-dominated solutions representing the optimal balance between competing objectives. (ML: 0.92)👍👎
Multiobjective optimization: A problem-solving approach that handles trade-offs between multiple conflicting objectives. (ML: 0.92)👍👎
The study aims to optimize water distribution for crop production while minimizing environmental impact. (ML: 0.89)👍👎
The Pareto front is approximated using the weighted-constraint method, providing a set of non-dominated solutions representing the trade-off between economic profit and environmental sustainability. (ML: 0.88)👍👎
Weighted-constraint method: A scalarization technique used to transform a multiobjective problem into a standard single-objective problem. (ML: 0.86)👍👎
The weighted-constraint method is used as a scalarization technique to transform the complex multiobjective problem into a standard single-objective problem. (ML: 0.84)👍👎

Abstract
The management of irrigation water systems has become increasingly complex due to competing demands for agricultural production, groundwater sustainability, and environmental flow requirements, particularly under hydrologic variability and climate uncertainty. Addressing these challenges requires optimization frameworks that can jointly determine optimal crop allocation, groundwater pumping, and environmental flow releases while maintaining economic and hydrological feasibility. However, existing hydro-economic models, including the widely used Lewis and Randall formulation, may overestimate net benefits by allowing infeasible negative pumping and surface water allocations. We extend the Lewis and Randall framework by reformulating groundwater pumping and surface water use as non-negative, demand-driven decision variables and by explicitly incorporating environmental flow and canal capacity constraints. Three models are developed to maximize economic benefit, minimize environmental deficits, and a multiobjective model that evaluates the trade-offs between these two objectives. An illustrative test case examining optimal crop area allocation and environmental flow management across dry, average, and wet years, using data from the Rajshahi Barind Tract in northwestern Bangladesh, is presented. The results show that the proposed formulation produces economically and hydrologically consistent solutions, identifying optimal strategies when either net benefits or environmental protection is prioritized, as well as Pareto-optimal trade-offs when both objectives are considered together. These findings provide practical insights for balancing farm income, groundwater sustainability, and ecological protection, offering a robust decision-support tool for irrigation management in water-limited river basins.

Why we are recommending this paper?
Due to your Interest in AI Water Consumption

Normative Equivalence in Human-AI Cooperation: Behaviour, Not Identity, Drives Cooperation in Mixed-Agent Groups

University of Zurich

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

The findings have implications for the integration of AI into human collectives and highlight the need for accountability and transparency in decision-making processes involving AI systems. (ML: 0.98)👍👎
The study provides a baseline for future research examining when normative equivalence might break in richer, more communicative, or adaptive human-AI interactions. (ML: 0.98)👍👎
The study's design focused on short-term, anonymous interactions, which may not reflect real-world human-AI collaboration. (ML: 0.97)👍👎
Normative equivalence: The phenomenon where mechanisms that sustain cooperation operate unchanged when artificial agents are introduced into a group. (ML: 0.97)👍👎
The study found that cooperative norms function similarly in human-AI groups as they do in purely human groups. (ML: 0.97)👍👎
The AI's behavior was scripted rather than adaptive, limiting the ecological validity of the findings. (ML: 0.96)👍👎
Social Heuristics Hypothesis: A theory positing that cooperation is often an intuitive, automated response generalized from daily social life. (ML: 0.96)👍👎
Computers Are Social Actors (CASA) paradigm: A framework focusing on unconscious reactions to anthropomorphic cues. (ML: 0.95)👍👎
Participants did not exploit AI teammates or withhold trust, indicating a lack of algorithm aversion or moral disengagement. (ML: 0.93)👍👎
The stability and generality of cooperative norms in hybrid groups suggest that 'socialness' is an emergent property of the rules and feedback loops governing interactions. (ML: 0.93)👍👎

Abstract
The introduction of artificial intelligence (AI) agents into human group settings raises essential questions about how these novel participants influence cooperative social norms. While previous studies on human-AI cooperation have primarily focused on dyadic interactions, little is known about how integrating AI agents affects the emergence and maintenance of cooperative norms in small groups. This study addresses this gap through an online experiment using a repeated four-player Public Goods Game (PGG). Each group consisted of three human participants and one bot, which was framed either as human or AI and followed one of three predefined decision strategies: unconditional cooperation, conditional cooperation, or free-riding. In our sample of 236 participants, we found that reciprocal group dynamics and behavioural inertia primarily drove cooperation. These normative mechanisms operated identically across conditions, resulting in cooperation levels that did not differ significantly between human and AI labels. Furthermore, we found no evidence of differences in norm persistence in a follow-up Prisoner's Dilemma, or in participants' normative perceptions. Participants' behaviour followed the same normative logic across human and AI conditions, indicating that cooperation depended on group behaviour rather than partner identity. This supports a pattern of normative equivalence, in which the mechanisms that sustain cooperation function similarly in mixed human-AI and all human groups. These findings suggest that cooperative norms are flexible enough to extend to artificial agents, blurring the boundary between humans and AI in collective decision-making.

Why we are recommending this paper?
Due to your Interest in AI for Social Equality

An Accounting Identity for Algorithmic Fairness

Stanford University

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

The paper presents a framework for evaluating and improving the fairness of binary classification models. (ML: 0.98)👍👎
They demonstrate the effectiveness of their approach on several real-world datasets, including COMPAS, German Credit, and Adult. (ML: 0.98)👍👎
It introduces three new metrics: pointwise balance, global calibration, and signed covariance, which are used to evaluate the fairness of a model. (ML: 0.98)👍👎
The authors also provide a detailed overview of experimental results, which include comparisons with existing fair machine learning methods. (ML: 0.97)👍👎
Their results show that the proposed framework can significantly improve the fairness of binary classification models while maintaining or improving their accuracy. (ML: 0.96)👍👎
The authors also propose a new algorithm for optimizing these metrics, called FairBoost. (ML: 0.94)👍👎
FairBoost: an algorithm for optimizing pointwise balance, global calibration, and signed covariance. (ML: 0.79)👍👎
Signed covariance: Cov(π(Y), r(Y)) ≥ 0 if π and r are monotone in the same direction. (ML: 0.73)👍👎
Global calibration: δC(z) = E[Y|Z=z, G=1]−E[Y|Z=z, G=0]. (ML: 0.72)👍👎
Pointwise balance: δB(y) = E[Z|Y=y, G=1]−E[Z|Y=y, G=0]. (ML: 0.64)👍👎

Abstract
We derive an accounting identity for predictive models that links accuracy with common fairness criteria. The identity shows that for globally calibrated models, the weighted sums of miscalibration within groups and error imbalance across groups is equal to a "total unfairness budget." For binary outcomes, this budget is the model's mean-squared error times the difference in group prevalence across outcome classes. The identity nests standard impossibility results as special cases, while also describing inherent tradeoffs when one or more fairness measures are not perfectly satisfied. The results suggest that accuracy and fairness are best viewed as complements in binary prediction tasks: increasing accuracy necessarily shrinks the total unfairness budget and vice-versa. Experiments on benchmark data confirm the theory and show that many fairness interventions largely substitute between fairness violations, and when they reduce accuracy they tend to expand the total unfairness budget. The results extend naturally to prediction tasks with non-binary outcomes, illustrating how additional outcome information can relax fairness incompatibilities and identifying conditions under which the binary-style impossibility does and does not extend to regression tasks.

Why we are recommending this paper?
Due to your Interest in AI for Social Fairness

Organizational Practices and Socio-Technical Design of Human-Centered AI

University of Bochum

Rate paper: 👍 👎 ♥ Save

AI Insights

The concept of human-centered AI may be challenging to implement in organizations with rigid structures or cultures. (ML: 0.98)👍👎
Collaboration between experts, 7. (ML: 0.98)👍👎
Human-centered AI may not be suitable for all types of tasks or industries, requiring careful consideration and evaluation. (ML: 0.98)👍👎
Continuous learning and improvement, 2. (ML: 0.98)👍👎
Monitoring and evaluation, and 10. (ML: 0.97)👍👎
Clear roles and responsibilities, 5. (ML: 0.97)👍👎
Effective communication, 6. (ML: 0.96)👍👎
The concept of keeping the organization in the loop is crucial for the successful implementation of human-centered AI. (ML: 0.96)👍👎
Interacting organizational practices require significant resources and effort to establish and maintain. (ML: 0.96)👍👎
Ten types of interacting organizational practices are identified as essential to accompany human-centered AI: 1. (ML: 0.96)👍👎
Case B substantiates this concept by highlighting the collaboration between technical and analytical experts, anchored in systematic communication structures. (ML: 0.96)👍👎
Interacting organizational practices are essential to accompany human-centered AI and ensure its effectiveness and adaptability. (ML: 0.95)👍👎
Adaptation processes, 8. (ML: 0.95)👍👎
Human-centered AI: An approach to keeping the human in the loop by emphasizing the importance of interacting organizational practices. (ML: 0.95)👍👎
The concept of keeping the organization in the loop is developed based on case A, which emphasizes the importance of human-centered AI and the need for interacting organizational practices. (ML: 0.95)👍👎
Interacting organizational practices: The essential types of practices that need to accompany human-centered AI, including continuous learning and improvement, feedback mechanisms, regular meetings and updates, clear roles and responsibilities, effective communication, collaboration between experts, adaptation processes, documentation and knowledge management, monitoring and evaluation, and continuous refinement of the AI system. (ML: 0.94)👍👎
Feedback mechanisms, 3. (ML: 0.93)👍👎
Continuous refinement of the AI system. (ML: 0.91)👍👎
Documentation and knowledge management, 9. (ML: 0.89)👍👎
Regular meetings and updates, 4. (ML: 0.89)👍👎

Abstract
This contribution explores how the integration of Artificial Intelligence (AI) into organizational practices can be effectively framed through a socio-technical perspective to comply with the requirements of Human-centered AI (HCAI). Instead of viewing AI merely as a technical tool, the analysis emphasizes the importance of embedding AI into communication, collaboration, and decision-making processes within organizations from a human-centered perspective. Ten case-based patterns illustrate how AI support of predictive maintenance can be organized to address quality assurance and continuous improvement and to provide different types of sup-port for HCAI. The analysis shows that AI adoption often requires and enables new forms of organizational learning, where specialists jointly interpret AI output, adapt workflows, and refine rules for system improve-ment. Different dimensions and levels of socio-technical integration of AI are considered to reflect the effort and benefits of keeping the organization in the loop.

Why we are recommending this paper?
Due to your Interest in AI for Society

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios

xbenchai

Rate paper: 👍 👎 ♥ Save

AI Insights

The results highlight the need for further research in instruction-following tasks and the development of more effective language models. (ML: 0.98)👍👎
The authors acknowledge that the proposed benchmark may not capture all aspects of real-world instruction-following tasks. (ML: 0.98)👍👎
The synthetic data generation approach relies on human-made questions and may not accurately reflect real-world scenarios. (ML: 0.96)👍👎
The paper proposes a new benchmark for evaluating the ability of language models to follow complex instructions and perform tasks that require multiple steps. (ML: 0.96)👍👎
The proposed benchmark provides a more comprehensive evaluation of language models' ability to follow instructions and perform tasks. (ML: 0.96)👍👎
Synthetic data generation: The process of creating artificial data that mimics real-world scenarios, used to train and evaluate language models. (ML: 0.96)👍👎
The results show that state-of-the-art language models struggle to perform well on this benchmark, highlighting the need for more research in this area. (ML: 0.95)👍👎
The authors introduce a novel approach to generating synthetic data for instruction-following tasks, which allows them to create a large-scale dataset with diverse and realistic scenarios. (ML: 0.94)👍👎
Instruction-following task: A task that requires a model to follow a set of instructions to complete a specific goal or achieve a certain outcome. (ML: 0.93)👍👎
The synthetic data generation approach allows for the creation of diverse and realistic scenarios, making it easier to train and evaluate language models. (ML: 0.93)👍👎

Abstract
The capacity of AI agents to effectively handle tasks of increasing duration and complexity continues to grow, demonstrating exceptional performance in coding, deep research, and complex problem-solving evaluations. However, in daily scenarios, the perception of these advanced AI capabilities among general users remains limited. We argue that current evaluations prioritize increasing task difficulty without sufficiently addressing the diversity of agentic tasks necessary to cover the daily work, life, and learning activities of a broad demographic. To address this, we propose AgentIF-OneDay, aimed at determining whether general users can utilize natural language instructions and AI agents to complete a diverse array of daily tasks. These tasks require not only solving problems through dialogue but also understanding various attachment types and delivering tangible file-based results. The benchmark is structured around three user-centric categories: Open Workflow Execution, which assesses adherence to explicit and complex workflows; Latent Instruction, which requires agents to infer implicit instructions from attachments; and Iterative Refinement, which involves modifying or expanding upon ongoing work. We employ instance-level rubrics and a refined evaluation pipeline that aligns LLM-based verification with human judgment, achieving an 80.1% agreement rate using Gemini-3-Pro. AgentIF-OneDay comprises 104 tasks covering 767 scoring points. We benchmarked four leading general AI agents and found that agent products built based on APIs and ChatGPT agents based on agent RL remain in the first tier simultaneously. Leading LLM APIs and open-source models have internalized agentic capabilities, enabling AI application teams to develop cutting-edge Agent products.

Why we are recommending this paper?
Due to your Interest in AI on Air

Learning to Live with AI: How Students Develop AI Literacy Through Naturalistic ChatGPT Interaction

Rate paper: 👍 👎 ♥ Save

AI Insights

The study relies on a single dataset and may not be generalizable to other contexts or populations. (ML: 0.99)👍👎
Repair literacy is a critical component of AI literacy, involving strategic cost-benefit analysis regarding continued engagement. (ML: 0.98)👍👎
AI Literacy: The ability to effectively use, understand, and critically evaluate artificial intelligence systems. (ML: 0.98)👍👎
Repair Literacy: The ability to diagnose and recover from AI breakdowns through technical skills (prompt modification, context provision) and emotional resilience. (ML: 0.97)👍👎
The study highlights the importance of considering naturalistic use in developing AI literacy frameworks and pedagogies. (ML: 0.97)👍👎
Students' use of ChatGPT reveals a range of sophisticated competencies in AI literacy, including relational, emotional, and epistemic dimensions. (ML: 0.97)👍👎
Existing AI literacy frameworks emphasize technical competencies, but this study reveals additional dimensions such as emotional regulation and repair literacy. (ML: 0.97)👍👎
Students develop sophisticated competencies in AI literacy through naturalistic use. (ML: 0.97)👍👎
The repair and negotiation genre is a key finding, showing students' ability to diagnose and recover from AI breakdowns through technical skills and emotional resilience. (ML: 0.95)👍👎
Epistemic Vigilance: The ability to recognize and respond to AI limitations through trust calibration and strategic disengagement. (ML: 0.94)👍👎

Abstract
How do students develop AI literacy through everyday practice rather than formal instruction? While normative AI literacy frameworks proliferate, empirical understanding of how students actually learn to work with generative AI remains limited. This study analyzes 10,536 ChatGPT messages from 36 undergraduates over one academic year, revealing five use genres -- academic workhorse, emotional companion, metacognitive partner, repair and negotiation, and trust calibration -- that constitute distinct configurations of student-AI learning. Drawing on domestication theory and emerging frameworks for AI literacy, we demonstrate that functional AI competence emerges through ongoing relational negotiation rather than one-time adoption. Students develop sophisticated genre portfolios, strategically matching interaction patterns to learning needs while exercising critical judgment about AI limitations. Notably, repair work during AI breakdowns produces substantial learning about AI capabilities, developing what we term "repair literacy" -- a crucial but underexplored dimension of AI competence. Our findings offer educators empirically grounded insights into how students actually learn to work with generative AI, with implications for AI literacy pedagogy, responsible AI integration, and the design of AI-enabled learning environments that support student agency.

Why we are recommending this paper?
Due to your Interest in AI on Education

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

AI for Social Equity
AI on Food
AI on Healthcare

You can edit or add more interests any time.

💬 Help Shape Our Pricing

We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.

Share Your Feedback

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback