Hi!

Your personalized paper recommendations for 12 to 16 January, 2026.

Optimising for Energy Efficiency and Performance in Machine Learning

University of Cambridge

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

They use the MOBO (Multi-fidelity Bayesian Optimization) algorithm to search for optimal hyperparameters. [3]
MOBO: Multi-fidelity Bayesian Optimization CNN: Convolutional Neural Network CIFAR-10: A dataset of images for image classification SOTA: State-of-the-Art MAC: Multiply-accumulate operation [3]
The authors present an approach to optimizing machine learning models for both performance and energy efficiency. [2]
However, the energy consumption of the optimized model is only 0.39 mJ, making it more energy-efficient than the state-of-the-art Spike Aggregation Transformer (SAFormer). [1]

Abstract
The ubiquity of machine learning (ML) and the demand for ever-larger models bring an increase in energy consumption and environmental impact. However, little is known about the energy scaling laws in ML, and existing research focuses on training cost -- ignoring the larger cost of inference. Furthermore, tools for measuring the energy consumption of ML do not provide actionable feedback. To address these gaps, we developed Energy Consumption Optimiser (ECOpt): a hyperparameter tuner that optimises for energy efficiency and model performance. ECOpt quantifies the trade-off between these metrics as an interpretable Pareto frontier. This enables ML practitioners to make informed decisions about energy cost and environmental impact, while maximising the benefit of their models and complying with new regulations. Using ECOpt, we show that parameter and floating-point operation counts can be unreliable proxies for energy consumption, and observe that the energy efficiency of Transformer models for text generation is relatively consistent across hardware. These findings motivate measuring and publishing the energy metrics of ML models. We further show that ECOpt can have a net positive environmental impact and use it to uncover seven models for CIFAR-10 that improve upon the state of the art, when considering accuracy and energy efficiency together.

Why we are recommending this paper?
Due to your Interest in AI Energy Consumption

This paper directly addresses the critical issue of energy consumption in machine learning, aligning with your interest in AI on Energy and AI Energy Consumption. The focus on inference costs is particularly relevant given the growing environmental impact of large models.

A Sustainable AI Economy Needs Data Deals That Work for Generators

Virginia Tech

Rate paper: 👍 👎 ♥ Save

AI Insights

The paper discusses the challenges of dataset licensing and attribution in AI research, highlighting the need for more transparent and equitable practices. [3]
Attribution: The act of acknowledging the source of a dataset or model used in AI research. [3]
The paper assumes that all datasets are available for use, which may not be the case in practice. [3]
The authors propose a framework for optimal data selection from multiple sources, which can improve performance scaling and reduce computational costs. [2]

Abstract
We argue that the machine learning value chain is structurally unsustainable due to an economic data processing inequality: each state in the data cycle from inputs to model weights to synthetic outputs refines technical signal but strips economic equity from data generators. We show, by analyzing seventy-three public data deals, that the majority of value accrues to aggregators, with documented creator royalties rounding to zero and widespread opacity of deal terms. This is not just an economic welfare concern: as data and its derivatives become economic assets, the feedback loop that sustains current learning algorithms is at risk. We identify three structural faults - missing provenance, asymmetric bargaining power, and non-dynamic pricing - as the operational machinery of this inequality. In our analysis, we trace these problems along the machine learning value chain and propose an Equitable Data-Value Exchange (EDVEX) Framework to enable a minimal market that benefits all participants. Finally, we outline research directions where our community can make concrete contributions to data deals and contextualize our position with related and orthogonal viewpoints.

Why we are recommending this paper?
Due to your Interest in AI Air Consumption

This research tackles the sustainability of AI development, directly connecting data processing inequalities to broader societal concerns like AI for Social Fairness and AI on Air. It’s a crucial consideration for responsible AI deployment.

WaterCopilot: An AI-Driven Virtual Assistant for Water Management

International Water Management Institute IWMI

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

The evaluation process involves passing 30 questions through the chatbot, recording the retrieved resources used to formulate responses, and documenting generated answers for further analysis. [3]
RAGAS (Retrieval-Augmented Generation Assessment) framework: A structured approach for evaluating the performance of a system that retrieves and generates information. [3]
Context Precision: Measures the proportion of retrieved context that is relevant to the given query. [3]
Context Recall: Evaluates how effectively the retrieved context encompasses the relevant information required to address a query. [3]
The Faithfulness metric measures the factual consistency of generated answers with retrieved context, with a mean score of 0.7877 indicating that most responses were consistent with the context provided. [3]
The WaterCopilot chatbot is designed to provide users with precise and contextually relevant information in the field of water management. [2]

Abstract
Sustainable water resource management in transboundary river basins is challenged by fragmented data, limited real-time access, and the complexity of integrating diverse information sources. This paper presents WaterCopilot-an AI-driven virtual assistant developed through collaboration between the International Water Management Institute (IWMI) and Microsoft Research for the Limpopo River Basin (LRB) to bridge these gaps through a unified, interactive platform. Built on Retrieval-Augmented Generation (RAG) and tool-calling architectures, WaterCopilot integrates static policy documents and real-time hydrological data via two custom plugins: the iwmi-doc-plugin, which enables semantic search over indexed documents using Azure AI Search, and the iwmi-api-plugin, which queries live databases to deliver dynamic insights such as environmental-flow alerts, rainfall trends, reservoir levels, water accounting, and irrigation data. The system features guided multilingual interactions (English, Portuguese, French), transparent source referencing, automated calculations, and visualization capabilities. Evaluated using the RAGAS framework, WaterCopilot achieves an overall score of 0.8043, with high answer relevancy (0.8571) and context precision (0.8009). Key innovations include automated threshold-based alerts, integration with the LRB Digital Twin, and a scalable deployment pipeline hosted on AWS. While limitations in processing non-English technical documents and API latency remain, WaterCopilot establishes a replicable AI-augmented framework for enhancing water governance in data-scarce, transboundary contexts. The study demonstrates the potential of this AI assistant to support informed, timely decision-making and strengthen water security in complex river basins.

Why we are recommending this paper?
Due to your Interest in AI Water Consumption

Given your interest in AI on Water and AI on Water Consumption, this paper’s focus on an AI-driven solution for sustainable water resource management is highly relevant. The use of a virtual assistant to address fragmented data is a promising approach.

Democracy and Distrust in an Era of Artificial Intelligence

New York Law School

Rate paper: 👍 👎 ♥ Save

AI Insights

The concept of artificial intelligence (AI) and its impact on society is a complex issue that requires careful consideration. [3]
The concept of algorithmic impact assessments (AIAs) has been proposed as a way to evaluate the potential effects of AI on society. [3]
Data protection impact assessments (DPIAs) have been implemented in some jurisdictions to evaluate the potential effects of data processing on individuals and society. [3]
The use of AI in decision-making processes can lead to biases and discrimination, which can have serious consequences for individuals and communities. [2]

Abstract
This essay examines how judicial review should adapt to address challenges posed by artificial intelligence decision-making, particularly regarding minority rights and interests. As I argue in this essay, the rise of three trends-privatization, prediction, and automation in AI-have combined to pose similar risks to minorities. Here, I outline what a theory of judicial review would look like in an era of artificial intelligence, analyzing both the limitations and the possibilities of judicial review of AI. I draw on cases in which AI decision-making has been challenged in courts, to show how concepts of due process and equal protection can be recuperated in a modern AI era, and even integrated into AI, to provide for better oversight and accountability, offering a framework for judicial review in the AI era that protects minorities from algorithmic discrimination.

Why we are recommending this paper?
Due to your Interest in AI Impacts on Society

This paper explores the societal implications of AI, particularly concerning fairness and trust, aligning with your broader interests in AI for Social Equity and AI for Social Good. The discussion of judicial review and minority rights is particularly pertinent.

The Conversational Exam: A Scalable Assessment Design for the AI Era

The George Washington University

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

It requires students to demonstrate their knowledge and skills through real-time, unscripted reasoning and defense of their logic, making it difficult for them to outsource thinking to AI. [3]
The conversational exam has been shown to promote deep mastery via adaptive probing, ensuring resilient integrity in the age of AI, and scalability through organizational creativity and HCI rigor. [3]
Generative AI: Artificial intelligence tools that can generate human-like text, images, or other forms of content. [3]
The format acknowledges the ubiquity and increasing power of AI tools while refusing to capitulate to the notion that authentic evaluation is impossible in their presence. [2]
The conversational exam is a format that emerged as a response to the challenges posed by generative AI in traditional assessment methods. [1]

Abstract
Traditional assessment methods collapse when students use generative AI to complete work without genuine engagement, creating an illusion of competence where they believe they're learning but aren't. This paper presents the conversational exam -- a scalable oral examination format that restores assessment validity by having students code live while explaining their reasoning. Drawing on human-computer interaction principles, we examined 58 students in small groups across just two days, demonstrating that oral exams can scale to typical class sizes. The format combines authentic practice (students work with documentation and supervised AI access) with inherent validity (real-time performance cannot be faked). We provide detailed implementation guidance to help instructors adapt this approach, offering a practical path forward when many educators feel paralyzed between banning AI entirely or accepting that valid assessment is impossible.

Why we are recommending this paper?
Due to your Interest in AI on Education

This paper addresses the challenges of assessment in an AI-driven world, directly relating to your interest in AI on Education and the need for genuine learning engagement. The conversational exam format offers a potential solution to the problem of AI-generated illusions of competence.

AI as Entertainment

The Alan Turing Institute

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

The authors contend that entertainment is a significant use case for AI, with people already using AI for activities unrelated to productivity. [3]
The paper suggests that this vision should inspire more debates, discourse, and study in the field of AI, as generative AI is increasingly being used for entertainment. [3]
AS: Artificially generated content GenAI: Generative AI Sociotechnical systems: Complex systems that combine social and technical components The paper concludes by emphasizing the need for a constructive vision of cultural AI, rather than just harm minimization. [3]
The paper argues that mainstream approaches to evaluating AI systems tend to focus on intelligence and harm minimization, but neglect the cultural dimension of AI use. [2]
They propose developing a positive theory of what beneficial, nutritious entertainment might look like, rather than just mitigating harms. [0]

Abstract
Generative AI systems are predominantly designed, evaluated, and marketed as intelligent systems which will benefit society by augmenting or automating human cognitive labor, promising to increase personal, corporate, and macroeconomic productivity. But this mainstream narrative about what AI is and what it can do is in tension with another emerging use case: entertainment. We argue that the field of AI is unprepared to measure or respond to how the proliferation of entertaining AI-generated content will impact society. Emerging data suggest AI is already widely adopted for entertainment purposes -- especially by young people -- and represents a large potential source of revenue. We contend that entertainment will become a primary business model for major AI corporations seeking returns on massive infrastructure investments; this will exert a powerful influence on the technology these companies produce in the coming years. Examining current evaluation practices, we identify a critical asymmetry: while AI assessments rigorously measure both benefits and harms of intelligence, they focus almost exclusively on cultural harms. We lack frameworks for articulating how cultural outputs might be actively beneficial. Drawing on insights from the humanities, we propose "thick entertainment" as a framework for evaluating AI-generated cultural content -- one that considers entertainment's role in meaning-making, identity formation, and social connection rather than simply minimizing harm. While AI is often touted for its potential to revolutionize productivity, in the long run we may find that AI turns out to be as much about "intelligence" as social media is about social connection.

Why we are recommending this paper?
Due to your Interest in AI Air Consumption

Fairness risk and its privacy-enabled solution in AI-driven robotic applications

University of Groningen

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

Lipschitz constant (LA): a measure of how much the function g(u, x, a) changes with respect to A. [3]
The paper introduces a new fairness notion called g-fairness, which measures the difference in expected outcomes between two groups. [2]
It shows that if a mechanism is differentially private with respect to A, then the g-fairness metric is bounded by εA + log(1 + LA diam(A) + δA γ / τ), where LA is the Lipschitz constant with respect to A. [1]

Abstract
Complex decision-making by autonomous machines and algorithms could underpin the foundations of future society. Generative AI is emerging as a powerful engine for such transitions. However, we show that Generative AI-driven developments pose a critical pitfall: fairness concerns. In robotic applications, although intuitions about fairness are common, a precise and implementable definition that captures user utility and inherent data randomness is missing. Here we provide a utility-aware fairness metric for robotic decision making and analyze fairness jointly with user-data privacy, deriving conditions under which privacy budgets govern fairness metrics. This yields a unified framework that formalizes and quantifies fairness and its interplay with privacy, which is tested in a robot navigation task. In view of the fact that under legal requirements, most robotic systems will enforce user privacy, the approach shows surprisingly that such privacy budgets can be jointly used to meet fairness targets. Addressing fairness concerns in the creative combined consideration of privacy is a step towards ethical use of AI and strengthens trust in autonomous robots deployed in everyday environments.

Why we are recommending this paper?
Due to your Interest in AI for Social Fairness

On the use of graph models to achieve individual and group fairness

Universidad Carlos III de Madrid

Rate paper: 👍 👎 ♥ Save

AI Insights

k-Nearest Neighbor (kNN) graph: A graph where two nodes are connected if they are among the k closest observations to each other according to a given metric. [3]
The text assumes a linear model is applied to the signal, which may not always be the case. [3]
The text draws inspiration from Opinion Dynamics, a concept in social network analysis. [3]
Signal aggregation and its implications on dataset analysis Imagine you're trying to understand a dataset with many points. [3]
The text discusses various topologies and their implications on signal aggregation in a dataset. [2]

Abstract
Machine Learning algorithms are ubiquitous in key decision-making contexts such as justice, healthcare and finance, which has spawned a great demand for fairness in these procedures. However, the theoretical properties of such models in relation with fairness are still poorly understood, and the intuition behind the relationship between group and individual fairness is still lacking. In this paper, we provide a theoretical framework based on Sheaf Diffusion to leverage tools based on dynamical systems and homology to model fairness. Concretely, the proposed method projects input data into a bias-free space that encodes fairness constrains, resulting in fair solutions. Furthermore, we present a collection of network topologies handling different fairness metrics, leading to a unified method capable of dealing with both individual and group bias. The resulting models have a layer of interpretability in the form of closed-form expressions for their SHAP values, consolidating their place in the responsible Artificial Intelligence landscape. Finally, these intuitions are tested on a simulation study and standard fairness benchmarks, where the proposed methods achieve satisfactory results. More concretely, the paper showcases the performance of the proposed models in terms of accuracy and fairness, studying available trade-offs on the Pareto frontier, checking the effects of changing the different hyper-parameters, and delving into the interpretation of its outputs.

Why we are recommending this paper?
Due to your Interest in AI for Social Fairness

A Marketplace for AI-Generated Adult Content and Deepfakes

Indiana University Bloomington

Rate paper: 👍 👎 ♥ Save

AI Insights

Civitai is a platform that allows users to create and share generative AI content, including images and videos. [3]
Bounties on Civitai are challenges that offer rewards for completing specific tasks related to generating AI content. [3]
Bounty: A challenge on Civitai that offers rewards for completing specific tasks related to generating AI content. [3]
The platform has a large user base, with millions of users, and has received funding from prominent investors such as Andreessen Horowitz. [2]
Deepfake: A type of synthetic media that uses AI to create fake images or videos of people, often in a realistic way. [0]

Abstract
Generative AI systems increasingly enable the production of highly realistic synthetic media. Civitai, a popular community-driven platform for AI-generated content, operates a monetized feature called Bounties, which allows users to commission the generation of content in exchange for payment. To examine how this mechanism is used and what content it incentivizes, we conduct a longitudinal analysis of all publicly available bounty requests collected over a 14-month period following the platform's launch. We find that the bounty marketplace is dominated by tools that let users steer AI models toward content they were not trained to generate. At the same time, requests for content that is "Not Safe For Work" are widespread and have increased steadily over time, now comprising a majority of all bounties. Participation in bounty creation is uneven, with 20% of requesters accounting for roughly half of requests. Requests for "deepfake" - media depicting identifiable real individuals - exhibit a higher concentration than other types of bounties. A nontrivial subset of these requests involves explicit deepfakes despite platform policies prohibiting such content. These bounties disproportionately target female celebrities, revealing a pronounced gender asymmetry in social harm. Together, these findings show how monetized, community-driven generative AI platforms can produce gendered harms, raising questions about consent, governance, and enforcement.

Why we are recommending this paper?
Due to your Interest in AI for Social Good

State of AI: An Empirical 100 Trillion Token Study with OpenRouter

OpenRouter Inc

Rate paper: 👍 👎 ♥ Save

AI Insights

Programming has become the most consistently expanding category across all models, reflecting a shift from exploratory or conversational use toward applied tasks such as code generation, debugging, and data scripting. [3]
Anthropic's Claude series dominates the programming category, but OpenAI and Google are gaining share, and MiniMax is emerging as a fast-rising entrant. [3]
Completion token length: The number of tokens in the output response from an LLM, which has almost tripled since early 2024. [3]
Programming has become one of the most contested and strategically important model categories, attracting sustained attention from top labs. [3]
LLMs are being used as analytical engines rather than creative generators. [2]
Agentic inference is becoming the new default in LLM usage, with models acting as agents that invoke external tools and reason over state. [1]

Abstract
The past year has marked a turning point in the evolution and real-world use of large language models (LLMs). With the release of the first widely adopted reasoning model, o1, on December 5th, 2024, the field shifted from single-pass pattern generation to multi-step deliberation inference, accelerating deployment, experimentation, and new classes of applications. As this shift unfolded at a rapid pace, our empirical understanding of how these models have actually been used in practice has lagged behind. In this work, we leverage the OpenRouter platform, which is an AI inference provider across a wide variety of LLMs, to analyze over 100 trillion tokens of real-world LLM interactions across tasks, geographies, and time. In our empirical study, we observe substantial adoption of open-weight models, the outsized popularity of creative roleplay (beyond just the productivity tasks many assume dominate) and coding assistance categories, plus the rise of agentic inference. Furthermore, our retention analysis identifies foundational cohorts: early users whose engagement persists far longer than later cohorts. We term this phenomenon the Cinderella "Glass Slipper" effect. These findings underscore that the way developers and end-users engage with LLMs "in the wild" is complex and multifaceted. We discuss implications for model builders, AI developers, and infrastructure providers, and outline how a data-driven understanding of usage can inform better design and deployment of LLM systems.

Why we are recommending this paper?
Due to your Interest in AI on Air

All Required, In Order: Phase-Level Evaluation for AI-Human Dialogue in Healthcare and Beyond

Interactlyai

Rate paper: 👍 👎 ♥ Save

AI Insights

The approach helps AI move beyond technical promise toward everyday reliability, making it usable and safe for healthcare professionals. [3]
The paper discusses the challenges in bringing AI into routine clinical practice, including data quality, shifting regulations, and messy clinical workflows. [2]
Phase-level evaluation, as formalized in OIP–SCE, addresses these challenges by structuring compliance as something clinicians can author and engineers can automate. [1]

Abstract
Conversational AI is starting to support real clinical work, but most evaluation methods miss how compliance depends on the full course of a conversation. We introduce Obligatory-Information Phase Structured Compliance Evaluation (OIP-SCE), an evaluation method that checks whether every required clinical obligation is met, in the right order, with clear evidence for clinicians to review. This makes complex rules practical and auditable, helping close the gap between technical progress and what healthcare actually needs. We demonstrate the method in two case studies (respiratory history, benefits verification) and show how phase-level evidence turns policy into shared, actionable steps. By giving clinicians control over what to check and engineers a clear specification to implement, OIP-SCE provides a single, auditable evaluation surface that aligns AI capability with clinical workflow and supports routine, safe use.

Why we are recommending this paper?
Due to your Interest in AI on Healthcare

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

AI for Social Equality
AI for Social Equity
AI for Social Justice
AI for Society
AI on Energy
AI on Food

You can edit or add more interests any time.

💬 Help Shape Our Pricing

We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.

Share Your Feedback

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback