Hi!
Your personalized paper recommendations for 15 to 19 December, 2025.
Gran Sasso Science Insit
AI Insights - The study proposes a framework called CAFFE that generates multiple counterfactual prompts per stereotypeβintent pair to ensure sufficient bias coverage. [3]
- CAFFE is designed to be user-centered and human-in-the-loop, allowing users to define additional domain-specific intents beyond the four Delivery subtypes used in the study. [3]
- The study evaluates the effectiveness of CAFFE's test data generation component using four selected communicative intents and three LLMs under test. [3]
- The study identifies the most suitable semantic similarity metric for detecting fairness violations at a target threshold of 0.9, which is consistent with the definition of counterfactual fairness. [2]
Abstract
Nowadays, Large Language Models (LLMs) are foundational components of modern software systems. As their influence grows, concerns about fairness have become increasingly pressing. Prior work has proposed metamorphic testing to detect fairness issues, applying input transformations to uncover inconsistencies in model behavior. This paper introduces an alternative perspective for testing counterfactual fairness in LLMs, proposing a structured and intent-aware framework coined CAFFE (Counterfactual Assessment Framework for Fairness Evaluation). Inspired by traditional non-functional testing, CAFFE (1) formalizes LLM-Fairness test cases through explicitly defined components, including prompt intent, conversational context, input variants, expected fairness thresholds, and test environment configuration, (2) assists testers by automatically generating targeted test data, and (3) evaluates model responses using semantic similarity metrics. Our experiments, conducted on three different architectural families of LLM, demonstrate that CAFFE achieves broader bias coverage and more reliable detection of unfair behavior than existing metamorphic approaches.
Why we are recommending this paper?
Due to your Interest in: Data Fairness
This paper directly addresses fairness evaluation in LLMs, a key concern given the user's interests in AI bias and data fairness. The CAFFE framework offers a systematic approach to identifying and mitigating potential biases, aligning with the user's focus on responsible AI development.
Visa Inc
AI Insights - Accuracy (ACC): The proportion of correctly classified instances out of the total number of instances. [3]
- F1 Score: The harmonic mean of precision and recall, providing a balanced measure of both. [3]
- True Positive Rate (TPR): The proportion of actual positives correctly identified by the model. [3]
- False Positive Rate (FPR): The proportion of false alarms or incorrect positives out of all instances classified as positive. [3]
- The tables provide fairness evaluations for various multi-agent systems and their constituent single-agent baselines on two datasets: Adult Income and German Credit Risk. [2]
Abstract
Multi-agent systems have demonstrated the ability to improve performance on a variety of predictive tasks by leveraging collaborative decision making. However, the lack of effective evaluation methodologies has made it difficult to estimate the risk of bias, making deployment of such systems unsafe in high stakes domains such as consumer finance, where biased decisions can translate directly into regulatory breaches and financial loss. To address this challenge, we need to develop fairness evaluation methodologies for multi-agent predictive systems and measure the fairness characteristics of these systems in the financial tabular domain. Examining fairness metrics using large-scale simulations across diverse multi-agent configurations, with varying communication and collaboration mechanisms, we reveal patterns of emergent bias in financial decision-making that cannot be traced to individual agent components, indicating that multi-agent systems may exhibit genuinely collective behaviors. Our findings highlight that fairness risks in financial multi-agent systems represent a significant component of model risk, with tangible impacts on tasks such as credit scoring and income estimation. We advocate that multi-agent decision systems must be evaluated as holistic entities rather than through reductionist analyses of their constituent components.
Why we are recommending this paper?
Due to your Interest in: AI Fairness
This research tackles bias in multi-agent systems, a relevant area given the increasing use of AI in complex decision-making scenarios. The paperβs focus on evaluation methodologies directly addresses the userβs interest in understanding and mitigating bias risks within these systems.
Northeastern University
Abstract
AI technologies have rapidly moved into business and research applications that involve large text corpora, including computational journalism research and newsroom settings. These models, trained on extant data from various sources, can be conceptualized as historical artifacts that encode decades-old attitudes and stereotypes. This paper investigates one such example trained on the broadly-used New York Times Annotated Corpus to create a multi-label classifier. Our use in research settings surfaced the concerning "blacks" thematic topic label. Through quantitative and qualitative means we investigate this label's use in the training corpus, what concepts it might be encoding in the trained classifier, and how those concepts impact our model use. Via the application of explainable AI methods, we find that the "blacks" label operates partially as a general "racism detector" across some minoritized groups. However, it performs poorly against expectations on modern examples such as COVID-19 era anti-Asian hate stories, and reporting on the Black Lives Matter movement. This case study of interrogating embedded biases in a model reveals how similar applications in newsroom settings can lead to unexpected outputs that could impact a wide variety of potential uses of any large language model-story discovery, audience targeting, summarization, etc. The fundamental tension this exposes for newsrooms is how to adopt AI-enabled workflow tools while reducing the risk of reproducing historical biases in news coverage.
Why we are recommending this paper?
Due to your Interest in: Data Bias
This paper investigates the critical issue of bias embedded in historical training data, a core concern for the userβs interests in data bias and AI ethics. Itβs a direct investigation into the root causes of unfairness in AI systems.
University of Calgary
AI Insights - Participants showed an implicit understanding of fairness in AI/ML software, progressing from initial confusion with accuracy and robustness to recognizing its multiple dimensions. [2]
Abstract
Nowadays, Artificial Intelligence (AI), particularly Machine Learning (ML) and Large Language Models (LLMs), is widely applied across various contexts. However, the corresponding models often operate as black boxes, leading them to unintentionally act unfairly towards different demographic groups. This has led to a growing focus on fairness in AI software recently, alongside the traditional focus on the effectiveness of AI models. Through 26 semi-structured interviews with practitioners from different application domains and with varied backgrounds across 23 countries, we conducted research on fairness requirements in AI from software engineering perspective. Our study assesses the participants' awareness of fairness in AI / ML software and its application within the Software Development Life Cycle (SDLC), from translating fairness concerns into requirements to assessing their arising early in the SDLC. It also examines fairness through the key assessment dimensions of implementation, validation, evaluation, and how it is balanced with trade-offs involving other priorities, such as addressing all the software functionalities and meeting critical delivery deadlines. Findings of our thematic qualitative analysis show that while our participants recognize the aforementioned AI fairness dimensions, practices are inconsistent, and fairness is often deprioritized with noticeable knowledge gaps. This highlights the need for agreement with relevant stakeholders on well-defined, contextually appropriate fairness definitions, the corresponding evaluation metrics, and formalized processes to better integrate fairness into AI/ML projects.
Why we are recommending this paper?
Due to your Interest in: AI Fairness
This study offers practical insights into fairness requirements from developers, aligning with the user's interest in data fairness and AI transparency. Understanding the development process is crucial for addressing bias effectively.
University of Gothenburg
Abstract
With the rise of AI-enabled cyber-physical systems, data annotation has become a critical yet often overlooked process in the development of these intelligent information systems. Existing work in requirements engineering (RE) has explored how requirements for AI systems and their data can be represented. However, related interviews with industry professionals show that data annotations and their related requirements introduce distinct challenges, indicating a need for annotation-specific requirement representations. We propose the Data Annotation Requirements Representation and Specification (DARS), including an Annotation Negotiation Card to align stakeholders on objectives and constraints, and a Scenario-Based Annotation Specification to express atomic and verifiable data annotation requirements. We evaluate DARS with an automotive perception case related to an ongoing project, and a mapping against 18 real-world data annotation error types. The results suggest that DARS mitigates root causes of completeness, accuracy, and consistency annotation errors. By integrating DARS into RE, this work improves the reliability of safety-critical systems using data annotations and demonstrates how engineering frameworks must evolve for data-dependent components of today's intelligent information systems.
Why we are recommending this paper?
Due to your Interest in: Data Representation
This paper addresses the importance of data annotation, a foundational element in ensuring fairness and mitigating bias in AI systems. Given the userβs focus on data representation and data ethics, this work is highly relevant.
TH K oln
Abstract
Linguistic bias in online news and social media is widespread but difficult to measure. Yet, its identification and quantification remain difficult due to subjectivity, context dependence, and the scarcity of high-quality gold-label datasets. We aim to reduce annotation effort by leveraging pairwise comparison for bias annotation. To overcome the costliness of the approach, we evaluate more efficient implementations of pairwise comparison-based rating. We achieve this by investigating the effects of various rating techniques and the parameters of three cost-aware alternatives in a simulation environment. Since the approach can in principle be applied to both human and large language model annotation, our work provides a basis for creating high-quality benchmark datasets and for quantifying biases and other subjective linguistic aspects.
The controlled simulations include latent severity distributions, distance-calibrated noise, and synthetic annotator bias to probe robustness and cost-quality trade-offs. In applying the approach to human-labeled bias benchmark datasets, we then evaluate the most promising setups and compare them to direct assessment by large language models and unmodified pairwise comparison labels as baselines. Our findings support the use of pairwise comparison as a practical foundation for quantifying subjective linguistic aspects, enabling reproducible bias analysis. We contribute an optimization of comparison and matchmaking components, an end-to-end evaluation including simulation and real-data application, and an implementation blueprint for cost-aware large-scale annotation
AI Insights - They also discuss the challenges associated with pairwise comparisons, such as data quality issues and computational complexity. [3]
- Pairwise comparison: a method for evaluating the relative performance or quality of two items by comparing them directly. [3]
- Bradley-Terry model: a statistical model used to analyze paired comparison data, developed by R.A. [3]
- The discussion on the challenges associated with pairwise comparisons is limited to data quality issues and computational complexity. [3]
- The paper discusses the use of pairwise comparison methods in evaluating language models and their applications in various domains. [2]
- Bradley and M.E. [1]
Why we are recommending this paper?
Due to your Interest in: Data Bias
UNICAMP
Abstract
In Artificial Intelligence (AI), language models have gained significant importance due to the widespread adoption of systems capable of simulating realistic conversations with humans through text generation. Because of their impact on society, developing and deploying these language models must be done responsibly, with attention to their negative impacts and possible harms. In this scenario, the number of AI Ethics Tools (AIETs) publications has recently increased. These AIETs are designed to help developers, companies, governments, and other stakeholders establish trust, transparency, and responsibility with their technologies by bringing accepted values to guide AI's design, development, and use stages. However, many AIETs lack good documentation, examples of use, and proof of their effectiveness in practice. This paper presents a methodology for evaluating AIETs in language models. Our approach involved an extensive literature survey on 213 AIETs, and after applying inclusion and exclusion criteria, we selected four AIETs: Model Cards, ALTAI, FactSheets, and Harms Modeling. For evaluation, we applied AIETs to language models developed for the Portuguese language, conducting 35 hours of interviews with their developers. The evaluation considered the developers' perspective on the AIETs' use and quality in helping to identify ethical considerations about their model. The results suggest that the applied AIETs serve as a guide for formulating general ethical considerations about language models. However, we note that they do not address unique aspects of these models, such as idiomatic expressions. Additionally, these AIETs did not help to identify potential negative impacts of models for the Portuguese language.
AI Insights - Model Cards was preferred over Harms Modeling for ease of use and understanding. [3]
- The developers found Model Cards and Harms Modeling to be the most useful AIETs for identifying risks and generating responsible documentation. [2]
Why we are recommending this paper?
Due to your Interest in: AI Ethics
University College Dublin
Abstract
Major AI ethics guidelines and laws, including the EU AI Act, call for effective human oversight, but do not define it as a distinct and developable capacity. This paper introduces human oversight as a well-being capacity, situated within the emerging Well-being Efficacy framework. The concept integrates AI literacy, ethical discernment, and awareness of human needs, acknowledging that some needs may be conflicting or harmful. Because people inevitably project desires, fears, and interests into AI systems, oversight requires the competence to examine and, when necessary, restrain problematic demands.
The authors argue that the sustainable and cost-effective development of this capacity depends on its integration into education at every level, from professional training to lifelong learning. The frame of human oversight as a well-being capacity provides a practical path from high-level regulatory goals to the continuous cultivation of human agency and responsibility essential for safe and ethical AI. The paper establishes a theoretical foundation for future research on the pedagogical implementation and empirical validation of well-being effectiveness in multiple contexts.
AI Insights - The article does not provide a clear definition of human oversight in AI The authors do not discuss the potential challenges and limitations of implementing human oversight in AI The article focuses primarily on the technical aspects of human oversight, without considering the social and ethical implications The article discusses the concept of human oversight in artificial intelligence (AI) and its importance for ensuring accountability, transparency, and fairness in AI decision-making. [2]
Why we are recommending this paper?
Due to your Interest in: AI Ethics
University of ElectroCom
Abstract
This work focuses on quantitative verification of fairness in tree ensembles. Unlike traditional verification approaches that merely return a single counterexample when the fairness is violated, quantitative verification estimates the ratio of all counterexamples and characterizes the regions where they occur, which is important information for diagnosing and mitigating bias. To date, quantitative verification has been explored almost exclusively for deep neural networks (DNNs). Representative methods, such as DeepGemini and FairQuant, all build on the core idea of Counterexample-Guided Abstraction Refinement, a generic framework that could be adapted to other model classes. We extended the framework into a model-agnostic form, but discovered two limitations: (i) it can provide only lower bounds, and (ii) its performance scales poorly. Exploiting the discrete structure of tree ensembles, our work proposes an efficient quantification technique that delivers any-time upper and lower bounds. Experiments on five widely used datasets demonstrate its effectiveness and efficiency. When applied to fairness testing, our quantification method significantly outperforms state-of-the-art testing techniques.
AI Insights - These enhancements improve the efficiency and effectiveness of the approach. [3]
- BoxQTE is a novel approach for fairness testing of machine learning models, specifically designed for tree ensembles. [2]
Why we are recommending this paper?
Due to your Interest in: Data Fairness
University of Southampton
Abstract
A complex system comprises multiple interacting entities whose interdependencies form a unified whole, exhibiting emergent behaviours not present in individual components. Examples include the human brain, living cells, soft matter, Earth's climate, ecosystems, and the economy. These systems exhibit high-dimensional, non-linear dynamics, making their modelling, classification, and prediction particularly challenging. Advances in information technology have enabled data-driven approaches to studying such systems. However, the sheer volume and complexity of spatio-temporal data often hinder traditional methods like dimensionality reduction, phase-space reconstruction, and attractor characterisation. This paper introduces a geometric framework for analysing spatio-temporal data from complex systems, grounded in the theory of vector fields over discrete measure spaces. We propose a two-parameter family of metrics suitable for data analysis and machine learning applications. The framework supports time-dependent images, image gradients, and real- or vector-valued functions defined on graphs and simplicial complexes. We validate our approach using data from numerical simulations of biological and physical systems on flat and curved domains. Our results show that the proposed metrics, combined with multidimensional scaling, effectively address key analytical challenges. They enable dimensionality reduction, mode decomposition, phase-space reconstruction, and attractor characterisation. Our findings offer a robust pathway for understanding complex dynamical systems, especially in contexts where traditional modelling is impractical but abundant experimental data are available.
AI Insights - The method can be used to analyze high-dimensional unstructured data sets, including RGB images with high resolution. [3]
- The paper presents a geometric framework for analyzing high-dimensional spatio-temporal data from complex systems. [2]
- The approach is demonstrated on two case studies: the Ginzburg-Landau equation and the Gray-Scott equation. [1]
Why we are recommending this paper?
Due to your Interest in: Data Representation
Truebees
Abstract
AI is reshaping the landscape of multimedia forensics. We propose AI forensic agents: reliable orchestrators that select and combine forensic detectors, identify provenance and context, and provide uncertainty-aware assessments. We highlight pitfalls in current solutions and introduce a unified framework to improve the authenticity verification process.
AI Insights - Media forensics: The process of analyzing digital media to determine its authenticity and origin. [3]
- The proposed framework is expected to provide a more accurate and reliable method for detecting AI-generated media, which can help prevent the spread of misinformation and protect individuals from potential harm. [3]
- The proposed framework may be vulnerable to attacks that exploit its weaknesses, such as adversarial examples or data poisoning. [3]
- The authors cite several studies on AI-generated media detection, including works by Verdoliva et al. [3]
- The proposed framework is required to provide calibrated confidence scores and, when the available cues are weak or conflicting, to avoid making a definitive decision, thereby reducing the risk of overconfident predictions. [2]
Why we are recommending this paper?
Due to your Interest in: AI Transparency
UIUC
Abstract
Cutting-edge agentic AI systems are built on foundation models that can be adapted to plan, reason, and interact with external tools to perform increasingly complex and specialized tasks. As these systems grow in capability and scope, adaptation becomes a central mechanism for improving performance, reliability, and generalization. In this paper, we unify the rapidly expanding research landscape into a systematic framework that spans both agent adaptations and tool adaptations. We further decompose these into tool-execution-signaled and agent-output-signaled forms of agent adaptation, as well as agent-agnostic and agent-supervised forms of tool adaptation. We demonstrate that this framework helps clarify the design space of adaptation strategies in agentic AI, makes their trade-offs explicit, and provides practical guidance for selecting or switching among strategies during system design. We then review the representative approaches in each category, analyze their strengths and limitations, and highlight key open challenges and future opportunities. Overall, this paper aims to offer a conceptual foundation and practical roadmap for researchers and practitioners seeking to build more capable, efficient, and reliable agentic AI systems.
AI Insights - The four adaptation paradigms in agentic AI are A1 (agent adaptation with tool-execution result as signal), A2 (agent adaptation with agent output as signal), T1 (tool adaptation with agent output as signal), and T2 (tool adaptation with agent output as signal). [3]
- A1 methods use the actual outcomes of external tool invocations as feedback to refine an agent's behavior. [3]
- Recent A1 methods include Toolformer, TRICE, Gorilla, ToolAlpaca, and others, which have achieved state-of-the-art performance on various tasks such as question-answering, math reasoning, and web search. [3]
- The RLVR (Reinforcement Learning with Value Regularization) framework is a key component of many recent A1 methods, allowing for more efficient learning and better generalization. [3]
- A2 methods focus on evaluating an agent's own outputs, rather than relying on tool execution results as feedback. [3]
- The development timeline of A1 methods shows a shift from earlier methods such as SFT (Self-Modifying Task) and DPO (Dynamic Policy Optimization) to more recent RLVR-based methods. [3]
- Recent A1 methods have achieved state-of-the-art performance on various tasks, including question-answering, math reasoning, web search, and text-to-SQL. [3]
- The development timeline of A1 methods shows a rapid growth in research, with many new methods being proposed between 2023 and 2025. [2]
- T1 and T2 methods involve adapting tools based on the agent's output, which can be useful in scenarios where the agent needs to interact with multiple tools or environments. [1]
Why we are recommending this paper?
Due to your Interest in: AI Transparency
Meta
Abstract
Image data collected in the wild often contains private information such as faces and license plates, and responsible data release must ensure that this information stays hidden. At the same time, released data should retain its usefulness for model-training. The standard method for private information obfuscation in images is Gaussian blurring. In this work, we show that practical implementations of Gaussian blurring are reversible enough to break privacy. We then take a closer look at the privacy-utility tradeoffs offered by three other obfuscation algorithms -- pixelization, pixelization and noise addition (DP-Pix), and cropping. Privacy is evaluated by reversal and discrimination attacks, while utility by the quality of the learnt representations when the model is trained on data with obfuscated faces. We show that the most popular industry-standard method, Gaussian blur is the least private of the four -- being susceptible to reversal attacks in its practical low-precision implementations. In contrast, pixelization and pixelization plus noise addition, when used at the right level of granularity, offer both privacy and utility for a number of computer vision tasks. We make our proposed methods together with suggested parameters available in a software package called Privacy Blur.
AI Insights - Utility: The performance or accuracy of a model trained on obfuscated data. [3]
- The paper assumes that the obfuscated data is used for training models, but does not consider other use cases such as inference or testing. [3]
- Pixelization and DP-Pix provide a nice mix of both privacy and utility on downstream model training. [2]
- The proposed methods are made available in a software package called Privacy Blur. [1]
- The paper investigates the privacy-utility tradeoffs of different methods for obfuscating private information in images. [0]
- Gaussian blurring is found to be the least private and easily reversible. [0]
Why we are recommending this paper?
Due to your Interest in: Data Transparency
Chinese Academy of
Abstract
The rapid advancement of deep learning has turned models into highly valuable assets due to their reliance on massive data and costly training processes. However, these models are increasingly vulnerable to leakage and theft, highlighting the critical need for robust intellectual property protection. Model watermarking has emerged as an effective solution, with black-box watermarking gaining significant attention for its practicality and flexibility. Nonetheless, existing black-box methods often fail to better balance covertness (hiding the watermark to prevent detection and forgery) and robustness (ensuring the watermark resists removal)-two essential properties for real-world copyright verification. In this paper, we propose ComMark, a novel black-box model watermarking framework that leverages frequency-domain transformations to generate compressed, covert, and attack-resistant watermark samples by filtering out high-frequency information. To further enhance watermark robustness, our method incorporates simulated attack scenarios and a similarity loss during training. Comprehensive evaluations across diverse datasets and architectures demonstrate that ComMark achieves state-of-the-art performance in both covertness and robustness. Furthermore, we extend its applicability beyond image recognition to tasks including speech recognition, sentiment analysis, image generation, image captioning, and video recognition, underscoring its versatility and broad applicability.
Why we are recommending this paper?
Due to your Interest in: Data Transparency
Quantiphi Inc
Abstract
Large Language Models (LLMs) have empowered AI agents with advanced capabilities for understanding, reasoning, and interacting across diverse tasks. The addition of memory further enhances them by enabling continuity across interactions, learning from past experiences, and improving the relevance of actions and responses over time; termed as memory-enhanced personalization. Although such personalization through memory offers clear benefits, it also introduces risks of bias. While several previous studies have highlighted bias in ML and LLMs, bias due to memory-enhanced personalized agents is largely unexplored. Using recruitment as an example use case, we simulate the behavior of a memory-enhanced personalized agent, and study whether and how bias is introduced and amplified in and across various stages of operation. Our experiments on agents using safety-trained LLMs reveal that bias is systematically introduced and reinforced through personalization, emphasizing the need for additional protective measures or agent guardrails in memory-enhanced LLM-based AI agents.
Why we are recommending this paper?
Due to your Interest in: AI Bias
Aindo SpA
Abstract
Synthetic data generation is gaining traction as a privacy enhancing technology (PET). When properly generated, synthetic data preserve the analytic utility of real data while avoiding the retention of information that would allow the identification of specific individuals. However, the concept of data privacy remains elusive, making it challenging for practitioners to evaluate and benchmark the degree of privacy protection offered by synthetic data. In this paper, we propose a framework to empirically assess the efficacy of tabular synthetic data privacy quantification methods through controlled, deliberate risk insertion. To demonstrate this framework, we survey existing approaches to synthetic data privacy quantification and the related legal theory. We then apply the framework to the main privacy quantification methods with no-box threat models on publicly available datasets.
Why we are recommending this paper?
Due to your Interest in: Data Ethics
Help us improve your experience!
This project is on its early stages your feedback can be pivotal on the future of the project.
Let us know what you think about this week's papers and suggestions!
Give Feedback