Data Bias

An Empirical Survey of Model Merging Algorithms for Social Bias Mitigation

Mitsubishi Electric

Rate paper: 👍 👎 ♥ Save

Abstract
Large language models (LLMs) are known to inherit and even amplify societal biases present in their pre-training corpora, threatening fairness and social trust. To address this issue, recent work has explored ``editing'' LLM parameters to mitigate social bias with model merging approaches; however, there is no empirical comparison. In this work, we empirically survey seven algorithms: Linear, Karcher Mean, SLERP, NuSLERP, TIES, DELLA, and Nearswap, applying 13 open weight models in the GPT, LLaMA, and Qwen families. We perform a comprehensive evaluation using three bias datasets (BBQ, BOLD, and HONEST) and measure the impact of these techniques on LLM performance in downstream tasks of the SuperGLUE benchmark. We find a trade-off between bias reduction and downstream performance: methods achieving greater bias mitigation degrade accuracy, particularly on tasks requiring reading comprehension and commonsense and causal reasoning. Among the merging algorithms, Linear, SLERP, and Nearswap consistently reduce bias while maintaining overall performance, with SLERP at moderate interpolation weights emerging as the most balanced choice. These results highlight the potential of model merging algorithms for bias mitigation, while indicating that excessive debiasing or inappropriate merging methods may lead to the degradation of important linguistic abilities.

AI Summary

LLM: Large Language Model Stereotypical Bias: The tendency of models to perpetuate societal biases and stereotypes. [3]
SuperGLUE: A benchmark for evaluating the performance of language models on social bias tasks. [3]
BBQ, BOLD, HONEST: Social bias evaluation datasets. [3]
The paper investigates the effect of stereotypical bias on large language models (LLMs) and proposes a method to mitigate this issue. [2]

Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models

KAIST

Rate paper: 👍 👎 ♥ Save

Abstract
Large vision-language model (LVLM) based text-to-image (T2I) systems have become the dominant paradigm in image generation, yet whether they amplify social biases remains insufficiently understood. In this paper, we show that LVLM-based models produce markedly more socially biased images than non-LVLM-based models. We introduce a 1,024 prompt benchmark spanning four levels of linguistic complexity and evaluate demographic bias across multiple attributes in a systematic manner. Our analysis identifies system prompts, the predefined instructions guiding LVLMs, as a primary driver of biased behavior. Through decoded intermediate representations, token-probability diagnostics, and embedding-association analyses, we reveal how system prompts encode demographic priors that propagate into image synthesis. To this end, we propose FairPro, a training-free meta-prompting framework that enables LVLMs to self-audit and construct fairness-aware system prompts at test time. Experiments on two LVLM-based T2I models, SANA and Qwen-Image, show that FairPro substantially reduces demographic bias while preserving text-image alignment. We believe our findings provide deeper insight into the central role of system prompts in bias propagation and offer a practical, deployable approach for building more socially responsible T2I systems.

AI Ethics

The Ethics of Generative AI

TU Delft

Rate paper: 👍 👎 ♥ Save

Abstract
This chapter discusses the ethics of generative AI. It provides a technical primer to show how generative AI affords experiencing technology as if it were human, and this affordance provides a fruitful focus for the philosophical ethics of generative AI. It then shows how generative AI can both aggravate and alleviate familiar ethical concerns in AI ethics, including responsibility, privacy, bias and fairness, and forms of alienation and exploitation. Finally, the chapter examines ethical questions that arise specifically from generative AI's mimetic generativity, such as debates about authorship and credit, the emergence of as-if social relationships with machines, and new forms of influence, persuasion, and manipulation.

AI Summary

Generative AI systems can produce outputs that resemble meaningful human expression, making it natural for users to experience their outputs as if they were intentional or expressive. [3]
The affordance of experience-as-real is a feature that distinguishes generative AI from other forms of machine learning and connects the analysis to existing work in moral psychology and the philosophy of technology. [3]
Generative AI turns abstract philosophical puzzles into urgent design problems. [3]
Affordance: A feature that makes it natural for users to experience a technological system in a certain way. [3]
In this case, the affordance is the ability of generative AI systems to produce outputs that resemble meaningful human expression. [3]
Generative AI: A type of machine learning that enables systems to generate new content, such as text or images, based on patterns and structures learned from large datasets. [3]
The affordance of experience-as-real is a key feature of generative AI systems that has significant ethical implications. [3]
Generative AI raises questions about the value of authorship, the basis of interpersonal relationships, and the nature of permissible influence. [3]
Reflection on generative AI may shed light on aspects of human agency and communication that are often taken for granted. [3]
The article does not provide a comprehensive overview of the current state of research on generative AI. [3]
Ethical evaluation must take account of how users interpret and respond to system behavior, not only of the systems' internal mechanisms. [2]

A Human-centric Framework for Debating the Ethics of AI Consciousness Under Uncertainty

University of California

Rate paper: 👍 👎 ♥ Save

Abstract
As AI systems become increasingly sophisticated, questions about machine consciousness and its ethical implications have moved from fringe speculation to mainstream academic debate. Current ethical frameworks in this domain often implicitly rely on contested functionalist assumptions, prioritize speculative AI welfare over concrete human interests, and lack coherent theoretical foundations. We address these limitations through a structured three-level framework grounded in philosophical uncertainty. At the foundational level, we establish five factual determinations about AI consciousness alongside human-centralism as our meta-ethical stance. These foundations logically entail three operational principles: presumption of no consciousness (placing the burden of proof on consciousness claims), risk prudence (prioritizing human welfare under uncertainty), and transparent reasoning (enabling systematic evaluation and adaptation). At the application level, the third component of our framework, we derive default positions on pressing ethical questions through a transparent logical process where each position can be explicitly traced back to our foundational commitments. Our approach balances philosophical rigor with practical guidance, distinguishes consciousness from anthropomorphism, and creates pathways for responsible evolution as scientific understanding advances, providing a human-centric foundation for navigating these profound ethical challenges.

AI Fairness

The Effect of Enforcing Fairness on Reshaping Explanations in Machine Learning Models

University of Pittsburgh

Rate paper: 👍 👎 ♥ Save

Abstract
Trustworthy machine learning in healthcare requires strong predictive performance, fairness, and explanations. While it is known that improving fairness can affect predictive performance, little is known about how fairness improvements influence explainability, an essential ingredient for clinical trust. Clinicians may hesitate to rely on a model whose explanations shift after fairness constraints are applied. In this study, we examine how enhancing fairness through bias mitigation techniques reshapes Shapley-based feature rankings. We quantify changes in feature importance rankings after applying fairness constraints across three datasets: pediatric urinary tract infection risk, direct anticoagulant bleeding risk, and recidivism risk. We also evaluate multiple model classes on the stability of Shapley-based rankings. We find that increasing model fairness across racial subgroups can significantly alter feature importance rankings, sometimes in different ways across groups. These results highlight the need to jointly consider accuracy, fairness, and explainability in model assessment rather than in isolation.

AI Summary

Applying bias mitigation can substantially alter feature importance rankings, especially in complex, nonlinear models. [3]
Explainability: The ability to understand how a machine learning model makes predictions or decisions. [3]
SHAP (SHapley Additive exPlanations): An algorithm for explaining the output of a machine learning model by assigning a value to each feature for a specific prediction. [3]
The study highlights the complex trade-off between performance, fairness, and explainability in ML. [2]
EOD (Equal Opportunity Demographic): A fairness metric that measures whether a model's predictions are biased towards certain demographic groups. [1]

Data Fairness

Intervention Strategies for Fairness and Efficiency at Autonomous Single-Intersection Traffic Flows

KAUST

Rate paper: 👍 👎 ♥ Save

Abstract
Intersections present significant challenges in traffic management, where ensuring safety and efficiency is essential for effective flow. However, these goals are often achieved at the expense of fairness, which is critical for trustworthiness and long-term sustainability. This paper investigates how the timing of centralized intervention affects the management of autonomous agents at a signal-less, orthogonal intersection, while satisfying safety constraints, evaluating efficiency, and ensuring fairness. A mixed-integer linear programming (MILP) approach is used to optimize agent coordination within a circular control zone centered at the intersection. We introduce the concept of fairness, measured via pairwise reversal counts, and incorporate fairness constraints into the MILP framework. We then study the relationship between fairness and system efficiency and its impact on platoon formation. Finally, simulation studies analyze the effectiveness of early versus late intervention strategies and fairness-aware control, focusing on safe, efficient, and robust management of agents within the control zone.

Data Representation

Input-Output Data-Driven Representation: Non-Minimality and Stability

Seoul National University

Rate paper: 👍 👎 ♥ Save

Abstract
Many recent data-driven control approaches for linear time-invariant systems are based on finite-horizon prediction of output trajectories using input-output data matrices. When applied recursively, this predictor forms a dynamic system representation. This data-driven representation is generally non-minimal, containing latent poles in addition to the system's original poles. In this article, we show that these latent poles are guaranteed to be stable through the use of the Moore-Penrose inverses of the data matrices, regardless of the system's stability and even in the presence of small noise in data. This result obviates the need to eliminate the latent poles through procedures that resort to low-rank approximation in data-driven control and analysis. It is then applied to construct a stabilizable and detectable realization from data, from which we design an output feedback linear quadratic regulator (LQR) controller. Furthermore, we extend this principle to data-driven inversion, enabling asymptotic unknown input estimation for minimum-phase systems.

AI Summary

The proposed method uses the concept of Hankel matrices and their properties to derive an optimal controller that minimizes the LQR cost function. [3]
The results show that the proposed controller outperforms traditional methods in terms of stability and performance. [3]
The paper also discusses the effect of the parameter N on the performance of the controller, showing that larger values of N lead to better results. [3]
Hankel matrix LQR cost function SISO system [3]
The paper presents a novel approach to designing input-output controllers using data-driven methods. [1]

Geometric Data Science

Springer

Rate paper: 👍 👎 ♥ Save

Abstract
This book introduces the new research area of Geometric Data Science, where data can represent any real objects through geometric measurements. The first part of the book focuses on finite point sets. The most important result is a complete and continuous classification of all finite clouds of unordered points under rigid motion in any Euclidean space. The key challenge was to avoid the exponential complexity arising from permutations of the given unordered points. For a fixed dimension of the ambient Euclidean space, the times of all algorithms for the resulting invariants and distance metrics depend polynomially on the number of points. The second part of the book advances a similar classification in the much more difficult case of periodic point sets, which model all periodic crystals at the atomic scale. The most significant result is the hierarchy of invariants from the ultra-fast to complete ones. The key challenge was to resolve the discontinuity of crystal representations that break down under almost any noise. Experimental validation on all major materials databases confirmed the Crystal Isometry Principle: any real periodic crystal has a unique location in a common moduli space of all periodic structures under rigid motion. The resulting moduli space contains all known and not yet discovered periodic crystals and hence continuously extends Mendeleev's table to the full crystal universe.

AI Transparency

Self-Improving AI Agents through Self-Play

ulamai

Rate paper: 👍 👎 ♥ Save

Abstract
We extend the moduli-theoretic framework of psychometric batteries to the domain of dynamical systems. While previous work established the AAI capability score as a static functional on the space of agent representations, this paper formalizes the agent as a flow $ν_r$ parameterized by computational resource $r$, governed by a recursive Generator-Verifier-Updater (GVU) operator. We prove that this operator generates a vector field on the parameter manifold $Θ$, and we identify the coefficient of self-improvement $κ$ as the Lie derivative of the capability functional along this flow. The central contribution of this work is the derivation of the Variance Inequality, a spectral condition that is sufficient (under mild regularity) for the stability of self-improvement. We show that a sufficient condition for $κ> 0$ is that, up to curvature and step-size effects, the combined noise of generation and verification must be small enough. We then apply this formalism to unify the recent literature on Language Self-Play (LSP), Self-Correction, and Synthetic Data bootstrapping. We demonstrate that architectures such as STaR, SPIN, Reflexion, GANs and AlphaZero are specific topological realizations of the GVU operator that satisfy the Variance Inequality through filtration, adversarial discrimination, or grounding in formal systems.

AI Summary

The GVU framework is used to analyze the stability of self-improvement in AI systems. [3]
The Variance Inequality (Theorem 4.1) provides a sufficient condition for stable self-improvement, requiring a high Signal-to-Noise Ratio (SNR) for both the generator and the verifier. [3]
AI slop event at parameter θ AI slop mass and slop regime The paper provides a framework for understanding the stability of self-improvement in AI systems, highlighting the importance of high SNR for both generators and verifiers. [3]
The paper defines AI slop as a region where the internal Verifier ranks outputs among its top fraction, but they actually lie in the bottom fraction of the true battery score. [2]
The paper introduces the Generalized Verifier-Generator Update (GVU) framework, which models the interaction between a generator and its verifier. [1]

InnoGym: Benchmarking the Innovation Potential of AI Agents

Zhejiang University

Rate paper: 👍 👎 ♥ Save

Abstract
LLMs and Agents have achieved impressive progress in code generation, mathematical reasoning, and scientific discovery. However, existing benchmarks primarily measure correctness, overlooking the diversity of methods behind solutions. True innovation depends not only on producing correct answers but also on the originality of the approach. We present InnoGym, the first benchmark and framework designed to systematically evaluate the innovation potential of AI agents. InnoGym introduces two complementary metrics: performance gain, which measures improvement over the best-known solutions, and novelty, which captures methodological differences from prior approaches. The benchmark includes 18 carefully curated tasks from real-world engineering and scientific domains, each standardized through resource filtering, evaluator validation, and solution collection. In addition, we provide iGym, a unified execution environment for reproducible and long-horizon evaluations. Extensive experiments show that while some agents produce novel approaches, their lack of robustness limits performance gains. These results highlight a key gap between creativity and effectiveness, underscoring the need for benchmarks that evaluate both.

AI Summary

The proposed benchmark may not capture all aspects of human intelligence, such as common sense or creativity. [3]
These models are like super-smart computers that can understand and generate human-like text. [3]
The authors want to make sure these models can solve math problems, which is an important part of being intelligent. [3]
The paper discusses the challenges of evaluating large language models (LLMs) and proposes a new benchmark for measuring their performance. [2]

Data Transparency

How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy

Google

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
High quality data is needed to unlock the full potential of AI for end users. However finding new sources of such data is getting harder: most publicly-available human generated data will soon have been used. Additionally, publicly available data often is not representative of users of a particular system -- for example, a research speech dataset of contractors interacting with an AI assistant will likely be more homogeneous, well articulated and self-censored than real world commands that end users will issue. Therefore unlocking high-quality data grounded in real user interactions is of vital interest. However, the direct use of user data comes with significant privacy risks. Differential Privacy (DP) is a well established framework for reasoning about and limiting information leakage, and is a gold standard for protecting user privacy. The focus of this work, \emph{Differentially Private Synthetic data}, refers to synthetic data that preserves the overall trends of source data,, while providing strong privacy guarantees to individuals that contributed to the source dataset. DP synthetic data can unlock the value of datasets that have previously been inaccessible due to privacy concerns and can replace the use of sensitive datasets that previously have only had rudimentary protections like ad-hoc rule-based anonymization. In this paper we explore the full suite of techniques surrounding DP synthetic data, the types of privacy protections they offer and the state-of-the-art for various modalities (image, tabular, text and decentralized). We outline all the components needed in a system that generates DP synthetic data, from sensitive data handling and preparation, to tracking the use and empirical privacy testing. We hope that work will result in increased adoption of DP synthetic data, spur additional research and increase trust in DP synthetic data approaches.

AI Summary

The choice of privacy unit is crucial in DP, and it determines the scope of the privacy guarantee. [3]
Empirical privacy auditing is a valuable tool in calibrating assessments, as carefully constructed audits can measure the effectiveness of attacks that are more realistic than the worse-case assumptions of DP. [3]
Privacy Unit: The scope of the privacy guarantee, which determines how the original data must be prepared for creating its DP synthetic version. [3]
Example-Level Privacy Unit: Neighboring datasets from the DP definition are defined as two datasets that differ in one record/instance/example only. [3]
User-Level Privacy Unit: Neighboring datasets in this setting will differ in inclusion/exclusion of all the data from a particular user. [3]
Larger-Than-User-Level Privacy Unit: A larger privacy unit can be appropriate when private data is constructed from data from multiple groups. [3]
The most commonly used privacy units are example-level, user-level, larger-than-user-level, and sub-user-level. [2]
Differential privacy (DP) is a mathematical framework for ensuring that individual data points are not identifiable in a dataset. [1]

Factuality and Transparency Are All RAG Needs! Self-Explaining Contrastive Evidence Re-ranking

So Paulo State Universty

Rate paper: 👍 👎 ♥ Save

Abstract
This extended abstract introduces Self-Explaining Contrastive Evidence Re-Ranking (CER), a novel method that restructures retrieval around factual evidence by fine-tuning embeddings with contrastive learning and generating token-level attribution rationales for each retrieved passage. Hard negatives are automatically selected using a subjectivity-based criterion, forcing the model to pull factual rationales closer while pushing subjective or misleading explanations apart. As a result, the method creates an embedding space explicitly aligned with evidential reasoning. We evaluated our method on clinical trial reports, and initial experimental results show that CER improves retrieval accuracy, mitigates the potential for hallucinations in RAG systems, and provides transparent, evidence-based retrieval that enhances reliability, especially in safety-critical domains.

AI Bias

Humans incorrectly reject confident accusatory AI judgments

Tilburg University

Rate paper: 👍 👎 ♥ Save

Abstract
Automated verbal deception detection using methods from Artificial Intelligence (AI) has been shown to outperform humans in disentangling lies from truths. Research suggests that transparency and interpretability of computational methods tend to increase human acceptance of using AI to support decisions. However, the extent to which humans accept AI judgments for deception detection remains unclear. We experimentally examined how an AI model's accuracy (i.e., its overall performance in deception detection) and confidence (i.e., the model's uncertainty in single-statements predictions) influence human adoption of the model's judgments. Participants (n=373) were presented with veracity judgments of an AI model with high or low overall accuracy and various degrees of prediction confidence. The results showed that humans followed predictions from a highly accurate model more than from a less accurate one. Interestingly, the more confident the model, the more people deviated from it, especially if the model predicted deception. We also found that human interaction with algorithmic predictions either worsened the machine's performance or was ineffective. While this human aversion to accept highly confident algorithmic predictions was partly explained by participants' tendency to overestimate humans' deception detection abilities, we also discuss how truth-default theory and the social costs of accusing someone of lying help explain the findings.

AI Summary

Humans tend to reject AI judgments more when these are made with high confidence, especially when predicting deception. [3]
High-confidence aversion effect: humans may distrust AI predictions due to overestimating their own ability in detecting deception or perceiving the model's judgments as overconfidence. [3]
Hybrid performance: human oversight on machine predictions can worsen algorithmic performance rather than improve it. [3]
Truth Default Theory (TDT): people generally assume honesty unless they have reasons to suspect otherwise. [3]
Human-AI hybrid deception detection: combining human judgment with AI-based predictions for detecting deception. [3]
Humans tend to reject high-confidence AI judgments, especially when predicting deception. [3]
High-confidence aversion effect may be due to overestimating human ability in detecting deception or perceiving the model's judgments as overconfidence. [3]
Hybrid performance can worsen algorithmic performance rather than improve it. [3]
The study's findings may be influenced by the specific task and context used in the experiment. [2]

Polarization by Design: How Elites Could Shape Mass Preferences as AI Reduces Persuasion Costs

University of Chicago

Rate paper: 👍 👎 ♥ Save

Abstract
In democracies, major policy decisions typically require some form of majority or consensus, so elites must secure mass support to govern. Historically, elites could shape support only through limited instruments like schooling and mass media; advances in AI-driven persuasion sharply reduce the cost and increase the precision of shaping public opinion, making the distribution of preferences itself an object of deliberate design. We develop a dynamic model in which elites choose how much to reshape the distribution of policy preferences, subject to persuasion costs and a majority rule constraint. With a single elite, any optimal intervention tends to push society toward more polarized opinion profiles - a ``polarization pull'' - and improvements in persuasion technology accelerate this drift. When two opposed elites alternate in power, the same technology also creates incentives to park society in ``semi-lock'' regions where opinions are more cohesive and harder for a rival to overturn, so advances in persuasion can either heighten or dampen polarization depending on the environment. Taken together, cheaper persuasion technologies recast polarization as a strategic instrument of governance rather than a purely emergent social byproduct, with important implications for democratic stability as AI capabilities advance.

AI Summary

The paper introduces a novel mechanism that emerges from the tension between policies requiring public support and shifting public opinion being costly. [3]
The rise of AI-driven persuasion technologies marks a critical juncture for democratic governance, and understanding how these tools interact with elite incentives is essential for ensuring that advances in AI strengthen rather than destabilize democratic institutions. [3]
The effects of AI-driven persuasion technologies on democratic governance depend on the driving forces, including persuasion costs, control persistence, and policy volatility. [3]
Polarization becomes optimal from the elite's perspective because it minimizes the costs of aligning public preferences with future policy shocks. [2]
Strategic insurance is the driving force behind polarization, as elites face strong incentives to manufacture polarization when they must secure majority approval under uncertainty. [1]

Interests not found

Help us improve your experience!