Hi!

Your personalized paper recommendations for 24 to 28 November, 2025.
🎯 Top Personalized Recommendations
Rate paper: 👍 👎 ♥ Save
AI Summary
  • The magnitude of these improvements depends on the specific objectives (metrics) used during optimization. [3]
  • Reweighting: a process that aims to improve model fairness by modulating the relative influence of different subgroups within a dataset in the loss function used by the model. [3]
  • The magnitude of these improvements depends on the specific objectives (metrics) used during optimization. [3]
  • Future work should investigate additional metric combinations, analyze the mechanisms underlying metric-dependent improvements, and explore the generalizability of the EW approach across different model architectures and application domains. [3]
  • The paper does not investigate additional metric combinations. [3]
  • Evolutionary optimization of sample weights via NSGA-II can enhance fairness without substantial loss in predictive accuracy. [2]
  • Evolved Weights (EW): a Genetic Algorithm approach based on NSGA-II for evolving sample weights. [1]
Abstract
Machine learning models trained on real-world data may inadvertently make biased predictions that negatively impact marginalized communities. Reweighting is a method that can mitigate such bias in model predictions by assigning a weight to each data point used during model training. In this paper, we compare three methods for generating these weights: (1) evolving them using a Genetic Algorithm (GA), (2) computing them using only dataset characteristics, and (3) assigning equal weights to all data points. Model performance under each strategy was evaluated using paired predictive and fairness metrics, which also served as optimization objectives for the GA during evolution. Specifically, we used two predictive metrics (accuracy and area under the Receiver Operating Characteristic curve) and two fairness metrics (demographic parity difference and subgroup false negative fairness). Using experiments on eleven publicly available datasets (including two medical datasets), we show that evolved sample weights can produce models that achieve better trade-offs between fairness and predictive performance than alternative weighting methods. However, the magnitude of these benefits depends strongly on the choice of optimization objectives. Our experiments reveal that optimizing with accuracy and demographic parity difference metrics yields the largest number of datasets for which evolved weights are significantly better than other weighting strategies in optimizing both objectives.
Why we think this paper is great for you:
This paper directly addresses bias mitigation in machine learning models, offering practical approaches to ensure fairer outcomes. It aligns perfectly with your interest in understanding and tackling AI and data bias.
Rate paper: 👍 👎 ♥ Save
AI Summary
  • Interpretable clustering: A type of clustering where the clusters are represented by decision trees or other interpretable models. [3]
  • They also emphasize the need for further research in this area, particularly in developing methods that can handle large datasets and provide interpretable results. [3]
  • The paper discusses the importance of interpretable clustering and proposes a method for fair clustering using decision trees. [2]
Abstract
Fair clustering has gained increasing attention in recent years, especially in applications involving socially sensitive attributes. However, existing fair clustering methods often lack interpretability, limiting their applicability in high-stakes scenarios where understanding the rationale behind clustering decisions is essential. In this work, we address this limitation by proposing an interpretable and fair clustering framework, which integrates fairness constraints into the structure of decision trees. Our approach constructs interpretable decision trees that partition the data while ensuring fair treatment across protected groups. To further enhance the practicality of our framework, we also introduce a variant that requires no fairness hyperparameter tuning, achieved through post-pruning a tree constructed without fairness constraints. Extensive experiments on both real-world and synthetic datasets demonstrate that our method not only delivers competitive clustering performance and improved fairness, but also offers additional advantages such as interpretability and the ability to handle multiple sensitive attributes. These strengths enable our method to perform robustly under complex fairness constraints, opening new possibilities for equitable and transparent clustering.
Why we think this paper is great for you:
This work combines 'fair clustering' with 'interpretability,' directly addressing your interests in AI fairness, data fairness, and transparency in high-stakes applications. It offers insights into making fair decisions understandable.
Rate paper: 👍 👎 ♥ Save
AI Summary
  • The paper proposes a new algorithm called DP2DP (Differentially Private and Fair Deep Learning) that combines differential privacy and fairness. [3]
  • The algorithm is designed for deep learning tasks and uses a Lagrangian dual approach to achieve both privacy and fairness. [3]
  • Fairness: The concept of ensuring that an algorithm's output is unbiased and does not discriminate against certain groups or individuals. [2]
  • The authors provide a theoretical analysis of the algorithm's privacy and fairness properties, as well as experimental results on benchmark datasets. [1]
Abstract
The increasing use of machine learning in sensitive applications demands algorithms that simultaneously preserve data privacy and ensure fairness across potentially sensitive sub-populations. While privacy and fairness have each been extensively studied, their joint treatment remains poorly understood. Existing research often frames them as conflicting objectives, with multiple studies suggesting that strong privacy notions such as differential privacy inevitably compromise fairness. In this work, we challenge that perspective by showing that differential privacy can be integrated into a fairness-enhancing pipeline with minimal impact on fairness guarantees. We design a postprocessing algorithm, called DP2DP, that enforces both demographic parity and differential privacy. Our analysis reveals that our algorithm converges towards its demographic parity objective at essentially the same rate (up logarithmic factor) as the best non-private methods from the literature. Experiments on both synthetic and real datasets confirm our theoretical results, showing that the proposed algorithm achieves state-of-the-art accuracy/fairness/privacy trade-offs.
Why we think this paper is great for you:
This paper tackles the critical intersection of fairness and privacy, which is highly relevant to your interests in AI fairness, data fairness, and data ethics. It explores how to achieve both simultaneously in sensitive applications.
Rate paper: 👍 👎 ♥ Save
AI Summary
  • The model assumes that students are assigned to schools based solely on their geographical location, which may not accurately reflect real-world scenarios. [3]
  • The price of fairness is substantial, with all seven demographic groups experiencing significant increases in travel time relative to the unconstrained model. [2]
Abstract
Socioeconomic segregation often arises in school districting and other contexts, causing some groups to be over- or under-represented within a particular district. This phenomenon is closely linked with disparities in opportunities and outcomes. We formulate a new class of geographical partitioning problems in which the population is heterogeneous, and it is necessary to ensure fair representation for each group at each facility. We prove that the optimal solution is a novel generalization of the additively weighted Voronoi diagram, and we propose a simple and efficient algorithm to compute it, thus resolving an open question dating back to Dvoretzky et al. (1951). The efficacy and potential for practical insight of the approach are demonstrated in a realistic case study involving seven demographic groups and $78$ district offices.
Why we think this paper is great for you:
Focusing on individual and group fairness in real-world scenarios like socioeconomic segregation, this paper is highly relevant to your interests in AI fairness, data fairness, and data bias. It explores how disparities can arise and be addressed.
Rate paper: 👍 👎 ♥ Save
AI Summary
  • The problem of popularity bias in recommendation systems can be addressed by analyzing the spectral properties of the system's matrix. [3]
  • Popularity bias: The phenomenon where popular items are over-represented in recommendation systems, leading to an uneven distribution of popularity among items. [3]
  • Spectral properties: The study of the eigenvalues and eigenvectors of a matrix, which can provide insights into the behavior of the system. [3]
  • The spectral properties of recommendation systems can be used to analyze and address popularity bias. [3]
  • The analysis of spectral properties may not always provide a clear understanding of popularity bias. [3]
  • A lower and upper bound for the k-th eigenvalue of a matrix can be found using linear programming techniques. [2]
Abstract
We are extending Popularity Bias Memorization theorem from arXiv:archive/2404.12008 in several directions. We extend it to arbitrary degree distributions and also prove both upper and lower estimates for the alignment with top-k singular hyperspace.
Why we think this paper is great for you:
This paper delves into 'Popularity Bias,' a specific type of bias, making it a strong match for your interests in data bias and AI bias. It provides theoretical estimates for understanding bias alignment.
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
AI Summary
  • The article discusses the Model Context Protocol (MCP) and its performance evaluation in serving model cards, comparing it to REST protocol. [2]
  • FAIR Signposting Profile: Implementation guidelines for exposing machine-actionable navigation links using standardized HTTP headers and HTML link elements. [1]
Abstract
AI/ML model cards can contain a benchmarked evaluation of an AI/ML model against intended use but a one time assessment during model training does not get at how and where a model is actually used over its lifetime. Through Patra Model Cards embedded in the ICICLE AI Institute software ecosystem we study model cards as dynamic objects. The study reported here assesses the benefits and tradeoffs of adopting the Model Context Protocol (MCP) as an interface to the Patra Model Card server. Quantitative assessment shows the overhead of MCP as compared to a REST interface. The core question however is of active sessions enabled by MCP; this is a qualitative question of fit and use in the context of dynamic model cards that we address as well.
Why we think this paper is great for you:
Model Cards are a key tool for promoting AI transparency and AI ethics by providing structured documentation about model performance and limitations. This paper explores their application in real-world AI systems.
Rate paper: 👍 👎 ♥ Save
AI Summary
  • The rise of digital ghosts and deadbots forces us to confront fundamental questions about how we remember our dead. [3]
  • Digital ghosts can become the blind spot between memory and trickery, a prolonged mourning disguised as dialogue. [3]
  • Deadbots: AI replicas that simulate the presence of deceased individuals. [3]
  • Digital ghosts: AI-generated representations of deceased people that can interact with the living. [3]
  • The era of AI 'afterlives' is here, and it falls upon us to ensure this technology is used in a way that supports memory without becoming an imposture, and helps heal without betraying the dignity of those we love and lose. [3]
  • The article cites various studies and papers on the topic of digital ghosts and deadbots, including works by authors such as Jed R. [3]
  • From an ethical standpoint, arguably intent and transparency make a substantial difference in creating or engaging with a simulacrum of a loved one. [2]
  • Brubaker and John Danaher. [1]
Abstract
Advances in artificial intelligence now make it possible to simulate the dead through chatbots, voice clones, and video avatars trained on a person's digital traces. These "digital ghosts" are moving from fiction to commercial reality, reshaping how people mourn and remember. This paper offers a conceptual and ethical analysis of AI-mediated digital afterlives. We define what counts as a digital ghost, trace their rise across personal, commercial, and institutional contexts, and identify core ethical tensions around grief and well-being, truthfulness and deception, consent and posthumous privacy, dignity and misrepresentation, and the commercialization of mourning. To analyze these challenges, we propose a nine-dimensional taxonomy of digital afterlife technologies and, building on it, outline the features of an ethically acceptable digital ghost: premortem intent, mutual consent, transparent and limited data use, clear disclosure, restricted purposes and access, family or estate stewardship, and minimal behavioral agency. We argue for targeted regulation and professional guidelines to ensure that digital ghosts can aid remembrance without slipping into forms of deception.
Why we think this paper is great for you:
This paper explores 'Ethical AI' in a unique and thought-provoking context, directly aligning with your broader interest in AI ethics. It raises important questions about the societal implications of advanced AI.
AI Ethics
Rate paper: 👍 👎 ♥ Save
Abstract
In AI, the existential risk denotes the hypothetical threat posed by an artificial system that would possess both the capability and the objective, either directly or indirectly, to eradicate humanity. This issue is gaining prominence in scientific debate due to recent technical advancements and increased media coverage. In parallel, AI progress has sparked speculation and studies about the potential emergence of artificial consciousness. The two questions, AI consciousness and existential risk, are sometimes conflated, as if the former entailed the latter. Here, I explain that this view stems from a common confusion between consciousness and intelligence. Yet these two properties are empirically and theoretically distinct. Arguably, while intelligence is a direct predictor of an AI system's existential threat, consciousness is not. There are, however, certain incidental scenarios in which consciousness could influence existential risk, in either direction. Consciousness could be viewed as a means towards AI alignment, thereby lowering existential risk; or, it could be a precondition for reaching certain capabilities or levels of intelligence, and thus positively related to existential risk. Recognizing these distinctions can help AI safety researchers and public policymakers focus on the most pressing issues.
AI Summary
  • The concept of artificial consciousness is being explored in various fields, including AI research, neuroscience, and philosophy. [3]
  • Artificial consciousness: The ability of a machine or computer program to possess consciousness, which is often defined as subjective experience, self-awareness, and the ability to have thoughts and feelings. [3]
  • Consciousness: A complex and multifaceted concept that refers to the state of being aware of one's surroundings, thoughts, and emotions. [3]
  • Selection-broadcast cycle structure: A hypothetical mechanism that proposes how the global workspace integrates information from various sources to generate conscious experience. [3]
  • Researchers are debating whether large language models (LLMs) can be considered conscious or intelligent. [2]
  • Some experts argue that LLMs lack the ability to truly understand and reason about their environment, while others propose that they may possess a form of consciousness. [1]
AI Fairness
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
This paper presents a computational account of how legal norms can influence the behavior of artificial intelligence (AI) agents, grounded in the active inference framework (AIF) that is informed by principles of economic legal analysis (ELA). The ensuing model aims to capture the complexity of human decision-making under legal constraints, offering a candidate mechanism for agent governance in AI systems, that is, the (auto)regulation of AI agents themselves rather than human actors in the AI industry. We propose that lawful and norm-sensitive AI behavior can be achieved through regulation by design, where agents are endowed with intentional control systems, or behavioral safety valves, that guide real-time decisions in accordance with normative expectations. To illustrate this, we simulate an autonomous driving scenario in which an AI agent must decide when to yield the right of way by balancing competing legal and pragmatic imperatives. The model formalizes how AIF can implement context-dependent preferences to resolve such conflicts, linking this mechanism to the conception of law as a scaffold for rational decision-making under uncertainty. We conclude by discussing how context-dependent preferences could function as safety mechanisms for autonomous agents, enhancing lawful alignment and risk mitigation in AI governance.
AI Summary
  • The paper presents an Active Inference Framework (AIF) driven agent that can act adequately in context, facing normative conflicts, similar to human agents. [3]
  • Context-dependent preferences allow AIF-driven agents to make nuanced decisions based on the current context. [3]
  • Gamma dynamics reflect the affective aspect of belief updating in human subjects, where valence and arousal emerge from precision-weighted prediction-error flow and belief updating about policies. [3]
  • Active Inference Framework (AIF): A computational framework that models how agents make decisions based on their internal state and external environment. [3]
  • Context-dependent preferences: Preferences that change depending on the current context, allowing the agent to adapt its behavior accordingly. [3]
  • Low confidence is conducive to vigilance and allows for normatively appropriate conduct in context. [2]
  • Precision: A measure of confidence in a decision or policy, with higher precision indicating greater certainty. [1]
Data Representation
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
A wiring diagram is a labeled directed graph that represents an abstract concept such as a temporal process. In this article, we introduce the notion of a quasi-skeleton wiring diagram graph, and prove that quasi-skeleton wiring diagram graphs correspond to Hasse diagrams. Using this result, we designed algorithms that extract wiring diagrams from sequential data. We used our algorithms in analyzing the behavior of an autonomous agent playing a computer game, and the algorithms correctly identified the winning strategies. We compared the performance of our main algorithm with two other algorithms based on standard clustering techniques (DBSCAN and agglomerative hierarchical), including when some of the data was perturbed. Overall, this article brings together techniques in category theory, graph theory, clustering, reinforcement learning, and data engineering.
AI Summary
  • A proximal policy optimization (PPO) algorithm was used to train a computer agent through reinforcement learning, producing 284 episodes of the agent playing the game. [3]
  • The problem is to enable an autonomous system to learn what it means to perform a concept by observing multiple instances of the same concept. [2]
Rate paper: 👍 👎 ♥ Save
Abstract
The DataSquad at Carleton College addresses a common problem at small liberal arts colleges: limited capacity for data services and few opportunities for students to gain practical experience with data and software development. Academic Technologist Paula Lackie designed the program as a work-study position that trains undergraduates through structured peer mentorship and real client projects. Students tackle data problems of increasing complexity-from basic data analysis to software development-while learning FAIR data principles and open science practices. The model's core components (peer mentorship structure, project-based learning, and communication training) make it adaptable to other institutions. UCLA and other colleges have adopted the model using openly shared materials through "DataSquad International." This paper describes the program's implementation at Carleton College and examines how structured peer mentorship can simultaneously improve institutional data services and provide students with professional skills and confidence.
AI Summary
  • The DataSquad program has been highly effective in providing students with practical experience and skills in data science, software engineering, and project management. [3]
  • The program's emphasis on teamwork, communication, and client interaction has helped students develop valuable soft skills. [3]
  • The DataSquad environment is highly encouraging, with 100% of alumni reporting that they felt encouraged to participate in the program. [3]
  • Statistical Analysis: Collecting, exploring and presenting large amounts of data to discover underlying patterns and trends Database Design/Cloud Systems: Designing a safe place to capture your data (in SQL or other), working with data capture or management tools like Qualtrics or Google Forms Coding, Software Engineering: Using programming languages, such as Python, R, etc., and utilizing file management tools like Git Project Management/Planning: Organizing tasks, managing time, and coordinating resources to achieve goals Effective Teamwork: Collaborating well with others, supporting teammates, and achieving shared objectives [3]
  • Many students experienced multiple roles during their tenure, gaining breadth across the program's offerings. [2]
Data Transparency
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
Differential Privacy (DP) is a widely adopted standard for privacy-preserving data analysis, but it assumes a uniform privacy budget across all records, limiting its applicability when privacy requirements vary with data values. Per-record Differential Privacy (PrDP) addresses this by defining the privacy budget as a function of each record, offering better alignment with real-world needs. However, the dependency between the privacy budget and the data value introduces challenges in protecting the budget's privacy itself. Existing solutions either handle specific privacy functions or adopt relaxed PrDP definitions. A simple workaround is to use the global minimum of the privacy function, but this severely degrades utility, as the minimum is often set extremely low to account for rare records with high privacy needs. In this work, we propose a general and practical framework that enables any standard DP mechanism to support PrDP, with error depending only on the minimal privacy requirement among records actually present in the dataset. Since directly revealing this minimum may leak information, we introduce a core technique called privacy-specified domain partitioning, which ensures accurate estimation without compromising privacy. We also extend our framework to the local DP setting via a novel technique, privacy-specified query augmentation. Using our framework, we present the first PrDP solutions for fundamental tasks such as count, sum, and maximum estimation. Experimental results show that our mechanisms achieve high utility and significantly outperform existing Personalized DP (PDP) methods, which can be viewed as a special case of PrDP with relaxed privacy protection.
AI Summary
  • Local differential privacy (LDP): A model of differential privacy where each party processes their record locally and then sends the result to the analyzer directly. [3]
  • The paper demonstrates the applicability of the framework through several case studies, including sum estimation and distinct count. [2]
  • The paper proposes a general framework for per-record differential privacy, which provides a solution to the problem of achieving differential privacy with varying privacy budgets across individuals. [1]
Rate paper: 👍 👎 ♥ Save
Abstract
With the rapid expansion of data lakes storing health data and hosting AI algorithms, a prominent concern arises: how safe is it to export machine learning models from these data lakes? In particular, deep network models, widely used for health data processing, encode information from their training dataset, potentially leading to the leakage of sensitive information upon its export. This paper thoroughly examines this issue in the context of medical imaging data and introduces a novel data exfiltration attack based on image compression techniques. This attack, termed Data Exfiltration by Compression, requires only access to a data lake and is based on lossless or lossy image compression methods. Unlike previous data exfiltration attacks, it is compatible with any image processing task and depends solely on an exported network model without requiring any additional information to be collected during the training process. We explore various scenarios, and techniques to limit the size of the exported model and conceal the compression codes within the network. Using two public datasets of CT and MR images, we demonstrate that this attack can effectively steal medical images and reconstruct them outside the data lake with high fidelity, achieving an optimal balance between compression and reconstruction quality. Additionally, we investigate the impact of basic differential privacy measures, such as adding Gaussian noise to the model parameters, to prevent the Data Exfiltration by Compression Attack. We also show how the attacker can make their attack resilient to differential privacy at the expense of decreasing the number of stolen images. Lastly, we propose an alternative prevention strategy by fine-tuning the model to be exported.
AI Bias
Rate paper: 👍 👎 ♥ Save
Abstract
The Adam optimizer is a cornerstone of modern deep learning, yet the empirical necessity of each of its individual components is often taken for granted. This paper presents a focused investigation into the role of bias-correction, a feature whose contribution remains poorly understood. Through a series of systematic ablations on vision and language modelling tasks, we demonstrate that the conventional wisdom surrounding bias correction is misleading. In particular, we demonstrate that in the optimal hyper-parameter configuration, the inclusion of bias correction leads to no improvement in final test performance. Moreover, unless appropriate learning rate scheduling is implemented, the inclusion of bias correction can sometimes be detrimental to performance. We further reinterpret bias correction as a form of implicit learning rate scheduling whose behaviour is strongly dependent on the choice of smoothing hyper-parameters $β_1, β_2 \in [0,1)$. Our findings challenge the universal inclusion of this component.
AI Summary
  • The paper investigates the effect of removing bias correction in Adam optimization. [3]
  • Results show that removing bias correction has no significant impact on performance in both language and vision tasks. [3]
  • The study also explores different values of β (β1=β2) and finds that increasing values of β lead to a worsening of optimal performance when using constant learning rate. [3]
  • Bias correction: A technique used in Adam optimization to adjust the bias terms in the gradient computation. [3]
  • Adam optimization: An adaptive stochastic gradient descent algorithm for training deep neural networks. [3]
  • β (β1=β2): The value that controls the decay of the bias terms in Adam optimization. [3]
  • Removing bias correction has no significant impact on performance in both language and vision tasks. [3]
  • Increasing values of β lead to a worsening of optimal performance when using constant learning rate. [3]
  • Limited scope: The study only considers a few specific models and datasets. [3]
  • Warm-up cosine scheduling can mitigate the negative effects of bias correction. [2]
Data Ethics
Rate paper: 👍 👎 ♥ Save
Abstract
We use electronic communication networks for more than simply traditional telecommunications: we access the news, buy goods online, file our taxes, contribute to public debate, and more. As a result, a wider array of privacy interests is implicated for users of electronic communications networks and services. This development calls into question the scope of electronic communications privacy rules. This paper analyses the scope of these rules, taking into account the rationale and the historic background of the European electronic communications privacy framework. We develop a framework for analysing the scope of electronic communications privacy rules using three approaches: (i) a service-centric approach, (ii) a data-centric approach, and (iii) a value-centric approach. We discuss the strengths and weaknesses of each approach. The current e-Privacy Directive contains a complex blend of the three approaches, which does not seem to be based on a thorough analysis of their strengths and weaknesses. The upcoming review of the directive announced by the European Commission provides an opportunity to improve the scoping of the rules.