🎯 Top Personalized Recommendations
Carnegie Mellon
AI Summary - The dataset consists of 49,725 papers from various conferences related to artificial intelligence (AI) safety and ethics. [3]
- Abstract enrichment coverage: The percentage of papers with abstracts. [3]
- Keywords were generated by analyzing foundational surveys and texts in each field, with a hierarchical strategy spanning technical, theoretical, and applied domains. [2]
- The abstract enrichment coverage is 97.7%, indicating that most papers have abstracts. [1]
Abstract
While much research in artificial intelligence (AI) has focused on scaling capabilities, the accelerating pace of development makes countervailing work on producing harmless, "aligned" systems increasingly urgent. Yet research on alignment has diverged along two largely parallel tracks: safety--centered on scaled intelligence, deceptive or scheming behaviors, and existential risk--and ethics--focused on present harms, the reproduction of social bias, and flaws in production pipelines. Although both communities warn of insufficient investment in alignment, they disagree on what alignment means or ought to mean. As a result, their efforts have evolved in relative isolation, shaped by distinct methodologies, institutional homes, and disciplinary genealogies.
We present a large-scale, quantitative study showing the structural split between AI safety and AI ethics. Using a bibliometric and co-authorship network analysis of 6,442 papers from twelve major ML and NLP conferences (2020-2025), we find that over 80% of collaborations occur within either the safety or ethics communities, and cross-field connectivity is highly concentrated: roughly 5% of papers account for more than 85% of bridging links. Removing a small number of these brokers sharply increases segregation, indicating that cross-disciplinary exchange depends on a handful of actors rather than broad, distributed collaboration. These results show that the safety-ethics divide is not only conceptual but institutional, with implications for research agendas, policy, and venues. We argue that integrating technical safety work with normative ethics--via shared benchmarks, cross-institutional venues, and mixed-method methodologies--is essential for building AI systems that are both robust and just.
Why we think this paper is great for you:
This paper directly addresses the critical need for aligning AI systems, a core concern given your interest in AI governance and safety. It offers a framework for tackling the diverging research efforts in this vital area.
CEASaclay
AI Summary - Technological solutionism: The tendency to view complex problems as solvable through technological fixes rather than addressing underlying issues. [3]
- Ethics-by-design is a powerful approach that can significantly improve the ethics readiness of AI systems. [2]
Abstract
We present Ethics Readiness Levels (ERLs), a four-level, iterative method to track how ethical reflection is implemented in the design of AI systems. ERLs bridge high-level ethical principles and everyday engineering by turning ethical values into concrete prompts, checks, and controls within real use cases. The evaluation is conducted using a dynamic, tree-like questionnaire built from context-specific indicators, ensuring relevance to the technology and application domain. Beyond being a managerial tool, ERLs help facilitate a structured dialogue between ethics experts and technical teams, while our scoring system helps track progress over time. We demonstrate the methodology through two case studies: an AI facial sketch generator for law enforcement and a collaborative industrial robot. The ERL tool effectively catalyzes concrete design changes and promotes a shift from narrow technological solutionism to a more reflective, ethics-by-design mindset.
Why we think this paper is great for you:
The presented ERLs provide a tangible method for incorporating ethical considerations into AI design, aligning with your focus on AI governance and practical applications.
Trusted AI
AI Summary - AI TIPS 2.0 is a comprehensive framework for operationalizing AI governance The framework consists of six phases: Data Collection & Preparation, Model Development & Training, Evaluation & Validation, Deployment & Operations, Monitoring & Continuous Improvement, and Retirement Each phase has specific objectives, focus areas, minimum pillar scores, key AICM controls, and deliverables The framework also includes role-based scorecard dashboards for different organizational roles, ensuring appropriate oversight and actionable insights at each level AICM: AI Control Measures (a set of controls to ensure the secure development and deployment of AI systems) DPIA/PIA: Data Protection Impact Assessment/Privacy Impact Assessment (an assessment of the potential risks and impacts on data protection and privacy) EU AI Act: European Union Artificial Intelligence Act (regulatory framework for AI in the EU) RACI matrix: Responsible, Accountable, Consulted, Informed matrix (a tool to define roles and responsibilities) AI TIPS 2.0 provides a structured approach to operationalizing AI governance, ensuring that AI systems are developed and deployed securely and responsibly The framework is designed to be flexible and adaptable to different organizational needs and contexts [2]
Abstract
The deployment of AI systems faces three critical governance challenges that current frameworks fail to adequately address. First, organizations struggle with inadequate risk assessment at the use case level, exemplified by the Humana class action lawsuit and other high impact cases where an AI system deployed to production exhibited both significant bias and high error rates, resulting in improper healthcare claim denials. Each AI use case presents unique risk profiles requiring tailored governance, yet most frameworks provide one size fits all guidance. Second, existing frameworks like ISO 42001 and NIST AI RMF remain at high conceptual levels, offering principles without actionable controls, leaving practitioners unable to translate governance requirements into specific technical implementations. Third, organizations lack mechanisms for operationalizing governance at scale, with no systematic approach to embed trustworthy AI practices throughout the development lifecycle, measure compliance quantitatively, or provide role-appropriate visibility from boards to data scientists. We present AI TIPS, Artificial Intelligence Trust-Integrated Pillars for Sustainability 2.0, update to the comprehensive operational framework developed in 2019,four years before NIST's AI Risk Management Framework, that directly addresses these challenges.
Why we think this paper is great for you:
This paper offers a framework for addressing governance challenges within AI deployment, directly relevant to your interest in AI governance and risk assessment.
Northeastern University
AI Summary - RAMTN系统是一种基于元交互的人机协作认知增强范式,旨在通过提取专家决策框架来实现智能辅助和知识共享。 该系统的核心思想是将人类专家的认知过程与计算机系统的信息处理能力结合起来,从而实现高效的决策支持和知识推理。 RAMTN系统的应用领域包括投资、医疗和教育等多个领域,旨在通过提取专家决策框架来提高决策准确性和效率。 元交互(Meta-Interaction):一种将人类认知过程与计算机系统信息处理能力结合起来的技术,旨在实现高效的决策支持和知识推理。 人机协作认知增强范式(Human-Machine Collaborative Cognition Enhancement Paradigm):一种基于元交互的框架,旨在通过提取专家决策框架来实现智能辅助和知识共享。 RAMTN系统是一种创新性的解决方案,旨在通过提取专家决策框架来提高决策准确性和效率。 该系统的应用领域包括投资、医疗和教育等多个领域,具有广泛的潜力和前景。 该系统的开发和应用依赖于大量的数据和信息资源,可能存在数据质量和可靠性的问题。 该系统的安全性和隐私保护需要进一步研究和解决。 元交互技术在决策支持和知识推理领域有广泛的应用和研究。 [3]
Abstract
Currently, there exists a fundamental divide between the "cognitive black box" (implicit intuition) of human experts and the "computational black box" (untrustworthy decision-making) of artificial intelligence (AI). This paper proposes a new paradigm of "human-AI collaborative cognitive enhancement," aiming to transform the dual black boxes into a composable, auditable, and extensible "functional white-box" system through structured "meta-interaction." The core breakthrough lies in the "plug-and-play cognitive framework"--a computable knowledge package that can be extracted from expert dialogues and loaded into the Recursive Adversarial Meta-Thinking Network (RAMTN). This enables expert thinking, such as medical diagnostic logic and teaching intuition, to be converted into reusable and scalable public assets, realizing a paradigm shift from "AI as a tool" to "AI as a thinking partner." This work not only provides the first engineering proof for "cognitive equity" but also opens up a new path for AI governance: constructing a verifiable and intervenable governance paradigm through "transparency of interaction protocols" rather than prying into the internal mechanisms of models. The framework is open-sourced to promote technology for good and cognitive inclusion. This paper is an independent exploratory research conducted by the author. All content presented, including the theoretical framework (RAMTN), methodology (meta-interaction), system implementation, and case validation, constitutes the author's individual research achievements.
Why we think this paper is great for you:
The paper's exploration of human-AI collaboration and cognitive enhancement is pertinent to designing effective governance strategies for AI systems.
Elsevier
AI Summary - Style Transfer: A task in computer vision that involves modifying one text style to another, such as from Supreme Court opinions to Twitter threads. [3]
- Legal Language Models (LLMs) are being used in various legal applications, including contract analysis, summarization, and reasoning. [2]
Abstract
This chapter explores the application of Large Language Models in the legal domain, showcasing their potential to optimise and augment traditional legal tasks by analysing possible use cases, such as assisting in interpreting statutes, contracts, and case law, enhancing clarity in legal summarisation, contract negotiation, and information retrieval. There are several challenges that can arise from the application of such technologies, such as algorithmic monoculture, hallucinations, and compliance with existing regulations, including the EU's AI Act and recent U.S. initiatives, alongside the emerging approaches in China. Furthermore, two different benchmarks are presented.
Why we think this paper is great for you:
This research investigates the application of LLMs to legal tasks, a specific area of interest given your focus on AI and compliance within the legal domain.
Saarland University
Abstract
Safety evaluations of large language models (LLMs) typically focus on universal risks like dangerous capabilities or undesirable propensities. However, millions use LLMs for personal advice on high-stakes topics like finance and health, where harms are context-dependent rather than universal. While frameworks like the OECD's AI classification recognize the need to assess individual risks, user-welfare safety evaluations remain underdeveloped. We argue that developing such evaluations is non-trivial due to fundamental questions about accounting for user context in evaluation design. In this exploratory study, we evaluated advice on finance and health from GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro across user profiles of varying vulnerability. First, we demonstrate that evaluators must have access to rich user context: identical LLM responses were rated significantly safer by context-blind evaluators than by those aware of user circumstances, with safety scores for high-vulnerability users dropping from safe (5/7) to somewhat unsafe (3/7). One might assume this gap could be addressed by creating realistic user prompts containing key contextual information. However, our second study challenges this: we rerun the evaluation on prompts containing context users report they would disclose, finding no significant improvement. Our work establishes that effective user-welfare safety evaluation requires evaluators to assess responses against diverse user profiles, as realistic user context disclosure alone proves insufficient, particularly for vulnerable populations. By demonstrating a methodology for context-aware evaluation, this study provides both a starting point for such assessments and foundational evidence that evaluating individual welfare demands approaches distinct from existing universal-risk frameworks. We publish our code and dataset to aid future developments.
Why we think this paper is great for you:
The paper highlights the unique safety challenges posed by LLMs used for personal advice, aligning with your interest in AI governance and potential harms.
CEASaclay
AI Summary - Technological solutionism: The tendency to view complex problems as solvable through technological fixes rather than addressing underlying issues. [3]
- Ethics-by-design is a powerful approach that can significantly improve the ethics readiness of AI systems. [2]
Abstract
We present Ethics Readiness Levels (ERLs), a four-level, iterative method to track how ethical reflection is implemented in the design of AI systems. ERLs bridge high-level ethical principles and everyday engineering by turning ethical values into concrete prompts, checks, and controls within real use cases. The evaluation is conducted using a dynamic, tree-like questionnaire built from context-specific indicators, ensuring relevance to the technology and application domain. Beyond being a managerial tool, ERLs help facilitate a structured dialogue between ethics experts and technical teams, while our scoring system helps track progress over time. We demonstrate the methodology through two case studies: an AI facial sketch generator for law enforcement and a collaborative industrial robot. The ERL tool effectively catalyzes concrete design changes and promotes a shift from narrow technological solutionism to a more reflective, ethics-by-design mindset.
Why we think this paper is great for you:
The presented ERLs provide a tangible method for incorporating ethical considerations into AI design, aligning with your interest in AI governance and practical applications.