Explicating Tacit Regulatory Knowledge from LLMs to Auto-Formalize Requirements for Compliance Test Case Generation

East China Normal University ECNU

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

The prompt design theories and techniques used for constructing financial meta-models from regulatory texts and test cases have been detailed in Section 3.2. [2]

Abstract
Compliance testing in highly regulated domains is crucial but largely manual, requiring domain experts to translate complex regulations into executable test cases. While large language models (LLMs) show promise for automation, their susceptibility to hallucinations limits reliable application. Existing hybrid approaches mitigate this issue by constraining LLMs with formal models, but still rely on costly manual modeling. To solve this problem, this paper proposes RAFT, a framework for requirements auto-formalization and compliance test generation via explicating tacit regulatory knowledge from multiple LLMs. RAFT employs an Adaptive Purification-Aggregation strategy to explicate tacit regulatory knowledge from multiple LLMs and integrate it into three artifacts: a domain meta-model, a formal requirements representation, and testability constraints. These artifacts are then dynamically injected into prompts to guide high-precision requirement formalization and automated test generation. Experiments across financial, automotive, and power domains show that RAFT achieves expert-level performance, substantially outperforms state-of-the-art (SOTA) methods while reducing overall generation and review time.

Why we are recommending this paper?
Due to your Interest in LLMs for Compliance

This paper directly addresses the use of LLMs in compliance testing, aligning with your interest in LLMs for compliance. The focus on auto-formalizing requirements for test cases is a crucial step towards reliable automated compliance processes.

Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis

Shandong University

Rate paper: 👍 👎 ♥ Save

AI Insights

Developing benchmarks and datasets that accurately reflect the complexities of human interaction with technology is essential for advancing this field. [3]
The paper discusses the evaluation of large language models (LLMs) as agents, focusing on their ability to interact with real-world APIs and follow rules. [2]
LLMs: Large Language Models API: Application Programming Interface HIPAA: Health Insurance Portability and Accountability Act GDPR: General Data Protection Regulation The evaluation of LLMs as agents is a crucial task for ensuring their safe and responsible use in real-world applications. [1]

Abstract
The integration of large language models (LLMs) into autonomous agents has enabled complex tool use, yet in high-stakes domains, these systems must strictly adhere to regulatory standards beyond simple functional correctness. However, existing benchmarks often overlook implicit regulatory compliance, thus failing to evaluate whether LLMs can autonomously enforce mandatory safety constraints. To fill this gap, we introduce LogiSafetyGen, a framework that converts unstructured regulations into Linear Temporal Logic oracles and employs logic-guided fuzzing to synthesize valid, safety-critical traces. Building on this framework, we construct LogiSafetyBench, a benchmark comprising 240 human-verified tasks that require LLMs to generate Python programs that satisfy both functional objectives and latent compliance rules. Evaluations of 13 state-of-the-art (SOTA) LLMs reveal that larger models, despite achieving better functional correctness, frequently prioritize task completion over safety, which results in non-compliant behavior.

Why we are recommending this paper?
Due to your Interest in LLMs for Compliance

Given your interest in AI governance, this research investigates the critical issue of implicit regulatory compliance within LLM-powered tools. The logic-guided synthesis approach offers a valuable framework for ensuring adherence to standards.

AI as Entertainment

The Alan Turing Institute

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

The authors contend that entertainment is a significant use case for AI, with people already using AI for activities unrelated to productivity. [3]
The paper suggests that this vision should inspire more debates, discourse, and study in the field of AI, as generative AI is increasingly being used for entertainment. [3]
AS: Artificially generated content GenAI: Generative AI Sociotechnical systems: Complex systems that combine social and technical components The paper concludes by emphasizing the need for a constructive vision of cultural AI, rather than just harm minimization. [3]
The paper argues that mainstream approaches to evaluating AI systems tend to focus on intelligence and harm minimization, but neglect the cultural dimension of AI use. [2]
They propose developing a positive theory of what beneficial, nutritious entertainment might look like, rather than just mitigating harms. [0]

Abstract
Generative AI systems are predominantly designed, evaluated, and marketed as intelligent systems which will benefit society by augmenting or automating human cognitive labor, promising to increase personal, corporate, and macroeconomic productivity. But this mainstream narrative about what AI is and what it can do is in tension with another emerging use case: entertainment. We argue that the field of AI is unprepared to measure or respond to how the proliferation of entertaining AI-generated content will impact society. Emerging data suggest AI is already widely adopted for entertainment purposes -- especially by young people -- and represents a large potential source of revenue. We contend that entertainment will become a primary business model for major AI corporations seeking returns on massive infrastructure investments; this will exert a powerful influence on the technology these companies produce in the coming years. Examining current evaluation practices, we identify a critical asymmetry: while AI assessments rigorously measure both benefits and harms of intelligence, they focus almost exclusively on cultural harms. We lack frameworks for articulating how cultural outputs might be actively beneficial. Drawing on insights from the humanities, we propose "thick entertainment" as a framework for evaluating AI-generated cultural content -- one that considers entertainment's role in meaning-making, identity formation, and social connection rather than simply minimizing harm. While AI is often touted for its potential to revolutionize productivity, in the long run we may find that AI turns out to be as much about "intelligence" as social media is about social connection.

Why we are recommending this paper?
Due to your Interest in AI Governance

Considering your focus on AI governance, this paper provides a critical perspective on the broader societal implications of AI, particularly regarding its design and deployment. Understanding the potential risks associated with AI systems is fundamental to responsible development.

Democracy and Distrust in an Era of Artificial Intelligence

New York Law School

Rate paper: 👍 👎 ♥ Save

AI Insights

The concept of artificial intelligence (AI) and its impact on society is a complex issue that requires careful consideration. [3]
The concept of algorithmic impact assessments (AIAs) has been proposed as a way to evaluate the potential effects of AI on society. [3]
Data protection impact assessments (DPIAs) have been implemented in some jurisdictions to evaluate the potential effects of data processing on individuals and society. [3]
The use of AI in decision-making processes can lead to biases and discrimination, which can have serious consequences for individuals and communities. [2]

Abstract
This essay examines how judicial review should adapt to address challenges posed by artificial intelligence decision-making, particularly regarding minority rights and interests. As I argue in this essay, the rise of three trends-privatization, prediction, and automation in AI-have combined to pose similar risks to minorities. Here, I outline what a theory of judicial review would look like in an era of artificial intelligence, analyzing both the limitations and the possibilities of judicial review of AI. I draw on cases in which AI decision-making has been challenged in courts, to show how concepts of due process and equal protection can be recuperated in a modern AI era, and even integrated into AI, to provide for better oversight and accountability, offering a framework for judicial review in the AI era that protects minorities from algorithmic discrimination.

Why we are recommending this paper?
Due to your Interest in AI Governance

This paper delves into the legal and ethical challenges posed by AI decision-making, specifically concerning minority rights and interests. It's a relevant exploration of how legal frameworks need to adapt to the rise of AI.

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

Ben Gurion University of the Negev

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

Benign input: An input to an AI agent that is intended to be processed correctly, without any malicious intent. [3]
AgentGuardian is a novel framework that governs AI agent behavior by combining access control policies on agent tools with execution flow integrity validation. [2]
The framework automatically derives access policies from benign inputs collected during a staging phase, capturing the agent's intended operational patterns. [1]

Abstract
Artificial intelligence (AI) agents are increasingly used in a variety of domains to automate tasks, interact with users, and make decisions based on data inputs. Ensuring that AI agents perform only authorized actions and handle inputs appropriately is essential for maintaining system integrity and preventing misuse. In this study, we introduce the AgentGuardian, a novel security framework that governs and protects AI agent operations by enforcing context-aware access-control policies. During a controlled staging phase, the framework monitors execution traces to learn legitimate agent behaviors and input patterns. From this phase, it derives adaptive policies that regulate tool calls made by the agent, guided by both real-time input context and the control flow dependencies of multi-step agent actions. Evaluation across two real-world AI agent applications demonstrates that AgentGuardian effectively detects malicious or misleading inputs while preserving normal agent functionality. Moreover, its control-flow-based governance mechanism mitigates hallucination-driven errors and other orchestration-level malfunctions.

Why we are recommending this paper?
Due to your Interest in AI for Compliance

This research focuses on access control policies for AI agents, directly addressing the need for secure and governed AI systems. The development of mechanisms to ensure authorized behavior is essential for responsible AI deployment.

Interests not found

Help us improve your experience!