Responsible AI: The Good, The Bad, The AI

University of Tartu

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

It emphasizes the need for a more comprehensive understanding of the complex relationships between technological, social, and economic factors in AI development and deployment. (ML: 0.99)👍👎
It cites studies on AI bias, transparency, accountability, and the need for responsible AI development and deployment. (ML: 0.99)👍👎
The study acknowledges that it has limitations due to the complexity of the topic and the need for further research. (ML: 0.98)👍👎
It also recognizes that the development of AI governance frameworks is a dynamic process that requires ongoing evaluation and refinement. (ML: 0.98)👍👎
The study highlights the importance of considering paradoxes when developing AI governance frameworks. (ML: 0.97)👍👎
The paper explores these kinds of complexities in AI governance and why they need to be considered when developing frameworks for responsible AI development and deployment. (ML: 0.97)👍👎
The paper explores the concept of paradox in the context of artificial intelligence (AI) and its governance, highlighting the importance of considering these complexities when developing AI governance frameworks. (ML: 0.97)👍👎
The paper discusses the concept of paradox in management and organization theories, specifically focusing on artificial intelligence (AI) and its governance. (ML: 0.96)👍👎
The paper draws on existing literature in management, organization theory, and AI ethics to inform its discussion of paradoxes in AI governance. (ML: 0.95)👍👎
Imagine you're trying to create a system that can make decisions without being biased, but at the same time, you want it to be transparent so people understand how those decisions are made. (ML: 0.93)👍👎
That's a paradox! (ML: 0.91)👍👎
Paradox: a situation or condition that is contradictory or opposite to what would be expected. (ML: 0.86)👍👎

Abstract
The rapid proliferation of artificial intelligence across organizational contexts has generated profound strategic opportunities while introducing significant ethical and operational risks. Despite growing scholarly attention to responsible AI, extant literature remains fragmented and is often adopting either an optimistic stance emphasizing value creation or an excessively cautious perspective fixated on potential harms. This paper addresses this gap by presenting a comprehensive examination of AI's dual nature through the lens of strategic information systems. Drawing upon a systematic synthesis of the responsible AI literature and grounded in paradox theory, we develop the Paradox-based Responsible AI Governance (PRAIG) framework that articulates: (1) the strategic benefits of AI adoption, (2) the inherent risks and unintended consequences, and (3) governance mechanisms that enable organizations to navigate these tensions. Our framework advances theoretical understanding by conceptualizing responsible AI governance as the dynamic management of paradoxical tensions between value creation and risk mitigation. We provide formal propositions demonstrating that trade-off approaches amplify rather than resolve these tensions, and we develop a taxonomy of paradox management strategies with specified contingency conditions. For practitioners, we offer actionable guidance for developing governance structures that neither stifle innovation nor expose organizations to unacceptable risks. The paper concludes with a research agenda for advancing responsible AI governance scholarship.

Why we are recommending this paper?
Due to your Interest in AI Agents

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios

xbenchai

Rate paper: 👍 👎 ♥ Save

AI Insights

The results highlight the need for further research in instruction-following tasks and the development of more effective language models. (ML: 0.98)👍👎
The authors acknowledge that the proposed benchmark may not capture all aspects of real-world instruction-following tasks. (ML: 0.98)👍👎
The synthetic data generation approach relies on human-made questions and may not accurately reflect real-world scenarios. (ML: 0.96)👍👎
The paper proposes a new benchmark for evaluating the ability of language models to follow complex instructions and perform tasks that require multiple steps. (ML: 0.96)👍👎
The proposed benchmark provides a more comprehensive evaluation of language models' ability to follow instructions and perform tasks. (ML: 0.96)👍👎
Synthetic data generation: The process of creating artificial data that mimics real-world scenarios, used to train and evaluate language models. (ML: 0.96)👍👎
The results show that state-of-the-art language models struggle to perform well on this benchmark, highlighting the need for more research in this area. (ML: 0.95)👍👎
The authors introduce a novel approach to generating synthetic data for instruction-following tasks, which allows them to create a large-scale dataset with diverse and realistic scenarios. (ML: 0.94)👍👎
Instruction-following task: A task that requires a model to follow a set of instructions to complete a specific goal or achieve a certain outcome. (ML: 0.93)👍👎
The synthetic data generation approach allows for the creation of diverse and realistic scenarios, making it easier to train and evaluate language models. (ML: 0.93)👍👎

Abstract
The capacity of AI agents to effectively handle tasks of increasing duration and complexity continues to grow, demonstrating exceptional performance in coding, deep research, and complex problem-solving evaluations. However, in daily scenarios, the perception of these advanced AI capabilities among general users remains limited. We argue that current evaluations prioritize increasing task difficulty without sufficiently addressing the diversity of agentic tasks necessary to cover the daily work, life, and learning activities of a broad demographic. To address this, we propose AgentIF-OneDay, aimed at determining whether general users can utilize natural language instructions and AI agents to complete a diverse array of daily tasks. These tasks require not only solving problems through dialogue but also understanding various attachment types and delivering tangible file-based results. The benchmark is structured around three user-centric categories: Open Workflow Execution, which assesses adherence to explicit and complex workflows; Latent Instruction, which requires agents to infer implicit instructions from attachments; and Iterative Refinement, which involves modifying or expanding upon ongoing work. We employ instance-level rubrics and a refined evaluation pipeline that aligns LLM-based verification with human judgment, achieving an 80.1% agreement rate using Gemini-3-Pro. AgentIF-OneDay comprises 104 tasks covering 767 scoring points. We benchmarked four leading general AI agents and found that agent products built based on APIs and ChatGPT agents based on agent RL remain in the first tier simultaneously. Leading LLM APIs and open-source models have internalized agentic capabilities, enabling AI application teams to develop cutting-edge Agent products.

Why we are recommending this paper?
Due to your Interest in AI Agents

Organizational Practices and Socio-Technical Design of Human-Centered AI

University of Bochum

Rate paper: 👍 👎 ♥ Save

AI Insights

The concept of human-centered AI may be challenging to implement in organizations with rigid structures or cultures. (ML: 0.98)👍👎
Collaboration between experts, 7. (ML: 0.98)👍👎
Human-centered AI may not be suitable for all types of tasks or industries, requiring careful consideration and evaluation. (ML: 0.98)👍👎
Continuous learning and improvement, 2. (ML: 0.98)👍👎
Monitoring and evaluation, and 10. (ML: 0.97)👍👎
Clear roles and responsibilities, 5. (ML: 0.97)👍👎
Effective communication, 6. (ML: 0.96)👍👎
The concept of keeping the organization in the loop is crucial for the successful implementation of human-centered AI. (ML: 0.96)👍👎
Interacting organizational practices require significant resources and effort to establish and maintain. (ML: 0.96)👍👎
Ten types of interacting organizational practices are identified as essential to accompany human-centered AI: 1. (ML: 0.96)👍👎
Case B substantiates this concept by highlighting the collaboration between technical and analytical experts, anchored in systematic communication structures. (ML: 0.96)👍👎
Interacting organizational practices are essential to accompany human-centered AI and ensure its effectiveness and adaptability. (ML: 0.95)👍👎
Adaptation processes, 8. (ML: 0.95)👍👎
Human-centered AI: An approach to keeping the human in the loop by emphasizing the importance of interacting organizational practices. (ML: 0.95)👍👎
The concept of keeping the organization in the loop is developed based on case A, which emphasizes the importance of human-centered AI and the need for interacting organizational practices. (ML: 0.95)👍👎
Interacting organizational practices: The essential types of practices that need to accompany human-centered AI, including continuous learning and improvement, feedback mechanisms, regular meetings and updates, clear roles and responsibilities, effective communication, collaboration between experts, adaptation processes, documentation and knowledge management, monitoring and evaluation, and continuous refinement of the AI system. (ML: 0.94)👍👎
Feedback mechanisms, 3. (ML: 0.93)👍👎
Continuous refinement of the AI system. (ML: 0.91)👍👎
Documentation and knowledge management, 9. (ML: 0.89)👍👎
Regular meetings and updates, 4. (ML: 0.89)👍👎

Abstract
This contribution explores how the integration of Artificial Intelligence (AI) into organizational practices can be effectively framed through a socio-technical perspective to comply with the requirements of Human-centered AI (HCAI). Instead of viewing AI merely as a technical tool, the analysis emphasizes the importance of embedding AI into communication, collaboration, and decision-making processes within organizations from a human-centered perspective. Ten case-based patterns illustrate how AI support of predictive maintenance can be organized to address quality assurance and continuous improvement and to provide different types of sup-port for HCAI. The analysis shows that AI adoption often requires and enables new forms of organizational learning, where specialists jointly interpret AI output, adapt workflows, and refine rules for system improve-ment. Different dimensions and levels of socio-technical integration of AI are considered to reflect the effort and benefits of keeping the organization in the loop.

Why we are recommending this paper?
Due to your Interest in AI and Society

MARS: Modular Agent with Reflective Search for Automated AI Research

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Automating AI research differs from general software engineering due to computationally expensive evaluation (e.g., model training) and opaque performance attribution. Current LLM-based agents struggle here, often generating monolithic scripts that ignore execution costs and causal factors. We introduce MARS (Modular Agent with Reflective Search), a framework optimized for autonomous AI research. MARS relies on three pillars: (1) Budget-Aware Planning via cost-constrained Monte Carlo Tree Search (MCTS) to explicitly balance performance with execution expense; (2) Modular Construction, employing a "Design-Decompose-Implement" pipeline to manage complex research repositories; and (3) Comparative Reflective Memory, which addresses credit assignment by analyzing solution differences to distill high-signal insights. MARS achieves state-of-the-art performance among open-source frameworks on MLE-Bench under comparable settings, maintaining competitiveness with the global leaderboard's top methods. Furthermore, the system exhibits qualitative "Aha!" moments, where 63% of all utilized lessons originate from cross-branch transfer, demonstrating that the agent effectively generalizes insights across search paths.

Why we are recommending this paper?
Due to your Interest in Research Automation with AI

Unmediated AI-Assisted Scholarly Citations

Rate paper: 👍 👎 ♥ Save

Abstract
Traditional bibliography databases require users to navigate search forms and manually copy citation data. Language models offer an alternative: a natural-language interface where researchers write text with informal citation fragments, which are automatically resolved to proper references. However, language models are not reliable for scholarly work as they generate fabricated (hallucinated) citations at substantial rates. We present an architectural approach that combines the natural-language interface of LLM chatbots with the accuracy of direct database access, implemented through the Model Context Protocol. Our system enables language models to search bibliographic databases, perform fuzzy matching, and export verified entries, all through conversational interaction. A key architectural principle bypasses the language model during final data export: entries are fetched directly from authoritative sources, with timeout protection, to guarantee accuracy. We demonstrate this approach with MCP-DBLP, a server providing access to the DBLP computer science bibliography. The system transforms form-based bibliographic services into conversational assistants that maintain scholarly integrity. This architecture is adaptable to other bibliographic databases and academic data sources.

Why we are recommending this paper?
Due to your Interest in Research Automation with AI

Deep Neural Networks as Iterated Function Systems and a Generalization Bound

Universit Paris Cit , CNRS

Rate paper: 👍 👎 ♥ Save

AI Insights

The authors show that any feedforward network with ReLU activations can be viewed as a place-independent IFS, and they extend this result to other types of neural networks, including residual blocks and MoE models. (ML: 0.92)👍👎
The paper discusses the interpretation of deep neural networks as iterated function systems (IFSs) and provides a general framework for analyzing their convergence properties. (ML: 0.89)👍👎
The paper provides several examples of neural network architectures that can be interpreted as IFSs, including ResNet with Softplus activation, Transformer block, and MoE model. (ML: 0.88)👍👎
The authors use the Hutchinson operator to analyze the convergence properties of IFSs and show that they can be used to bound the Wasserstein distance between the output of a neural network and its fixed point. (ML: 0.87)👍👎
Definition 1: A Markov recursion is a sequence of random variables {Xn} defined by X0 = x and Xt+1 = w(Xt, Θ), where w is a function that depends on the current state Xt and the parameter Θ. (ML: 0.82)👍👎
Definition 3: A place-dependent IFS (P-IFS) is an IFS {wξ} where each wξ depends on the current state x and the parameter Θ. (ML: 0.80)👍👎
Definition 2: An iterated function system (IFS) is a collection of functions {wξ} indexed by ξ ∈ I, where each wξ is a Lipschitz map from X to itself. (ML: 0.80)👍👎
They also introduce the concept of strong average Lipschitz contractivity for place-dependent IFSs and provide conditions under which it holds. (ML: 0.75)👍👎
Definition 5: A P-IFS {wξ} is strongly average-contractive if sup_x∈X ∑_{ξ∈I} pξ(x)cξ ≤ c < 1. (ML: 0.67)👍👎
Definition 4: The Hutchinson operator T is a contraction on the space of probability measures PP(X) with respect to the Wasserstein distance W2 if there exists a constant c < 1 such that W2(T(µ), T(ν)) ≤ cW2(µ, ν) for all µ, ν ∈ PP(X). (ML: 0.67)👍👎
The Hutchinson operator T is defined as T(µ) = ∑_{ξ∈I} pwξ#µq. (ML: 0.49)👍👎

Abstract
Deep neural networks (DNNs) achieve remarkable performance on a wide range of tasks, yet their mathematical analysis remains fragmented: stability and generalization are typically studied in disparate frameworks and on a case-by-case basis. Architecturally, DNNs rely on the recursive application of parametrized functions, a mechanism that can be unstable and difficult to train, making stability a primary concern. Even when training succeeds, there are few rigorous results on how well such models generalize beyond the observed data, especially in the generative setting. In this work, we leverage the theory of stochastic Iterated Function Systems (IFS) and show that two important deep architectures can be viewed as, or canonically associated with, place-dependent IFS. This connection allows us to import results from random dynamical systems to (i) establish the existence and uniqueness of invariant measures under suitable contractivity assumptions, and (ii) derive a Wasserstein generalization bound for generative modeling. The bound naturally leads to a new training objective that directly controls the collage-type approximation error between the data distribution and its image under the learned transfer operator. We illustrate the theory on a controlled 2D example and empirically evaluate the proposed objective on standard image datasets (MNIST, CelebA, CIFAR-10).

Why we are recommending this paper?
Due to your Interest in Deep Learning

TinyTorch: Building Machine Learning Systems from First Principles

Harvard University

Rate paper: 👍 👎 ♥ Save

AI Insights

Milestones serve dual pedagogical and validation purposes, providing motivation through historical framing and demonstrating implementation correctness through real-world task performance. (ML: 0.98)👍👎
Each module concludes with systems reasoning prompts measuring conceptual understanding beyond syntactic correctness. (ML: 0.97)👍👎
Milestones are designed to be challenging but achievable, allowing students to demonstrate their understanding of complex concepts through real-world tasks. (ML: 0.96)👍👎
Assessment validates both isolated correctness and cross-module integration. (ML: 0.96)👍👎
The TinyTorch framework is designed for teaching machine learning concepts through hands-on implementation and analysis. (ML: 0.95)👍👎
Reflect: Systems Analysis Questions. (ML: 0.94)👍👎
TinyTorch follows a consistent Build-Use-Reflect cycle, integrating implementation, application, and systems reasoning to address multiple learning objectives. (ML: 0.94)👍👎
It's a pedagogical tool aimed at bridging the gap between theoretical understanding and practical application. (ML: 0.94)👍👎
Students implement components in Jupyter notebooks with scaffolded guidance. (ML: 0.91)👍👎
TinyTorch's design emphasizes systems thinking, encouraging students to analyze and understand the relationships between components, rather than just focusing on individual functions. (ML: 0.87)👍👎
The framework includes six historical milestones that recreate actual breakthroughs using exclusively student code, validating success through task-appropriate performance. (ML: 0.85)👍👎
The framework is built with a focus on explicit dependencies, making it easier for students to understand where each module fits in the larger architecture. (ML: 0.83)👍👎
Use: Integration Testing Beyond Unit Tests. (ML: 0.77)👍👎
Build: Implementation with Explicit Dependencies. (ML: 0.66)👍👎

Abstract
Machine learning education faces a fundamental gap: students learn algorithms without understanding the systems that execute them. They study gradient descent without measuring memory, attention mechanisms without analyzing O(N^2) scaling, optimizer theory without knowing why Adam requires 3x the memory of SGD. This "algorithm-systems divide" produces practitioners who can train models but cannot debug memory failures, optimize inference latency, or reason about deployment trade-offs--the very skills industry demands as "ML systems engineering." We present TinyTorch, a 20-module curriculum that closes this gap through "implementation-based systems pedagogy": students construct PyTorch's core components (tensors, autograd, optimizers, CNNs, transformers) in pure Python, building a complete framework where every operation they invoke is code they wrote. The design employs three patterns: "progressive disclosure" of complexity, "systems-first integration" of profiling from the first module, and "build-to-validate milestones" recreating 67 years of ML breakthroughs--from Perceptron (1958) through Transformers (2017) to MLPerf-style benchmarking. Requiring only 4GB RAM and no GPU, TinyTorch demonstrates that deep ML systems understanding is achievable without specialized hardware. The curriculum is available open-source at mlsysbook.ai/tinytorch.

Why we are recommending this paper?
Due to your Interest in Deep Learning

Artifact-Aware Evaluation for High-Quality Video Generation

Southeast University

Rate paper: 👍 👎 ♥ Save

AI Insights

Artifact-aware evaluation framework: A comprehensive framework for evaluating video generation models based on their ability to detect and correct artifacts. (ML: 0.94)👍👎
The proposed framework provides a comprehensive evaluation of video generation models, enabling researchers to identify areas for improvement. (ML: 0.94)👍👎
Video quality assessment methods: Techniques for evaluating the quality of generated videos based on various metrics, such as peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). (ML: 0.93)👍👎
The novel DV AR (Dense Video Artifact Recognition) framework is proposed, which leverages the strengths of both text-to-video diffusion models and video quality assessment methods. (ML: 0.91)👍👎
The paper proposes a comprehensive artifact-aware evaluation framework for video generation, focusing on fine-grained artifact detection across Appearance, Motion, and Camera dimensions. (ML: 0.91)👍👎
Experimental results demonstrate the accuracy of the approach in identifying artifacts in generated videos, with a significant improvement over state-of-the-art methods. (ML: 0.90)👍👎
Text-to-video diffusion models: A type of deep learning model that generates videos from text descriptions. (ML: 0.88)👍👎
Fine-grained artifact detection: The process of detecting specific types of artifacts in videos, such as noise, blurriness, or color distortion. (ML: 0.87)👍👎
The large-scale GenVID dataset is introduced, which includes a wide range of videos with diverse content, styles, and quality levels. (ML: 0.87)👍👎
The FMG-DFS strategy enhances temporal localization, enabling more precise and efficient artifact detection. (ML: 0.68)👍👎

Abstract
With the rapid advancement of video generation techniques, evaluating and auditing generated videos has become increasingly crucial. Existing approaches typically offer coarse video quality scores, lacking detailed localization and categorization of specific artifacts. In this work, we introduce a comprehensive evaluation protocol focusing on three key aspects affecting human perception: Appearance, Motion, and Camera. We define these axes through a taxonomy of 10 prevalent artifact categories reflecting common generative failures observed in video generation. To enable robust artifact detection and categorization, we introduce GenVID, a large-scale dataset of 80k videos generated by various state-of-the-art video generation models, each carefully annotated for the defined artifact categories. Leveraging GenVID, we develop DVAR, a Dense Video Artifact Recognition framework for fine-grained identification and classification of generative artifacts. Extensive experiments show that our approach significantly improves artifact detection accuracy and enables effective filtering of low-quality content.

Why we are recommending this paper?
Due to your Interest in Image and Video Generation

Creative Image Generation with Diffusion Model

Rutgers University

Rate paper: 👍 👎 ♥ Save

AI Insights

Generative models: A class of machine learning algorithms that can generate new data samples based on a given dataset. (ML: 0.97)👍👎
The paper assumes that the input text is well-formed and does not contain any errors or ambiguities. (ML: 0.97)👍👎
Diffusion prior constraints: A type of regularization technique used in generative models that encourages the model to produce samples that are similar to the data distribution. (ML: 0.92)👍👎
The paper discusses the concept of ConceptLab, a creative generation system that uses diffusion prior constraints to generate novel images. (ML: 0.90)👍👎
ConceptLab is based on a combination of generative models and text-image harmony, which allows it to produce high-quality images with specific styles and concepts. (ML: 0.88)👍👎
ConceptLab is based on a combination of generative models and text-image harmony, which allows it to produce high-quality images with specific styles and concepts. (ML: 0.88)👍👎
The paper presents ConceptLab, a creative generation system that uses diffusion prior constraints to generate novel images. (ML: 0.86)👍👎
The authors propose a new method for training generative models using diffusion prior constraints, which enables the model to learn from the data distribution and generate novel samples. (ML: 0.84)👍👎
The authors propose a new method for training generative models using diffusion prior constraints, which enables the model to learn from the data distribution and generate novel samples. (ML: 0.84)👍👎
Text-image harmony: A method for combining text and image features to generate images with specific styles and concepts. (ML: 0.76)👍👎

Abstract
Creative image generation has emerged as a compelling area of research, driven by the need to produce novel and high-quality images that expand the boundaries of imagination. In this work, we propose a novel framework for creative generation using diffusion models, where creativity is associated with the inverse probability of an image's existence in the CLIP embedding space. Unlike prior approaches that rely on a manual blending of concepts or exclusion of subcategories, our method calculates the probability distribution of generated images and drives it towards low-probability regions to produce rare, imaginative, and visually captivating outputs. We also introduce pullback mechanisms, achieving high creativity without sacrificing visual fidelity. Extensive experiments on text-to-image diffusion models demonstrate the effectiveness and efficiency of our creative generation framework, showcasing its ability to produce unique, novel, and thought-provoking images. This work provides a new perspective on creativity in generative models, offering a principled method to foster innovation in visual content synthesis.

Why we are recommending this paper?
Due to your Interest in Image and Video Generation

Interests not found

Help us improve your experience!