Hi!
Your personalized paper recommendations for 02 to 06 February, 2026.
University of Michigan
AI Insights - The framework has implications for fairness in machine learning, particularly in multi-group settings. (ML: 0.99)ππ
- Relative Improvement: The difference between the baseline risk and the predicted risk for each group. (ML: 0.99)ππ
- The paper presents a framework for fairness in machine learning based on relative improvement. (ML: 0.98)ππ
- It provides a way to quantify and compare fairness across different groups. (ML: 0.98)ππ
- The framework is based on a transformation between risk space and relative improvement space. (ML: 0.98)ππ
- The goal is to find the predictor that maximizes the minimum relative improvement across all groups. (ML: 0.97)ππ
- Group Optimal Points: The points where each group has achieved its maximum relative improvement. (ML: 0.97)ππ
- Disagreement Point: The point where both groups have equal risk. (ML: 0.97)ππ
- Feasible Region: The set of all possible outcomes, including the disagreement point and group optimal points. (ML: 0.95)ππ
- The transformation maps the disagreement point to (0, 0) and group optimal points to (1, Ο(1)_2) and (Ο(2)_1, 1). (ML: 0.89)ππ
- The paper establishes a uniqueness result for the predictor under strict convexity of the loss function. (ML: 0.89)ππ
- The paper also discusses the connection between the KS-solution and the maximin relative improvement solution. (ML: 0.88)ππ
- Pareto Frontier: The set of all points in the feasible region that are not dominated by any other point. (ML: 0.84)ππ
- The paper shows that the KS-solution coincides with the maximin relative improvement solution. (ML: 0.82)ππ
- This result is strictly stronger than uniqueness of the induced risk vector or of the bargaining solution itself. (ML: 0.79)ππ
- The fairness diagonal intersects the Pareto frontier at the KS solution. (ML: 0.79)ππ
- Kalai-Smorodinsky (KS) Solution: A bargaining solution that selects the point on the Pareto frontier where both players achieve equal normalized gains. (ML: 0.76)ππ
- This approach is motivated by the Kalai-Smorodinsky (KS) solution, which selects the point on the Pareto frontier where both players achieve equal normalized gains. (ML: 0.73)ππ
Abstract
When deploying a single predictor across multiple subpopulations, we propose a fundamentally different approach: interpreting group fairness as a bargaining problem among subpopulations. This game-theoretic perspective reveals that existing robust optimization methods such as minimizing worst-group loss or regret correspond to classical bargaining solutions and embody different fairness principles. We propose relative improvement, the ratio of actual risk reduction to potential reduction from a baseline predictor, which recovers the Kalai-Smorodinsky solution. Unlike absolute-scale methods that may not be comparable when groups have different potential predictability, relative improvement provides axiomatic justification including scale invariance and individual monotonicity. We establish finite-sample convergence guarantees under mild conditions.
Why we are recommending this paper?
Due to your Interest in Social Inequality
This paper directly addresses inequality through a game-theoretic lens, exploring fairness in predictive models across subpopulations. Given the userβs interest in social inequality and economic disparity, this approach offers a novel framework for understanding and mitigating biased outcomes.
Maastricht University
AI Insights - The text also discusses the relationship between groundedness and maximization of complete and transitive preference relations. (ML: 0.97)ππ
- Some of the key concepts explored include consistency, monotonicity, and weak axiom of revealed preference (WARP). (ML: 0.97)ππ
- The results have implications for understanding rationalizability and groundedness in choice theory. (ML: 0.95)ππ
- GAIC: Grounded Axiom of Revealed Preference. (ML: 0.92)ππ
- A choice function c is said to satisfy GMAIC if it maximizes a complete and transitive preference relation over non-empty subsets of X. (ML: 0.91)ππ
- Groundedness: A choice function c satisfies groundedness if for all x β X, there exists a set S β X \{x such that I(S) = β
. (ML: 0.89)ππ
- GMAIC: Grounded Maximizing Axiom of Choice. (ML: 0.89)ππ
- The provided text provides a comprehensive proof of various theorems and propositions related to choice theory. (ML: 0.89)ππ
- The proofs cover topics such as injectivity, surjectivity, and double union closure of interpretation functions. (ML: 0.88)ππ
- A choice function c is said to satisfy GAIC if it satisfies groundedness and the corresponding interpretation I satisfies consistency, monotonicity, and WARP. (ML: 0.88)ππ
- The proofs demonstrate the relationship between different axioms and properties of choice functions. (ML: 0.86)ππ
- The provided text appears to be a proof of various theorems and propositions related to choice theory, specifically in the context of rationalizability and groundedness. (ML: 0.86)ππ
Abstract
This paper proposes a model of choice via agentic artificial intelligence (AI). A key feature is that the AI may misinterpret a menu before recommending what to choose. A single acyclicity condition guarantees that there is a monotonic interpretation and a strict preference relation that together rationalize the AI's recommendations. Since this preference is in general not unique, there is no safeguard against it misaligning with that of a decision maker. What enables the verification of such AI alignment is interpretations satisfying double monotonicity. Indeed, double monotonicity ensures full identifiability and internal consistency. But, an additional idempotence property is required to guarantee that recommendations are fully rational and remain grounded within the original feasible set.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
The concept of an AI misinterpreting a menu aligns with concerns about algorithmic bias and potentially flawed decision-making processes, a key aspect of inequality. This research provides a model for understanding how AI systems can exacerbate existing inequalities through flawed recommendations.
University of California, Riverside
AI Insights - A suitably broad understanding of intelligence is useful for recognizing the diverse and impressive capacities of different types of beings. (ML: 0.98)ππ
- Tests of a single capacity in an AI system should not be used to generalize about the system's overall capacities. (ML: 0.97)ππ
- Intelligence is not a single-dimensional concept, but rather a multidimensional one. (ML: 0.96)ππ
- The theory of strange intelligence invites us to pause when considering the ethical significance of intelligence, as correlations between intelligence and other ethically significant properties may be absent. (ML: 0.96)ππ
- multidimensional intelligence: Intelligence that cannot be reduced to a single dimension or concept. (ML: 0.94)ππ
- strange intelligence: A type of intelligence that operates very differently from typical humans. (ML: 0.94)ππ
- The linear model of progress in AI is rejected, and instead, a nonlinear approach to intelligence is proposed. (ML: 0.94)ππ
- AIs can be highly intelligent in their own distinct way, even if they don't possess human-like intelligence. (ML: 0.94)ππ
- AIs can be intelligent in their own distinct way, even if they don't possess human-like intelligence. (ML: 0.93)ππ
- general intelligence: The ability to achieve a broad range of goals in a broad range of environments, which is a matter of degree and always relative to the range of goals and environments of interest. (ML: 0.93)ππ
Abstract
We endorse and expand upon Susan Schneider's critique of the linear model of AI progress and introduce two novel concepts: "familiar intelligence" and "strange intelligence". AI intelligence is likely to be strange intelligence, defying familiar patterns of ability and inability, combining superhuman capacities in some domains with subhuman performance in other domains, and even within domains sometimes combining superhuman insight with surprising errors that few humans would make. We develop and defend a nonlinear model of intelligence on which "general intelligence" is not a unified capacity but instead the ability to achieve a broad range of goals in a broad range of environments, in a manner that defies nonarbitrary reduction to a single linear quantity. We conclude with implications for adversarial testing approaches to evaluating AI capacities. If AI is strange intelligence, we should expect that even the most capable systems will sometimes fail in seemingly obvious tasks. On a nonlinear model of AI intelligence, such errors on their own do not demonstrate a system's lack of outstanding general intelligence. Conversely, excellent performance on one type of task, such as an IQ test, cannot warrant assumptions of broad capacities beyond that task domain.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
This paperβs critique of linear AI models and introduction of βstrange intelligenceβ is highly relevant to understanding potential unforeseen consequences of advanced AI systems. It raises important questions about the nature of intelligence and its potential impact on societal structures and inequalities.
Gratex International
AI Insights - Further research is needed to improve the quality and reliability of LLM-based systems, as well as to address issues such as hallucinations and data bias. (ML: 0.99)ππ
- They also discuss the challenges associated with using LLMs in software engineering, including the need for high-quality training data and the potential for hallucinations. (ML: 0.97)ππ
- Limited training data Potential for hallucinations Data bias (ML: 0.95)ππ
- The use of LLMs in software engineering has shown promising results, but there are still challenges associated with their adoption. (ML: 0.94)ππ
- The paper discusses the use of large language models (LLMs) in software engineering, specifically in the areas of test case generation and scenario-based GUI testing. (ML: 0.94)ππ
- LLMs: Large Language Models are a type of artificial intelligence model that can process and generate human-like text. (ML: 0.94)ππ
- Retrieval-augmented models: These models use LLMs to retrieve relevant information from a database or knowledge graph, which is then used to augment the LLM's output. (ML: 0.93)ππ
- The authors propose a framework that combines LLMs with retrieval-augmented models to generate test cases from natural language requirements. (ML: 0.90)ππ
Abstract
The introduction of large language models ignited great retooling and rethinking of the software development models. The ensuing response of software engineering research yielded a massive body of tools and approaches. In this paper, we join the hassle by introducing agentic AI solutions for two tasks. First, we developed a solution for automatic test scenario generation from a detailed requirements description. This approach relies on specialized worker agents forming a star topology with the supervisor agent in the middle. We demonstrate its capabilities on a real-world example. Second, we developed an agentic AI solution for the document retrieval task in the context of software engineering documents. Our solution enables performing various use cases on a body of documents related to the development of a single software, including search, question answering, tracking changes, and large document summarization. In this case, each use case is handled by a dedicated LLM-based agent, which performs all subtasks related to the corresponding use case. We conclude by hinting at the future perspectives of our line of research.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
The use of agentic AI in software engineering, particularly in document retrieval and test scenario generation, has implications for resource allocation and potentially exacerbating inequalities within the tech industry. This work offers a perspective on how AI tools could impact labor markets and access to opportunities.
New York University
AI Insights - The authors emphasize that these challenges require a multidisciplinary approach involving computer science, medicine, and social sciences. (ML: 0.99)ππ
- It proposes a framework for evaluating their performance, focusing on three main aspects: Adaptation & Learning, Safety & Ethics, and Human-AI Collaboration. (ML: 0.99)ππ
- The paper discusses the challenges of using Large Language Models (LLMs) in healthcare settings. (ML: 0.98)ππ
- The paper discusses the challenges of deploying Large Language Models (LLMs) in healthcare settings, including drift detection, reinforcement-based adaptation, meta-learning, few-shot competence, safety, and ethics. (ML: 0.97)ππ
- Adaptation & Learning: This discipline involves keeping deployed agents calibrated to a moving world by detecting distributional and behavioral shifts, applying targeted mitigations before clinical performance erodes. (ML: 0.97)ππ
- The authors propose a framework for evaluating LLMs in healthcare, focusing on three main aspects: Adaptation & Learning, Safety & Ethics, and Human-AI Collaboration. (ML: 0.97)ππ
- Limited discussion on human-AI collaboration Recent studies document automatic prompt injection attacks [113] and organize defenses into clear taxonomies [114] The paper discusses the challenges of deploying Large Language Models (LLMs) in healthcare settings and proposes a framework for evaluating their performance. (ML: 0.95)ππ
- The paper highlights the importance of addressing drift detection, reinforcement-based adaptation, meta-learning, few-shot competence, safety, and ethics in healthcare LLMs. (ML: 0.94)ππ
Abstract
Large Language Model (LLM)-based agents that plan, use tools and act has begun to shape healthcare and medicine. Reported studies demonstrate competence on various tasks ranging from EHR analysis and differential diagnosis to treatment planning and research workflows. Yet the literature largely consists of overviews which are either broad surveys or narrow dives into a single capability (e.g., memory, planning, reasoning), leaving healthcare work without a common frame. We address this by reviewing 49 studies using a seven-dimensional taxonomy: Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology and Core Tasks & Subtasks with 29 operational sub-dimensions. Using explicit inclusion and exclusion criteria and a labeling rubric (Fully Implemented, Partially Implemented, Not Implemented), we map each study to the taxonomy and report quantitative summaries of capability prevalence and co-occurrence patterns. Our empirical analysis surfaces clear asymmetries. For instance, the External Knowledge Integration sub-dimension under Knowledge Management is commonly realized (~76% Fully Implemented) whereas Event-Triggered Activation sub-dimenison under Interaction Patterns is largely absent (~92% Not Implemented) and Drift Detection & Mitigation sub-dimension under Adaptation & Learning is rare (~98% Not Implemented). Architecturally, Multi-Agent Design sub-dimension under Framework Typology is the dominant pattern (~82% Fully Implemented) while orchestration layers remain mostly partial. Across Core Tasks & Subtasks, information centric capabilities lead e.g., Medical Question Answering & Decision Support and Benchmarking & Simulation, while action and discovery oriented areas such as Treatment Planning & Prescription still show substantial gaps (~59% Not Implemented).
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
Examining the deployment of LLM-based agents in healthcare raises critical questions about access to care and potential biases embedded within these systems. This research provides a framework for evaluating the fairness and equity of AI applications in a sector with significant existing inequalities.
JigsawStack, Inc
AI Insights - Small language models may not be as effective as large language models in certain tasks. (ML: 0.99)ππ
- The use of LLMs with tools may be limited by the availability of high-quality training data. (ML: 0.99)ππ
- Researchers are exploring the potential of multimodal safety classification, which may have significant implications for industries that rely on text-based data. (ML: 0.98)ππ
- Multimodal safety classification is a technique used to identify potential risks or hazards in text-based data. (ML: 0.98)ππ
- The use of small language models is gaining traction as a valuable plug-in for large language models. (ML: 0.95)ππ
- LLMs (Large Language Models) are artificial intelligence models that can process and generate human-like language. (ML: 0.94)ππ
- The use of LLMs with tools and small language models is becoming increasingly prevalent in various applications. (ML: 0.94)ππ
- There is a growing interest in multimodal safety classification, with the introduction of Llama Guard 4 by Meta AI. (ML: 0.89)ππ
- LLMs with tools are becoming increasingly popular, and researchers are exploring their potential in various applications. (ML: 0.88)ππ
- Agentic AI refers to artificial intelligence systems that can perform tasks autonomously, often requiring human oversight or intervention. (ML: 0.83)ππ
Abstract
We present Interfaze, a system that treats modern LLM applications as a problem of building and acting over context, not just picking the right monolithic model. Instead of a single transformer, we combine (i) a stack of heterogeneous DNNs paired with small language models as perception modules for OCR involving complex PDFs, charts and diagrams, and multilingual ASR with (ii) a context-construction layer that crawls, indexes, and parses external sources (web pages, code, PDFs) into compact structured state, and (iii) an action layer that can browse, retrieve, execute code in a sandbox, and drive a headless browser for dynamic web pages. A thin controller sits on top of this stack and exposes a single, OpenAI-style endpoint: it decides which small models and actions to run and always forwards the distilled context to a user-selected LLM that produces the final response.
On this architecture, Interfaze-Beta achieves 83.6% on MMLU-Pro, 91.4% on MMLU, 81.3% on GPQA-Diamond, 57.8% on LiveCodeBench v5, and 90.0% on AIME-2025, along with strong multimodal scores on MMMU (val) (77.3%), AI2D (91.5%), ChartQA (90.9%), and Common Voice v16 (90.8%). We show that most queries are handled primarily by the small-model and tool stack, with the large LLM operating only on distilled context, yielding competitive accuracy while shifting the bulk of computation away from the most expensive and monolithic models.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Carnegie Mellon University
AI Insights - Social cost of intelligence: The negative consequences of intelligent behavior in multi-agent systems, including the emergence, propagation, and amplification of stereotypical bias. (ML: 0.95)ππ
- The paper highlights the importance of considering the social cost of intelligence in multi-agent systems, including the emergence, propagation, and amplification of stereotypical bias. (ML: 0.93)ππ
- The authors propose a framework for evaluating and improving the performance of multi-agent systems using a combination of metrics such as accuracy, efficiency, and fairness. (ML: 0.92)ππ
- The authors emphasize the need for a deeper understanding of the social cost of intelligence in multi-agent systems and its impact on real-world applications. (ML: 0.92)ππ
- The paper concludes by highlighting the importance of continued research into the development of more effective and fair multi-agent systems. (ML: 0.91)ππ
- The paper suggests that future work should focus on developing more sophisticated evaluation metrics and methods for improving the performance of multi-agent systems. (ML: 0.89)ππ
- The paper's focus on theoretical concepts may limit its practical applications. (ML: 0.87)ππ
- Agent: An autonomous entity that can perceive its environment, make decisions, and take actions to achieve its goals. (ML: 0.86)ππ
- Multi-agent system: A system consisting of multiple agents that interact with each other to achieve a common goal. (ML: 0.82)ππ
- The paper discusses the concept of multi-agent systems, which involve multiple agents interacting with each other to achieve a common goal. (ML: 0.82)ππ
Abstract
While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and error propagation. This paper proposes AgentArk, a novel framework to distill multi-agent dynamics into the weights of a single model, effectively transforming explicit test-time interactions into implicit model capabilities. This equips a single agent with the intelligence of multi-agent systems while remaining computationally efficient. Specifically, we investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios: reasoning-enhanced fine-tuning; trajectory-based augmentation; and process-aware distillation. By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents. They further demonstrate enhanced robustness and generalization across diverse reasoning tasks. We hope this work can shed light on future research on efficient and robust multi-agent development. Our code is at https://github.com/AIFrontierLab/AgentArk.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Luxembourg Institute of Science and Technology
AI Insights - Additionally, the generated neural network architectures may not always outperform state-of-the-art models in various tasks. (ML: 0.98)ππ
- They also discuss the limitations and challenges associated with this approach. (ML: 0.98)ππ
- This can help improve the performance of various machine learning tasks such as image classification, object detection, and natural language processing. (ML: 0.97)ππ
- However, it relies heavily on the capabilities of LLMs, which may not be available to all researchers or practitioners. (ML: 0.97)ππ
- The proposed method relies heavily on the capabilities of LLMs, which may not be available to all researchers or practitioners. (ML: 0.97)ππ
- The paper proposes a method for generating neural network architectures using large language models (LLMs). (ML: 0.92)ππ
- The authors cite several papers that demonstrate the effectiveness of using LLMs for generating neural network architectures. (ML: 0.92)ππ
- The authors demonstrate the effectiveness of their approach by generating neural network architectures that outperform state-of-the-art models in several tasks. (ML: 0.92)ππ
- The authors demonstrate the effectiveness of their approach by generating neural network architectures that outperform state-of-the-art models in several tasks. (ML: 0.92)ππ
- The paper presents a novel approach to generating neural network architectures using large language models (LLMs). (ML: 0.91)ππ
- The paper proposes a new way to generate neural network architectures using large language models (LLMs). (ML: 0.91)ππ
- The authors propose a method that leverages the capabilities of LLMs to generate neural network architectures, which can be used for various tasks such as image classification, object detection, and natural language processing. (ML: 0.91)ππ
- LLM: Large Language Model The proposed method for generating neural network architectures using LLMs is a promising approach that can be used to improve the performance of various machine learning tasks. (ML: 0.89)ππ
- The proposed method is based on a combination of two techniques: instruction-guided autoregressive neural network parameter generation and tabular data generation using agentic LLM methods. (ML: 0.85)ππ
Abstract
Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring, and migration. These tools play a crucial role in guaranteeing both the correctness and maintainability of neural network architectures, helping to prevent implementation errors, simplify model updates, and ensure that complex networks can be reliably extended and reused. Yet, assessing their effectiveness remains challenging due to the lack of publicly diverse datasets of neural networks that would allow systematic evaluation. To address this gap, we leverage large language models (LLMs) to automatically generate a dataset of neural networks that can serve as a benchmark for validation. The dataset is designed to cover diverse architectural components and to handle multiple input data types and tasks. In total, 608 samples are generated, each conforming to a set of precise design choices. To further ensure their consistency, we validate the correctness of the generated networks using static analysis and symbolic tracing. We make the dataset publicly available to support the community in advancing research on neural network reliability and adaptability.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Beijing University of Posts and Telecommunications
AI Insights - Scale-invariant components: Components in neural networks that are invariant to scaling, such as BatchNorm layers. (ML: 0.92)ππ
- Architecture-aware updates/projections: Updates and projections that are aware of the architecture of the neural network, such as BatchNorm layers. (ML: 0.92)ππ
- Curvature-adaptive radial step sizing: An adaptive learning rate scheme that adjusts the step size based on the curvature of the loss function. (ML: 0.89)ππ
- Its performance is superior to other algorithms, including AdamW and AdamP, on both CIFAR-100 and modular-arithmetic Grokking tasks. (ML: 0.88)ππ
- The paper evaluates AdamO on CIFAR-100 and modular-arithmetic Grokking tasks, showing that it outperforms other optimization algorithms, including AdamW and AdamP. (ML: 0.84)ππ
- Orthogonal dynamics: A method for optimizing neural networks by decoupling the update rules for different dimensions. (ML: 0.81)ππ
- AdamO is designed to handle scale-invariant components in neural networks, such as BatchNorm, by using projections to suppress ineffective updates. (ML: 0.81)ππ
- AdamO is a robust and effective optimization algorithm for deep learning tasks, particularly those involving scale-invariant components. (ML: 0.71)ππ
- AdamO's performance is robust across a wide range of hyperparameters, making it easier to tune and use in practice. (ML: 0.67)ππ
- The paper proposes a new adaptive optimization algorithm called AdamO, which is fully decoupled orthogonal dynamics with curvature-adaptive radial step sizing and architecture-aware updates/projections. (ML: 0.61)ππ
Abstract
Is the standard weight decay in AdamW truly optimal? Although AdamW decouples weight decay from adaptive gradient scaling, a fundamental conflict remains: the Radial Tug-of-War. In deep learning, gradients tend to increase parameter norms to expand effective capacity while steering directions to learn features, whereas weight decay indiscriminately suppresses norm growth. This push--pull interaction induces radial oscillations, injecting noise into Adam's second-moment estimates and potentially degrading delicate tangential feature learning. We argue that magnitude and direction play distinct roles and should be decoupled in optimizer dynamics. We propose Orthogonal Dynamics Decoupling and instantiate it as AdamO: an SGD-style update handles the one-dimensional norm control, while Adam's adaptive preconditioning is confined to the tangential subspace. AdamO further incorporates curvature-adaptive radial step sizing and architecture-aware rules and projections for scale-invariant layers and low-dimensional parameters. Experiments on vision and language tasks show that AdamO improves generalization and stability over AdamW without introducing additional complex constraints.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Aalto University
AI Insights - Ground-truth intents: The actual intentions or goals of a user that can be used as a reference for evaluating the performance of a system. (ML: 0.98)ππ
- To avoid learning and ordering effects, we adopted a between-subjects design where each participant was assigned one condition and one task. (ML: 0.97)ππ
- The user study was deployed on a web-based interface. (ML: 0.97)ππ
- We conducted a power analysis using G*Power for a two-condition between-subjects design. (ML: 0.97)ππ
- Each task was used equally often per condition. (ML: 0.95)ππ
- Participants were given a task scenario from a set of eight design types: interior design, painting, photography, app icon design, poster design, logo design, fashion design, and architectural style design. (ML: 0.94)ππ
- Between-subjects design: A research design in which each participant is assigned to only one group or condition, and the groups are compared to each other. (ML: 0.93)ππ
- LLM tools: Large language model tools that use artificial intelligence to generate text based on input prompts. (ML: 0.92)ππ
- The study evaluates APE using a prototyped user interface that reflects how the system would be deployed in practice, and compares it against manual prompt engineering, a baseline condition representing the current standard in which users iteratively refine textual prompts. (ML: 0.90)ππ
- The required sample size was estimated at 128 participants. (ML: 0.90)ππ
- Text-to-image generation: A technique in computer vision that generates images from text-based descriptions. (ML: 0.85)ππ
- The server side integrated our APE algorithm, supporting real-time interaction with the text-to-image model. (ML: 0.78)ππ
Abstract
Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
Peking University
AI Insights - It then localizes the visual regions associated with each extracted word. (ML: 0.94)ππ
- The VLM first parses the editing instruction and extracts all nouns that correspond to the reference images. (ML: 0.92)ππ
- The prompt used to predict the bounding boxes associated with each extracted word is presented in Figure 8. (ML: 0.90)ππ
- Instruction-reference alignment Correspondence-aware masked attention VAE dropout probability The model effectively balances VLM features and VAE-based appearance information when trained with an appropriate VAE dropout probability. (ML: 0.88)ππ
- The prompt used to extract reference-related words from the editing instruction is shown in Figure 7. (ML: 0.86)ππ
- Qwen3-VL-30B-A3B-Instruct (QwenTeam, 2025) was used for Instruction-Reference Alignment. (ML: 0.83)ππ
- The images at the bottom visualize these grounded regions on the reference images. (ML: 0.73)ππ
Abstract
Multi-subject image generation aims to synthesize images that faithfully preserve the identities of multiple reference subjects while following textual instructions. However, existing methods often suffer from identity inconsistency and limited compositional control, as they rely on diffusion models to implicitly associate text prompts with reference images. In this work, we propose Hierarchical Concept-to-Appearance Guidance (CAG), a framework that provides explicit, structured supervision from high-level concepts to fine-grained appearances. At the conceptual level, we introduce a VAE dropout training strategy that randomly omits reference VAE features, encouraging the model to rely more on robust semantic signals from a Visual Language Model (VLM) and thereby promoting consistent concept-level generation in the absence of complete appearance cues. At the appearance level, we integrate the VLM-derived correspondences into a correspondence-aware masked attention module within the Diffusion Transformer (DiT). This module restricts each text token to attend only to its matched reference regions, ensuring precise attribute binding and reliable multi-subject composition. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the multi-subject image generation, substantially improving prompt following and subject consistency.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
We did not find tons of content matching your interests we've included some additional topics that are popular.
Also be aware that if the topics is not present in arxiv we wont be able to recommend it.
New York University
AI Insights - The authors emphasize that these challenges require a multidisciplinary approach involving computer science, medicine, and social sciences. (ML: 0.99)ππ
- It proposes a framework for evaluating their performance, focusing on three main aspects: Adaptation & Learning, Safety & Ethics, and Human-AI Collaboration. (ML: 0.99)ππ
- The paper discusses the challenges of using Large Language Models (LLMs) in healthcare settings. (ML: 0.98)ππ
- The paper discusses the challenges of deploying Large Language Models (LLMs) in healthcare settings, including drift detection, reinforcement-based adaptation, meta-learning, few-shot competence, safety, and ethics. (ML: 0.97)ππ
- Adaptation & Learning: This discipline involves keeping deployed agents calibrated to a moving world by detecting distributional and behavioral shifts, applying targeted mitigations before clinical performance erodes. (ML: 0.97)ππ
- The authors propose a framework for evaluating LLMs in healthcare, focusing on three main aspects: Adaptation & Learning, Safety & Ethics, and Human-AI Collaboration. (ML: 0.97)ππ
- Limited discussion on human-AI collaboration Recent studies document automatic prompt injection attacks [113] and organize defenses into clear taxonomies [114] The paper discusses the challenges of deploying Large Language Models (LLMs) in healthcare settings and proposes a framework for evaluating their performance. (ML: 0.95)ππ
- The paper highlights the importance of addressing drift detection, reinforcement-based adaptation, meta-learning, few-shot competence, safety, and ethics in healthcare LLMs. (ML: 0.94)ππ
Abstract
Large Language Model (LLM)-based agents that plan, use tools and act has begun to shape healthcare and medicine. Reported studies demonstrate competence on various tasks ranging from EHR analysis and differential diagnosis to treatment planning and research workflows. Yet the literature largely consists of overviews which are either broad surveys or narrow dives into a single capability (e.g., memory, planning, reasoning), leaving healthcare work without a common frame. We address this by reviewing 49 studies using a seven-dimensional taxonomy: Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology and Core Tasks & Subtasks with 29 operational sub-dimensions. Using explicit inclusion and exclusion criteria and a labeling rubric (Fully Implemented, Partially Implemented, Not Implemented), we map each study to the taxonomy and report quantitative summaries of capability prevalence and co-occurrence patterns. Our empirical analysis surfaces clear asymmetries. For instance, the External Knowledge Integration sub-dimension under Knowledge Management is commonly realized (~76% Fully Implemented) whereas Event-Triggered Activation sub-dimenison under Interaction Patterns is largely absent (~92% Not Implemented) and Drift Detection & Mitigation sub-dimension under Adaptation & Learning is rare (~98% Not Implemented). Architecturally, Multi-Agent Design sub-dimension under Framework Typology is the dominant pattern (~82% Fully Implemented) while orchestration layers remain mostly partial. Across Core Tasks & Subtasks, information centric capabilities lead e.g., Medical Question Answering & Decision Support and Benchmarking & Simulation, while action and discovery oriented areas such as Treatment Planning & Prescription still show substantial gaps (~59% Not Implemented).
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
Maastricht University
AI Insights - The text also discusses the relationship between groundedness and maximization of complete and transitive preference relations. (ML: 0.97)ππ
- Some of the key concepts explored include consistency, monotonicity, and weak axiom of revealed preference (WARP). (ML: 0.97)ππ
- The results have implications for understanding rationalizability and groundedness in choice theory. (ML: 0.95)ππ
- GAIC: Grounded Axiom of Revealed Preference. (ML: 0.92)ππ
- A choice function c is said to satisfy GMAIC if it maximizes a complete and transitive preference relation over non-empty subsets of X. (ML: 0.91)ππ
- Groundedness: A choice function c satisfies groundedness if for all x β X, there exists a set S β X \{x such that I(S) = β
. (ML: 0.89)ππ
- GMAIC: Grounded Maximizing Axiom of Choice. (ML: 0.89)ππ
- The provided text provides a comprehensive proof of various theorems and propositions related to choice theory. (ML: 0.89)ππ
- The proofs cover topics such as injectivity, surjectivity, and double union closure of interpretation functions. (ML: 0.88)ππ
- A choice function c is said to satisfy GAIC if it satisfies groundedness and the corresponding interpretation I satisfies consistency, monotonicity, and WARP. (ML: 0.88)ππ
- The proofs demonstrate the relationship between different axioms and properties of choice functions. (ML: 0.86)ππ
- The provided text appears to be a proof of various theorems and propositions related to choice theory, specifically in the context of rationalizability and groundedness. (ML: 0.86)ππ
Abstract
This paper proposes a model of choice via agentic artificial intelligence (AI). A key feature is that the AI may misinterpret a menu before recommending what to choose. A single acyclicity condition guarantees that there is a monotonic interpretation and a strict preference relation that together rationalize the AI's recommendations. Since this preference is in general not unique, there is no safeguard against it misaligning with that of a decision maker. What enables the verification of such AI alignment is interpretations satisfying double monotonicity. Indeed, double monotonicity ensures full identifiability and internal consistency. But, an additional idempotence property is required to guarantee that recommendations are fully rational and remain grounded within the original feasible set.
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
University of California, Riverside
AI Insights - A suitably broad understanding of intelligence is useful for recognizing the diverse and impressive capacities of different types of beings. (ML: 0.98)ππ
- Tests of a single capacity in an AI system should not be used to generalize about the system's overall capacities. (ML: 0.97)ππ
- Intelligence is not a single-dimensional concept, but rather a multidimensional one. (ML: 0.96)ππ
- The theory of strange intelligence invites us to pause when considering the ethical significance of intelligence, as correlations between intelligence and other ethically significant properties may be absent. (ML: 0.96)ππ
- multidimensional intelligence: Intelligence that cannot be reduced to a single dimension or concept. (ML: 0.94)ππ
- strange intelligence: A type of intelligence that operates very differently from typical humans. (ML: 0.94)ππ
- The linear model of progress in AI is rejected, and instead, a nonlinear approach to intelligence is proposed. (ML: 0.94)ππ
- AIs can be highly intelligent in their own distinct way, even if they don't possess human-like intelligence. (ML: 0.94)ππ
- AIs can be intelligent in their own distinct way, even if they don't possess human-like intelligence. (ML: 0.93)ππ
- general intelligence: The ability to achieve a broad range of goals in a broad range of environments, which is a matter of degree and always relative to the range of goals and environments of interest. (ML: 0.93)ππ
Abstract
We endorse and expand upon Susan Schneider's critique of the linear model of AI progress and introduce two novel concepts: "familiar intelligence" and "strange intelligence". AI intelligence is likely to be strange intelligence, defying familiar patterns of ability and inability, combining superhuman capacities in some domains with subhuman performance in other domains, and even within domains sometimes combining superhuman insight with surprising errors that few humans would make. We develop and defend a nonlinear model of intelligence on which "general intelligence" is not a unified capacity but instead the ability to achieve a broad range of goals in a broad range of environments, in a manner that defies nonarbitrary reduction to a single linear quantity. We conclude with implications for adversarial testing approaches to evaluating AI capacities. If AI is strange intelligence, we should expect that even the most capable systems will sometimes fail in seemingly obvious tasks. On a nonlinear model of AI intelligence, such errors on their own do not demonstrate a system's lack of outstanding general intelligence. Conversely, excellent performance on one type of task, such as an IQ test, cannot warrant assumptions of broad capacities beyond that task domain.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
Gratex International
AI Insights - Further research is needed to improve the quality and reliability of LLM-based systems, as well as to address issues such as hallucinations and data bias. (ML: 0.99)ππ
- They also discuss the challenges associated with using LLMs in software engineering, including the need for high-quality training data and the potential for hallucinations. (ML: 0.97)ππ
- Limited training data Potential for hallucinations Data bias (ML: 0.95)ππ
- The use of LLMs in software engineering has shown promising results, but there are still challenges associated with their adoption. (ML: 0.94)ππ
- The paper discusses the use of large language models (LLMs) in software engineering, specifically in the areas of test case generation and scenario-based GUI testing. (ML: 0.94)ππ
- LLMs: Large Language Models are a type of artificial intelligence model that can process and generate human-like text. (ML: 0.94)ππ
- Retrieval-augmented models: These models use LLMs to retrieve relevant information from a database or knowledge graph, which is then used to augment the LLM's output. (ML: 0.93)ππ
- The authors propose a framework that combines LLMs with retrieval-augmented models to generate test cases from natural language requirements. (ML: 0.90)ππ
Abstract
The introduction of large language models ignited great retooling and rethinking of the software development models. The ensuing response of software engineering research yielded a massive body of tools and approaches. In this paper, we join the hassle by introducing agentic AI solutions for two tasks. First, we developed a solution for automatic test scenario generation from a detailed requirements description. This approach relies on specialized worker agents forming a star topology with the supervisor agent in the middle. We demonstrate its capabilities on a real-world example. Second, we developed an agentic AI solution for the document retrieval task in the context of software engineering documents. Our solution enables performing various use cases on a body of documents related to the development of a single software, including search, question answering, tracking changes, and large document summarization. In this case, each use case is handled by a dedicated LLM-based agent, which performs all subtasks related to the corresponding use case. We conclude by hinting at the future perspectives of our line of research.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
JigsawStack, Inc
AI Insights - Small language models may not be as effective as large language models in certain tasks. (ML: 0.99)ππ
- The use of LLMs with tools may be limited by the availability of high-quality training data. (ML: 0.99)ππ
- Researchers are exploring the potential of multimodal safety classification, which may have significant implications for industries that rely on text-based data. (ML: 0.98)ππ
- Multimodal safety classification is a technique used to identify potential risks or hazards in text-based data. (ML: 0.98)ππ
- The use of small language models is gaining traction as a valuable plug-in for large language models. (ML: 0.95)ππ
- LLMs (Large Language Models) are artificial intelligence models that can process and generate human-like language. (ML: 0.94)ππ
- The use of LLMs with tools and small language models is becoming increasingly prevalent in various applications. (ML: 0.94)ππ
- There is a growing interest in multimodal safety classification, with the introduction of Llama Guard 4 by Meta AI. (ML: 0.89)ππ
- LLMs with tools are becoming increasingly popular, and researchers are exploring their potential in various applications. (ML: 0.88)ππ
- Agentic AI refers to artificial intelligence systems that can perform tasks autonomously, often requiring human oversight or intervention. (ML: 0.83)ππ
Abstract
We present Interfaze, a system that treats modern LLM applications as a problem of building and acting over context, not just picking the right monolithic model. Instead of a single transformer, we combine (i) a stack of heterogeneous DNNs paired with small language models as perception modules for OCR involving complex PDFs, charts and diagrams, and multilingual ASR with (ii) a context-construction layer that crawls, indexes, and parses external sources (web pages, code, PDFs) into compact structured state, and (iii) an action layer that can browse, retrieve, execute code in a sandbox, and drive a headless browser for dynamic web pages. A thin controller sits on top of this stack and exposes a single, OpenAI-style endpoint: it decides which small models and actions to run and always forwards the distilled context to a user-selected LLM that produces the final response.
On this architecture, Interfaze-Beta achieves 83.6% on MMLU-Pro, 91.4% on MMLU, 81.3% on GPQA-Diamond, 57.8% on LiveCodeBench v5, and 90.0% on AIME-2025, along with strong multimodal scores on MMMU (val) (77.3%), AI2D (91.5%), ChartQA (90.9%), and Common Voice v16 (90.8%). We show that most queries are handled primarily by the small-model and tool stack, with the large LLM operating only on distilled context, yielding competitive accuracy while shifting the bulk of computation away from the most expensive and monolithic models.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Carnegie Mellon University
AI Insights - Social cost of intelligence: The negative consequences of intelligent behavior in multi-agent systems, including the emergence, propagation, and amplification of stereotypical bias. (ML: 0.95)ππ
- The paper highlights the importance of considering the social cost of intelligence in multi-agent systems, including the emergence, propagation, and amplification of stereotypical bias. (ML: 0.93)ππ
- The authors propose a framework for evaluating and improving the performance of multi-agent systems using a combination of metrics such as accuracy, efficiency, and fairness. (ML: 0.92)ππ
- The authors emphasize the need for a deeper understanding of the social cost of intelligence in multi-agent systems and its impact on real-world applications. (ML: 0.92)ππ
- The paper concludes by highlighting the importance of continued research into the development of more effective and fair multi-agent systems. (ML: 0.91)ππ
- The paper suggests that future work should focus on developing more sophisticated evaluation metrics and methods for improving the performance of multi-agent systems. (ML: 0.89)ππ
- The paper's focus on theoretical concepts may limit its practical applications. (ML: 0.87)ππ
- Agent: An autonomous entity that can perceive its environment, make decisions, and take actions to achieve its goals. (ML: 0.86)ππ
- Multi-agent system: A system consisting of multiple agents that interact with each other to achieve a common goal. (ML: 0.82)ππ
- The paper discusses the concept of multi-agent systems, which involve multiple agents interacting with each other to achieve a common goal. (ML: 0.82)ππ
Abstract
While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and error propagation. This paper proposes AgentArk, a novel framework to distill multi-agent dynamics into the weights of a single model, effectively transforming explicit test-time interactions into implicit model capabilities. This equips a single agent with the intelligence of multi-agent systems while remaining computationally efficient. Specifically, we investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios: reasoning-enhanced fine-tuning; trajectory-based augmentation; and process-aware distillation. By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents. They further demonstrate enhanced robustness and generalization across diverse reasoning tasks. We hope this work can shed light on future research on efficient and robust multi-agent development. Our code is at https://github.com/AIFrontierLab/AgentArk.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Luxembourg Institute of Science and Technology
AI Insights - Additionally, the generated neural network architectures may not always outperform state-of-the-art models in various tasks. (ML: 0.98)ππ
- They also discuss the limitations and challenges associated with this approach. (ML: 0.98)ππ
- This can help improve the performance of various machine learning tasks such as image classification, object detection, and natural language processing. (ML: 0.97)ππ
- However, it relies heavily on the capabilities of LLMs, which may not be available to all researchers or practitioners. (ML: 0.97)ππ
- The proposed method relies heavily on the capabilities of LLMs, which may not be available to all researchers or practitioners. (ML: 0.97)ππ
- The paper proposes a method for generating neural network architectures using large language models (LLMs). (ML: 0.92)ππ
- The authors cite several papers that demonstrate the effectiveness of using LLMs for generating neural network architectures. (ML: 0.92)ππ
- The authors demonstrate the effectiveness of their approach by generating neural network architectures that outperform state-of-the-art models in several tasks. (ML: 0.92)ππ
- The authors demonstrate the effectiveness of their approach by generating neural network architectures that outperform state-of-the-art models in several tasks. (ML: 0.92)ππ
- The paper presents a novel approach to generating neural network architectures using large language models (LLMs). (ML: 0.91)ππ
- The paper proposes a new way to generate neural network architectures using large language models (LLMs). (ML: 0.91)ππ
- The authors propose a method that leverages the capabilities of LLMs to generate neural network architectures, which can be used for various tasks such as image classification, object detection, and natural language processing. (ML: 0.91)ππ
- LLM: Large Language Model The proposed method for generating neural network architectures using LLMs is a promising approach that can be used to improve the performance of various machine learning tasks. (ML: 0.89)ππ
- The proposed method is based on a combination of two techniques: instruction-guided autoregressive neural network parameter generation and tabular data generation using agentic LLM methods. (ML: 0.85)ππ
Abstract
Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring, and migration. These tools play a crucial role in guaranteeing both the correctness and maintainability of neural network architectures, helping to prevent implementation errors, simplify model updates, and ensure that complex networks can be reliably extended and reused. Yet, assessing their effectiveness remains challenging due to the lack of publicly diverse datasets of neural networks that would allow systematic evaluation. To address this gap, we leverage large language models (LLMs) to automatically generate a dataset of neural networks that can serve as a benchmark for validation. The dataset is designed to cover diverse architectural components and to handle multiple input data types and tasks. In total, 608 samples are generated, each conforming to a set of precise design choices. To further ensure their consistency, we validate the correctness of the generated networks using static analysis and symbolic tracing. We make the dataset publicly available to support the community in advancing research on neural network reliability and adaptability.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Beijing University of Posts and Telecommunications
AI Insights - Scale-invariant components: Components in neural networks that are invariant to scaling, such as BatchNorm layers. (ML: 0.92)ππ
- Architecture-aware updates/projections: Updates and projections that are aware of the architecture of the neural network, such as BatchNorm layers. (ML: 0.92)ππ
- Curvature-adaptive radial step sizing: An adaptive learning rate scheme that adjusts the step size based on the curvature of the loss function. (ML: 0.89)ππ
- Its performance is superior to other algorithms, including AdamW and AdamP, on both CIFAR-100 and modular-arithmetic Grokking tasks. (ML: 0.88)ππ
- The paper evaluates AdamO on CIFAR-100 and modular-arithmetic Grokking tasks, showing that it outperforms other optimization algorithms, including AdamW and AdamP. (ML: 0.84)ππ
- Orthogonal dynamics: A method for optimizing neural networks by decoupling the update rules for different dimensions. (ML: 0.81)ππ
- AdamO is designed to handle scale-invariant components in neural networks, such as BatchNorm, by using projections to suppress ineffective updates. (ML: 0.81)ππ
- AdamO is a robust and effective optimization algorithm for deep learning tasks, particularly those involving scale-invariant components. (ML: 0.71)ππ
- AdamO's performance is robust across a wide range of hyperparameters, making it easier to tune and use in practice. (ML: 0.67)ππ
- The paper proposes a new adaptive optimization algorithm called AdamO, which is fully decoupled orthogonal dynamics with curvature-adaptive radial step sizing and architecture-aware updates/projections. (ML: 0.61)ππ
Abstract
Is the standard weight decay in AdamW truly optimal? Although AdamW decouples weight decay from adaptive gradient scaling, a fundamental conflict remains: the Radial Tug-of-War. In deep learning, gradients tend to increase parameter norms to expand effective capacity while steering directions to learn features, whereas weight decay indiscriminately suppresses norm growth. This push--pull interaction induces radial oscillations, injecting noise into Adam's second-moment estimates and potentially degrading delicate tangential feature learning. We argue that magnitude and direction play distinct roles and should be decoupled in optimizer dynamics. We propose Orthogonal Dynamics Decoupling and instantiate it as AdamO: an SGD-style update handles the one-dimensional norm control, while Adam's adaptive preconditioning is confined to the tangential subspace. AdamO further incorporates curvature-adaptive radial step sizing and architecture-aware rules and projections for scale-invariant layers and low-dimensional parameters. Experiments on vision and language tasks show that AdamO improves generalization and stability over AdamW without introducing additional complex constraints.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Aalto University
AI Insights - Ground-truth intents: The actual intentions or goals of a user that can be used as a reference for evaluating the performance of a system. (ML: 0.98)ππ
- To avoid learning and ordering effects, we adopted a between-subjects design where each participant was assigned one condition and one task. (ML: 0.97)ππ
- The user study was deployed on a web-based interface. (ML: 0.97)ππ
- We conducted a power analysis using G*Power for a two-condition between-subjects design. (ML: 0.97)ππ
- Each task was used equally often per condition. (ML: 0.95)ππ
- Participants were given a task scenario from a set of eight design types: interior design, painting, photography, app icon design, poster design, logo design, fashion design, and architectural style design. (ML: 0.94)ππ
- Between-subjects design: A research design in which each participant is assigned to only one group or condition, and the groups are compared to each other. (ML: 0.93)ππ
- LLM tools: Large language model tools that use artificial intelligence to generate text based on input prompts. (ML: 0.92)ππ
- The study evaluates APE using a prototyped user interface that reflects how the system would be deployed in practice, and compares it against manual prompt engineering, a baseline condition representing the current standard in which users iteratively refine textual prompts. (ML: 0.90)ππ
- The required sample size was estimated at 128 participants. (ML: 0.90)ππ
- Text-to-image generation: A technique in computer vision that generates images from text-based descriptions. (ML: 0.85)ππ
- The server side integrated our APE algorithm, supporting real-time interaction with the text-to-image model. (ML: 0.78)ππ
Abstract
Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
Peking University
AI Insights - It then localizes the visual regions associated with each extracted word. (ML: 0.94)ππ
- The VLM first parses the editing instruction and extracts all nouns that correspond to the reference images. (ML: 0.92)ππ
- The prompt used to predict the bounding boxes associated with each extracted word is presented in Figure 8. (ML: 0.90)ππ
- Instruction-reference alignment Correspondence-aware masked attention VAE dropout probability The model effectively balances VLM features and VAE-based appearance information when trained with an appropriate VAE dropout probability. (ML: 0.88)ππ
- The prompt used to extract reference-related words from the editing instruction is shown in Figure 7. (ML: 0.86)ππ
- Qwen3-VL-30B-A3B-Instruct (QwenTeam, 2025) was used for Instruction-Reference Alignment. (ML: 0.83)ππ
- The images at the bottom visualize these grounded regions on the reference images. (ML: 0.73)ππ
Abstract
Multi-subject image generation aims to synthesize images that faithfully preserve the identities of multiple reference subjects while following textual instructions. However, existing methods often suffer from identity inconsistency and limited compositional control, as they rely on diffusion models to implicitly associate text prompts with reference images. In this work, we propose Hierarchical Concept-to-Appearance Guidance (CAG), a framework that provides explicit, structured supervision from high-level concepts to fine-grained appearances. At the conceptual level, we introduce a VAE dropout training strategy that randomly omits reference VAE features, encouraging the model to rely more on robust semantic signals from a Visual Language Model (VLM) and thereby promoting consistent concept-level generation in the absence of complete appearance cues. At the appearance level, we integrate the VLM-derived correspondences into a correspondence-aware masked attention module within the Diffusion Transformer (DiT). This module restricts each text token to attend only to its matched reference regions, ensuring precise attribute binding and reliable multi-subject composition. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the multi-subject image generation, substantially improving prompt following and subject consistency.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
Interests not found
We did not find any papers that match the below interests.
Try other terms also consider if the content exists in arxiv.org.
- Inequality
- Poverty
- Economic Inequality
π¬ Help Shape Our Pricing
We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.
Share Your Feedback
Help us improve your experience!
This project is on its early stages your feedback can be pivotal on the future of the project.
Let us know what you think about this week's papers and suggestions!
Give Feedback