Hi!
Your personalized paper recommendations for 02 to 06 February, 2026.
New York University
AI Insights - The authors emphasize that these challenges require a multidisciplinary approach involving computer science, medicine, and social sciences. (ML: 0.99)👍👎
- It proposes a framework for evaluating their performance, focusing on three main aspects: Adaptation & Learning, Safety & Ethics, and Human-AI Collaboration. (ML: 0.99)👍👎
- The paper discusses the challenges of using Large Language Models (LLMs) in healthcare settings. (ML: 0.98)👍👎
- The paper discusses the challenges of deploying Large Language Models (LLMs) in healthcare settings, including drift detection, reinforcement-based adaptation, meta-learning, few-shot competence, safety, and ethics. (ML: 0.97)👍👎
- Adaptation & Learning: This discipline involves keeping deployed agents calibrated to a moving world by detecting distributional and behavioral shifts, applying targeted mitigations before clinical performance erodes. (ML: 0.97)👍👎
- The authors propose a framework for evaluating LLMs in healthcare, focusing on three main aspects: Adaptation & Learning, Safety & Ethics, and Human-AI Collaboration. (ML: 0.97)👍👎
- Limited discussion on human-AI collaboration Recent studies document automatic prompt injection attacks [113] and organize defenses into clear taxonomies [114] The paper discusses the challenges of deploying Large Language Models (LLMs) in healthcare settings and proposes a framework for evaluating their performance. (ML: 0.95)👍👎
- The paper highlights the importance of addressing drift detection, reinforcement-based adaptation, meta-learning, few-shot competence, safety, and ethics in healthcare LLMs. (ML: 0.94)👍👎
Abstract
Large Language Model (LLM)-based agents that plan, use tools and act has begun to shape healthcare and medicine. Reported studies demonstrate competence on various tasks ranging from EHR analysis and differential diagnosis to treatment planning and research workflows. Yet the literature largely consists of overviews which are either broad surveys or narrow dives into a single capability (e.g., memory, planning, reasoning), leaving healthcare work without a common frame. We address this by reviewing 49 studies using a seven-dimensional taxonomy: Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology and Core Tasks & Subtasks with 29 operational sub-dimensions. Using explicit inclusion and exclusion criteria and a labeling rubric (Fully Implemented, Partially Implemented, Not Implemented), we map each study to the taxonomy and report quantitative summaries of capability prevalence and co-occurrence patterns. Our empirical analysis surfaces clear asymmetries. For instance, the External Knowledge Integration sub-dimension under Knowledge Management is commonly realized (~76% Fully Implemented) whereas Event-Triggered Activation sub-dimenison under Interaction Patterns is largely absent (~92% Not Implemented) and Drift Detection & Mitigation sub-dimension under Adaptation & Learning is rare (~98% Not Implemented). Architecturally, Multi-Agent Design sub-dimension under Framework Typology is the dominant pattern (~82% Fully Implemented) while orchestration layers remain mostly partial. Across Core Tasks & Subtasks, information centric capabilities lead e.g., Medical Question Answering & Decision Support and Benchmarking & Simulation, while action and discovery oriented areas such as Treatment Planning & Prescription still show substantial gaps (~59% Not Implemented).
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
This paper directly addresses the application of AI, specifically LLMs, within a critical domain – healthcare – aligning with the user’s interest in AI for product management. The taxonomy provides a valuable framework for evaluating agentic AI systems, which is relevant to strategic vision setting.
Gratex International
AI Insights - Further research is needed to improve the quality and reliability of LLM-based systems, as well as to address issues such as hallucinations and data bias. (ML: 0.99)👍👎
- They also discuss the challenges associated with using LLMs in software engineering, including the need for high-quality training data and the potential for hallucinations. (ML: 0.97)👍👎
- Limited training data Potential for hallucinations Data bias (ML: 0.95)👍👎
- The use of LLMs in software engineering has shown promising results, but there are still challenges associated with their adoption. (ML: 0.94)👍👎
- The paper discusses the use of large language models (LLMs) in software engineering, specifically in the areas of test case generation and scenario-based GUI testing. (ML: 0.94)👍👎
- LLMs: Large Language Models are a type of artificial intelligence model that can process and generate human-like text. (ML: 0.94)👍👎
- Retrieval-augmented models: These models use LLMs to retrieve relevant information from a database or knowledge graph, which is then used to augment the LLM's output. (ML: 0.93)👍👎
- The authors propose a framework that combines LLMs with retrieval-augmented models to generate test cases from natural language requirements. (ML: 0.90)👍👎
Abstract
The introduction of large language models ignited great retooling and rethinking of the software development models. The ensuing response of software engineering research yielded a massive body of tools and approaches. In this paper, we join the hassle by introducing agentic AI solutions for two tasks. First, we developed a solution for automatic test scenario generation from a detailed requirements description. This approach relies on specialized worker agents forming a star topology with the supervisor agent in the middle. We demonstrate its capabilities on a real-world example. Second, we developed an agentic AI solution for the document retrieval task in the context of software engineering documents. Our solution enables performing various use cases on a body of documents related to the development of a single software, including search, question answering, tracking changes, and large document summarization. In this case, each use case is handled by a dedicated LLM-based agent, which performs all subtasks related to the corresponding use case. We conclude by hinting at the future perspectives of our line of research.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
Given the user’s focus on product roadmap and product strategy, this paper’s exploration of agentic AI tools for software engineering tasks is highly pertinent. The demonstration on document retrieval and test scenario generation offers practical insights into AI-driven development processes.
Maastricht University
AI Insights - The text also discusses the relationship between groundedness and maximization of complete and transitive preference relations. (ML: 0.97)👍👎
- Some of the key concepts explored include consistency, monotonicity, and weak axiom of revealed preference (WARP). (ML: 0.97)👍👎
- The results have implications for understanding rationalizability and groundedness in choice theory. (ML: 0.95)👍👎
- GAIC: Grounded Axiom of Revealed Preference. (ML: 0.92)👍👎
- A choice function c is said to satisfy GMAIC if it maximizes a complete and transitive preference relation over non-empty subsets of X. (ML: 0.91)👍👎
- Groundedness: A choice function c satisfies groundedness if for all x ∈ X, there exists a set S ⊆ X \{x such that I(S) = ∅. (ML: 0.89)👍👎
- GMAIC: Grounded Maximizing Axiom of Choice. (ML: 0.89)👍👎
- The provided text provides a comprehensive proof of various theorems and propositions related to choice theory. (ML: 0.89)👍👎
- The proofs cover topics such as injectivity, surjectivity, and double union closure of interpretation functions. (ML: 0.88)👍👎
- A choice function c is said to satisfy GAIC if it satisfies groundedness and the corresponding interpretation I satisfies consistency, monotonicity, and WARP. (ML: 0.88)👍👎
- The proofs demonstrate the relationship between different axioms and properties of choice functions. (ML: 0.86)👍👎
- The provided text appears to be a proof of various theorems and propositions related to choice theory, specifically in the context of rationalizability and groundedness. (ML: 0.86)👍👎
Abstract
This paper proposes a model of choice via agentic artificial intelligence (AI). A key feature is that the AI may misinterpret a menu before recommending what to choose. A single acyclicity condition guarantees that there is a monotonic interpretation and a strict preference relation that together rationalize the AI's recommendations. Since this preference is in general not unique, there is no safeguard against it misaligning with that of a decision maker. What enables the verification of such AI alignment is interpretations satisfying double monotonicity. Indeed, double monotonicity ensures full identifiability and internal consistency. But, an additional idempotence property is required to guarantee that recommendations are fully rational and remain grounded within the original feasible set.
Why we are recommending this paper?
Due to your Interest in AI for Product Management
This paper’s investigation into AI models making choices and potential misinterpretations is directly relevant to the user’s interest in product strategy and vision setting. Understanding how AI can influence decision-making is crucial for effective product management.
New Jersey Institute of Technology
AI Insights - The authors compared the similarity of classification probability vectors for models across the spectrum using the Jensen-Shannon divergence. (ML: 0.98)👍👎
- Jensen-Shannon divergence: A measure of similarity between two probability distributions. (ML: 0.96)👍👎
- The paper assumes that the IB module is applied independently per attention head and per token, which may not be the case in practice. (ML: 0.94)👍👎
- KL-divergence term: A regularization term used during training to encourage the model to produce outputs with lower entropy. (ML: 0.93)👍👎
- The paper demonstrates that controlling information flow in ViT can lead to improved performance and interpretability. (ML: 0.92)👍👎
- The IB module is applied independently per attention head and per token, and returns both the transformed messages and a KL-divergence term used for regularization during training. (ML: 0.90)👍👎
- The use of a KL-divergence term as a regularization term may not always lead to better results. (ML: 0.86)👍👎
- Variational Information Bottleneck (IB) module: A module that applies an information bottleneck to the attention mechanism, reducing the dimensionality of the input while preserving important information. (ML: 0.83)👍👎
- The paper focuses on controlling information flow in Vision Transformers (ViT) by introducing a variational information bottleneck (IB) module. (ML: 0.83)👍👎
- The use of a variational IB module allows for more efficient processing of visual data, reducing the need for large models or high computational resources. (ML: 0.82)👍👎
Abstract
We make the information transmitted by attention an explicit, measurable quantity in vision transformers. By inserting variational information bottlenecks on all attention-mediated writes to the residual stream -- without other architectural changes -- we train models with an explicit information cost and obtain a controllable spectrum from independent patch processing to fully expressive global attention. On ImageNet-100, we characterize how classification behavior and information routing evolve across this spectrum, and provide initial insights into how global visual representations emerge from local patch processing by analyzing the first attention heads that transmit information. By biasing learning toward solutions with constrained internal communication, our approach yields models that are more tractable for mechanistic analysis and more amenable to control.
Why we are recommending this paper?
Due to your Interest in Vision Setting for Tech Teams
The exploration of information flow control in vision transformers aligns with the user's interest in AI for product management, particularly concerning the underlying technologies driving innovation. Understanding how AI processes information is fundamental to strategic product decisions.
JigsawStack, Inc
AI Insights - Small language models may not be as effective as large language models in certain tasks. (ML: 0.99)👍👎
- The use of LLMs with tools may be limited by the availability of high-quality training data. (ML: 0.99)👍👎
- Researchers are exploring the potential of multimodal safety classification, which may have significant implications for industries that rely on text-based data. (ML: 0.98)👍👎
- Multimodal safety classification is a technique used to identify potential risks or hazards in text-based data. (ML: 0.98)👍👎
- The use of small language models is gaining traction as a valuable plug-in for large language models. (ML: 0.95)👍👎
- LLMs (Large Language Models) are artificial intelligence models that can process and generate human-like language. (ML: 0.94)👍👎
- The use of LLMs with tools and small language models is becoming increasingly prevalent in various applications. (ML: 0.94)👍👎
- There is a growing interest in multimodal safety classification, with the introduction of Llama Guard 4 by Meta AI. (ML: 0.89)👍👎
- LLMs with tools are becoming increasingly popular, and researchers are exploring their potential in various applications. (ML: 0.88)👍👎
- Agentic AI refers to artificial intelligence systems that can perform tasks autonomously, often requiring human oversight or intervention. (ML: 0.83)👍👎
Abstract
We present Interfaze, a system that treats modern LLM applications as a problem of building and acting over context, not just picking the right monolithic model. Instead of a single transformer, we combine (i) a stack of heterogeneous DNNs paired with small language models as perception modules for OCR involving complex PDFs, charts and diagrams, and multilingual ASR with (ii) a context-construction layer that crawls, indexes, and parses external sources (web pages, code, PDFs) into compact structured state, and (iii) an action layer that can browse, retrieve, execute code in a sandbox, and drive a headless browser for dynamic web pages. A thin controller sits on top of this stack and exposes a single, OpenAI-style endpoint: it decides which small models and actions to run and always forwards the distilled context to a user-selected LLM that produces the final response.
On this architecture, Interfaze-Beta achieves 83.6% on MMLU-Pro, 91.4% on MMLU, 81.3% on GPQA-Diamond, 57.8% on LiveCodeBench v5, and 90.0% on AIME-2025, along with strong multimodal scores on MMMU (val) (77.3%), AI2D (91.5%), ChartQA (90.9%), and Common Voice v16 (90.8%). We show that most queries are handled primarily by the small-model and tool stack, with the large LLM operating only on distilled context, yielding competitive accuracy while shifting the bulk of computation away from the most expensive and monolithic models.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
This paper’s focus on a modular approach to AI, utilizing small models for specific tasks, directly addresses the need for efficient and adaptable AI systems – a key consideration for product roadmaps. The approach supports the user’s interest in leveraging AI for product management.
National Institute of Advanced Industrial Science and Technology
AI Insights - However, the authors acknowledge some limitations of their approach, such as the need for high-quality data and the potential for bias in the LLMs. (ML: 0.99)👍👎
- The paper discusses the application of large language models (LLMs) in supply chain management, specifically in inventory management. (ML: 0.95)👍👎
- They suggest that future research should focus on addressing these limitations and exploring other applications of LLMs in supply chain management. (ML: 0.93)👍👎
- The paper also discusses the potential benefits of using LLMs in supply chain management, including improved decision-making and reduced costs. (ML: 0.92)👍👎
- They compare the performance of InvAgent with other existing methods and find that it outperforms them in terms of accuracy and efficiency. (ML: 0.90)👍👎
- Large Language Models (LLMs): Deep learning models that are trained on vast amounts of text data to generate human-like language. (ML: 0.89)👍👎
- Supply Chain Management: The management of the flow of goods, services, and information from raw materials to end customers. (ML: 0.85)👍👎
- Multi-Agent System: A system composed of multiple agents that interact with each other and their environment to achieve a common goal. (ML: 0.83)👍👎
- The authors propose a multi-agent system based on LLMs for inventory management, which they call InvAgent. (ML: 0.77)👍👎
- Inventory Management: The process of managing the flow of goods, materials, and supplies into and out of an organization. (ML: 0.73)👍👎
Abstract
This study investigates large language model (LLM) -based multi-agent systems (MASs) as a promising approach to inventory management, which is a key component of supply chain management. Although these systems have gained considerable attention for their potential to address the challenges associated with typical inventory management methods, key uncertainties regarding their effectiveness persist. Specifically, it is unclear whether LLM-based MASs can consistently derive optimal ordering policies and adapt to diverse supply chain scenarios. To address these questions, we examine an LLM-based MAS with a fixed-ordering strategy prompt that encodes the stepwise processes of the problem setting and a safe-stock strategy commonly used in inventory management. Our empirical results demonstrate that, even without detailed prompt adjustments, an LLM-based MAS can determine optimal ordering decisions in a restricted scenario. To enhance adaptability, we propose a novel agent called AIM-RM, which leverages similar historical experiences through similarity matching. Our results show that AIM-RM outperforms benchmark methods across various supply chain scenarios, highlighting its robustness and adaptability.
Why we are recommending this paper?
Due to your Interest in Product Management
University of Copenhagen
AI Insights - Scaffolding: Establishing conditions that enable a system or agent to make an adequate contribution. (ML: 0.97)👍👎
- Perceptual grounding: The process of establishing sensory correspondences between differentially perceived properties of an object. (ML: 0.95)👍👎
- Laura scaffolds GPT by establishing conditions that enable it to make an adequate contribution, but GPT neither responds to the ongoing course of Laura's actions nor provides her with additional resources to 'go on'. (ML: 0.95)👍👎
- Laura and GPT do not achieve co-operative actions in Goodwin's sense, where each participant's action responds and builds in real time on the other participant's hearable and/or embodied conduct. (ML: 0.90)👍👎
- Cooperative actions: Each participant's action responds and builds in real time on the other participant's hearable and/or embodied conduct. (ML: 0.88)👍👎
- GPT provides no resources that Laura can use to improve their mutual perception of the object being inspected. (ML: 0.85)👍👎
- GPT provides no resources that Laura can use to improve their mutual perception of the object being inspected. (ML: 0.85)👍👎
- The inspection sequence must each time be started over by Laura as she requests a new visual account from GPT. (ML: 0.84)👍👎
- The interaction between Laura and GPT is characterized by a one-sided distribution of work, where Laura systematically initiates modifications to GPT's perspective without any guidance from GPT. (ML: 0.81)👍👎
- The interaction between Laura and GPT is characterized by a one-sided distribution of work, where Laura systematically initiates modifications to GPT's perspective without any guidance from GPT. (ML: 0.81)👍👎
Abstract
Does human-AI assistance unfold in the same way as human-human assistance? This research explores what can be learned from the expertise of blind individuals and sighted volunteers to inform the design of multimodal voice agents and address the enduring challenge of proactivity. Drawing on granular analysis of two representative fragments from a larger corpus, we contrast the practices co-produced by an experienced human remote sighted assistant and a blind participant-as they collaborate to find a stain on a blanket over the phone-with those achieved when the same participant worked with a multimodal voice agent on the same task, a few moments earlier. This comparison enables us to specify precisely which fundamental proactive practices the agent did not enact in situ. We conclude that, so long as multimodal voice agents cannot produce environmentally occasioned vision-based actions, they will lack a key resource relied upon by human remote sighted assistants.
Why we are recommending this paper?
Due to your Interest in Vision Setting for Tech Teams
University of California, Riverside
AI Insights - A suitably broad understanding of intelligence is useful for recognizing the diverse and impressive capacities of different types of beings. (ML: 0.98)👍👎
- Tests of a single capacity in an AI system should not be used to generalize about the system's overall capacities. (ML: 0.97)👍👎
- Intelligence is not a single-dimensional concept, but rather a multidimensional one. (ML: 0.96)👍👎
- The theory of strange intelligence invites us to pause when considering the ethical significance of intelligence, as correlations between intelligence and other ethically significant properties may be absent. (ML: 0.96)👍👎
- multidimensional intelligence: Intelligence that cannot be reduced to a single dimension or concept. (ML: 0.94)👍👎
- strange intelligence: A type of intelligence that operates very differently from typical humans. (ML: 0.94)👍👎
- The linear model of progress in AI is rejected, and instead, a nonlinear approach to intelligence is proposed. (ML: 0.94)👍👎
- AIs can be highly intelligent in their own distinct way, even if they don't possess human-like intelligence. (ML: 0.94)👍👎
- AIs can be intelligent in their own distinct way, even if they don't possess human-like intelligence. (ML: 0.93)👍👎
- general intelligence: The ability to achieve a broad range of goals in a broad range of environments, which is a matter of degree and always relative to the range of goals and environments of interest. (ML: 0.93)👍👎
Abstract
We endorse and expand upon Susan Schneider's critique of the linear model of AI progress and introduce two novel concepts: "familiar intelligence" and "strange intelligence". AI intelligence is likely to be strange intelligence, defying familiar patterns of ability and inability, combining superhuman capacities in some domains with subhuman performance in other domains, and even within domains sometimes combining superhuman insight with surprising errors that few humans would make. We develop and defend a nonlinear model of intelligence on which "general intelligence" is not a unified capacity but instead the ability to achieve a broad range of goals in a broad range of environments, in a manner that defies nonarbitrary reduction to a single linear quantity. We conclude with implications for adversarial testing approaches to evaluating AI capacities. If AI is strange intelligence, we should expect that even the most capable systems will sometimes fail in seemingly obvious tasks. On a nonlinear model of AI intelligence, such errors on their own do not demonstrate a system's lack of outstanding general intelligence. Conversely, excellent performance on one type of task, such as an IQ test, cannot warrant assumptions of broad capacities beyond that task domain.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
Carnegie Mellon University
AI Insights - Social cost of intelligence: The negative consequences of intelligent behavior in multi-agent systems, including the emergence, propagation, and amplification of stereotypical bias. (ML: 0.95)👍👎
- The paper highlights the importance of considering the social cost of intelligence in multi-agent systems, including the emergence, propagation, and amplification of stereotypical bias. (ML: 0.93)👍👎
- The authors propose a framework for evaluating and improving the performance of multi-agent systems using a combination of metrics such as accuracy, efficiency, and fairness. (ML: 0.92)👍👎
- The authors emphasize the need for a deeper understanding of the social cost of intelligence in multi-agent systems and its impact on real-world applications. (ML: 0.92)👍👎
- The paper concludes by highlighting the importance of continued research into the development of more effective and fair multi-agent systems. (ML: 0.91)👍👎
- The paper suggests that future work should focus on developing more sophisticated evaluation metrics and methods for improving the performance of multi-agent systems. (ML: 0.89)👍👎
- The paper's focus on theoretical concepts may limit its practical applications. (ML: 0.87)👍👎
- Agent: An autonomous entity that can perceive its environment, make decisions, and take actions to achieve its goals. (ML: 0.86)👍👎
- Multi-agent system: A system consisting of multiple agents that interact with each other to achieve a common goal. (ML: 0.82)👍👎
- The paper discusses the concept of multi-agent systems, which involve multiple agents interacting with each other to achieve a common goal. (ML: 0.82)👍👎
Abstract
While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and error propagation. This paper proposes AgentArk, a novel framework to distill multi-agent dynamics into the weights of a single model, effectively transforming explicit test-time interactions into implicit model capabilities. This equips a single agent with the intelligence of multi-agent systems while remaining computationally efficient. Specifically, we investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios: reasoning-enhanced fine-tuning; trajectory-based augmentation; and process-aware distillation. By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents. They further demonstrate enhanced robustness and generalization across diverse reasoning tasks. We hope this work can shed light on future research on efficient and robust multi-agent development. Our code is at https://github.com/AIFrontierLab/AgentArk.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Luxembourg Institute of Science and Technology
AI Insights - Additionally, the generated neural network architectures may not always outperform state-of-the-art models in various tasks. (ML: 0.98)👍👎
- They also discuss the limitations and challenges associated with this approach. (ML: 0.98)👍👎
- This can help improve the performance of various machine learning tasks such as image classification, object detection, and natural language processing. (ML: 0.97)👍👎
- However, it relies heavily on the capabilities of LLMs, which may not be available to all researchers or practitioners. (ML: 0.97)👍👎
- The proposed method relies heavily on the capabilities of LLMs, which may not be available to all researchers or practitioners. (ML: 0.97)👍👎
- The paper proposes a method for generating neural network architectures using large language models (LLMs). (ML: 0.92)👍👎
- The authors cite several papers that demonstrate the effectiveness of using LLMs for generating neural network architectures. (ML: 0.92)👍👎
- The authors demonstrate the effectiveness of their approach by generating neural network architectures that outperform state-of-the-art models in several tasks. (ML: 0.92)👍👎
- The authors demonstrate the effectiveness of their approach by generating neural network architectures that outperform state-of-the-art models in several tasks. (ML: 0.92)👍👎
- The paper presents a novel approach to generating neural network architectures using large language models (LLMs). (ML: 0.91)👍👎
- The paper proposes a new way to generate neural network architectures using large language models (LLMs). (ML: 0.91)👍👎
- The authors propose a method that leverages the capabilities of LLMs to generate neural network architectures, which can be used for various tasks such as image classification, object detection, and natural language processing. (ML: 0.91)👍👎
- LLM: Large Language Model The proposed method for generating neural network architectures using LLMs is a promising approach that can be used to improve the performance of various machine learning tasks. (ML: 0.89)👍👎
- The proposed method is based on a combination of two techniques: instruction-guided autoregressive neural network parameter generation and tabular data generation using agentic LLM methods. (ML: 0.85)👍👎
Abstract
Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring, and migration. These tools play a crucial role in guaranteeing both the correctness and maintainability of neural network architectures, helping to prevent implementation errors, simplify model updates, and ensure that complex networks can be reliably extended and reused. Yet, assessing their effectiveness remains challenging due to the lack of publicly diverse datasets of neural networks that would allow systematic evaluation. To address this gap, we leverage large language models (LLMs) to automatically generate a dataset of neural networks that can serve as a benchmark for validation. The dataset is designed to cover diverse architectural components and to handle multiple input data types and tasks. In total, 608 samples are generated, each conforming to a set of precise design choices. To further ensure their consistency, we validate the correctness of the generated networks using static analysis and symbolic tracing. We make the dataset publicly available to support the community in advancing research on neural network reliability and adaptability.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Beijing University of Posts and Telecommunications
AI Insights - Scale-invariant components: Components in neural networks that are invariant to scaling, such as BatchNorm layers. (ML: 0.92)👍👎
- Architecture-aware updates/projections: Updates and projections that are aware of the architecture of the neural network, such as BatchNorm layers. (ML: 0.92)👍👎
- Curvature-adaptive radial step sizing: An adaptive learning rate scheme that adjusts the step size based on the curvature of the loss function. (ML: 0.89)👍👎
- Its performance is superior to other algorithms, including AdamW and AdamP, on both CIFAR-100 and modular-arithmetic Grokking tasks. (ML: 0.88)👍👎
- The paper evaluates AdamO on CIFAR-100 and modular-arithmetic Grokking tasks, showing that it outperforms other optimization algorithms, including AdamW and AdamP. (ML: 0.84)👍👎
- Orthogonal dynamics: A method for optimizing neural networks by decoupling the update rules for different dimensions. (ML: 0.81)👍👎
- AdamO is designed to handle scale-invariant components in neural networks, such as BatchNorm, by using projections to suppress ineffective updates. (ML: 0.81)👍👎
- AdamO is a robust and effective optimization algorithm for deep learning tasks, particularly those involving scale-invariant components. (ML: 0.71)👍👎
- AdamO's performance is robust across a wide range of hyperparameters, making it easier to tune and use in practice. (ML: 0.67)👍👎
- The paper proposes a new adaptive optimization algorithm called AdamO, which is fully decoupled orthogonal dynamics with curvature-adaptive radial step sizing and architecture-aware updates/projections. (ML: 0.61)👍👎
Abstract
Is the standard weight decay in AdamW truly optimal? Although AdamW decouples weight decay from adaptive gradient scaling, a fundamental conflict remains: the Radial Tug-of-War. In deep learning, gradients tend to increase parameter norms to expand effective capacity while steering directions to learn features, whereas weight decay indiscriminately suppresses norm growth. This push--pull interaction induces radial oscillations, injecting noise into Adam's second-moment estimates and potentially degrading delicate tangential feature learning. We argue that magnitude and direction play distinct roles and should be decoupled in optimizer dynamics. We propose Orthogonal Dynamics Decoupling and instantiate it as AdamO: an SGD-style update handles the one-dimensional norm control, while Adam's adaptive preconditioning is confined to the tangential subspace. AdamO further incorporates curvature-adaptive radial step sizing and architecture-aware rules and projections for scale-invariant layers and low-dimensional parameters. Experiments on vision and language tasks show that AdamO improves generalization and stability over AdamW without introducing additional complex constraints.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Aalto University
AI Insights - Ground-truth intents: The actual intentions or goals of a user that can be used as a reference for evaluating the performance of a system. (ML: 0.98)👍👎
- To avoid learning and ordering effects, we adopted a between-subjects design where each participant was assigned one condition and one task. (ML: 0.97)👍👎
- The user study was deployed on a web-based interface. (ML: 0.97)👍👎
- We conducted a power analysis using G*Power for a two-condition between-subjects design. (ML: 0.97)👍👎
- Each task was used equally often per condition. (ML: 0.95)👍👎
- Participants were given a task scenario from a set of eight design types: interior design, painting, photography, app icon design, poster design, logo design, fashion design, and architectural style design. (ML: 0.94)👍👎
- Between-subjects design: A research design in which each participant is assigned to only one group or condition, and the groups are compared to each other. (ML: 0.93)👍👎
- LLM tools: Large language model tools that use artificial intelligence to generate text based on input prompts. (ML: 0.92)👍👎
- The study evaluates APE using a prototyped user interface that reflects how the system would be deployed in practice, and compares it against manual prompt engineering, a baseline condition representing the current standard in which users iteratively refine textual prompts. (ML: 0.90)👍👎
- The required sample size was estimated at 128 participants. (ML: 0.90)👍👎
- Text-to-image generation: A technique in computer vision that generates images from text-based descriptions. (ML: 0.85)👍👎
- The server side integrated our APE algorithm, supporting real-time interaction with the text-to-image model. (ML: 0.78)👍👎
Abstract
Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
Peking University
AI Insights - It then localizes the visual regions associated with each extracted word. (ML: 0.94)👍👎
- The VLM first parses the editing instruction and extracts all nouns that correspond to the reference images. (ML: 0.92)👍👎
- The prompt used to predict the bounding boxes associated with each extracted word is presented in Figure 8. (ML: 0.90)👍👎
- Instruction-reference alignment Correspondence-aware masked attention VAE dropout probability The model effectively balances VLM features and VAE-based appearance information when trained with an appropriate VAE dropout probability. (ML: 0.88)👍👎
- The prompt used to extract reference-related words from the editing instruction is shown in Figure 7. (ML: 0.86)👍👎
- Qwen3-VL-30B-A3B-Instruct (QwenTeam, 2025) was used for Instruction-Reference Alignment. (ML: 0.83)👍👎
- The images at the bottom visualize these grounded regions on the reference images. (ML: 0.73)👍👎
Abstract
Multi-subject image generation aims to synthesize images that faithfully preserve the identities of multiple reference subjects while following textual instructions. However, existing methods often suffer from identity inconsistency and limited compositional control, as they rely on diffusion models to implicitly associate text prompts with reference images. In this work, we propose Hierarchical Concept-to-Appearance Guidance (CAG), a framework that provides explicit, structured supervision from high-level concepts to fine-grained appearances. At the conceptual level, we introduce a VAE dropout training strategy that randomly omits reference VAE features, encouraging the model to rely more on robust semantic signals from a Visual Language Model (VLM) and thereby promoting consistent concept-level generation in the absence of complete appearance cues. At the appearance level, we integrate the VLM-derived correspondences into a correspondence-aware masked attention module within the Diffusion Transformer (DiT). This module restricts each text token to attend only to its matched reference regions, ensuring precise attribute binding and reliable multi-subject composition. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the multi-subject image generation, substantially improving prompt following and subject consistency.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
We did not find tons of content matching your interests we've included some additional topics that are popular.
Also be aware that if the topics is not present in arxiv we wont be able to recommend it.
New York University
AI Insights - The authors emphasize that these challenges require a multidisciplinary approach involving computer science, medicine, and social sciences. (ML: 0.99)👍👎
- It proposes a framework for evaluating their performance, focusing on three main aspects: Adaptation & Learning, Safety & Ethics, and Human-AI Collaboration. (ML: 0.99)👍👎
- The paper discusses the challenges of using Large Language Models (LLMs) in healthcare settings. (ML: 0.98)👍👎
- The paper discusses the challenges of deploying Large Language Models (LLMs) in healthcare settings, including drift detection, reinforcement-based adaptation, meta-learning, few-shot competence, safety, and ethics. (ML: 0.97)👍👎
- Adaptation & Learning: This discipline involves keeping deployed agents calibrated to a moving world by detecting distributional and behavioral shifts, applying targeted mitigations before clinical performance erodes. (ML: 0.97)👍👎
- The authors propose a framework for evaluating LLMs in healthcare, focusing on three main aspects: Adaptation & Learning, Safety & Ethics, and Human-AI Collaboration. (ML: 0.97)👍👎
- Limited discussion on human-AI collaboration Recent studies document automatic prompt injection attacks [113] and organize defenses into clear taxonomies [114] The paper discusses the challenges of deploying Large Language Models (LLMs) in healthcare settings and proposes a framework for evaluating their performance. (ML: 0.95)👍👎
- The paper highlights the importance of addressing drift detection, reinforcement-based adaptation, meta-learning, few-shot competence, safety, and ethics in healthcare LLMs. (ML: 0.94)👍👎
Abstract
Large Language Model (LLM)-based agents that plan, use tools and act has begun to shape healthcare and medicine. Reported studies demonstrate competence on various tasks ranging from EHR analysis and differential diagnosis to treatment planning and research workflows. Yet the literature largely consists of overviews which are either broad surveys or narrow dives into a single capability (e.g., memory, planning, reasoning), leaving healthcare work without a common frame. We address this by reviewing 49 studies using a seven-dimensional taxonomy: Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology and Core Tasks & Subtasks with 29 operational sub-dimensions. Using explicit inclusion and exclusion criteria and a labeling rubric (Fully Implemented, Partially Implemented, Not Implemented), we map each study to the taxonomy and report quantitative summaries of capability prevalence and co-occurrence patterns. Our empirical analysis surfaces clear asymmetries. For instance, the External Knowledge Integration sub-dimension under Knowledge Management is commonly realized (~76% Fully Implemented) whereas Event-Triggered Activation sub-dimenison under Interaction Patterns is largely absent (~92% Not Implemented) and Drift Detection & Mitigation sub-dimension under Adaptation & Learning is rare (~98% Not Implemented). Architecturally, Multi-Agent Design sub-dimension under Framework Typology is the dominant pattern (~82% Fully Implemented) while orchestration layers remain mostly partial. Across Core Tasks & Subtasks, information centric capabilities lead e.g., Medical Question Answering & Decision Support and Benchmarking & Simulation, while action and discovery oriented areas such as Treatment Planning & Prescription still show substantial gaps (~59% Not Implemented).
Why we are recommending this paper?
Because ai agents is a popular topic and you have less than 3 interests with available recommendations
University of California, Riverside
AI Insights - A suitably broad understanding of intelligence is useful for recognizing the diverse and impressive capacities of different types of beings. (ML: 0.98)👍👎
- Tests of a single capacity in an AI system should not be used to generalize about the system's overall capacities. (ML: 0.97)👍👎
- Intelligence is not a single-dimensional concept, but rather a multidimensional one. (ML: 0.96)👍👎
- The theory of strange intelligence invites us to pause when considering the ethical significance of intelligence, as correlations between intelligence and other ethically significant properties may be absent. (ML: 0.96)👍👎
- multidimensional intelligence: Intelligence that cannot be reduced to a single dimension or concept. (ML: 0.94)👍👎
- strange intelligence: A type of intelligence that operates very differently from typical humans. (ML: 0.94)👍👎
- The linear model of progress in AI is rejected, and instead, a nonlinear approach to intelligence is proposed. (ML: 0.94)👍👎
- AIs can be highly intelligent in their own distinct way, even if they don't possess human-like intelligence. (ML: 0.94)👍👎
- AIs can be intelligent in their own distinct way, even if they don't possess human-like intelligence. (ML: 0.93)👍👎
- general intelligence: The ability to achieve a broad range of goals in a broad range of environments, which is a matter of degree and always relative to the range of goals and environments of interest. (ML: 0.93)👍👎
Abstract
We endorse and expand upon Susan Schneider's critique of the linear model of AI progress and introduce two novel concepts: "familiar intelligence" and "strange intelligence". AI intelligence is likely to be strange intelligence, defying familiar patterns of ability and inability, combining superhuman capacities in some domains with subhuman performance in other domains, and even within domains sometimes combining superhuman insight with surprising errors that few humans would make. We develop and defend a nonlinear model of intelligence on which "general intelligence" is not a unified capacity but instead the ability to achieve a broad range of goals in a broad range of environments, in a manner that defies nonarbitrary reduction to a single linear quantity. We conclude with implications for adversarial testing approaches to evaluating AI capacities. If AI is strange intelligence, we should expect that even the most capable systems will sometimes fail in seemingly obvious tasks. On a nonlinear model of AI intelligence, such errors on their own do not demonstrate a system's lack of outstanding general intelligence. Conversely, excellent performance on one type of task, such as an IQ test, cannot warrant assumptions of broad capacities beyond that task domain.
Why we are recommending this paper?
Because ai and society is a popular topic and you have less than 3 interests with available recommendations
Gratex International
AI Insights - Further research is needed to improve the quality and reliability of LLM-based systems, as well as to address issues such as hallucinations and data bias. (ML: 0.99)👍👎
- They also discuss the challenges associated with using LLMs in software engineering, including the need for high-quality training data and the potential for hallucinations. (ML: 0.97)👍👎
- Limited training data Potential for hallucinations Data bias (ML: 0.95)👍👎
- The use of LLMs in software engineering has shown promising results, but there are still challenges associated with their adoption. (ML: 0.94)👍👎
- The paper discusses the use of large language models (LLMs) in software engineering, specifically in the areas of test case generation and scenario-based GUI testing. (ML: 0.94)👍👎
- LLMs: Large Language Models are a type of artificial intelligence model that can process and generate human-like text. (ML: 0.94)👍👎
- Retrieval-augmented models: These models use LLMs to retrieve relevant information from a database or knowledge graph, which is then used to augment the LLM's output. (ML: 0.93)👍👎
- The authors propose a framework that combines LLMs with retrieval-augmented models to generate test cases from natural language requirements. (ML: 0.90)👍👎
Abstract
The introduction of large language models ignited great retooling and rethinking of the software development models. The ensuing response of software engineering research yielded a massive body of tools and approaches. In this paper, we join the hassle by introducing agentic AI solutions for two tasks. First, we developed a solution for automatic test scenario generation from a detailed requirements description. This approach relies on specialized worker agents forming a star topology with the supervisor agent in the middle. We demonstrate its capabilities on a real-world example. Second, we developed an agentic AI solution for the document retrieval task in the context of software engineering documents. Our solution enables performing various use cases on a body of documents related to the development of a single software, including search, question answering, tracking changes, and large document summarization. In this case, each use case is handled by a dedicated LLM-based agent, which performs all subtasks related to the corresponding use case. We conclude by hinting at the future perspectives of our line of research.
Why we are recommending this paper?
Because research automation with ai is a popular topic and you have less than 3 interests with available recommendations
JigsawStack, Inc
AI Insights - Small language models may not be as effective as large language models in certain tasks. (ML: 0.99)👍👎
- The use of LLMs with tools may be limited by the availability of high-quality training data. (ML: 0.99)👍👎
- Researchers are exploring the potential of multimodal safety classification, which may have significant implications for industries that rely on text-based data. (ML: 0.98)👍👎
- Multimodal safety classification is a technique used to identify potential risks or hazards in text-based data. (ML: 0.98)👍👎
- The use of small language models is gaining traction as a valuable plug-in for large language models. (ML: 0.95)👍👎
- LLMs (Large Language Models) are artificial intelligence models that can process and generate human-like language. (ML: 0.94)👍👎
- The use of LLMs with tools and small language models is becoming increasingly prevalent in various applications. (ML: 0.94)👍👎
- There is a growing interest in multimodal safety classification, with the introduction of Llama Guard 4 by Meta AI. (ML: 0.89)👍👎
- LLMs with tools are becoming increasingly popular, and researchers are exploring their potential in various applications. (ML: 0.88)👍👎
- Agentic AI refers to artificial intelligence systems that can perform tasks autonomously, often requiring human oversight or intervention. (ML: 0.83)👍👎
Abstract
We present Interfaze, a system that treats modern LLM applications as a problem of building and acting over context, not just picking the right monolithic model. Instead of a single transformer, we combine (i) a stack of heterogeneous DNNs paired with small language models as perception modules for OCR involving complex PDFs, charts and diagrams, and multilingual ASR with (ii) a context-construction layer that crawls, indexes, and parses external sources (web pages, code, PDFs) into compact structured state, and (iii) an action layer that can browse, retrieve, execute code in a sandbox, and drive a headless browser for dynamic web pages. A thin controller sits on top of this stack and exposes a single, OpenAI-style endpoint: it decides which small models and actions to run and always forwards the distilled context to a user-selected LLM that produces the final response.
On this architecture, Interfaze-Beta achieves 83.6% on MMLU-Pro, 91.4% on MMLU, 81.3% on GPQA-Diamond, 57.8% on LiveCodeBench v5, and 90.0% on AIME-2025, along with strong multimodal scores on MMMU (val) (77.3%), AI2D (91.5%), ChartQA (90.9%), and Common Voice v16 (90.8%). We show that most queries are handled primarily by the small-model and tool stack, with the large LLM operating only on distilled context, yielding competitive accuracy while shifting the bulk of computation away from the most expensive and monolithic models.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Carnegie Mellon University
AI Insights - Social cost of intelligence: The negative consequences of intelligent behavior in multi-agent systems, including the emergence, propagation, and amplification of stereotypical bias. (ML: 0.95)👍👎
- The paper highlights the importance of considering the social cost of intelligence in multi-agent systems, including the emergence, propagation, and amplification of stereotypical bias. (ML: 0.93)👍👎
- The authors propose a framework for evaluating and improving the performance of multi-agent systems using a combination of metrics such as accuracy, efficiency, and fairness. (ML: 0.92)👍👎
- The authors emphasize the need for a deeper understanding of the social cost of intelligence in multi-agent systems and its impact on real-world applications. (ML: 0.92)👍👎
- The paper concludes by highlighting the importance of continued research into the development of more effective and fair multi-agent systems. (ML: 0.91)👍👎
- The paper suggests that future work should focus on developing more sophisticated evaluation metrics and methods for improving the performance of multi-agent systems. (ML: 0.89)👍👎
- The paper's focus on theoretical concepts may limit its practical applications. (ML: 0.87)👍👎
- Agent: An autonomous entity that can perceive its environment, make decisions, and take actions to achieve its goals. (ML: 0.86)👍👎
- Multi-agent system: A system consisting of multiple agents that interact with each other to achieve a common goal. (ML: 0.82)👍👎
- The paper discusses the concept of multi-agent systems, which involve multiple agents interacting with each other to achieve a common goal. (ML: 0.82)👍👎
Abstract
While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and error propagation. This paper proposes AgentArk, a novel framework to distill multi-agent dynamics into the weights of a single model, effectively transforming explicit test-time interactions into implicit model capabilities. This equips a single agent with the intelligence of multi-agent systems while remaining computationally efficient. Specifically, we investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios: reasoning-enhanced fine-tuning; trajectory-based augmentation; and process-aware distillation. By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents. They further demonstrate enhanced robustness and generalization across diverse reasoning tasks. We hope this work can shed light on future research on efficient and robust multi-agent development. Our code is at https://github.com/AIFrontierLab/AgentArk.
Why we are recommending this paper?
Because agi: artificial general intelligence is a popular topic and you have less than 3 interests with available recommendations
Luxembourg Institute of Science and Technology
AI Insights - Additionally, the generated neural network architectures may not always outperform state-of-the-art models in various tasks. (ML: 0.98)👍👎
- They also discuss the limitations and challenges associated with this approach. (ML: 0.98)👍👎
- This can help improve the performance of various machine learning tasks such as image classification, object detection, and natural language processing. (ML: 0.97)👍👎
- However, it relies heavily on the capabilities of LLMs, which may not be available to all researchers or practitioners. (ML: 0.97)👍👎
- The proposed method relies heavily on the capabilities of LLMs, which may not be available to all researchers or practitioners. (ML: 0.97)👍👎
- The paper proposes a method for generating neural network architectures using large language models (LLMs). (ML: 0.92)👍👎
- The authors cite several papers that demonstrate the effectiveness of using LLMs for generating neural network architectures. (ML: 0.92)👍👎
- The authors demonstrate the effectiveness of their approach by generating neural network architectures that outperform state-of-the-art models in several tasks. (ML: 0.92)👍👎
- The authors demonstrate the effectiveness of their approach by generating neural network architectures that outperform state-of-the-art models in several tasks. (ML: 0.92)👍👎
- The paper presents a novel approach to generating neural network architectures using large language models (LLMs). (ML: 0.91)👍👎
- The paper proposes a new way to generate neural network architectures using large language models (LLMs). (ML: 0.91)👍👎
- The authors propose a method that leverages the capabilities of LLMs to generate neural network architectures, which can be used for various tasks such as image classification, object detection, and natural language processing. (ML: 0.91)👍👎
- LLM: Large Language Model The proposed method for generating neural network architectures using LLMs is a promising approach that can be used to improve the performance of various machine learning tasks. (ML: 0.89)👍👎
- The proposed method is based on a combination of two techniques: instruction-guided autoregressive neural network parameter generation and tabular data generation using agentic LLM methods. (ML: 0.85)👍👎
Abstract
Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring, and migration. These tools play a crucial role in guaranteeing both the correctness and maintainability of neural network architectures, helping to prevent implementation errors, simplify model updates, and ensure that complex networks can be reliably extended and reused. Yet, assessing their effectiveness remains challenging due to the lack of publicly diverse datasets of neural networks that would allow systematic evaluation. To address this gap, we leverage large language models (LLMs) to automatically generate a dataset of neural networks that can serve as a benchmark for validation. The dataset is designed to cover diverse architectural components and to handle multiple input data types and tasks. In total, 608 samples are generated, each conforming to a set of precise design choices. To further ensure their consistency, we validate the correctness of the generated networks using static analysis and symbolic tracing. We make the dataset publicly available to support the community in advancing research on neural network reliability and adaptability.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Beijing University of Posts and Telecommunications
AI Insights - Scale-invariant components: Components in neural networks that are invariant to scaling, such as BatchNorm layers. (ML: 0.92)👍👎
- Architecture-aware updates/projections: Updates and projections that are aware of the architecture of the neural network, such as BatchNorm layers. (ML: 0.92)👍👎
- Curvature-adaptive radial step sizing: An adaptive learning rate scheme that adjusts the step size based on the curvature of the loss function. (ML: 0.89)👍👎
- Its performance is superior to other algorithms, including AdamW and AdamP, on both CIFAR-100 and modular-arithmetic Grokking tasks. (ML: 0.88)👍👎
- The paper evaluates AdamO on CIFAR-100 and modular-arithmetic Grokking tasks, showing that it outperforms other optimization algorithms, including AdamW and AdamP. (ML: 0.84)👍👎
- Orthogonal dynamics: A method for optimizing neural networks by decoupling the update rules for different dimensions. (ML: 0.81)👍👎
- AdamO is designed to handle scale-invariant components in neural networks, such as BatchNorm, by using projections to suppress ineffective updates. (ML: 0.81)👍👎
- AdamO is a robust and effective optimization algorithm for deep learning tasks, particularly those involving scale-invariant components. (ML: 0.71)👍👎
- AdamO's performance is robust across a wide range of hyperparameters, making it easier to tune and use in practice. (ML: 0.67)👍👎
- The paper proposes a new adaptive optimization algorithm called AdamO, which is fully decoupled orthogonal dynamics with curvature-adaptive radial step sizing and architecture-aware updates/projections. (ML: 0.61)👍👎
Abstract
Is the standard weight decay in AdamW truly optimal? Although AdamW decouples weight decay from adaptive gradient scaling, a fundamental conflict remains: the Radial Tug-of-War. In deep learning, gradients tend to increase parameter norms to expand effective capacity while steering directions to learn features, whereas weight decay indiscriminately suppresses norm growth. This push--pull interaction induces radial oscillations, injecting noise into Adam's second-moment estimates and potentially degrading delicate tangential feature learning. We argue that magnitude and direction play distinct roles and should be decoupled in optimizer dynamics. We propose Orthogonal Dynamics Decoupling and instantiate it as AdamO: an SGD-style update handles the one-dimensional norm control, while Adam's adaptive preconditioning is confined to the tangential subspace. AdamO further incorporates curvature-adaptive radial step sizing and architecture-aware rules and projections for scale-invariant layers and low-dimensional parameters. Experiments on vision and language tasks show that AdamO improves generalization and stability over AdamW without introducing additional complex constraints.
Why we are recommending this paper?
Because deep learning is a popular topic and you have less than 3 interests with available recommendations
Aalto University
AI Insights - Ground-truth intents: The actual intentions or goals of a user that can be used as a reference for evaluating the performance of a system. (ML: 0.98)👍👎
- To avoid learning and ordering effects, we adopted a between-subjects design where each participant was assigned one condition and one task. (ML: 0.97)👍👎
- The user study was deployed on a web-based interface. (ML: 0.97)👍👎
- We conducted a power analysis using G*Power for a two-condition between-subjects design. (ML: 0.97)👍👎
- Each task was used equally often per condition. (ML: 0.95)👍👎
- Participants were given a task scenario from a set of eight design types: interior design, painting, photography, app icon design, poster design, logo design, fashion design, and architectural style design. (ML: 0.94)👍👎
- Between-subjects design: A research design in which each participant is assigned to only one group or condition, and the groups are compared to each other. (ML: 0.93)👍👎
- LLM tools: Large language model tools that use artificial intelligence to generate text based on input prompts. (ML: 0.92)👍👎
- The study evaluates APE using a prototyped user interface that reflects how the system would be deployed in practice, and compares it against manual prompt engineering, a baseline condition representing the current standard in which users iteratively refine textual prompts. (ML: 0.90)👍👎
- The required sample size was estimated at 128 participants. (ML: 0.90)👍👎
- Text-to-image generation: A technique in computer vision that generates images from text-based descriptions. (ML: 0.85)👍👎
- The server side integrated our APE algorithm, supporting real-time interaction with the text-to-image model. (ML: 0.78)👍👎
Abstract
Aligning text-to-image generation with user intent remains challenging, for users who provide ambiguous inputs and struggle with model idiosyncrasies. We propose Adaptive Prompt Elicitation (APE), a technique that adaptively asks visual queries to help users refine prompts without extensive writing. Our technical contribution is a formulation of interactive intent inference under an information-theoretic framework. APE represents latent intent as interpretable feature requirements using language model priors, adaptively generates visual queries, and compiles elicited requirements into effective prompts. Evaluation on IDEA-Bench and DesignBench shows that APE achieves stronger alignment with improved efficiency. A user study with challenging user-defined tasks demonstrates 19.8% higher alignment without workload overhead. Our work contributes a principled approach to prompting that, for general users, offers an effective and efficient complement to the prevailing prompt-based interaction paradigm with text-to-image models.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
Peking University
AI Insights - It then localizes the visual regions associated with each extracted word. (ML: 0.94)👍👎
- The VLM first parses the editing instruction and extracts all nouns that correspond to the reference images. (ML: 0.92)👍👎
- The prompt used to predict the bounding boxes associated with each extracted word is presented in Figure 8. (ML: 0.90)👍👎
- Instruction-reference alignment Correspondence-aware masked attention VAE dropout probability The model effectively balances VLM features and VAE-based appearance information when trained with an appropriate VAE dropout probability. (ML: 0.88)👍👎
- The prompt used to extract reference-related words from the editing instruction is shown in Figure 7. (ML: 0.86)👍👎
- Qwen3-VL-30B-A3B-Instruct (QwenTeam, 2025) was used for Instruction-Reference Alignment. (ML: 0.83)👍👎
- The images at the bottom visualize these grounded regions on the reference images. (ML: 0.73)👍👎
Abstract
Multi-subject image generation aims to synthesize images that faithfully preserve the identities of multiple reference subjects while following textual instructions. However, existing methods often suffer from identity inconsistency and limited compositional control, as they rely on diffusion models to implicitly associate text prompts with reference images. In this work, we propose Hierarchical Concept-to-Appearance Guidance (CAG), a framework that provides explicit, structured supervision from high-level concepts to fine-grained appearances. At the conceptual level, we introduce a VAE dropout training strategy that randomly omits reference VAE features, encouraging the model to rely more on robust semantic signals from a Visual Language Model (VLM) and thereby promoting consistent concept-level generation in the absence of complete appearance cues. At the appearance level, we integrate the VLM-derived correspondences into a correspondence-aware masked attention module within the Diffusion Transformer (DiT). This module restricts each text token to attend only to its matched reference regions, ensuring precise attribute binding and reliable multi-subject composition. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the multi-subject image generation, substantially improving prompt following and subject consistency.
Why we are recommending this paper?
Because image and video generation is a popular topic and you have less than 3 interests with available recommendations
Interests not found
We did not find any papers that match the below interests.
Try other terms also consider if the content exists in arxiv.org.
- Product Roadmap
- Product Strategy
💬 Help Shape Our Pricing
We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.
Share Your Feedback
Help us improve your experience!
This project is on its early stages your feedback can be pivotal on the future of the project.
Let us know what you think about this week's papers and suggestions!
Give Feedback