Hi j34nc4rl0+deep_learning,

Here is our personalized paper recommendations for you sorted by most relevant
Large Language Models
Xiaoduo AI
Paper visualization
Abstract
The rapid evolution of specialized large language models (LLMs) has transitioned from simple domain adaptation to sophisticated native architectures, marking a paradigm shift in AI development. This survey systematically examines this progression across healthcare, finance, legal, and technical domains. Besides the wide use of specialized LLMs, technical breakthrough such as the emergence of domain-native designs beyond fine-tuning, growing emphasis on parameter efficiency through sparse computation and quantization, increasing integration of multimodal capabilities and so on are applied to recent LLM agent. Our analysis reveals how these innovations address fundamental limitations of general-purpose LLMs in professional applications, with specialized models consistently performance gains on domain-specific benchmarks. The survey further highlights the implications for E-Commerce field to fill gaps in the field.
Abstract
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a comprehensive, data-centric synthesis that reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate. We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge, emphasizing the multimodal, cross-scale, and domain-specific challenges that differentiate scientific corpora from general natural language processing datasets. We systematically review recent Sci-LLMs, from general-purpose foundations to specialized models across diverse scientific disciplines, alongside an extensive analysis of over 270 pre-/post-training datasets, showing why Sci-LLMs pose distinct demands -- heterogeneous, multi-scale, uncertainty-laden corpora that require representations preserving domain invariance and enabling cross-modal reasoning. On evaluation, we examine over 190 benchmark datasets and trace a shift from static exams toward process- and discovery-oriented assessments with advanced evaluation protocols. These data-centric analyses highlight persistent issues in scientific data development and discuss emerging solutions involving semi-automated annotation pipelines and expert validation. Finally, we outline a paradigm shift toward closed-loop systems where autonomous agents based on Sci-LLMs actively experiment, validate, and contribute to a living, evolving knowledge base. Collectively, this work provides a roadmap for building trustworthy, continually evolving artificial intelligence (AI) systems that function as a true partner in accelerating scientific discovery.
Deep Learning
Institute of Informatics, University of Warsaw
Paper visualization
Abstract
Deep learning has transformed computer vision (CV), achieving outstanding performance in classification, segmentation, and related tasks. Such AI-based CV systems are becoming prevalent, with applications spanning from medical imaging to surveillance. State of the art models such as convolutional neural networks (CNNs) and vision transformers (ViTs) are often regarded as ``black boxes,'' offering limited transparency into their decision-making processes. Despite a recent advancement in explainable AI (XAI), explainability remains underutilized in practical CV deployments. A primary obstacle is the absence of integrated software solutions that connect XAI techniques with robust knowledge management and monitoring frameworks. To close this gap, we have developed Obz AI, a comprehensive software ecosystem designed to facilitate state-of-the-art explainability and observability for vision AI systems. Obz AI provides a seamless integration pipeline, from a Python client library to a full-stack analytics dashboard. With Obz AI, a machine learning engineer can easily incorporate advanced XAI methodologies, extract and analyze features for outlier detection, and continuously monitor AI models in real time. By making the decision-making mechanisms of deep models interpretable, Obz AI promotes observability and responsible deployment of computer vision systems.
Monash Centre for Consciousness and Contemplative Studies
Abstract
I consider motivation and value-alignment in AI systems from the perspective of (constrained) entropy maximization. Though the structures encoding knowledge in any physical system can be understood as energetic constraints, only living agents harness entropy in the endogenous generation of actions. I argue that this exploitation of "mortal" or thermodynamic computation, in which cognitive and physical dynamics are inseparable, is of the essence of desire, motivation, and value, while the lack of true endogenous motivation in simulated "agents" predicts pathologies like reward hacking.
Deep Learning Architectures
Qualcomm AI Research
Abstract
As deep learning models are increasingly deployed on mobile devices, modern mobile devices incorporate deep learning-specific accelerators to handle the growing computational demands, thus increasing their hardware heterogeneity. However, existing works on scheduling deep learning workloads across these processors have significant limitations: most studies focus on single-model scenarios rather than realistic multi-model scenarios, overlook performance variations from different hardware/software configurations, and struggle with accurate execution time estimation. To address these challenges, we propose a novel genetic algorithm-based methodology for scheduling multiple deep learning networks on heterogeneous processors by partitioning the networks into multiple subgraphs. Our approach incorporates three different types of chromosomes for partition/mapping/priority exploration, and leverages device-in-the-loop profiling and evaluation for accurate execution time estimation. Based on this methodology, our system, Puzzle, demonstrates superior performance in extensive evaluations with randomly generated scenarios involving nine state-of-the-art networks. The results demonstrate Puzzle can support 3.7 and 2.2 times higher request frequency on average compared to the two heuristic baselines, NPU Only and Best Mapping, respectively, while satisfying the equivalent level of real-time requirements.
Multimodal Learning
Shu Shen
Abstract
Multimodal learning has significantly enhanced machine learning performance but still faces numerous challenges and limitations. Imbalanced multimodal learning is one of the problems extensively studied in recent works and is typically mitigated by modulating the learning of each modality. However, we find that these methods typically hinder the dominant modality's learning to promote weaker modalities, which affects overall multimodal performance. We analyze the cause of this issue and highlight a commonly overlooked problem: optimization bias within networks. To address this, we propose Adaptive Intra-Network Modulation (AIM) to improve balanced modality learning. AIM accounts for differences in optimization state across parameters and depths within the network during modulation, achieving balanced multimodal learning without hindering either dominant or weak modalities for the first time. Specifically, AIM decouples the dominant modality's under-optimized parameters into Auxiliary Blocks and encourages reliance on these performance-degraded blocks for joint training with weaker modalities. This approach effectively prevents suppression of weaker modalities while enabling targeted optimization of under-optimized parameters to improve the dominant modality. Additionally, AIM assesses modality imbalance level across network depths and adaptively adjusts modulation strength at each depth. Experimental results demonstrate that AIM outperforms state-of-the-art imbalanced modality learning methods across multiple benchmarks and exhibits strong generalizability across different backbones, fusion strategies, and optimizers.
InternVL Team
Abstract
We introduce InternVL 3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. A key innovation is the Cascade Reinforcement Learning (Cascade RL) framework, which enhances reasoning through a two-stage process: offline RL for stable convergence and online RL for refined alignment. This coarse-to-fine training strategy leads to substantial improvements on downstream reasoning tasks, e.g., MMMU and MathVista. To optimize efficiency, we propose a Visual Resolution Router (ViR) that dynamically adjusts the resolution of visual tokens without compromising performance. Coupled with ViR, our Decoupled Vision-Language Deployment (DvD) strategy separates the vision encoder and language model across different GPUs, effectively balancing computational load. These contributions collectively enable InternVL3.5 to achieve up to a +16.0\% gain in overall reasoning performance and a 4.05$\times$ inference speedup compared to its predecessor, i.e., InternVL3. In addition, InternVL3.5 supports novel capabilities such as GUI interaction and embodied agency. Notably, our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks -- narrowing the performance gap with leading commercial models like GPT-5. All models and code are publicly released.
Diffusion Models
MSU-BIT-SMBU Joint Research Center of Applied Mathematics
Abstract
We propose threshold diffusion processes as unique solutions to stochastic differential equations with step-function coefficients, and obtain explicit expressions for the conditional Laplace transform of the hitting times and the potential measures. Applying these results, we further discuss their asymptotic behaviors such as the stationary distributions and the escape probabilities.
TTIC
Abstract
When do diffusion models reproduce their training data, and when are they able to generate samples beyond it? A practically relevant theoretical understanding of this interplay between memorization and generalization may significantly impact real-world deployments of diffusion models with respect to issues such as copyright infringement and data privacy. In this work, to disentangle the different factors that influence memorization and generalization in practical diffusion models, we introduce a scientific and mathematical "laboratory" for investigating these phenomena in diffusion models trained on fully synthetic or natural image-like structured data. Within this setting, we hypothesize that the memorization or generalization behavior of an underparameterized trained model is determined by the difference in training loss between an associated memorizing model and a generalizing model. To probe this hypothesis, we theoretically characterize a crossover point wherein the weighted training loss of a fully generalizing model becomes greater than that of an underparameterized memorizing model at a critical value of model (under)parameterization. We then demonstrate via carefully-designed experiments that the location of this crossover predicts a phase transition in diffusion models trained via gradient descent, validating our hypothesis. Ultimately, our theory enables us to analytically predict the model size at which memorization becomes predominant. Our work provides an analytically tractable and practically meaningful setting for future theoretical and empirical investigations. Code for our experiments is available at https://github.com/DruvPai/diffusion_mem_gen.
Mixture of Experts
School of Software, Shandong University
Abstract
Current inference systems for Mixture-of-Experts (MoE) models primarily employ static parallelization strategies. However, these static approaches cannot consistently achieve optimal performance across different inference scenarios, as they lack the flexibility to adapt to varying computational requirements. In this work, we propose HAP (Hybrid Adaptive Parallelism), a novel method that dynamically selects hybrid parallel strategies to enhance MoE inference efficiency. The fundamental innovation of HAP lies in hierarchically decomposing MoE architectures into two distinct computational modules: the Attention module and the Expert module, each augmented with a specialized inference latency simulation model. This decomposition promotes the construction of a comprehensive search space for seeking model parallel strategies. By leveraging Integer Linear Programming (ILP), HAP could solve the optimal hybrid parallel configurations to maximize inference efficiency under varying computational constraints. Our experiments demonstrate that HAP consistently determines parallel configurations that achieve comparable or superior performance to the TP strategy prevalent in mainstream inference systems. Compared to the TP-based inference, HAP-based inference achieves speedups of 1.68x, 1.77x, and 1.57x on A100, A6000, and V100 GPU platforms, respectively. Furthermore, HAP showcases remarkable generalization capability, maintaining performance effectiveness across diverse MoE model configurations, including Mixtral and Qwen series models.
State Key Laboratory of General Artificial Intellligence, BIGAI
Abstract
Context faithfulness is essential for reliable reasoning in context-dependent scenarios. However, large language models often struggle to ground their outputs in the provided context, resulting in irrelevant responses. Inspired by the emergent expert specialization observed in mixture-of-experts architectures, this work investigates whether certain experts exhibit specialization in context utilization, offering a potential pathway toward targeted optimization for improved context faithfulness. To explore this, we propose Router Lens, a method that accurately identifies context-faithful experts. Our analysis reveals that these experts progressively amplify attention to relevant contextual information, thereby enhancing context grounding. Building on this insight, we introduce Context-faithful Expert Fine-Tuning (CEFT), a lightweight optimization approach that selectively fine-tunes context-faithful experts. Experiments across a wide range of benchmarks and models demonstrate that CEFT matches or surpasses the performance of full fine-tuning while being significantly more efficient.

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • Deep Learning Models
  • Deep Learning Optimization
You can edit or add more interests any time.

Unsubscribe from these updates