Hi!

Your personalized paper recommendations for 05 to 09 January, 2026.
University of Bologna
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
Cartoon-texture image decomposition is a critical preprocessing problem bottlenecked by the numerical intractability of classical variational or optimization models and the tedious manual tuning of global regularization parameters.We propose a Guided Variational Decomposition (GVD) model which introduces spatially adaptive quadratic norms whose pixel-wise weights are learned either through local probabilistic statistics or via a lightweight neural network within a bilevel framework.This leads to a unified, interpretable, and computationally efficient model that bridges classical variational ideas with modern adaptive and data-driven methodologies. Numerical experiments on this framework, which inherently includes automatic parameter selection, delivers GVD as a robust, self-tuning, and superior solution for reliable image decomposition.
Why we are recommending this paper?
Due to your Interest in Image Processing

This paper proposes a novel approach to image decomposition, addressing a common bottleneck in image processing workflows. The Guided Variational Decomposition model directly tackles issues related to image preprocessing, aligning with your interest in image processing techniques.
Technical University of Denmark
Rate paper: 👍 👎 ♥ Save
Abstract
This paper introduces a diffusion-based framework for universal image segmentation, making agnostic segmentation possible without depending on mask-based frameworks and instead predicting the full segmentation in a holistic manner. We present several key adaptations to diffusion models, which are important in this discrete setting. Notably, we show that a location-aware palette with our 2D gray code ordering improves performance. Adding a final tanh activation function is crucial for discrete data. On optimizing diffusion parameters, the sigmoid loss weighting consistently outperforms alternatives, regardless of the prediction type used, and we settle on x-prediction. While our current model does not yet surpass leading mask-based architectures, it narrows the performance gap and introduces unique capabilities, such as principled ambiguity modeling, that these models lack. All models were trained from scratch, and we believe that combining our proposed improvements with large-scale pretraining or promptable conditioning could lead to competitive models.
Why we are recommending this paper?
Due to your Interest in Image Processing

This work presents a diffusion-based framework for universal image segmentation, offering a potentially more flexible solution compared to traditional methods. The holistic segmentation approach aligns with your interest in multimodal models and fusion techniques.
Ume university
Rate paper: 👍 👎 ♥ Save
Abstract
The class of generalized gamma convolutions (GGC) is closed with respect to (wrt) change of scales, weak limits and addition and multiplication of independent random variables. Our main result adds the new property that GGC is also closed wrt q-th powers, q>1. The proof uses explicit formulas for the densities of finite sums of independent gamma variables, hyperbolically completely monotone functions (HCM) and the Laplace transform. The result is applied to sums and products of independent gamma variables and to symmetric extended GGC (symEGGC).
Why we are recommending this paper?
Due to your Interest in convolution

This paper delves into the mathematical properties of generalized gamma convolutions, a foundational topic in image processing. Understanding these properties is crucial for developing and optimizing convolutional models, directly relevant to your interests.
USTC
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
While Unified Multimodal Models (UMMs) have achieved remarkable success in cross-modal comprehension, a significant gap persists in their ability to leverage such internal knowledge for high-quality generation. We formalize this discrepancy as Conduction Aphasia, a phenomenon where models accurately interpret multimodal inputs but struggle to translate that understanding into faithful and controllable synthesis. To address this, we propose UniCorn, a simple yet elegant self-improvement framework that eliminates the need for external data or teacher supervision. By partitioning a single UMM into three collaborative roles: Proposer, Solver, and Judge, UniCorn generates high-quality interactions via self-play and employs cognitive pattern reconstruction to distill latent understanding into explicit generative signals. To validate the restoration of multimodal coherence, we introduce UniCycle, a cycle-consistency benchmark based on a Text to Image to Text reconstruction loop. Extensive experiments demonstrate that UniCorn achieves comprehensive and substantial improvements over the base model across six general image generation benchmarks. Notably, it achieves SOTA performance on TIIF(73.8), DPG(86.8), CompBench(88.5), and UniCycle while further delivering substantial gains of +5.0 on WISE and +6.5 on OneIG. These results highlight that our method significantly enhances T2I generation while maintaining robust comprehension, demonstrating the scalability of fully self-supervised refinement for unified multimodal intelligence.
Why we are recommending this paper?
Due to your Interest in multimodal models

This paper addresses the challenges of training unified multimodal models, a growing area of interest. The focus on self-improvement and knowledge leveraging is highly relevant to your exploration of multimodal models.
Harbin Institute of Technology
Rate paper: 👍 👎 ♥ Save
Abstract
Large Multimodal Models (LMMs) have demonstrated impressive capabilities in video reasoning via Chain-of-Thought (CoT). However, the robustness of their reasoning chains remains questionable. In this paper, we identify a critical failure mode termed textual inertia, where once a textual hallucination occurs in the thinking process, models tend to blindly adhere to the erroneous text while neglecting conflicting visual evidence. To systematically investigate this, we propose the LogicGraph Perturbation Protocol that structurally injects perturbations into the reasoning chains of diverse LMMs spanning both native reasoning architectures and prompt-driven paradigms to evaluate their self-reflection capabilities. The results reveal that models successfully self-correct in less than 10% of cases and predominantly succumb to blind textual error propagation. To mitigate this, we introduce Active Visual-Context Refinement, a training-free inference paradigm which orchestrates an active visual re-grounding mechanism to enforce fine-grained verification coupled with an adaptive context refinement strategy to summarize and denoise the reasoning history. Experiments demonstrate that our approach significantly stifles hallucination propagation and enhances reasoning robustness.
Why we are recommending this paper?
Due to your Interest in multimodal models

This research investigates the robustness of large multimodal models, a critical area for reliable performance. Understanding how models handle conflicting information is essential for building robust image recognition systems, aligning with your interests.
Dhaka University
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
Convolutional Neural Networks (CNNs) have demonstrated remarkable success in image classification tasks; however, the choice between designing a custom CNN from scratch and employing established pre-trained architectures remains an important practical consideration. In this work, we present a comparative analysis of a custom-designed CNN and several widely used deep learning architectures, including VGG-16, ResNet-50, and MobileNet, for an image classification task. The custom CNN is developed and trained from scratch, while the popular architectures are employed using transfer learning under identical experimental settings. All models are evaluated using standard performance metrics such as accuracy, precision, recall, and F1-score. Experimental results show that pre-trained CNN architectures consistently outperform the custom CNN in terms of classification accuracy and convergence speed, particularly when training data is limited. However, the custom CNN demonstrates competitive performance with significantly fewer parameters and reduced computational complexity. This study highlights the trade-offs between model complexity, performance, and computational efficiency, and provides practical insights into selecting appropriate CNN architectures for image classification problems.
Why we are recommending this paper?
Due to your Interest in Image Recognition
Ho Chi Minh City University of Technology
Rate paper: 👍 👎 ♥ Save
Abstract
Bahnar, a minority language spoken across Vietnam, Cambodia, and Laos, faces significant preservation challenges due to limited research and data availability. This study addresses the critical need for accurate digitization of Bahnar language documents through optical character recognition (OCR) technology. Digitizing scanned paper documents poses significant challenges, as degraded image quality from broken or blurred areas introduces considerable OCR errors that compromise information retrieval systems. We propose a comprehensive approach combining advanced table and non-table detection techniques with probability-based post-processing heuristics to enhance recognition accuracy. Our method first applies detection algorithms to improve input data quality, then employs probabilistic error correction on OCR output. Experimental results indicate a substantial improvement, with recognition accuracy increasing from 72.86% to 79.26%. This work contributes valuable resources for Bahnar language preservation and provides a framework applicable to other minority language digitization efforts.
Why we are recommending this paper?
Due to your Interest in Image Recognition
National Taiwan University
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
We introduce a novel fusion scheme enabled by laser-plasma solitons, which promises to overcome several fundamental obstructions to reaching the breakeven condition. For concreteness, we invoke deuterium-tritium (DT) as fuels. The intense electromagnetic field trapped inside the soliton significantly enhances the DT-fusion cross section, its ponderomotive potential evacuates electrons, and it accelerates D/T to kinetic energies suitable for fusion reaction. While electrons are expelled almost instantly, the much heavier D/T moves at picosecond time scale. Such a difference in time scales renders a time window for DT fusion to occur efficiently in an electron-free environment. We inject two consecutive lasers, where the first would excite plasma solitons and the second, much more intense and with a matched lower frequency, would fortify the soliton electromagnetic field resonantly. We impose a plasma density gradient to induce soliton motion. All D/T inside the plasma column swept by the moving soliton during its lifetime would participate in this fusion mechanism. We show that the breakeven condition is attainable. Invoking fiber laser and the iCAN laser technologies for high repetition rate and high intensity operation, gigawatt output maybe conceivable.
Why we are recommending this paper?
Due to your Interest in fusion models