Hi!

Your personalized paper recommendations for 01 to 05 December, 2025.
Knowledge Management
The Hong Kong Polytechnic
Rate paper: 👍 👎 ♥ Save
Abstract
In the history of knowledge distillation, the focus has once shifted over time from logit-based to feature-based approaches. However, this transition has been revisited with the advent of Decoupled Knowledge Distillation (DKD), which re-emphasizes the importance of logit knowledge through advanced decoupling and weighting strategies. While DKD marks a significant advancement, its underlying mechanisms merit deeper exploration. As a response, we rethink DKD from a predictive distribution perspective. First, we introduce an enhanced version, the Generalized Decoupled Knowledge Distillation (GDKD) loss, which offers a more versatile method for decoupling logits. Then we pay particular attention to the teacher model's predictive distribution and its impact on the gradients of GDKD loss, uncovering two critical insights often overlooked: (1) the partitioning by the top logit considerably improves the interrelationship of non-top logits, and (2) amplifying the focus on the distillation loss of non-top logits enhances the knowledge extraction among them. Utilizing these insights, we further propose a streamlined GDKD algorithm with an efficient partition strategy to handle the multimodality of teacher models' predictive distribution. Our comprehensive experiments conducted on a variety of benchmarks, including CIFAR-100, ImageNet, Tiny-ImageNet, CUB-200-2011, and Cityscapes, demonstrate GDKD's superior performance over both the original DKD and other leading knowledge distillation methods. The code is available at https://github.com/ZaberKo/GDKD.
AI Summary
  • GDKD (Generalized Knowledge Distillation) is a novel knowledge distillation method that effectively transfers knowledge from teacher models to student models by leveraging the relationships among logits. [3]
  • GDKD outperforms traditional knowledge distillation methods, such as KD and DKD, in various tasks and scenarios, including those with multimodal predictions or small logits. [3]
  • The lowKD-other component is an essential driver of performance enhancements in GDKD, as it effectively isolates small logits into a separate group and enhances their relational structure via recalibrated softmax predictions. [3]
  • GDKD maintains robustness with minimal performance degradation when using a softmax temperature of T=1, while other logit-based distillation methods face notable declines in performance. [3]
  • Logits: The output of a neural network before applying softmax, representing the probability distribution over classes. [3]
  • Softmax Temperature (T): A hyperparameter that controls the spread of the softmax distribution. [3]
  • Dark Knowledge: The suppression of knowledge due to the coupling of small and large logits in traditional KD methods. [3]
  • Dynamic scaling factors may not significantly enhance the distillation process, and fixed-weights approach is chosen for its simplicity and computational efficiency. [1]
Wuhan University
Rate paper: 👍 👎 ♥ Save
Abstract
Knowledge editing aims to update specific facts in large language models (LLMs) without full retraining. Prior efforts sought to tune the knowledge layers of LLMs, proving effective for making selective edits. However, a significant gap exists between their performance in controlled, teacher-forcing evaluations and their real-world effectiveness in lifelong learning scenarios, which greatly limits their practical applicability. This work's empirical analysis reveals two recurring issues associated with this gap: (1) Most traditional methods lead the edited model to overfit to the new fact, thereby degrading pre-trained capabilities; (2) There is a critical absence of a knowledge consolidation stage, leaving new facts insufficiently integrated into LLMs' inference-time behavior under autoregressive generation, thereby leading to a mismatch between parametric knowledge and actual generation behavior. To this end, we propose Edit-then-Consolidate, a novel knowledge editing paradigm that aims to bridge the gap between theoretical knowledge editing methods and their real-world applicability. Specifically, (1) our framework mitigates overfitting via Targeted Proximal Supervised Fine-Tuning (TPSFT) that localizes the edit via a trust-region objective to limit policy drift; (2) Then, a consolidation stage using Group Relative Policy Optimization (GRPO) aligns the edited knowledge with CoT-based inference policy by optimizing trajectory-level behavior under comprehensive reward signals. Extensive experiments demonstrate our framework consistently improves editing reliability and generalization under real-world evaluations, while better preserving locality and pre-trained capabilities.
Knowledge Graphs
University of New South
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
Environmental, Social, and Governance (ESG) disclosure frameworks such as SASB, TCFD, and IFRS S2 require organizations to compute and report numerous metrics for compliance, yet these requirements are embedded in long, unstructured PDF documents that are difficult to interpret, standardize, and audit. Manual extraction is unscalable, while unconstrained large language model (LLM) extraction often produces inconsistent entities, hallucinated relationships, missing provenance, and high validation failure rates. We present OntoMetric, an ontology-guided framework that transforms ESG regulatory documents into validated, AI- and web-ready knowledge graphs. OntoMetric operates through a three-stage pipeline: (1) structure-aware segmentation using table-of-contents boundaries, (2) ontology-constrained LLM extraction that embeds the ESGMKG schema into prompts while enriching entities with semantic fields for downstream reasoning, and (3) two-phase validation that combines LLM-based semantic verification with rule-based schema checking across entity, property, and relationship levels (VR001-VR006). The framework preserves both segment-level and page-level provenance for audit traceability. Evaluated on five ESG standards (SASB Commercial Banks, SASB Semiconductors, TCFD, IFRS S2, AASB S2) totaling 228 pages and 60 segments, OntoMetric achieves 65-90% semantic accuracy and 80-90% schema compliance, compared to 3-10% for baseline unconstrained extraction, at approximately 0.01 to 0.02 USD per validated entity. Our results demonstrate that combining symbolic ontology constraints with neural extraction enables reliable, auditable knowledge graphs suitable for regulatory compliance and web integration, supporting downstream applications such as sustainable-finance analytics, transparency portals, and automated compliance tools.
AI Summary
  • It ensures the extracted outputs are consistent with the ESGMKG ontology and maintain structural integrity. [3]
  • The text describes an ontology-guided framework for automated ESG knowledge graph construction called OntoMetric. [2]
  • ESG: Environmental, Social, and Governance ESGMKG: ESG Knowledge Graph Model LLM: Large Language Model OntoMetric: An Ontology-Guided Framework for Automated ESG Knowledge Graph Construction The framework provides a structured approach to extracting ESG knowledge graphs from regulatory text. [1]
Emory University
Rate paper: 👍 👎 ♥ Save
Abstract
Electronic health records (EHRs) support powerful clinical prediction models, but existing methods typically provide coarse, post hoc explanations that offer limited value for patient-level decision making. We introduce a knowledge graph (KG)-guided chain-of-thought (CoT) framework that generates clinically grounded and temporally consistent reasoning for visit-level disease prediction in MIMIC-III. ICD-9 codes are mapped to PrimeKG, from which disease-relevant nodes and multi-hop reasoning paths are extracted and used as scaffolds for CoT generation; only explanations whose conclusions match observed outcomes are retained. Lightweight LLaMA-3.1-Instruct-8B and Gemma-7B models are then fine-tuned on this supervision corpus. Across ten PrimeKG-mapped diseases and limited training cohorts (400 and 1000 cases), KG-guided models outperform strong classical baselines, achieving AUROC values of 0.66 to 0.70 and macro-AUPR values of 0.40 to 0.47. The models also transfer zero-shot to the CRADLE cohort, improving accuracy from approximately 0.40 to 0.51 up to 0.72 to 0.77. A blinded clinician evaluation shows consistent preference for KG-guided CoT explanations in clarity, relevance, and clinical correctness.
Product Categorization
Dartmouth
Rate paper: 👍 👎 ♥ Save
Abstract
The adequate use of information measured in a continuous manner along a period of time represents a methodological challenge. In the last decades, most of traditional statistical procedures have been extended for accommodating these functional data. The binary classification problem, which aims to correctly identify units as positive or negative based on marker values, is not aside of this scenario. The crucial point for making binary classifications based on a marker is to establish an order in the marker values, which is not immediate when these values are presented as functions. Here, we argue that if the marker is related to the characteristic under study, a trajectory from a positive participant should be more similar to trajectories from the positive population than to those drawn from the negative. With this criterion, a classification procedure based on the distance between the involved functions is proposed. Besides, we propose a fully non-parametric estimator for this so-called probability-based criterion, PBC. We explore its asymptotic properties, and its finite-sample behavior from an extensive Monte Carlo study. The observed results suggest that the proposed methodology works adequately, and frequently better than its competitors, for a wide variety of situations when the sample size in both the training and the testing cohorts is adequate. The practical use of the proposal is illustrated from real-world dataset. As online supplementary material, the manuscript includes a document with further simulations and additional comments. An R function which wraps up the implemented routines is also provided.
AI Summary
  • Functional data analysis is a statistical approach that deals with data that can be represented as functions. [3]
  • The receiver operating characteristic (ROC) curve is a graphical representation of the performance of a binary classifier. [3]
  • The area under the ROC curve (AUC) is a widely used metric for evaluating the performance of a binary classifier. [3]
  • A new method called the generalized ROC (gROC) curve has been proposed to handle non-monotone relationships between the true positive rate and false positive rate. [3]
  • The gROC curve can be estimated using a two-stage approach, where the first stage involves estimating the eCDFs and the second stage involves estimating the ROC curve. [3]
  • The results showed that the gROC curve estimator performed well in terms of accuracy and precision compared to other existing methods. [3]
  • The gROC curve can be used as a tool for evaluating the performance of binary classifiers, especially when there are non-monotone relationships between the true positive rate and false positive rate. [3]
  • A new R package called logitFD has been developed to implement functional principal component logit regression. [3]
  • The logitFD package provides functions for estimating the eCDFs and the ROC curve using a two-stage approach. [3]
  • A simulation study was conducted to evaluate the performance of the gROC curve estimator compared to other existing methods. [1]
Macquarie University
Rate paper: 👍 👎 ♥ Save
Abstract
Restriction categories provide a categorical framework for partiality. In this paper, we introduce three new categorical theories for partiality: local categories, partial categories, and inclusion categories. The objects of a local category are partially accessible resources, and morphisms are processes between these resources. In a partial category, partiality is addressed via two operators, restriction and contraction, which control the domain of definition of a morphism. Finally, an inclusion category is a category equipped with a family of monics which axiomatize the inclusions between sets. The main result of this paper shows that restriction categories are $2$-equivalent to local categories, that partial categories are $2$-equivalent to inclusion categories, and that both restriction/local categories are $2$-equivalent to bounded partial/inclusion categories. Our result offers four equivalent ways to describe partiality: on morphisms, via restriction categories; on objects, with local categories; operationally, with partial categories; and via inclusions, with inclusion categories. We also translate several key concepts from restriction category theory to the local category context, which allows us to show that various special kinds of restriction categories, such as inverse categories, are $2$-equivalent to their analogous kind of local categories. In particular, the equivalence between inverse (restriction) categories and inverse local categories is a generalization of the celebrated Ehresmann-Schein-Nambooripad theorem for inverse semigroups.
AI Summary
  • A local category is a category equipped with a notion of 'local' objects that satisfy certain properties. [3]
  • A Cartesian object is an object in a 2-category that satisfies certain universal properties. [3]
  • The problem statement is a mathematical text that requires the reader to understand and apply various definitions and concepts related to restriction categories, local categories, and Cartesian objects. [2]
MECE Mutually Exclusive, Collectively Exhaustive.
Perimeter Institute for
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
The task of conclusive exclusion for a set of quantum states is to find a measurement such that for each state in the set, there is an outcome that allows one to conclude with certainty that the state in question was not prepared. Defining classicality of statistics as realizability by a generalized-noncontextual ontological model, we show that there is a quantum-over-classical advantage for how well one can achieve conclusive exclusion. This is achieved in an experimental scenario motivated by the construction appearing in the Pusey-Barrett-Rudolph theorem. We derive noise-robust noncontextuality inequalities bounding the conclusiveness of exclusion, and describe a quantum violation of these. Finally, we show that this bound also constitutes a classical causal compatibility inequality within the bilocality scenario, and that its violation in quantum theory yields a novel possibilistic proof of a quantum-classical gap in that scenario.
AI Summary
  • The paper presents a conclusive exclusion task in quantum mechanics and shows that it cannot be explained by any noncontextual ontological model. [3]
  • Conclusive Exclusion Task: A quantum experiment where the goal is to determine whether two sources are correlated or not. [3]
  • Bilocality Scenario: A causal structure where two parties, Alice and Bob, are connected by a shared entangled state, but each has access to only one half of the state. [3]
  • The paper relies heavily on the assumption that noncontextual ontological models are sufficient to explain quantum phenomena. [3]
  • The bilocality scenario induces a prepare-measure scenario on the bipartite system AB via steering, allowing for a connection to be made with the conclusive exclusion task from the main text. [2]
  • The experiment admits of a noncontextual ontological model since all states and measurements are in the stabilizer subtheory of the qubit theory. [1]
Continual Generalized Category Discovery
TU Berlin
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
We frame novelty detection on path space as a hypothesis testing problem with signature-based test statistics. Using transportation-cost inequalities of Gasteratos and Jacquier (2023), we obtain tail bounds for false positive rates that extend beyond Gaussian measures to laws of RDE solutions with smooth bounded vector fields, yielding estimates of quantiles and p-values. Exploiting the shuffle product, we derive exact formulae for smooth surrogates of conditional value-at-risk (CVaR) in terms of expected signatures, leading to new one-class SVM algorithms optimising smooth CVaR objectives. We then establish lower bounds on type-$\mathrm{II}$ error for alternatives with finite first moment, giving general power bounds when the reference measure and the alternative are absolutely continuous with respect to each other. Finally, we evaluate numerically the type-$\mathrm{I}$ error and statistical power of signature-based test statistic, using synthetic anomalous diffusion data and real-world molecular biology data.
Ontology for Products
Duke University
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
High-throughput screening (HTS) is useful for evaluating chemicals for potential human health risks. However, given the extraordinarily large number of genes, assay endpoints, and chemicals of interest, available data are sparse, with dose-response curves missing for the vast majority of chemical-gene pairs. Although gene ontologies characterize similarity among genes with respect to known cellular functions and biological pathways, the sensitivity of various pathways to environmental contaminants remains unclear. We propose a novel Dose-Activity Response Tracking (DART) approach to predict the biological activity of chemicals across genes using information on chemical structural properties and gene ontologies within a Bayesian factor model. Designed to provide toxicologists with a flexible tool applicable across diverse HTS assay platforms, DART reveals the latent processes driving dose-response behavior and predicts new activity profiles for chemical-gene pairs lacking experimental data. We demonstrate the performance of DART through simulation studies and an application to a vast new multi-experiment data set consisting of dose-response observations generated by the exposure of HepG2 cells to per- and polyfluoroalkyl substances (PFAS), where it provides actionable guidance for chemical prioritization and inference on the structural and functional mechanisms underlying assay activation.
AI Summary
  • It extends existing approaches by handling highly sparse data and providing additional interpretive structure. [3]
  • The DART model's imputation and prioritization capacity has applied value, identifying additional chemicals likely to be active despite lacking direct experimental measurements. [3]
  • TCPL: Classical TCPL models that can be fit in smaller subsets where classical TCPL models could be fit. [3]
  • PFAS: Per- and polyfluoroalkyl substances. [3]
  • CASRN: Chemical Abstracts Service Registry Number. [3]
  • DART is a novel Bayesian framework for imputing missing dose-response curves from high-throughput screening (HTS) studies. [2]
  • The DART model performed competitively with classical TCPL models in a smaller subset, suggesting that it matches the strengths of existing approaches while extending to settings with far greater sparsity. [1]
Graphs for Products
Indian Institute of Tech
Rate paper: 👍 👎 ♥ Save
Abstract
For a positive integer $k$, the \emph{ total $k$-cut complex} of a graph $G$, denoted as $Δ_k^t(G)$, is the simplicial complex whose facets are $σ\subseteq V(G)$ such that $|σ| = |V(G)|-k$ and the induced subgraph $G[V(G) \setminus σ]$ does not contain any edge. These complexes were introduced by Bayer et al.\ in \cite{Bayer2024TotalCutcomplex} in connection with commutative algebra. In the same paper, they studied the homotopy types of these complexes for various families of graphs, including cycle graphs $C_n$, squared cycle graphs $C_n^2$, and Cartesian products of complete graphs and path graphs $K_m \square P_2$ and $K_2 \square P_n$. In this article, we extend the work of Bayer et al.\ for these families of graphs. We focus on the complexes $Δ_2^t(G)$ and determine the homotopy types of these complexes for three classes of graphs: (i) $p$-th powers of cycle graphs $C_n^p$ (ii) $K_m \square P_n$ and (iii) $K_m \square C_n$. Using discrete Morse theory, we show that these complexes are homotopy equivalent to wedges of spheres. We also give the number and dimension of spheres appearing in the homotopy type. Our result on powers of cycle graphs $C_n^p$ proves a conjecture of Shen et al.\ about the homotopy type of the complexes $Δ_2^t(C_n^p)$.
AI Summary
  • Their method involves constructing a simplicial complex associated with the graph and then applying various tools from algebraic topology to compute its homology groups. [3]
  • Homology group Simplicial complex Graph homomorphism The new approach developed by the authors provides a more efficient and effective way of calculating the homology groups of complexes of graph homomorphisms. [3]
  • Their method has far-reaching implications for various areas of mathematics, including algebraic topology, combinatorics, and computer science. [3]
  • Their approach may not be applicable to all types of graphs, particularly those with complex structures. [3]
  • The authors have developed a new approach to calculate the homology groups of these complexes, using techniques from algebraic topology and combinatorics. [2]
  • The problem of calculating the homology groups of complexes of graph homomorphisms has been studied extensively in recent years. [1]
Princeton University
Rate paper: 👍 👎 ♥ Save
Abstract
Maximal clique enumeration is a fundamental graph mining task, but its utility is often limited by computational intractability and highly redundant output. To address these challenges, we introduce \emph{$ρ$-dense aggregators}, a novel approach that succinctly captures maximal clique structure. Instead of listing all cliques, we identify a small collection of clusters with edge density at least $ρ$ that collectively contain every maximal clique. In contrast to maximal clique enumeration, we prove that for all $ρ< 1$, every graph admits a $ρ$-dense aggregator of \emph{sub-exponential} size, $n^{O(\log_{1/ρ}n)}$, and provide an algorithm achieving this bound. For graphs with bounded degeneracy, a typical characteristic of real-world networks, our algorithm runs in near-linear time and produces near-linear size aggregators. We also establish a matching lower bound on aggregator size, proving our results are essentially tight. In an empirical evaluation on real-world networks, we demonstrate significant practical benefits for the use of aggregators: our algorithm is consistently faster than the state-of-the-art clique enumeration algorithm, with median speedups over $6\times$ for $ρ=0.1$ (and over $300\times$ in an extreme case), while delivering a much more concise structural summary.

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • Taxonomy of Products
You can edit or add more interests any time.