Hi!

Your personalized paper recommendations for 15 to 19 December, 2025.
OSF HealthCare
Paper visualization
Rate image: ๐Ÿ‘ ๐Ÿ‘Ž
AI Insights
  • The platform uses various techniques, including retrieval-augmented generation and graph embedding, to create knowledge graphs. [3]
  • Darth Vecdor has been tested on several datasets, including the Metathesaurus dataset, which contains information about medical concepts and relationships. [3]
  • Retrieval-augmented generation: a technique that uses a large language model to retrieve relevant information from a knowledge graph and then generate text based on that information. [3]
  • The platform has been tested on several datasets and has shown promising results. [3]
  • Graph embedding: a technique for representing nodes in a graph as vectors in a high-dimensional space, allowing for efficient computation of similarity between nodes. [2]
  • Darth Vecdor is a platform for creating knowledge graphs from large language models (LLMs). [1]
Abstract
Many large language models (LLMs) are trained on a massive body of knowledge present on the Internet. Darth Vecdor (DV) was designed to extract this knowledge into a structured, terminology-mapped, SQL database ("knowledge base" or "knowledge graph"). Knowledge graphs may be useful in many domains, including healthcare. Although one might query an LLM directly rather than a SQL-based knowledge graph, concerns such as cost, speed, safety, and confidence may arise, especially in high-volume operations. These may be mitigated when the information is pre-extracted from the LLM and becomes query-able through a standard database. However, the author found the need to address several issues. These included erroneous, off-topic, free-text, overly general, and inconsistent LLM responses, as well as allowing for multi-element responses. DV was built with features intended to mitigate these issues. To facilitate ease of use, and to allow for prompt engineering by those with domain expertise but little technical background, DV provides a simple, browser-based graphical user interface. DV has been released as free, open-source, extensible software, on an "as is" basis, without warranties or conditions of any kind, either express or implied. Users need to be cognizant of the potential risks and benefits of using DV and its outputs, and users are responsible for ensuring any use is safe and effective. DV should be assumed to have bugs, potentially very serious ones. However, the author hopes that appropriate use of current and future versions of DV and its outputs can help improve healthcare.
Why we are recommending this paper?
Due to your Interest in: Knowledge Management

This paper directly addresses knowledge graph construction using LLMs, aligning with the user's interest in Knowledge Graphs and their application in extracting information from large datasets. The use of 'Darth Vecdor' suggests a practical system for knowledge representation, a key area of interest.
Anhalt University of Appl
Paper visualization
Rate image: ๐Ÿ‘ ๐Ÿ‘Ž
Abstract
Many complex real-world systems exhibit inherently intertwined temporal and spatial characteristics. Spatio-temporal knowledge graphs (STKGs) have therefore emerged as a powerful representation paradigm, as they integrate entities, relationships, time and space within a unified graph structure. They are increasingly applied across diverse domains, including environmental systems and urban, transportation, social and human mobility networks. However, modeling STKGs remains challenging: their foundations span classical graph theory as well as temporal and spatial graph models, which have evolved independently across different research communities and follow heterogeneous modeling assumptions and terminologies. As a result, existing approaches often lack conceptual alignment, generalizability and reusability. This survey provides a systematic review of spatio-temporal knowledge graph models, tracing their origins in static, temporal and spatial graph modeling. We analyze existing approaches along key modeling dimensions, including edge semantics, temporal and spatial annotation strategies, temporal and spatial semantics and relate these choices to their respective application domains. Our analysis reveals that unified modeling frameworks are largely absent and that most current models are tailored to specific use cases rather than designed for reuse or long-term knowledge preservation. Based on these findings, we derive modeling guidelines and identify open challenges to guide future research.
Why we are recommending this paper?
Due to your Interest in: Knowledge Graphs

Given the user's focus on Knowledge Graphs and their integration with spatial and temporal data, this survey offers a valuable overview of relevant models. The exploration of STKGs is highly relevant to the user's interest in representing complex, time-dependent systems.
Tsinghua University
Paper visualization
Rate image: ๐Ÿ‘ ๐Ÿ‘Ž
AI Insights
  • The method looks for the worst-case failure case by imposing small perturbations to the model and then applies a worst-case loss sharpness penalty (LSP) to the model. [3]
  • By penalizing loss sharpness, the model can generate a flatter loss surface and effectively suppress overfitting to noisy samples. [3]
  • GCD: Generalized Clustering Discriminative LSP: Loss Sharpness Penalty DAS: Dynamic Anchor Selection The proposed method can effectively alleviate the overfitting of parameterized GCD methods to noisy samples and improve clustering accuracy. [3]
  • The paper also proposes a novel dynamic anchor selection strategy called DAS, which selects anchors based on their purity and adaptively updates them during training. [2]
  • The paper proposes a novel method to alleviate the overfitting of parameterized GCD methods to noisy samples. [1]
Abstract
Generalized category discovery (GCD) is an important and challenging task in open-world learning. Specifically, given some labeled data of known classes, GCD aims to cluster unlabeled data that contain both known and unknown classes. Current GCD methods based on parametric classification adopt the DINO-like pseudo-labeling strategy, where the sharpened probability output of one view is used as supervision information for the other view. However, large pre-trained models have a preference for some specific visual patterns, resulting in encoding spurious correlation for unlabeled data and generating noisy pseudo-labels. To address this issue, we propose a novel method, which contains two modules: Loss Sharpness Penalty (LSP) and Dynamic Anchor Selection (DAS). LSP enhances the robustness of model parameters to small perturbations by minimizing the worst-case loss sharpness of the model, which suppressing the encoding of trivial features, thereby reducing overfitting of noise samples and improving the quality of pseudo-labels. Meanwhile, DAS selects representative samples for the unknown classes based on KNN density and class probability during the model training and assigns hard pseudo-labels to them, which not only alleviates the confidence difference between known and unknown classes but also enables the model to quickly learn more accurate feature distribution for the unknown classes, thus further improving the clustering accuracy. Extensive experiments demonstrate that the proposed method can effectively mitigate the noise of pseudo-labels, and achieve state-of-the-art results on multiple GCD benchmarks.
Why we are recommending this paper?
Due to your Interest in: Continual Generalized Category Discovery

This paper tackles Generalized Category Discovery, a technique closely related to Continual Generalized Category Discovery, a core interest for the user. The focus on dynamic category creation and unlabeled data clustering is a strong match.
University of Michigan
Paper visualization
Rate image: ๐Ÿ‘ ๐Ÿ‘Ž
AI Insights
  • Generalized category discovery is a subfield of computer vision that focuses on discovering new categories or classes in images without prior knowledge of their existence. [3]
  • The field has seen significant advancements in recent years, with the introduction of new architectures and techniques such as CLIP-GCD (Contrastive Learning for Generalized Category Discovery) and GET (Generalized Category Discovery using Enhanced Transfer). [3]
  • Researchers have also explored the use of multimodal data, including text and audio, to improve category discovery performance. [3]
  • Generalized Category Discovery (GCD): The task of discovering new categories or classes in images without prior knowledge of their existence. [3]
  • Contrastive Learning for Generalized Category Discovery (CLIP-GCD): A technique used for GCD that involves training a model to distinguish between positive and negative pairs of images. [3]
  • The field of generalized category discovery is rapidly evolving, with new techniques and architectures being introduced regularly. [3]
  • Multimodal data has shown promise in improving category discovery performance, but there are still many open challenges to be addressed. [3]
  • The field is still in its early stages, and there are many open challenges and research questions that need to be addressed. [2]
  • Further research is needed to fully understand the capabilities and limitations of GCD models. [1]
Abstract
Adaptive categorization of visual scenes is essential for AI agents to handle changing tasks. Unlike fixed common categories for plants or animals, ad-hoc categories are created dynamically to serve specific goals. We study open ad-hoc categorization: Given a few labeled exemplars and abundant unlabeled data, the goal is to discover the underlying context and to expand ad-hoc categories through semantic extension and visual clustering around it. Building on the insight that ad-hoc and common categories rely on similar perceptual mechanisms, we propose OAK, a simple model that introduces a small set of learnable context tokens at the input of a frozen CLIP and optimizes with both CLIP's image-text alignment objective and GCD's visual clustering objective. On Stanford and Clevr-4 datasets, OAK achieves state-of-the-art in accuracy and concept discovery across multiple categorizations, including 87.4% novel accuracy on Stanford Mood, surpassing CLIP and GCD by over 50%. Moreover, OAK produces interpretable saliency maps, focusing on hands for Action, faces for Mood, and backgrounds for Location, promoting transparency and trust while enabling adaptive and generalizable categorization.
Why we are recommending this paper?
Due to your Interest in: Product Categorization

The paper's exploration of open ad-hoc categorization aligns well with the user's interest in dynamic categorization and creating categories based on specific contexts. This approach directly addresses the challenge of adapting to changing requirements, a key element of the user's interests.
Harbin Institute of Techn
Paper visualization
Rate image: ๐Ÿ‘ ๐Ÿ‘Ž
AI Insights
  • Delta-L, a low-complexity variant of Delta-P, yields a solution remarkably close to the optimal frontier, demonstrating its near-optimal performance. [2]
  • The proposed algorithm, Delta-P, achieves near-optimal performance by precisely aligning updates with segment boundaries and fast-decay periods. [1]
Abstract
The Channel Knowledge Map (CKM) maps position information to channel state information, leveraging environmental knowledge to reduce signaling overhead in sixth-generation networks. However, constructing a reliable CKM demands substantial data and computation, and in dynamic environments, a pre-built CKM becomes outdated, degrading performance. Frequent retraining restores accuracy but incurs significant waste, creating a fundamental trade-off between CKM efficacy and update overhead. To address this, we introduce a Map Efficacy Function (MEF) capturing both gradual aging and abrupt environmental transitions, and formulate the update scheduling problem as fractional programming. We develop two Dinkelbach-based algorithms: Delta-P guarantees global optimality, while Delta-L achieves near-optimal performance with near-linear complexity. For unpredictable environments, we derive a threshold-based policy: immediate updates are optimal when the environmental degradation rate exceeds the resource consumption acceleration; otherwise, delay is preferable. For predictable environments, long-term strategies strategically relax these myopic rules to maximize global performance. Across this regime, the policy reveals that stronger entry loss and faster decay favor immediate updates, while weaker entry loss and slower decay favor delayed updates.
Why we are recommending this paper?
Due to your Interest in: Knowledge Management

This paper's focus on knowledge maps within complex network environments aligns with the user's interest in Knowledge Management and Knowledge Graphs. The application to sixth-generation networks demonstrates a practical use case for graph-based knowledge representation.
Mayo Clinic
Abstract
Continual learning remains a fundamental challenge in machine learning, requiring models to learn from a stream of tasks without forgetting previously acquired knowledge. A major obstacle in this setting is catastrophic forgetting, where performance on earlier tasks degrades as new tasks are learned. In this paper, we introduce PPSEBM, a novel framework that integrates an Energy-Based Model (EBM) with Progressive Parameter Selection (PPS) to effectively address catastrophic forgetting in continual learning for natural language processing tasks. In PPSEBM, progressive parameter selection allocates distinct, task-specific parameters for each new task, while the EBM generates representative pseudo-samples from prior tasks. These generated samples actively inform and guide the parameter selection process, enhancing the model's ability to retain past knowledge while adapting to new tasks. Experimental results on diverse NLP benchmarks demonstrate that PPSEBM outperforms state-of-the-art continual learning methods, offering a promising and robust solution to mitigate catastrophic forgetting.
Why we are recommending this paper?
Due to your Interest in: Continual Generalized Category Discovery
University of the Aegean
Abstract
This paper explores the integration of Large Language Models (LLMs) in the engineering of a Parkinson's Disease (PD) monitoring and alerting ontology through four key methodologies: One Shot (OS) prompt techniques, Chain of Thought (CoT) prompts, X-HCOME, and SimX-HCOME+. The primary objective is to determine whether LLMs alone can create comprehensive ontologies and, if not, whether human-LLM collaboration can achieve this goal. Consequently, the paper assesses the effectiveness of LLMs in automated ontology development and the enhancement achieved through human-LLM collaboration. Initial ontology generation was performed using One Shot (OS) and Chain of Thought (CoT) prompts, demonstrating the capability of LLMs to autonomously construct ontologies for PD monitoring and alerting. However, these outputs were not comprehensive and required substantial human refinement to enhance their completeness and accuracy. X-HCOME, a hybrid ontology engineering approach that combines human expertise with LLM capabilities, showed significant improvements in ontology comprehensiveness. This methodology resulted in ontologies that are very similar to those constructed by experts. Further experimentation with SimX-HCOME+, another hybrid methodology emphasizing continuous human supervision and iterative refinement, highlighted the importance of ongoing human involvement. This approach led to the creation of more comprehensive and accurate ontologies. Overall, the paper underscores the potential of human-LLM collaboration in advancing ontology engineering, particularly in complex domains like PD. The results suggest promising directions for future research, including the development of specialized GPT models for ontology construction.
AI Insights
  • The authors propose three collaborative OEMs: X-HCOME, Expert Review of X-HCOME, and Sim-X-HCOME+. [3]
  • Parkinson's disease is a neurodegenerative disorder characterized by tremors, rigidity, bradykinesia, and postural instability. [3]
  • Monitoring and alerting systems for Parkinson's disease aim to detect early signs of the disease or its progression. [3]
  • The paper discusses the use of Large Language Models (LLMs) in Ontology Engineering for Parkinson's disease monitoring and alerting. [2]
  • The study highlights the potential of LLM-enhanced OE in healthcare contexts, but also notes the need for refinement before these approaches can be considered mature OEMs. [1]
Why we are recommending this paper?
Due to your Interest in: Ontology for Products
Free University of Bozen
Abstract
Object-centric process mining is a new branch of process mining where events are associated with multiple objects, and where object-to-object interactions are essential to understand the process dynamics. Traditional event data models, also called case-centric, are unable to cope with the complexity introduced by these more refined relationships. Several models have been made to move from case-centric to Object-Centric Event Data (OCED), trying to retain simplicity as much as possible. Still, these suffer from inherent ambiguities, and lack a comprehensive support of essential dimensions related to time and (dynamic) relations. In this work, we propose to fill this gap by leveraging a well-founded ontology of events and bringing ontological foundations to OCED, with a three-step approach. First, we start from key open issues reported in the literature regarding current OCED metamodels, and witness their ambiguity and expressiveness limitations on illustrative and representative examples proposed therein. Second, we consider the OCED Core Model, currently proposed as the basis for defining a new standard for object-centric event data, and we enhance it by grounding it on a lightweight version of UFO-B called gUFO, a well-known foundational ontology tailored to the representation of objects, events, time, and their (dynamic) relations. This results in a new metamodel, which we call gOCED. The third contribution then shows how gOCED at once covers the features of existing metamodels preserving their simplicity, and extends them with the essential features needed to overcome the ambiguity and expressiveness issues reported in the literature.
Why we are recommending this paper?
Due to your Interest in: Ontology for Products
University of California
Abstract
Given a graph $H$, let $ฯ‡_H(\mathbb{R}^n)$ be the smallest positive integer $r$ such that there exists an $r$-coloring of $\mathbb{R}^n$ with no monochromatic unit-copy of $H$, that is a set of $|V(H)|$ vertices of the same color such that any two vertices corresponding to an edge of $H$ are at distance one. This Ramsey-type function extends the famous Hadwiger--Nelson problem on the chromatic number $ฯ‡(\mathbb{R}^n)=ฯ‡_{K_2}(\mathbb{R}^n)$ of the space from a complete graph $K_2$ on two vertices to an arbitrary graph $H$. It also extends the classical Euclidean Ramsey problem for congruent monochromatic subsets to the family of those defined by a specific subset of unit distances. Among others, we show that $ฯ‡_H(\mathbb{R}^n)=ฯ‡(\mathbb{R}^n)$ for any even cycle $H$ of length $8$ or at least $12$ as well as for any forest and that $ฯ‡_H(\mathbb{R}^n)=\lceilฯ‡(\mathbb{R}^n)/2\rceil$ for any sufficiently long odd cycle. Our main tools and results, which are of independent interest, establish that Cartesian powers enjoy Ramsey-type properties for graphs with favorable Turรกn-type characteristics, such as zero hypercube Turรกn density. In addition, we prove induced variants of these results, find bounds on $ฯ‡_H(\mathbb{R}^n)$ for growing dimensions $n$, and prove a canonical-type result. We conclude with many open problems. One of these is to determine $ฯ‡_{C_4}(\mathbb{R}^2)$, for a cycle $C_4$ on four vertices.
AI Insights
  • Chromatic number: The minimum number of colors needed to color a graph so that no two adjacent vertices have the same color. [3]
  • Hypercube Turรกn density: A measure of how many edges are present in a subgraph, relative to the maximum possible number of edges. [3]
  • The results have implications for various areas of mathematics, such as graph theory, number theory, and dynamical systems. [3]
  • The problem of determining the chromatic number of a graph in Euclidean space is a long-standing open question in mathematics. [2]
Why we are recommending this paper?
Due to your Interest in: Graphs for Products
Amazon
Abstract
Graph Convolutional Networks (GCNs) have become a standard approach for semi-supervised node classification, yet practitioners lack clear guidance on when GCNs provide meaningful improvements over simpler baselines. We present a diagnostic study using the Amazon Computers co-purchase data to understand when and why GCNs help. Through systematic experiments with simulated label scarcity, feature ablation, and per-class analysis, we find that GCN performance depends critically on the interaction between graph homophily and feature quality. GCNs provide the largest gains under extreme label scarcity, where they leverage neighborhood structure to compensate for limited supervision. Surprisingly, GCNs can match their original performance even when node features are replaced with random noise, suggesting that structure alone carries sufficient signal on highly homophilous graphs. However, GCNs hurt performance when homophily is low and features are already strong, as noisy neighbors corrupt good predictions. Our quadrant analysis reveals that GCNs help in three of four conditions and only hurt when low homophily meets strong features. These findings offer practical guidance for practitioners deciding whether to adopt graph-based methods.
Why we are recommending this paper?
Due to your Interest in: Graphs for Products

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.
  • Taxonomy of Products
  • MECE Mutually Exclusive, Collectively Exhaustive.
You can edit or add more interests any time.