Hi!

Your personalized paper recommendations for 10 to 14 November, 2025.

🎯 Top Personalized Recommendations

Knowledge Graphs Generation from Cultural Heritage Texts: Combining LLMs and Ontological Engineering for Scholarly Debates

University of Bologna

Why we think this paper is great for you:
This paper directly addresses the creation of structured knowledge from unstructured text using Knowledge Graphs and ontological engineering, which is highly relevant to your interest in organizing information. It provides a systematic methodology for building these crucial structures.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Cultural Heritage texts contain rich knowledge that is difficult to query systematically due to the challenges of converting unstructured discourse into structured Knowledge Graphs (KGs). This paper introduces ATR4CH (Adaptive Text-to-RDF for Cultural Heritage), a systematic five-step methodology for Large Language Model-based Knowledge Extraction from Cultural Heritage documents. We validate the methodology through a case study on authenticity assessment debates. Methodology - ATR4CH combines annotation models, ontological frameworks, and LLM-based extraction through iterative development: foundational analysis, annotation schema development, pipeline architecture, integration refinement, and comprehensive evaluation. We demonstrate the approach using Wikipedia articles about disputed items (documents, artifacts...), implementing a sequential pipeline with three LLMs (Claude Sonnet 3.7, Llama 3.3 70B, GPT-4o-mini). Findings - The methodology successfully extracts complex Cultural Heritage knowledge: 0.96-0.99 F1 for metadata extraction, 0.7-0.8 F1 for entity recognition, 0.65-0.75 F1 for hypothesis extraction, 0.95-0.97 for evidence extraction, and 0.62 G-EVAL for discourse representation. Smaller models performed competitively, enabling cost-effective deployment. Originality - This is the first systematic methodology for coordinating LLM-based extraction with Cultural Heritage ontologies. ATR4CH provides a replicable framework adaptable across CH domains and institutional resources. Research Limitations - The produced KG is limited to Wikipedia articles. While the results are encouraging, human oversight is necessary during post-processing. Practical Implications - ATR4CH enables Cultural Heritage institutions to systematically convert textual knowledge into queryable KGs, supporting automated metadata enrichment and knowledge discovery.

AI Summary

The methodology successfully extracts complex CH knowledge with high F1-scores: 0.96-0.99 for metadata, 0.7-0.8 for entity recognition, 0.65-0.75 for hypothesis extraction, and 0.95-0.97 for evidence extraction, achieving 0.62 G-EVAL for discourse representation. [3]
Smaller Large Language Models (LLMs) demonstrated competitive performance compared to larger architectures within the ATR4CH pipeline, indicating potential for cost-effective deployment in resource-constrained CH institutions. [3]
This methodology enables Cultural Heritage institutions to systematically convert unstructured textual knowledge into queryable Knowledge Graphs (KGs), thereby supporting automated metadata enrichment and advanced knowledge discovery. [3]
ATR4CH tackles the practical challenge of extracting complex scholarly information, which traditionally requires enormous manual labor and combines deep humanities scholarship with technical knowledge representation skills. [3]
Core Ontological Patterns (COPs): Essential Knowledge Graph patterns, identified during foundational analysis, that represent the central ontological nodes and relationships both present in the corpus as extractable information and necessary for addressing research Competency Questions. [3]
ATR4CH is a systematic five-step methodology for LLM-based Knowledge Extraction from Cultural Heritage (CH) documents, providing a replicable framework adaptable across various CH domains and institutional resources. [2]
ATR4CH is the first systematic methodology to coordinate LLM-based extraction with Cultural Heritage ontologies, specifically validated through authenticity assessment debates. [2]
The approach directly addresses the fundamental misalignment between rich, nuanced textual scholarly discourse and the often sparse, categorical assertions found in existing structured CH knowledge bases. [2]
ATR4CH (Adaptive Text-to-RDF for Cultural Heritage): A systematic five-step methodology for Large Language Model-based Knowledge Extraction from Cultural Heritage documents, combining annotation models, ontological frameworks, and LLM-based extraction through iterative development. [2]
Minimal Working Annotation (MWA): An iteratively developed annotation schema that captures essential knowledge structures from the pilot corpus, prioritizing simplicity and feasibility while ensuring adequate coverage of the COPs for both manual annotation and automated extraction. [2]

OWLAPY: A Pythonic Framework for OWL Ontology Engineering

Paderborn University

Why we think this paper is great for you:
This framework offers practical tools for creating and managing OWL ontologies, directly supporting your work in defining product taxonomies and knowledge structures. It provides a hands-on approach to ontology engineering.

Rate paper: 👍 👎 ♥ Save

Abstract
In this paper, we introduce OWLAPY, a comprehensive Python framework for OWL ontology engineering. OWLAPY streamlines the creation, modification, and serialization of OWL 2 ontologies. It uniquely integrates native Python-based reasoners with support for external Java reasoners, offering flexibility for users. OWLAPY facilitates multiple implementations of core ontology components and provides robust conversion capabilities between OWL class expressions and formats such as Description Logics, Manchester Syntax, and SPARQL. It also allows users to define custom workflows to leverage large language models (LLMs) in ontology generation from natural language text. OWLAPY serves as a well-tested software framework for users seeking a flexible Python library for advanced ontology engineering, including those transitioning from Java-based environments. The project is publicly available on GitHub at https://github.com/dice-group/owlapy and on the Python Package Index (PyPI) at https://pypi.org/project/owlapy/ , with over 50,000 downloads at the time of writing.

ProgRAG: Hallucination-Resistant Progressive Retrieval and Reasoning over Knowledge Graphs

Hanyang University

Why we think this paper is great for you:
This paper explores advanced reasoning over Knowledge Graphs, enhancing their utility for complex, knowledge-intensive tasks. It aligns well with your focus on leveraging structured knowledge for robust information management.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Large Language Models (LLMs) demonstrate strong reasoning capabilities but struggle with hallucinations and limited transparency. Recently, KG-enhanced LLMs that integrate knowledge graphs (KGs) have been shown to improve reasoning performance, particularly for complex, knowledge-intensive tasks. However, these methods still face significant challenges, including inaccurate retrieval and reasoning failures, often exacerbated by long input contexts that obscure relevant information or by context constructions that struggle to capture the richer logical directions required by different question types. Furthermore, many of these approaches rely on LLMs to directly retrieve evidence from KGs, and to self-assess the sufficiency of this evidence, which often results in premature or incorrect reasoning. To address the retrieval and reasoning failures, we propose ProgRAG, a multi-hop knowledge graph question answering (KGQA) framework that decomposes complex questions into sub-questions, and progressively extends partial reasoning paths by answering each sub-question. At each step, external retrievers gather candidate evidence, which is then refined through uncertainty-aware pruning by the LLM. Finally, the context for LLM reasoning is optimized by organizing and rearranging the partial reasoning paths obtained from the sub-question answers. Experiments on three well-known datasets demonstrate that ProgRAG outperforms existing baselines in multi-hop KGQA, offering improved reliability and reasoning quality.

PDAC: Efficient Coreset Selection for Continual Learning via Probability Density Awareness

Why we think this paper is great for you:
This work on efficient continual learning for knowledge retention is very pertinent to your interest in adapting and evolving category discovery over time. It addresses how to maintain and update knowledge effectively in dynamic environments.

Rate paper: 👍 👎 ♥ Save

Abstract
Rehearsal-based Continual Learning (CL) maintains a limited memory buffer to store replay samples for knowledge retention, making these approaches heavily reliant on the quality of the stored samples. Current Rehearsal-based CL methods typically construct the memory buffer by selecting a representative subset (referred to as coresets), aiming to approximate the training efficacy of the full dataset with minimal storage overhead. However, mainstream Coreset Selection (CS) methods generally formulate the CS problem as a bi-level optimization problem that relies on numerous inner and outer iterations to solve, leading to substantial computational cost thus limiting their practical efficiency. In this paper, we aim to provide a more efficient selection logic and scheme for coreset construction. To this end, we first analyze the Mean Squared Error (MSE) between the buffer-trained model and the Bayes-optimal model through the perspective of localized error decomposition to investigate the contribution of samples from different regions to MSE suppression. Further theoretical and experimental analyses demonstrate that samples with high probability density play a dominant role in error suppression. Inspired by this, we propose the Probability Density-Aware Coreset (PDAC) method. PDAC leverages the Projected Gaussian Mixture (PGM) model to estimate each sample's joint density, enabling efficient density-prioritized buffer selection. Finally, we introduce the streaming Expectation Maximization (EM) algorithm to enhance the adaptability of PGM parameters to streaming data, yielding Streaming PDAC (SPDAC) for streaming scenarios. Extensive comparative experiments show that our methods outperforms other baselines across various CL settings while ensuring favorable efficiency.

Efficient Model-Agnostic Continual Learning for Next POI Recommendation

Why we think this paper is great for you:
Focusing on continual learning for adapting to changing patterns, this paper offers insights into dynamic categorization and discovery processes. It's highly relevant to your need for systems that can evolve their understanding of categories.

Rate paper: 👍 👎 ♥ Save

Abstract
Next point-of-interest (POI) recommendation improves personalized location-based services by predicting users' next destinations based on their historical check-ins. However, most existing methods rely on static datasets and fixed models, limiting their ability to adapt to changes in user behavior over time. To address this limitation, we explore a novel task termed continual next POI recommendation, where models dynamically adapt to evolving user interests through continual updates. This task is particularly challenging, as it requires capturing shifting user behaviors while retaining previously learned knowledge. Moreover, it is essential to ensure efficiency in update time and memory usage for real-world deployment. To this end, we propose GIRAM (Generative Key-based Interest Retrieval and Adaptive Modeling), an efficient, model-agnostic framework that integrates context-aware sustained interests with recent interests. GIRAM comprises four components: (1) an interest memory to preserve historical preferences; (2) a context-aware key encoding module for unified interest key representation; (3) a generative key-based retrieval module to identify diverse and relevant sustained interests; and (4) an adaptive interest update and fusion module to update the interest memory and balance sustained and recent interests. In particular, GIRAM can be seamlessly integrated with existing next POI recommendation models. Experiments on three real-world datasets demonstrate that GIRAM consistently outperforms state-of-the-art methods while maintaining high efficiency in both update time and memory consumption.

Beyond Elicitation: Provision-based Prompt Optimization for Knowledge-Intensive Tasks

SJTU

Why we think this paper is great for you:
This paper delves into optimizing performance for knowledge-intensive tasks, which is crucial for effective knowledge management systems. It provides strategies for improving how models interact with and utilize knowledge.

Rate paper: 👍 👎 ♥ Save

Abstract
While prompt optimization has emerged as a critical technique for enhancing language model performance, existing approaches primarily focus on elicitation-based strategies that search for optimal prompts to activate models' capabilities. These methods exhibit fundamental limitations when addressing knowledge-intensive tasks, as they operate within fixed parametric boundaries rather than providing the factual knowledge, terminology precision, and reasoning patterns required in specialized domains. To address these limitations, we propose Knowledge-Provision-based Prompt Optimization (KPPO), a framework that reformulates prompt optimization as systematic knowledge integration rather than potential elicitation. KPPO introduces three key innovations: 1) a knowledge gap filling mechanism for knowledge gap identification and targeted remediation; 2) a batch-wise candidate evaluation approach that considers both performance improvement and distributional stability; 3) an adaptive knowledge pruning strategy that balances performance and token efficiency, reducing up to 29% token usage. Extensive evaluation on 15 knowledge-intensive benchmarks from various domains demonstrates KPPO's superiority over elicitation-based methods, with an average performance improvement of ~6% over the strongest baseline while achieving comparable or lower token consumption. Code at: https://github.com/xyz9911/KPPO.

On topological descriptors for graph products

Aalto University

Why we think this paper is great for you:
This paper explores the mathematical properties of graph products and their structural information, which could be foundational for understanding complex graph-based representations. It offers a deeper look into the underlying structures of graphs.

Rate paper: 👍 👎 ♥ Save

Abstract
Topological descriptors have been increasingly utilized for capturing multiscale structural information in relational data. In this work, we consider various filtrations on the (box) product of graphs and the effect on their outputs on the topological descriptors - the Euler characteristic (EC) and persistent homology (PH). In particular, we establish a complete characterization of the expressive power of EC on general color-based filtrations. We also show that the PH descriptors of (virtual) graph products contain strictly more information than the computation on individual graphs, whereas EC does not. Additionally, we provide algorithms to compute the PH diagrams of the product of vertex- and edge-level filtrations on the graph product. We also substantiate our theoretical analysis with empirical investigations on runtime analysis, expressivity, and graph classification performance. Overall, this work paves way for powerful graph persistent descriptors via product filtrations. Code is available at https://github.com/Aalto-QuML/tda_graph_product.

Taxonomy of Products

Recollements, coproducts and products in extriangulated categories

UNAM

Rate paper: 👍 👎 ♥ Save

Abstract
We introduce a notion similar to the AB4 (resp. AB4{*}) condition for abelian categories but in the context of extriangulated categories. We will refer to this notion as AET4 (resp. AET4{*}). One of our main results shows equivalent statements for AET4 (resp. AET4{*}), which generalize statements commonly used in homological constructions in abelian categories. As an application, we will give conditions for a recollement $(\mathcal{A},\mathcal{B},\mathcal{C})$ of extriangulated categories with $\mathcal{B}$ AET4 (resp. AET4{*}) to imply that the categories $\mathcal{A}$ and $\mathcal{C}$ are AET4 (resp. AET4{*}); and we will show a relation between the $n$-smashing (resp. $n$-co-smashing) condition for a $t$-structure and the AET4 (resp. AET4{*}) condition of the extended hearts of the $t$-structure. It is also included an appendix where we study in detail the properties of adjoint pairs between extriangulated categories which are necessary for the development of the paper, including some special properties for higher extension groups.

Product Categorization

Product distribution learning with imperfect advice

University of Warwick

Rate paper: 👍 👎 ♥ Save

Abstract
Given i.i.d.~samples from an unknown distribution $P$, the goal of distribution learning is to recover the parameters of a distribution that is close to $P$. When $P$ belongs to the class of product distributions on the Boolean hypercube $\{0,1\}^d$, it is known that $Ω(d/\varepsilon^2)$ samples are necessary to learn $P$ within total variation (TV) distance $\varepsilon$. We revisit this problem when the learner is also given as advice the parameters of a product distribution $Q$. We show that there is an efficient algorithm to learn $P$ within TV distance $\varepsilon$ that has sample complexity $\tilde{O}(d^{1-η}/\varepsilon^2)$, if $\|\mathbf{p} - \mathbf{q}\|_1 < \varepsilon d^{0.5 - Ω(η)}$. Here, $\mathbf{p}$ and $\mathbf{q}$ are the mean vectors of $P$ and $Q$ respectively, and no bound on $\|\mathbf{p} - \mathbf{q}\|_1$ is known to the algorithm a priori.

Graphs for Products

Cyclic Hypergraph Product Code

IonQ Inc

Rate paper: 👍 👎 ♥ Save

Abstract
Hypergraph product (HGP) codes are one of the most popular family of quantum low-density parity-check (LDPC) codes. Circuit-level simulations show that they can achieve the same logical error rate as surface codes with a reduced qubit overhead. They have been extensively optimized by importing classical techniques such as the progressive edge growth, or through random search, simulated annealing or reinforcement learning techniques. In this work, instead of machine learning (ML) algorithms that improve the code performance through local transformations, we impose additional global symmetries, that are hard to discover through ML, and we perform an exhaustive search. Precisely, we focus on the hypergraph product of two cyclic codes, which we call CxC codes and we study C2 codes which are the product a cyclic code with itself and CxR codes which are the product of a cyclic codes with a repetition code. We discover C2 codes and CxR codes that significantly outperform previously optimized HGP codes, achieving better parameters and a logical error rate per logical qubit that is up to three orders of magnitude better. Moreover, some C2 codes achieve simultaneously a lower logical error rate and a smaller qubit overhead than state-of-the-art LDPC codes such as the bivariate bicycle codes, at the price of a larger block length. Finally, leveraging the cyclic symmetry imposed on the codes, we design an efficient planar layout for the QCCD architecture, allowing for a trapped ion implementation of the syndrome extraction circuit in constant depth.

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

MECE Mutually Exclusive, Collectively Exhaustive.

You can edit or add more interests any time.

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback