Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs

Graphcore Research

Why we think this paper is great for you:
This paper directly addresses how to improve the training and evaluation of systems that use structured knowledge bases. You will find its insights on graph retrieval and knowledge graph augmentation particularly useful for your work.

Rate paper: 👍 👎 ♥ Save

Abstract
Retrieval of information from graph-structured knowledge bases represents a promising direction for improving the factuality of LLMs. While various solutions have been proposed, a comparison of methods is difficult due to the lack of challenging QA datasets with ground-truth targets for graph retrieval. We present SynthKGQA, a framework for generating high-quality synthetic Knowledge Graph Question Answering datasets from any Knowledge Graph, providing the full set of ground-truth facts in the KG to reason over each question. We show how, in addition to enabling more informative benchmarking of KG retrievers, the data produced with SynthKGQA also allows us to train better models. We apply SynthKGQA to Wikidata to generate GTSQA, a new dataset designed to test zero-shot generalization abilities of KG retrievers with respect to unseen graph structures and relation types, and benchmark popular solutions for KG-augmented LLMs on it.

AI Summary

The GTSQA dataset, generated by SynthKGQA from Wikidata, is specifically designed to test zero-shot generalization abilities of KG retrievers with respect to unseen graph structures and relation types, addressing a critical gap in existing benchmarks. [3]
SOTA KG-RAG models struggle significantly on the GTSQA benchmark, particularly with questions requiring intersection of paths from multiple seed entities and generalizing to unseen graph isomorphism or relation types. [3]
Training KG retrievers using ground-truth answer subgraphs as supervision signal, rather than approximated shortest paths, leads to substantial performance improvements, with EM Hits scores increasing by 5% to 20% and ground-truth triple precision by up to 141% for multi-hop questions. [3]
Ground-Truth Answer Subgraph (G): The exact set of triples in the Knowledge Graph required to reason over a specific question, serving as the golden target for retrieval. [3]
GTSQA: A challenging new synthetic KGQA dataset with 32,099 questions, grounded in Wikidata, generated by SynthKGQA, and designed to test zero-shot generalization abilities of KG-RAG models. [3]
All-at-once subgraph retrievers generally outperform path-based retrievers and KG agents on GTSQA, primarily due to their higher recall of ground-truth triples, although they can struggle with unseen graph isomorphism types requiring complex projections. [2]
The SynthKGQA framework enables the generation of high-quality, synthetic Knowledge Graph Question Answering (KGQA) datasets from any Knowledge Graph, providing explicit ground-truth answer subgraphs and SPARQL queries for robust benchmarking and training. [1]
Shortest paths between seed and answer nodes are often poor approximations of the true ground-truth answer subgraphs, especially for multi-hop questions, due to 'shortcuts' and 'parallel paths' that do not capture the required reasoning. [1]
KG agents and path-based methods exhibit widespread inefficiency in properly expanding and coordinating search from multiple seed entities, leading to low recall of ground-truth triples for multi-seed questions. [1]
SynthKGQA: A framework for generating large synthetic KGQA datasets from any Knowledge Graph, providing high-quality, diverse questions with procedurally-verified ground-truth answer subgraphs and SPARQL queries. [1]

KGFR: A Foundation Retriever for Generalized Knowledge Graph Question Answering

Nanjing University of Nan

Why we think this paper is great for you:
This work on a foundation retriever for knowledge graph question answering, especially its focus on scalability for large and unseen graphs, aligns well with your interest in leveraging graph structures for information. It offers valuable approaches for handling extensive knowledge.

Rate paper: 👍 👎 ♥ Save

Abstract
Large language models (LLMs) excel at reasoning but struggle with knowledge-intensive questions due to limited context and parametric knowledge. However, existing methods that rely on finetuned LLMs or GNN retrievers are limited by dataset-specific tuning and scalability on large or unseen graphs. We propose the LLM-KGFR collaborative framework, where an LLM works with a structured retriever, the Knowledge Graph Foundation Retriever (KGFR). KGFR encodes relations using LLM-generated descriptions and initializes entities based on their roles in the question, enabling zero-shot generalization to unseen KGs. To handle large graphs efficiently, it employs Asymmetric Progressive Propagation (APP)- a stepwise expansion that selectively limits high-degree nodes while retaining informative paths. Through node-, edge-, and path-level interfaces, the LLM iteratively requests candidate answers, supporting facts, and reasoning paths, forming a controllable reasoning loop. Experiments demonstrate that LLM-KGFR achieves strong performance while maintaining scalability and generalization, providing a practical solution for KG-augmented reasoning.

Dynamic Routing Between Experts: A Data-Efficient Approach to Continual Learning in Vision-Language Models

Amazoncom

Why we think this paper is great for you:
This paper's exploration of continual learning is highly relevant to your needs for systems that can adapt and discover new categories over time without forgetting previous knowledge. It provides methods for maintaining performance on evolving datasets.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Vision-Language Models (VLMs) suffer from catastrophic forgetting when sequentially fine-tuned on new tasks, degrading performance on previously learned foundational and task-specific capabilities. While multi-task learning can mitigate forgetting, it requires simultaneous access to all datasets and imposes computational overhead that scales linearly with the number of tasks. In this work, we introduce a routing-based approach that enables the integration of new tasks while preserving the foundational knowledge acquired during pretraining. We evaluate our method using InternVL-2 models (2B and 8B parameters) and demonstrate that routing preserves the model's foundational capabilities by maintaining performance on general-purpose benchmarks such as ChartQA, MMBench, and DocVQA, while simultaneously improving accuracy on specialized tasks. Importantly, our approach achieves this without requiring concurrent access to data from all tasks, avoiding the significant computational and data overhead associated with traditional multi-task learning. We further conduct extensive ablation studies to evaluate the scalability and robustness of routing-based learning, showing that the approach is resilient to a growing number of tasks and performs particularly well when new tasks are semantically related. Finally, we show that the routing mechanism enables superior cross-modal transfer between language and vision capabilities, allowing knowledge learned in one modality to enhance performance in another capability not achieved by existing continual learning methods.

Identification of Separable OTUs for Multinomial Classification in Compositional Data Analysis

Universitat de les Illes

Why we think this paper is great for you:
The methodology presented here for multinomial classification offers direct applicability to developing robust systems for organizing items into distinct groups. You might find its approach to identifying separable features useful for creating clear distinctions.

Rate paper: 👍 👎 ♥ Save

Abstract
High-throughput sequencing has transformed microbiome research, but it also produces inherently compositional data that challenge standard statistical and machine learning methods. In this work, we propose a multinomial classification framework for compositional microbiome data based on penalized log-ratio regression and pairwise separability screening. The method quantifies the discriminative ability of each OTU through the area under the receiver operating characteristic curve ($AUC$) for all pairwise log-ratios and aggregates these values into a global separability index $S_k$, yielding interpretable rankings of taxa together with confidence intervals. We illustrate the approach by reanalyzing the Baxter colorectal adenoma dataset and comparing our results with Greenacre's ordination-based analysis using Correspondence Analysis and Canonical Correspondence Analysis. Our models consistently recover a core subset of taxa previously identified as discriminant, thereby corroborating Greenacre's main findings, while also revealing additional OTUs that become important once demographic covariates are taken into account. In particular, adjustment for age, gender, and diabetes medication improves the precision of the separation index and highlights new, potentially relevant taxa, suggesting that part of the original signal may have been influenced by confounding. Overall, the integration of log-ratio modeling, covariate adjustment, and uncertainty estimation provides a robust and interpretable framework for OTU selection in compositional microbiome data. The proposed method complements existing ordination-based approaches by adding a probabilistic and inferential perspective, strengthening the identification of biologically meaningful microbial signatures.

In Good GRACEs: Principled Teacher Selection for Knowledge Distillation

Princeton Language and

Why we think this paper is great for you:
This paper on knowledge distillation presents an efficient strategy for transferring knowledge from larger models to smaller ones. This technique could be very beneficial for managing and deploying complex knowledge structures effectively.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Knowledge distillation is an efficient strategy to use data generated by large "teacher" language models to train smaller capable "student" models, but selecting the optimal teacher for a specific student-task combination requires expensive trial-and-error. We propose a lightweight score called GRACE to quantify how effective a teacher will be for post-training a student model. GRACE measures distributional properties of the student's gradients without access to a verifier, teacher logits, teacher internals, or test data. From an information-theoretic perspective, GRACE connects to leave-one-out stability of gradient-based algorithms, which controls the generalization performance of the distilled students. On GSM8K and MATH, GRACE correlates strongly (up to 86% Spearman correlation) with the performance of the distilled LLaMA and OLMo students. In particular, training a student using the GRACE-selected teacher can improve the performance by up to 7.4% over naively using the best-performing teacher. Further, GRACE can provide guidance on crucial design choices in distillation, including (1) the best temperature to use when generating from the teacher, (2) the best teacher to use given a size constraint, and (3) the best teacher to use within a specific model family. Altogether, our findings demonstrate that GRACE can efficiently and effectively identify a strongly compatible teacher for a given student and provide fine-grained guidance on how to perform distillation.

On the Equivalence of Regression and Classification

Indian Institute of Techn

Why we think this paper is great for you:
This paper provides a fundamental theoretical link between regression and classification, which could deepen your understanding of the underlying principles behind various categorization tasks. It offers a fresh perspective on these core machine learning concepts.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
A formal link between regression and classification has been tenuous. Even though the margin maximization term $\|w\|$ is used in support vector regression, it has at best been justified as a regularizer. We show that a regression problem with $M$ samples lying on a hyperplane has a one-to-one equivalence with a linearly separable classification task with $2M$ samples. We show that margin maximization on the equivalent classification task leads to a different regression formulation than traditionally used. Using the equivalence, we demonstrate a ``regressability'' measure, that can be used to estimate the difficulty of regressing a dataset, without needing to first learn a model for it. We use the equivalence to train neural networks to learn a linearizing map, that transforms input variables into a space where a linear regressor is adequate.

Accelerating scientific discovery with the common task framework

University of Washington

Why we think this paper is great for you:
This paper discusses frameworks for evaluating and comparing machine learning algorithms, which is crucial for assessing the performance of your own systems. It can help you establish robust benchmarks for your research.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Machine learning (ML) and artificial intelligence (AI) algorithms are transforming and empowering the characterization and control of dynamic systems in the engineering, physical, and biological sciences. These emerging modeling paradigms require comparative metrics to evaluate a diverse set of scientific objectives, including forecasting, state reconstruction, generalization, and control, while also considering limited data scenarios and noisy measurements. We introduce a common task framework (CTF) for science and engineering, which features a growing collection of challenge data sets with a diverse set of practical and common objectives. The CTF is a critically enabling technology that has contributed to the rapid advance of ML/AI algorithms in traditional applications such as speech recognition, language processing, and computer vision. There is a critical need for the objective metrics of a CTF to compare the diverse algorithms being rapidly developed and deployed in practice today across science and engineering.

Interests not found

Help us improve your experience!