Abstract
The database community lacks a unified relational query language for subset
selection and optimisation queries, limiting both user expression and query
optimiser reasoning about such problems. Decades of research (latterly under
the rubric of prescriptive analytics) have produced powerful evaluation
algorithms with incompatible, ad-hoc SQL extensions that specify and filter
through distinct mechanisms. We present the first unified algebraic foundation
for these queries, introducing relational exponentiation to complete the
fundamental algebraic operations alongside union (addition) and cross product
(multiplication). First, we extend relational algebra to complete domain
relations-relations defined by characteristic functions rather than explicit
extensions-achieving the expressiveness of NP-complete/hard problems, while
simultaneously providing query safety for finite inputs. Second, we introduce
solution sets, a higher-order relational algebra over sets of relations that
naturally expresses search spaces as functions f: Base to Decision, yielding
|Decision|^|Base| candidate relations. Third, we provide structure-preserving
translation semantics from solution sets to standard relational algebra,
enabling mechanical translation to existing evaluation algorithms. This
framework achieves the expressiveness of the most powerful prior approaches
while providing the theoretical clarity and compositional properties absent in
previous work. We demonstrate the capabilities these algebras open up through a
polymorphic SQL where standard clauses seamlessly express data management,
subset selection, and optimisation queries within a single paradigm.
Northeastern University
Abstract
In recent years, there has been significant progress in the development of
deep learning models over relational databases, including architectures based
on heterogeneous graph neural networks (hetero-GNNs) and heterogeneous graph
transformers. In effect, such architectures state how the database records and
links (e.g., foreign-key references) translate into a large, complex numerical
expression, involving numerous learnable parameters. This complexity makes it
hard to explain, in human-understandable terms, how a model uses the available
data to arrive at a given prediction. We present a novel framework for
explaining machine-learning models over relational databases, where
explanations are view definitions that highlight focused parts of the database
that mostly contribute to the model's prediction. We establish such global
abductive explanations by adapting the classic notion of determinacy by Nash,
Segoufin, and Vianu (2010). In addition to tuning the tradeoff between
determinacy and conciseness, the framework allows controlling the level of
granularity by adopting different fragments of view definitions, such as ones
highlighting whole columns, foreign keys between tables, relevant groups of
tuples, and so on. We investigate the realization of the framework in the case
of hetero-GNNs. We develop heuristic algorithms that avoid the exhaustive
search over the space of all databases. We propose techniques that are
model-agnostic, and others that are tailored to hetero-GNNs via the notion of
learnable masking. Our approach is evaluated through an extensive empirical
study on the RelBench collection, covering a variety of domains and different
record-level tasks. The results demonstrate the usefulness of the proposed
explanations, as well as the efficiency of their generation.
AI Insights - GNNExplainer (Ying et al. 2019) is cited as a seminal method for extracting subgraph importance in GNNs.
- XGNN (Yuan et al. 2020) is referenced for its graphâsynthesis approach to modelâlevel explanations.
- ContextGNN (Yuan et al. 2025) is noted for extending explainability to twoâtower recommendation architectures.
- The authors recommend âProbabilistic Databasesâ by Suciu et al. (2011) for a rigorous treatment of uncertainty in relational settings.
- The paper points out that current GNN explainers are limited in scope, calling for more comprehensive methods.
- The literature review underscores the importance of feature importance, saliency maps, and modelâlevel explanations as core GNN interpretability techniques.