Amirali Rayegan, Tim Menz
Abstract
Efficient, interpretable optimization is a critical but underexplored
challenge in software engineering, where practitioners routinely face vast
configuration spaces and costly, error-prone labeling processes. This paper
introduces EZR, a novel and modular framework for multi-objective optimization
that unifies active sampling, learning, and explanation within a single,
lightweight pipeline. Departing from conventional wisdom, our Maximum Clarity
Heuristic demonstrates that using less (but more informative) data can yield
optimization models that are both effective and deeply understandable. EZR
employs an active learning strategy based on Naive Bayes sampling to
efficiently identify high-quality configurations with a fraction of the labels
required by fully supervised approaches. It then distills optimization logic
into concise decision trees, offering transparent, actionable explanations for
both global and local decision-making. Extensive experiments across 60
real-world datasets establish that EZR reliably achieves over 90% of the
best-known optimization performance in most cases, while providing clear,
cohort-based rationales that surpass standard attribution-based explainable AI
(XAI) methods (LIME, SHAP, BreakDown) in clarity and utility. These results
endorse "less but better"; it is both possible and often preferable to use
fewer (but more informative) examples to generate label-efficient optimization
and explanations in software systems. To support transparency and
reproducibility, all code and experimental materials are publicly available at
https://github.com/amiiralii/Minimal-Data-Maximum-Clarity.
AI Insights - They use SHAP values as a feature‑selection compass, steering models toward the most informative inputs.
- The paper reviews XAI methods—SHAP, LIME, TPE—highlighting strengths and blind spots.
- It exposes gaps in current XAI tools, urging research into lighter, sharper explanations for software.
- Explainability is framed as essential, not optional, for trustworthy software engineering.
- A curated reading list appears, from The Book of Why to LLM‑driven active‑learning papers.
- The authors note limited empirical evidence, reminding theory must meet data.
- Definitions are clear: XAI demystifies AI; interpretability lets us see model decisions.
Abstract
In this paper, we study the class $\mathtt{cstPP}$ of operations
$\mathtt{op}: \mathbb{N}^k\to\mathbb{N}$, of any fixed arity $k\ge 1$,
satisfying the following property: for each fixed integer $d\ge 1$, there
exists an algorithm for a RAM machine which, for any input integer $N\ge 2$, -
pre-computes some tables in $O(N)$ time, - then reads $k$ operands
$x_1,\ldots,x_k1$, or conversely, is reduced
to $N^{\varepsilon}$, for any positive $\varepsilon<1$ (provided the set of
primitive operation includes $+$, $\mathtt{div}$ and $\mathtt{mod}$). To
complete the picture, we demonstrate that the $\mathtt{cstPP}$ class
degenerates if the preprocessing time reduces to $N^{o(1)}$.