Leiden University
Abstract
Automatic performance tuning, or auto-tuning, accelerates high-performance
codes by exploring vast spaces of code variants. However, due to the large
number of possible combinations and complex constraints, constructing these
search spaces can be a major bottleneck. Real-world applications have been
encountered where the search space construction takes minutes to hours or even
days. Current state-of-the-art techniques for search space construction, such
as chain-of-trees, lack a formal foundation and only perform adequately on a
specific subset of search spaces.
We show that search space construction for constraint-based auto-tuning can
be reformulated as a Constraint Satisfaction Problem (CSP). Building on this
insight with a CSP solver, we develop a runtime parser that translates
user-defined constraint functions into solver-optimal expressions, optimize the
solver to exploit common structures in auto-tuning constraints, and integrate
these and other advances in open-source tools. These contributions
substantially improve performance and accessibility while preserving
flexibility.
We evaluate our approach using a diverse set of benchmarks, demonstrating
that our optimized solver reduces construction time by four orders of magnitude
versus brute-force enumeration, three orders of magnitude versus an unoptimized
CSP solver, and one to two orders of magnitude versus leading auto-tuning
frameworks built on chain-of-trees. We thus eliminate a critical scalability
barrier for auto-tuning and provide a drop-in solution that enables the
exploration of previously unattainable problem scales in auto-tuning and
related domains.
Northwestern University
Abstract
Investigative journalists routinely confront large document collections.
Large language models (LLMs) with retrieval-augmented generation (RAG)
capabilities promise to accelerate the process of document discovery, but
newsroom adoption remains limited due to hallucination risks, verification
burden, and data privacy concerns. We present a journalist-centered approach to
LLM-powered document search that prioritizes transparency and editorial control
through a five-stage pipeline -- corpus summarization, search planning,
parallel thread execution, quality evaluation, and synthesis -- using small,
locally-deployable language models that preserve data security and maintain
complete auditability through explicit citation chains. Evaluating three
quantized models (Gemma 3 12B, Qwen 3 14B, and GPT-OSS 20B) on two corpora, we
find substantial variation in reliability. All models achieved high citation
validity and ran effectively on standard desktop hardware (e.g., 24 GB of
memory), demonstrating feasibility for resource-constrained newsrooms. However,
systematic challenges emerged, including error propagation through multi-stage
synthesis and dramatic performance variation based on training data overlap
with corpus content. These findings suggest that effective newsroom AI
deployment requires careful model selection and system design, alongside human
oversight for maintaining standards of accuracy and accountability.