Universidade Federal de P
Abstract
Solutions to the Algorithm Selection Problem (ASP) in machine learning face
the challenge of high computational costs associated with evaluating various
algorithms' performances on a given dataset. To mitigate this cost, the
meta-learning field can leverage previously executed experiments shared in
online repositories such as OpenML. OpenML provides an extensive collection of
machine learning experiments. However, an analysis of OpenML's records reveals
limitations. It lacks diversity in pipelines, specifically when exploring data
preprocessing steps/blocks, such as scaling or imputation, resulting in limited
representation. Its experiments are often focused on a few popular techniques
within each pipeline block, leading to an imbalanced sample. To overcome the
observed limitations of OpenML, we propose PIPES, a collection of experiments
involving multiple pipelines designed to represent all combinations of the
selected sets of techniques, aiming at diversity and completeness. PIPES stores
the results of experiments performed applying 9,408 pipelines to 300 datasets.
It includes detailed information on the pipeline blocks, training and testing
times, predictions, performances, and the eventual error messages. This
comprehensive collection of results allows researchers to perform analyses
across diverse and representative pipelines and datasets. PIPES also offers
potential for expansion, as additional data and experiments can be incorporated
to support the meta-learning community further. The data, code, supplementary
material, and all experiments can be found at
https://github.com/cynthiamaia/PIPES.git.
University of Pennsylvann
Abstract
Generative Artificial Intelligence is emerging as an important technology,
promising to be transformative in many areas. At the same time, generative AI
techniques are based on sampling from probabilistic models, and by default,
they come with no guarantees about correctness, safety, fairness, or other
properties. Statistical methods offer a promising potential approach to improve
the reliability of generative AI techniques. In addition, statistical methods
are also promising for improving the quality and efficiency of AI evaluation,
as well as for designing interventions and experiments in AI.
In this paper, we review some of the existing work on these topics,
explaining both the general statistical techniques used, as well as their
applications to generative AI. We also discuss limitations and potential future
directions.
AI Insights - Conformal prediction gives distribution‑free confidence bands for LLM outputs, turning uncertainty into a measurable metric.
- Activation engineering tweaks hidden activations to steer language models toward desired styles without retraining.
- Causal inference tools uncover hidden biases in LLM responses, enabling targeted debiasing.
- The reliability of steering vectors—directional prompts that guide generation—remains largely uncharted, inviting fresh research.
- Uncertainty quantification turns hallucinations into measurable risks, paving the way for safer AI deployment.
- “Algorithmic Learning in a Random World” provides a rigorous statistical foundation for model behavior under randomness.
- Conformal abstention lets LLMs refuse uncertain queries, dramatically reducing hallucination rates.