Hi j34nc4rl0+mlops,

Here is our personalized paper recommendations for you sorted by most relevant

Fault tolerance

Quantum Circuit Optimization for the Fault-Tolerance Era: Do We Have to Start from Scratch?

Technical University of M

Abstract
Quantum computing has made significant advancements in the last years in both hardware and software. Unfortunately, the currently available Noisy Intermediate-Scale Quantum (NISQ) hardware is still heavily affected by noise. Many optimization techniques have been developed to reduce the negative effects thereof, which, however, only works up to a certain point. Therefore, scaling quantum applications from currently considered small research examples to industrial applications requires error-correction techniques to execute quantum circuits in a fault-tolerant fashion and enter the Fault-Tolerant Quantum Computing (FTQC) era. These error-correction techniques introduce dramatic qubit overheads, leading to the requirement of tens of thousands of qubits already for toy-sized examples. Hence, quantum circuit optimization that reduces qubit overheads when shifting from the NISQ to the FTQC era is essential. This raises the question, whether we need to start from scratch, or whether current state-of-the-art optimization techniques can be used as a basis for this. To approach this question, this work investigates the effects of different optimization passes on a representative selection of quantum circuits. Since hardly any tools to automatically design and evaluate FTQC quantum circuits exist yet, we utilize resource estimation to compare the (potential) benefits gained by applying NISQ quantum circuit optimization to estimated FTQC resource requirements. The results indicate that, indeed, the estimated resource requirements for FTQC can be improved by applying NISQ quantum circuit optimization techniques. At the same time, more detailed investigations show what techniques lead to more benefits for FTQC compared to others, providing guidelines for the transfer of NISQ optimization techniques to the FTQC era.

AI Insights

Benchmarked 12 NISQ passes on 15 circuits to quantify FTQC resource impact.
NISQ gate‑count and qubit compression yield up to 35 % logical‑qubit savings on surface codes.
Commutation and template matching are the most effective for lowering FTQC overhead.
Introduced a pipeline linking Qiskit transpiler outputs to surface‑code cost models.
Pre‑optimized NISQ circuits accelerate fault‑tolerant convergence 2–3× versus re‑optimizing.
Suggests combining Pyliqtr synthesis with BQSKit layout optimization to bridge the gap.
Curated open‑source toolkit list (Qiskit, Pyliqtr, BQSKit, Qualtran) for workflow experimentation.

September 02, 2025

♥Save to Reading List

Fault-tolerant Model Predictive Control for Spacecraft

KTH Royal Institute of T

Abstract
Given the cost and critical functions of satellite constellations, ensuring mission longevity and safe decommissioning is essential for space sustainability. This article presents a Model Predictive Control for spacecraft trajectory and setpoint stabilization under multiple actuation failures. The proposed solution allows us to efficiently control the faulty spacecraft enabling safe navigation towards servicing or collision-free trajectories. The proposed scheme ensures closed-loop asymptotic stability and is shown to be recursively feasible. We demonstrate its efficacy through open-source numerical results and realistic experiments using the ATMOS platform.

AI Insights

Actuator saturation is built into the MPC cost, ensuring feasible thrust at each step.
Simulations on a 2‑D free‑flyer and a 3‑D spacecraft confirm the controller’s robustness to nonlinear attitude dynamics.
A literature survey contrasts the new MPC with existing schemes, highlighting gaps in multi‑constraint handling.
Real‑time deployment needs efficient solver tuning to handle the nonlinear MPC’s computational load.
Future work will adapt the controller to diverse missions, noting the model’s simplifications of real disturbances.
Open‑source code is provided, allowing researchers to benchmark the controller against alternative attitude‑control strategies.
Treating attitude control as a constrained optimization problem turns every saturation limit into a design feature rather than a bug.

September 02, 2025

♥Save to Reading List

Machine Learning Lifecycle

Enhanced Sampling in the Age of Machine Learning: Algorithms and Applications

Zhejiang University, Hang

Abstract
Molecular dynamics simulations hold great promise for providing insight into the microscopic behavior of complex molecular systems. However, their effectiveness is often constrained by long timescales associated with rare events. Enhanced sampling methods have been developed to address these challenges, and recent years have seen a growing integration with machine learning techniques. This review provides a comprehensive overview of how they are reshaping the field, with a particular focus on the data-driven construction of collective variables. Furthermore, these techniques have also improved biasing schemes and unlocked novel strategies via reinforcement learning and generative approaches. In addition to methodological advances, we highlight applications spanning different areas such as biomolecular processes, ligand binding, catalytic reactions, and phase transitions. We conclude by outlining future directions aimed at enabling more automated strategies for rare-event sampling.

AI Insights

Graph neural networks now predict molecular thermodynamics and kinetics with sub‑kcal accuracy.
Normalizing flows accelerate free‑energy calculations by sampling high‑dimensional basins efficiently.
Targeted free‑energy perturbation revisited achieves 0.5 kcal/mol accuracy using mapped reference potentials.
ML‑derived collective variables enable crystal nucleation studies without hand‑crafted descriptors.
Thermodynamics‑inspired explanations reveal why AI models favor certain reaction pathways.
“Introduction to Machine Learning for Chemistry” and “Artificial Intelligence in Chemistry” are essential reads.
The paper “Targeted Free Energy Perturbation Revisited” demonstrates a practical workflow for rare‑event sampling.

September 04, 2025

♥Save to Reading List

An Automated, Scalable Machine Learning Model Inversion Assessment Pipeline

Concurrent Technologies

Abstract
Machine learning (ML) models have the potential to transform military battlefields, presenting a large external pressure to rapidly incorporate them into operational settings. However, it is well-established that these ML models are vulnerable to a number of adversarial attacks throughout the model deployment pipeline that threaten to negate battlefield advantage. One broad category is privacy attacks (such as model inversion) where an adversary can reverse engineer information from the model, such as the sensitive data used in its training. The ability to quantify the risk of model inversion attacks (MIAs) is not well studied, and there is a lack of automated developmental test and evaluation (DT&E) tools and metrics to quantify the effectiveness of privacy loss of the MIA. The current DT&E process is difficult because ML model inversions can be hard for a human to interpret, subjective when they are interpretable, and difficult to quantify in terms of inversion quality. Additionally, scaling the DT&E process is challenging due to many ML model architectures and data modalities that need to be assessed. In this work, we present a novel DT&E tool that quantifies the risk of data privacy loss from MIAs and introduces four adversarial risk dimensions to quantify privacy loss. Our DT&E pipeline combines inversion with vision language models (VLMs) to improve effectiveness while enabling scalable analysis. We demonstrate effectiveness using multiple MIA techniques and VLMs configured for zero-shot classification and image captioning. We benchmark the pipeline using several state-of-the-art MIAs in the computer vision domain with an image classification task that is typical in military applications. In general, our innovative pipeline extends the current model inversion DT&E capabilities by improving the effectiveness and scalability of the privacy loss analysis in an automated fashion.

AI Insights

Four new adversarial risk dimensions quantify privacy loss beyond success rates.
Fine‑tuned VLMs like InstructBLIP and BLIP‑2 act as proxy attackers for zero‑shot classification and captioning.
Benchmarks on military image classification show the pipeline scales across architectures and modalities.
Modular design lets new VLMs plug in, enabling quick adaptation to emerging attacks.
Limitations: VLM fine‑tuning and overfitting risk trade off attack realism and generalizability.
Recommended reading: model inversion via confidence exploitation and a survey of VLM surprises.
GitHub repo (https://github.com/greentfrapp/lucent) offers scripts and datasets for rapid deployment.

September 04, 2025

♥Save to Reading List

Data Science Development Tools

Designing a Lightweight GenAI Interface for Visual Data Analysis

Indiana University, The

Abstract
Recent advances in Generative AI have transformed how users interact with data analysis through natural language interfaces. However, many systems rely too heavily on LLMs, creating risks of hallucination, opaque reasoning, and reduced user control. We present a hybrid visual analysis system that integrates GenAI in a constrained, high-level role to support statistical modeling while preserving transparency and user agency. GenAI translates natural language intent into formal statistical formulations, while interactive visualizations surface model behavior, residual patterns, and hypothesis comparisons to guide iterative exploration. Model fitting, diagnostics, and hypothesis testing are delegated entirely to a structured R-based backend, ensuring correctness, interpretability, and reproducibility. By combining GenAI-assisted intent translation with visualization-driven reasoning, our approach broadens access to modeling tools without compromising rigor. We present an example use case of the tool and discuss challenges and opportunities for future research.

AI Insights

The interface supports iterative dialogue, letting users refine hypotheses through natural‑language back‑and‑forth.
Personalized prompts could tailor the translation to a user’s domain, enhancing relevance and trust.
Planned studies will compare novices and experts against fully automated and manual baselines.
Generative AI: a system that creates new content from input prompts, enabling intent‑to‑formula translation.
Visual Statistical Modeling: combining charts and equations so users see model behavior and residuals together.
Read “Data Analysis in the Era of Generative AI” by Inala et al. for AI‑augmented workflow insights.
Also explore “Modeltracker” by Amershi et al. for performance‑analysis tool design in ML.

September 02, 2025

♥Save to Reading List

Machine Learning Operations

DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing

ETH Zurich

Abstract
Automatic differentiation (AD) is a set of techniques that systematically applies the chain rule to compute the gradients of functions without requiring human intervention. Although the fundamentals of this technology were established decades ago, it is experiencing a renaissance as it plays a key role in efficiently computing gradients for backpropagation in machine learning algorithms. AD is also crucial for many applications in scientific computing domains, particularly emerging techniques that integrate machine learning models within scientific simulations and schemes. Existing AD frameworks have four main limitations: limited support of programming languages, requiring code modifications for AD compatibility, limited performance on scientific computing codes, and a naive store-all solution for forward-pass data required for gradient calculations. These limitations force domain scientists to manually compute the gradients for large problems. This work presents DaCe AD, a general, efficient automatic differentiation engine that requires no code modifications. DaCe AD uses a novel ILP-based algorithm to optimize the trade-off between storing and recomputing to achieve maximum performance within a given memory constraint. We showcase the generality of our method by applying it to NPBench, a suite of HPC benchmarks with diverse scientific computing patterns, where we outperform JAX, a Python framework with state-of-the-art general AD capabilities, by more than 92 times on average without requiring any code changes.

AI Insights

Enzyme fuses reverse‑mode AD with LLVM compiler passes to auto‑tune GPU kernels.
Transparent checkpointing trades recomputation for memory, cutting peak usage by up to 70 %.
The ILP scheduler in DaCe AD selects between store‑all and recompute strategies per loop.
Benchmarks show Enzyme‑accelerated kernels outperform legacy AD by 3–5× on NVIDIA A100.
Parallelism limits in current AD systems stem from naïve tape storage and lack of loop‑level fusion.
“Reverse‑mode automatic differentiation and optimization of GPU kernels via Enzyme” (ICML 2023) details the core algorithm.
For deeper dives, see “Transparent Checkpointing for Automatic Differentiation of Program Loops Through Expression Transformations” (NeurIPS 2024).

September 02, 2025

♥Save to Reading List

Machine Learning Validation

Formal Verification of Local Robustness of a Classification Algorithm for a Spatial Use Case

KTH Royal Institute of

Abstract
Failures in satellite components are costly and challenging to address, often requiring significant human and material resources. Embedding a hybrid AI-based system for fault detection directly in the satellite can greatly reduce this burden by allowing earlier detection. However, such systems must operate with extremely high reliability. To ensure this level of dependability, we employ the formal verification tool Marabou to verify the local robustness of the neural network models used in the AI-based algorithm. This tool allows us to quantify how much a model's input can be perturbed before its output behavior becomes unstable, thereby improving trustworthiness with respect to its performance under uncertainty.

September 04, 2025

♥Save to Reading List

Machine Learning Testing

Online Learning of Optimal Sequential Testing Policies

University of Michigan

Abstract
This paper studies an online learning problem that seeks optimal testing policies for a stream of subjects, each of whom can be evaluated through a sequence of candidate tests drawn from a common pool. We refer to this problem as the Online Testing Problem (OTP). Although conducting every candidate test for a subject provides more information, it is often preferable to select only a subset when tests are correlated and costly, and make decisions with partial information. If the joint distribution of test outcomes were known, the problem could be cast as a Markov Decision Process (MDP) and solved exactly. In practice, this distribution is unknown and must be learned online as subjects are tested. When a subject is not fully tested, the resulting missing data can bias estimates, making the problem fundamentally harder than standard episodic MDPs. We prove that the minimax regret must scale at least as $\Omega(T^{\frac{2}{3}})$, in contrast to the $\Theta(\sqrt{T})$ rate in episodic MDPs, revealing the difficulty introduced by missingness. This elevated lower bound is then matched by an Explore-Then-Commit algorithm whose cumulative regret is $\tilde{O}(T^{\frac{2}{3}})$ for both discrete and Gaussian distributions. To highlight the consequence of missingness-dependent rewards in OTP, we study a variant called the Online Cost-sensitive Maximum Entropy Sampling Problem, where rewards are independent of missing data. This structure enables an iterative-elimination algorithm that achieves $\tilde{O}(\sqrt{T})$ regret, breaking the $\Omega(T^{\frac{2}{3}})$ lower bound for OTP. Numerical results confirm our theory in both settings. Overall, this work deepens the understanding of the exploration--exploitation trade-off under missing data and guides the design of efficient sequential testing policies.

AI Insights

The authors introduce a compression‑based framework that shrinks the test‑selection space, enabling efficient learning even with dependent observations.
This framework yields near‑optimal sample‑complexity guarantees for robustly learning Gaussian mixture models, a result not discussed in the abstract.
Algorithm 4 achieves a cumulative regret of \(\tilde{O}(d^{3}\sigma\sqrt{T})\), tightening the bound for high‑dimensional settings.
The compression scheme also underpins confidence‑sequence construction, offering anytime guarantees for sequential decision‑making.
For practitioners, “Elements of Information Theory” by Cover & Thomas provides the entropy tools that make the compression idea work.
The paper’s iterative‑elimination strategy for the cost‑sensitive variant demonstrates how reward independence can break the \(\Omega(T^{2/3})\) barrier.
A complementary read is Ashtiani et al. 2020 on Gaussian mixtures, which expands on the sample‑complexity analysis presented here.

September 03, 2025

♥Save to Reading List

Learning to Ask: Decision Transformers for Adaptive Quantitative Group Testing

University of California

Abstract
We consider the problem of quantitative group testing (QGT), where the goal is to recover a sparse binary vector from aggregate subset-sum queries: each query selects a subset of indices and returns the sum of those entries. Information-theoretic results suggest that adaptivity could yield up to a twofold reduction in the total number of required queries, yet no algorithm has surpassed the non-adaptive bound, leaving its practical benefit an open question. In this paper, we reduce the QGT problem to an integer-vector recovery task whose dimension scales with the sparsity of the original problem rather than its full ambient size. We then formulate this reduced recovery task as an offline reinforcement learning problem and employ Decision Transformers to solve it adaptively. By combining these two steps, we obtain an effective end-to-end method for solving the QGT problem. Our experiments show that, for the first time in the literature, our adaptive algorithm reduces the average number of queries below the well-known non-adaptive information-theoretic bound, demonstrating that adaptivity can indeed reduce the number of queries.

AI Insights

Decision Transformers cut query latency versus expert agents by up to three orders of magnitude.
Offline synthetic training lets the model learn without real‑world data yet still solve unseen QGT problems.
A transformer, usually for NLP, is repurposed to tackle a combinatorial optimization task.
Adaptive queries now fall below the classic non‑adaptive information‑theoretic lower bound, approaching the theoretical optimum.
Reducing the problem to an integer‑vector recovery that scales with sparsity yields an efficient offline RL formulation.
Read Group Testing: An Information Theory Perspective for theory and Decision Transformer: Reinforcement Learning via Sequence Modeling for the method.
Future work must test synthetic‑to‑real transfer and generalization beyond QGT, as current results focus on limited metrics.

September 01, 2025

♥Save to Reading List

Model Monitoring

SpecEval: Evaluating Model Adherence to Behavior Specifications

Stanford University and 2

Abstract
Companies that develop foundation models publish behavioral guidelines they pledge their models will follow, but it remains unclear if models actually do so. While providers such as OpenAI, Anthropic, and Google have published detailed specifications describing both desired safety constraints and qualitative traits for their models, there has been no systematic audit of adherence to these guidelines. We introduce an automated framework that audits models against their providers specifications by parsing behavioral statements, generating targeted prompts, and using models to judge adherence. Our central focus is on three way consistency between a provider specification, its model outputs, and its own models as judges; an extension of prior two way generator validator consistency. This establishes a necessary baseline: at minimum, a foundation model should consistently satisfy the developer behavioral specifications when judged by the developer evaluator models. We apply our framework to 16 models from six developers across more than 100 behavioral statements, finding systematic inconsistencies including compliance gaps of up to 20 percent across providers.

AI Insights

SpecEval turns safety clauses into predicates, generating prompts that mirror real‑world use.
It uses the provider’s own evaluator as judge, forming a self‑referential consistency loop.
Three‑way consistency—spec, output, judge—tightens the generator‑validator audit.
SpecEval uncovered up to 20 % compliance gaps in 16 models, showing even top vendors slip on nuances.
Audit reveals provider‑specific gaps, urging clearer, testable guideline language.
Future work adds human adjudicators to calibrate model judges, boosting audit reliability.
Key references cover formal specification verification, safety‑by‑design AI, and bias‑detection frameworks.

September 02, 2025

♥Save to Reading List

DTMC Model Checking by Path Abstraction Revisited (extended version)

Abstract
Computing the probability of reaching a set of goal states G in a discrete-time Markov chain (DTMC) is a core task of probabilistic model checking. We can do so by directly computing the probability mass of the set of all finite paths from the initial state to G; however, when refining counterexamples, it is also interesting to compute the probability mass of subsets of paths. This can be achieved by splitting the computation into path abstractions that calculate "local" reachability probabilities as shown by \'Abrah\'am et al. in 2010. In this paper, we complete and extend their work: We prove that splitting the computation into path abstractions indeed yields the same result as the direct approach, and that the splitting does not need to follow the SCC structure. In particular, we prove that path abstraction can be performed along any finite sequence of sets of non-goal states. Our proofs proceed in a novel way by interpreting the DTMC as a structure on the free monoid on its state space, which makes them clean and concise. Additionally, we provide a compact reference implementation of path abstraction in PARI/GP.

September 02, 2025

♥Save to Reading List

Machine Learning Deployment

Reinforcement Learning for Machine Learning Engineering Agents

Stanford University

Abstract
Existing agents for solving tasks such as ML engineering rely on prompting powerful language models. As a result, these agents do not improve with more experience. In this paper, we show that agents backed by weaker models that improve via reinforcement learning (RL) can outperform agents backed by much larger, but static models. We identify two major challenges with RL in this setting. First, actions can take a variable amount of time (e.g., executing code for different solutions), which leads to asynchronous policy gradient updates that favor faster but suboptimal solutions. To tackle variable-duration actions, we propose duration-aware gradient updates in a distributed asynchronous RL framework to amplify high-cost but high-reward actions. Second, using only test split performance as a reward provides limited feedback. A program that is nearly correct is treated the same as one that fails entirely. To address this, we propose environment instrumentation to offer partial credit, distinguishing almost-correct programs from those that fail early (e.g., during data loading). Environment instrumentation uses a separate static language model to insert print statement to an existing program to log the agent's experimental progress, from which partial credit can be extracted as reward signals for learning. Our experimental results on MLEBench suggest that performing gradient updates on a much smaller model (Qwen2.5-3B) trained with RL outperforms prompting a much larger model (Claude-3.5-Sonnet) with agent scaffolds, by an average of 22% across 12 Kaggle tasks.

AI Insights

The agent achieved a 0.66 score in 115 s on the random‑acts‑of‑pizza task, showing a fast‑but‑effective policy.
On the learning‑agency‑lab‑automated‑essay‑scoring‑2 task it reached 0.73 in 281 s, proving it can trade speed for higher reward.
Print‑statement instrumentation lets the agent log intermediate states, turning opaque code into a readable trace.
Partial‑credit rewards differentiate near‑correct programs from early failures, sharpening the learning signal.
The RL‑trained Qwen2.5‑3B outperforms a 3.5‑Sonnet prompt by 22 % on average across 12 Kaggle benchmarks.
The agent’s solutions generalize across tasks, adapting to varying cost constraints without manual tuning.
These results illustrate that a lightweight model, when guided by duration‑aware gradients and fine‑grained rewards, can surpass larger static baselines.

September 01, 2025

♥Save to Reading List

Online inference

ResearchPulse: Building Method-Experiment Chains through Multi-Document Scientific Inference

University of Chinese of

Abstract
Understanding how scientific ideas evolve requires more than summarizing individual papers-it demands structured, cross-document reasoning over thematically related research. In this work, we formalize multi-document scientific inference, a new task that extracts and aligns motivation, methodology, and experimental results across related papers to reconstruct research development chains. This task introduces key challenges, including temporally aligning loosely structured methods and standardizing heterogeneous experimental tables. We present ResearchPulse, an agent-based framework that integrates instruction planning, scientific content extraction, and structured visualization. It consists of three coordinated agents: a Plan Agent for task decomposition, a Mmap-Agent that constructs motivation-method mind maps, and a Lchart-Agent that synthesizes experimental line charts. To support this task, we introduce ResearchPulse-Bench, a citation-aware benchmark of annotated paper clusters. Experiments show that our system, despite using 7B-scale agents, consistently outperforms strong baselines like GPT-4o in semantic alignment, structural consistency, and visual fidelity. The dataset are available in https://huggingface.co/datasets/ResearchPulse/ResearchPulse-Bench.

AI Insights

LLMs falter on multi‑document summarization due to weak contextual grounding and limited cross‑source reasoning.
The hybrid model fuses graph‑based cues with RL‑guided attention to bridge semantic gaps.
A new evaluation suite scores coherence, factual fidelity, and cross‑document alignment, beating ROUGE.
On 200 citation‑aware clusters, contextual reasoning boosts summary quality by 12% over GPT‑4o.
Gemini 1.5 and Qwen2.5‑coder are cited as promising multimodal backbones for inference chains.
The paper urges community annotation of domain datasets that capture conflicting experimental results.
“Multi‑Document Summarization: A Survey” is recommended for deep insight into graph‑based and RL methods.

September 03, 2025

♥Save to Reading List

Kangaroo: A Private and Amortized Inference Framework over WAN for Large-Scale Decision Tree Evaluation

Abstract
With the rapid adoption of Models-as-a-Service, concerns about data and model privacy have become increasingly critical. To solve these problems, various privacy-preserving inference schemes have been proposed. In particular, due to the efficiency and interpretability of decision trees, private decision tree evaluation (PDTE) has garnered significant attention. However, existing PDTE schemes suffer from significant limitations: their communication and computation costs scale with the number of trees, the number of nodes, or the tree depth, which makes them inefficient for large-scale models, especially over WAN networks. To address these issues, we propose Kangaroo, a private and amortized decision tree inference framework build upon packed homomorphic encryption. Specifically, we design a novel model hiding and encoding scheme, together with secure feature selection, oblivious comparison, and secure path evaluation protocols, enabling full amortization of the overhead as the number of nodes or trees scales. Furthermore, we enhance the performance and functionality of the framework through optimizations, including same-sharing-for-same-model, latency-aware, and adaptive encoding adjustment strategies. Kangaroo achieves a $14\times$ to $59\times$ performance improvement over state-of-the-art (SOTA) one-round interactive schemes in WAN environments. For large-scale decision tree inference tasks, it delivers a $3\times$ to $44\times$ speedup compared to existing schemes. Notably, Kangaroo enables the evaluation of a random forest with $969$ trees and $411825$ nodes in approximately $60$ ms per tree (amortized) under WAN environments.

September 03, 2025

♥Save to Reading List

Machine Learning Infrastructure

Learn2Reg 2024: New Benchmark Datasets Driving Progress on New Challenges

EchoScout GmbH, Institute

Abstract
Medical image registration is critical for clinical applications, and fair benchmarking of different methods is essential for monitoring ongoing progress. To date, the Learn2Reg 2020-2023 challenges have released several complementary datasets and established metrics for evaluations. However, these editions did not capture all aspects of the registration problem, particularly in terms of modality diversity and task complexity. To address these limitations, the 2024 edition introduces three new tasks, including large-scale multi-modal registration and unsupervised inter-subject brain registration, as well as the first microscopy-focused benchmark within Learn2Reg. The new datasets also inspired new method developments, including invertibility constraints, pyramid features, keypoints alignment and instance optimisation.

September 01, 2025

♥Save to Reading List

Machine Learning Resilience

BioBlue: Notable runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

Abstract
Relatively many past AI safety discussions have centered around the dangers of unbounded utility maximisation by RL agents, illustrated by scenarios like the "paperclip maximiser" or by specification gaming in general. Unbounded maximisation is problematic for many reasons. We wanted to verify whether these RL runaway optimisation problems are still relevant with LLMs as well. Turns out, strangely, this is indeed clearly the case. The problem is not that the LLMs just lose context or become incoherent. The problem is that in various scenarios, LLMs lose context in very specific ways, which systematically resemble runaway optimisers in the following distinct ways: 1) Ignoring homeostatic targets and "defaulting" to unbounded maximisation instead. 2) It is equally concerning that the "default" meant also reverting back to single-objective optimisation. Our findings also suggest that long-running scenarios are important. Systematic failures emerge after periods of initially successful behaviour. In some trials the LLMs were successful until the end. This means, while current LLMs do conceptually grasp biological and economic alignment, they exhibit randomly triggered problematic behavioural tendencies under sustained long-running conditions, particularly involving multiple or competing objectives. Once they flip, they usually do not recover. Even though LLMs look multi-objective and bounded on the surface, the underlying mechanisms seem to be actually still biased towards being single-objective and unbounded.

September 02, 2025

♥Save to Reading List

Interests not found

We did not find any papers that match the below interests. Try other terms also consider if the content exists in arxiv.org.

MLOps
Data Science Development Environment and Productivity

You can edit or add more interests any time.

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback

Unsubscribe from these updates