Hi!

Your personalized paper recommendations for 10 to 14 November, 2025.
🎯 Top Personalized Recommendations
Tsinghua University
Why we think this paper is great for you:
This paper directly addresses the crucial aspects of system reliability and fault tolerance in complex, interconnected environments. You will find its insights into maintaining robust operations particularly relevant.
Rate paper: 👍 👎 ♥ Save
Abstract
Ensuring the reliability of agent architectures and effectively identifying problematic agents when failures occur are crucial challenges in multi-agent systems (MAS). Advances in large language models (LLMs) have established LLM-based agents as a major branch of MAS, enabling major breakthroughs in complex problem solving and world modeling. However, the reliability implications of this shift remain largely unexplored. i.e., whether substituting traditional agents with LLM-based agents can effectively enhance the reliability of MAS. In this work, we investigate and quantify the reliability of LLM-based agents from the perspective of Byzantine fault tolerance. We observe that LLM-based agents demonstrate stronger skepticism when processing erroneous message flows, a characteristic that enables them to outperform traditional agents across different topological structures. Motivated by the results of the pilot experiment, we design CP-WBFT, a confidence probe-based weighted Byzantine Fault Tolerant consensus mechanism to enhance the stability of MAS with different topologies. It capitalizes on the intrinsic reflective and discriminative capabilities of LLMs by employing a probe-based, weighted information flow transmission method to improve the reliability of LLM-based agents. Extensive experiments demonstrate that CP-WBFT achieves superior performance across diverse network topologies under extreme Byzantine conditions (85.7\% fault rate). Notably, our approach surpasses traditional methods by attaining remarkable accuracy on various topologies and maintaining strong reliability in both mathematical reasoning and safety assessment tasks.
AI Summary
  • Hidden-level Confidence Probing (HCP) consistently outperforms Prompt-level Confidence Probing (PCP) and single-token extraction methods, demonstrating that decoder-level semantic consistency signals are superior for robust confidence assessment. [3]
  • LLM-based agents inherently possess stronger skepticism towards erroneous information, enabling them to significantly outperform traditional agents in Byzantine fault tolerance across various network topologies. [2]
  • The proposed CP-WBFT mechanism effectively enhances MAS reliability by leveraging LLM's intrinsic reflective and discriminative capabilities through confidence-guided weighted information flow. [2]
  • CP-WBFT, particularly with Hidden-level Confidence Probing (HCP), achieves remarkable Byzantine fault tolerance, maintaining 100% final accuracy even under extreme conditions (85.7% fault rate) in well-connected topologies like complete graphs. [2]
  • Network topology critically influences consensus effectiveness, with complete graphs maximizing information flow for optimal performance, while constrained topologies limit consensus due to restricted information exchange. [2]
  • LLM-based multi-agent systems can exceed the classical Byzantine fault tolerance bound of f < n/3, tolerating a much higher proportion of malicious nodes than traditional systems. [2]
  • Safety assessment tasks (XSTest) exhibit higher topology dependence for effective consensus compared to mathematical reasoning tasks (GSM8K), which show more topology-agnostic robustness. [2]
  • Byzantine Fault Tolerance (BFT) in LLM-based MAS: The ability of multi-agent systems composed of LLMs to achieve consensus and maintain reliability despite the presence of malicious or arbitrarily faulty LLM agents. [2]
  • CP-WBFT (Confidence Probe-based Weighted Byzantine Fault Tolerant consensus mechanism): A novel consensus protocol that enhances MAS stability by dynamically assigning information weights based on agents' confidence levels, derived from either prompt-level or hidden-level probes. [2]
  • Prompt-level Confidence Probe (PCP): A method to explicitly elicit and quantify an LLM agent's confidence in its response through structured prompting strategies, leveraging the LLM's self-reflective capabilities. [2]
1QBit
Why we think this paper is great for you:
Focusing on the critical timing aspects of advanced computing, this paper explores how to minimize delays in high-performance systems. Its discussion on fault-tolerant architectures will also align with your interests.
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
The speed of a fault-tolerant quantum computer is dictated by the reaction time of its classical electronics, that is, the total time required by decoders and controllers to determine the outcome of a logical measurement and execute subsequent conditional logical operations. Despite its importance, the reaction time and its impact on the design of the logical microarchitecture of a quantum computer are not well understood. In this work, we build, for a surface code based architecture, a model for the reaction time in which the decoder latency is based on parallel space- and time-window decoding methods, and communication latencies are drawn from our envisioned quantum execution environment comprising a high-speed network of quantum processing units, controllers, decoders, and high-performance computing nodes. We use this model to estimate the increase in the logical error rate of magic state injections as a function of the reaction time. Next, we show how the logical microarchitecture can be optimized with respect to the reaction time, and then present detailed full-system quantum and classical resource estimates for executing utility-scale quantum circuits based on realistic hardware noise parameters and state-of-the-art decoding times. For circuits with $10^{6}$--$10^{11}$ $T$ gates involving 200--2000 logical qubits, under a $Λ=9.3$ hardware model representative of a realistic target for superconducting quantum processors operating at a 2.86 MHz stabilization frequency, we show that even decoding at a sub-microsecond per stabilization round speed introduces substantial resource overheads: approximately 100k--250k additional physical qubits for correction qubit storage in the magic state factory; 300k--1.75M extra physical qubits in the core processor due to the code distance increase of $d$ to $d+4$ for extra memory protection; and a longer runtime by roughly a factor of 100.
Why we think this paper is great for you:
This work investigates techniques to achieve superior concurrent performance and efficient data movement by offloading communication tasks. You will appreciate its focus on optimizing system throughput and responsiveness.
Rate paper: 👍 👎 ♥ Save
Abstract
Offloading machine learning (ML) communication collectives to direct memory access (DMA) engines has emerged as an interesting and low-cost solution to efficiently overlap computation and communication in inference and training. Doing so delivers superior concurrent performance by freeing up all GPU cores for computation and also lowers interference in the memory sub-system (caches). While DMA collectives show strong promise, prior works have only studied them in limited context (bandwidth-bound transfer sizes only, performance-only). To address this, we provide a comprehensive performance, power/energy and synchronization costs analysis of offloading ML communication collectives (all-gather, all-to-all) to DMA engines on state-of-the-art AMD Instinct MI300X GPUs. Our analysis reveals that, compared to the state-of-the-art RCCL communication collectives library, DMA collectives are at-par or better for large sizes (10s of MB to GB) in terms of both performance (16% better) and power (32% better). However, they significantly lag for latency-bound small sizes; 4.5X and 2.5X slower for all-gather and all-to-all, respectively. We provide a detailed latency breakdown of a DMA transfer and identify that DMA command scheduling and synchronization costs can limit DMA collective performance. To tackle this, we harness existing DMA architecture innovations, hitherto untapped, to build optimized DMA collectives and demonstrate their efficacy on real hardware. Our optimized implementations considerably close the performance gap for DMA collectives at smaller sizes (30% slower and 20% faster all-gather and all-to-all, respectively) and further improves performance (by 7%) and power savings at larger sizes (3-10%). Overall, this work represents a significant step toward making DMA collectives suitable for adoption in mainstream collective libraries.
Credere Associates LLC
Why we think this paper is great for you:
This paper offers valuable strategies for understanding and mitigating disruptions in complex systems, enhancing their ability to withstand unforeseen challenges. It provides a strong focus on maintaining stability and robustness.
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
Supply chains increasing globalization and complexity has resulted in recent unpredictable disruptions, ripple effects, and cascading resulting failures. Proposed practices for managing these concerns includes the advanced field of forward stress testing, where threats and predicted impacts to the supply chain are evaluated to harden the system against the most damaging scenarios. Such approaches are limited by the almost endless number of potential threat scenarios and cannot capture residual risk. In contrast to forward stress testing, this paper develops a reverse stress testing (RST) methodology that allows to probabilistically predict which changes across the supply chain network are most likely to cause a specified level of disruption in a specific country or company. The methodology was applied to the case of copper wire production in the USA, a simple good which may have significant implications for national security. Results show that Canada, Chile and Mexico are predicted to consistently be sources of disruptions at multiple loss levels. Other countries may contribute to overall losses during small disruptions but be less important if catastrophic losses are of concern for decision makers (e.g., Papua New Guinea). Yet some countries may be only important when catastrophic disruptions are considered (e.g., Chili). The probabilistic implementation of RST allows for robust and resilient supply chain design addressing both risk and resilience.
Why we think this paper is great for you:
Exploring how cutting-edge hardware accelerates demanding computational tasks, this paper provides context on achieving high performance in AI. You will find its insights into architectural optimization beneficial for improving system speed.
Rate paper: 👍 👎 ♥ Save
Paper visualization
Rate image: 👍 👎
Abstract
The remarkable progress in Artificial Intelligence (AI) is foundation-ally linked to a concurrent revolution in computer architecture. As AI models, particularly Deep Neural Networks (DNNs), have grown in complexity, their massive computational demands have pushed traditional architectures to their limits. This paper provides a structured review of this co-evolution, analyzing the architectural landscape designed to accelerate modern AI workloads. We explore the dominant architectural paradigms Graphics Processing Units (GPUs), Appli-cation-Specific Integrated Circuits (ASICs), and Field-Programmable Gate Ar-rays (FPGAs) by breaking down their design philosophies, key features, and per-formance trade-offs. The core principles essential for performance and energy efficiency, including dataflow optimization, advanced memory hierarchies, spar-sity, and quantization, are analyzed. Furthermore, this paper looks ahead to emerging technologies such as Processing-in-Memory (PIM) and neuromorphic computing, which may redefine future computation. By synthesizing architec-tural principles with quantitative performance data from industry-standard benchmarks, this survey presents a comprehensive picture of the AI accelerator landscape. We conclude that AI and computer architecture are in a symbiotic relationship, where hardware-software co-design is no longer an optimization but a necessity for future progress in computing.