Sandwich Reasoning: An Answer-Reasoning-Answer Approach for Low-Latency Query Correction

Renmin University of China

Rate paper: 👍 👎 ♥ Save

Abstract
Query correction is a critical entry point in modern search pipelines, demanding high accuracy strictly within real-time latency constraints. Chain-of-Thought (CoT) reasoning improves accuracy but incurs prohibitive latency for real-time query correction. A potential solution is to output an answer before reasoning to reduce latency; however, under autoregressive decoding, the early answer is independent of subsequent reasoning, preventing the model from leveraging its reasoning capability to improve accuracy. To address this issue, we propose Sandwich Reasoning (SandwichR), a novel approach that explicitly aligns a fast initial answer with post-hoc reasoning, enabling low-latency query correction without sacrificing reasoning-aware accuracy. SandwichR follows an Answer-Reasoning-Answer paradigm, producing an initial correction, an explicit reasoning process, and a final refined correction. To align the initial answer with post-reasoning insights, we design a consistency-aware reinforcement learning (RL) strategy: a dedicated consistency reward enforces alignment between the initial and final corrections, while margin-based rejection sampling prioritizes borderline samples where reasoning drives the most impactful corrective gains. Additionally, we construct a high-quality query correction dataset, addressing the lack of specialized benchmarks for complex query correction. Experimental results demonstrate that SandwichR achieves SOTA accuracy comparable to standard CoT while delivering a 40-70% latency reduction, resolving the latency-accuracy trade-off in online search.

Why we are recommending this paper?
Due to your Interest in Low latency

This paper directly addresses the need for low-latency query correction, a core interest for the user. The proposed 'Sandwich Reasoning' approach tackles the latency challenges inherent in traditional reasoning methods, aligning perfectly with the user's focus on minimizing response times.

ResMAS: Resilience Optimization in LLM-based Multi-agent Systems

Tsinghua University

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Large Language Model-based Multi-Agent Systems (LLM-based MAS), where multiple LLM agents collaborate to solve complex tasks, have shown impressive performance in many areas. However, MAS are typically distributed across different devices or environments, making them vulnerable to perturbations such as agent failures. While existing works have studied the adversarial attacks and corresponding defense strategies, they mainly focus on reactively detecting and mitigating attacks after they occur rather than proactively designing inherently resilient systems. In this work, we study the resilience of LLM-based MAS under perturbations and find that both the communication topology and prompt design significantly influence system resilience. Motivated by these findings, we propose ResMAS: a two-stage framework for enhancing MAS resilience. First, we train a reward model to predict the MAS's resilience, based on which we train a topology generator to automatically design resilient topology for specific tasks through reinforcement learning. Second, we introduce a topology-aware prompt optimization method that refines each agent's prompt based on its connections and interactions with other agents. Extensive experiments across a range of tasks show that our approach substantially improves MAS resilience under various constraints. Moreover, our framework demonstrates strong generalization ability to new tasks and models, highlighting its potential for building resilient MASs.

Why we are recommending this paper?
Due to your Interest in Resilience

Given the user's interest in resilience, this paper's exploration of LLM-based multi-agent systems offers a relevant perspective on system robustness. The focus on perturbations within distributed systems is a key area of concern.

Majorum: Ebb-and-Flow Consensus with Dynamic Quorums

Ethereum Foundation

Rate paper: 👍 👎 ♥ Save

AI Insights

The paper also presents a new joining protocol for ebb-and-flow consensus protocols, which allows validators to join or leave the network without disrupting the consensus process. [2]
The paper presents a new ebb-and-flow consensus protocol, called HotStuff-2, which is designed to be more efficient and scalable than its predecessor, HotStuff. [1]

Abstract
Dynamic availability is the ability of a consensus protocol to remain live despite honest participants going offline and later rejoining. A well-known limitation is that dynamically available protocols, on their own, cannot provide strong safety guarantees during network partitions or extended asynchrony. Ebb-and-flow protocols [SP21] address this by combining a dynamically available protocol with a partially synchronous finality protocol that irrevocably finalizes a prefix. We present Majorum, an ebb-and-flow construction whose dynamically available component builds on a quorum-based protocol (TOB-SVD). Under optimistic conditions, Majorum finalizes blocks in as few as three slots while requiring only a single voting phase per slot. In particular, when conditions remain favourable, each slot finalizes the next block extending the previously finalized one.

Why we are recommending this paper?
Due to your Interest in Distributed Systems

This paper’s investigation into dynamically available consensus protocols directly relates to the user's interest in resilience and distributed systems. The concept of handling participant availability aligns with the need for robust system operation.

Developing a Quantitative Resiliency Approach

Colorado State University

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

Reliability and resiliency are intertwined system characteristics that can feed into and/or off of each other. [2]

Abstract
Resiliency has garnered attention in the management of critical infrastructure as a metric of system performance, but there are significant roadblocks to its implementation in a realistic decision-making framework. Contrasted to risk and reliability, which have robust quantification approaches and undergird many regulatory approaches to system safety (e.g., "risk-informed decision-making"), resiliency is a diffuse, qualitatively-understood characteristic, often treated differently or distinctly. However, in the emerging context of highly-complex, highly-interdependent critical systems, the idea of reliability (as the probability of non-failure) may not be an appropriate metric of system health. As a result, focus is shifting towards resiliency-centered approaches that value the response to failure as much as the avoidance of failure. Supporting this approach requires a robustly-defined, quantitative understanding of resiliency. In this paper, we explore the foundations of reliability and resiliency engineering, and propose an approach to resiliency-informed decision-making bolstered by a quantitative understanding of resiliency.

Why we are recommending this paper?
Due to your Interest in Resilience

This paper's focus on quantifying resiliency provides a foundational approach to the user's interests. The work aims to establish a robust framework for measuring system performance, a critical element for building resilient systems.

Distributed Online Convex Optimization with Efficient Communication: Improved Algorithm and Lower bounds

Nanjing University

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
We investigate distributed online convex optimization with compressed communication, where $n$ learners connected by a network collaboratively minimize a sequence of global loss functions using only local information and compressed data from neighbors. Prior work has established regret bounds of $O(\max\{ω^{-2}ρ^{-4}n^{1/2},ω^{-4}ρ^{-8}\}n\sqrt{T})$ and $O(\max\{ω^{-2}ρ^{-4}n^{1/2},ω^{-4}ρ^{-8}\}n\ln{T})$ for convex and strongly convex functions, respectively, where $ω\in(0,1]$ is the compression quality factor ($ω=1$ means no compression) and $ρ<1$ is the spectral gap of the communication matrix. However, these regret bounds suffer from a \emph{quadratic} or even \emph{quartic} dependence on $ω^{-1}$. Moreover, the \emph{super-linear} dependence on $n$ is also undesirable. To overcome these limitations, we propose a novel algorithm that achieves improved regret bounds of $\tilde{O}(ω^{-1/2}ρ^{-1}n\sqrt{T})$ and $\tilde{O}(ω^{-1}ρ^{-2}n\ln{T})$ for convex and strongly convex functions, respectively. The primary idea is to design a \emph{two-level blocking update framework} incorporating two novel ingredients: an online gossip strategy and an error compensation scheme, which collaborate to \emph{achieve a better consensus} among learners. Furthermore, we establish the first lower bounds for this problem, justifying the optimality of our results with respect to both $ω$ and $T$. Additionally, we consider the bandit feedback scenario, and extend our method with the classic gradient estimators to enhance existing regret bounds.

Why we are recommending this paper?
Due to your Interest in Distributed Systems

This paper addresses the core challenge of efficient communication in distributed systems, directly relevant to the user's interest in high throughput. The algorithm's focus on minimizing communication aligns with optimizing system performance.

Interests not found

Help us improve your experience!