Long-term Monitoring of Kernel and Hardware Events to Understand Latency Variance

The Ohio State University

Rate paper: 👍 👎 ♥ Save

AI Insights

The system has three main components: offline analysis, online analysis, and output. [3]
VarMRI is a system for long-term monitoring of kernel and hardware events to understand latency variance. [2]

Abstract
This paper presents our experience to understand latency variance caused by kernel and hardware events, which are often invisible at the application level. For this purpose, we have built VarMRI, a tool chain to monitor and analyze those events in the long term. To mitigate the "big data" problem caused by long-term monitoring, VarMRI selectively records a subset of events following two principles: it only records events that are affecting the requests recorded by the application; it records coarse-grained information first and records additional information only when necessary. Furthermore, VarMRI introduces an analysis method that is efficient on large amount of data, robust on different data set and against missing data, and informative to the user. VarMRI has helped us to carry out a 3,000-hour study of six applications and benchmarks on CloudLab. It reveals a wide variety of events causing latency variance, including interrupt preemption, Java GC, pipeline stall, NUMA balancing etc.; simple optimization or tuning can reduce tail latencies by up to 31%. Furthermore, the impacts of some of these events vary significantly across different experiments, which confirms the necessity of long-term monitoring.

Why we are recommending this paper?
Due to your Interest in Low latency

This paper directly addresses latency, a core interest, by examining kernel and hardware events – key factors influencing system performance. Analyzing these events with VarMRI provides a valuable approach to understanding and mitigating latency issues, aligning with your focus on low latency systems.

Statistical Characterization and Prediction of E2E Latency over LEO Satellite Networks

Aalborg University

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

Predictive accuracy was quantified using both regression (Mean Squared error (MSE)) and classification (Area Under the PR-Curve (AUPRC)), confirming the feasibility of simple yet effective forecasting approaches. [3]
LEO: Low Earth Orbit RTT: Round-Trip Time UL: Uplink DL: Downlink MSE: Mean Squared Error AUPRC: Area Under the PR-Curve [3]
The study presents a statistical framework for period-level characterization, prediction, and classification of end-to-end latency in LEO satellite networks. [2]
The framework is applied to Starlink's system, characterizing its deterministic 15-second periodic structure, investigating Uplink (UL), Downlink (DL), and Round-Trip Time (RTT). [1]

Abstract
Low Earth Orbit (LEO) satellite networks are emerging as an essential communication infrastructure, with standardized 5G-based non-terrestrial networks and their integration with terrestrial systems envisioned as a key feature of 6G. However, current LEO systems still exhibit significant latency variations, limiting their suitability for latency-sensitive services. We present a detailed statistical analysis of end-to-end latency based on 500Hz experimental bidirectional one-way measurements and introduce a segmentation of the deterministic 15-second periodic behavior observed in Starlink. We characterize handover-induced boundary regions that produce latency spikes lasting approximately 140 ms at the beginning and 75 ms at the end of each cycle, followed by a stable intra-period regime, enabling accurate short-term prediction. This analysis shows that latency prediction based on long-term statistics leads to pessimistic estimates. In contrast, by exploiting the periodic structure, isolating boundary regions, and applying lightweight parametric and non-parametric models to intra-period latency distributions, we achieve 99th-percentile latency prediction errors below 50 ms. Furthermore, period-level latency prediction and classification enable adaptive transmission strategies by identifying upcoming periods where application latency requirements cannot be satisfied, necessitating the use of alternative systems.

Why we are recommending this paper?
Due to your Interest in Low latency

Given your interest in high throughput, this research investigates latency in LEO satellite networks, a potentially significant area for expanding network capacity. Understanding latency characteristics in this emerging infrastructure is crucial for optimizing system performance.

Transforming Crises into Opportunities: From Chaos to Urban Antifragility

Universitat Politcnica de CatalunyaBarcelonaTech

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

Co-activations reflect adaptive intelligence, with cities pairing principles to reduce vulnerabilities and open pathways for evolution. [3]
Antifragile trajectories depend on interdimensional alignment across learning, innovation, and post-crisis transformation. [2]

Abstract
Urban crises - floods, pandemics, economic shocks, and conflicts - function as accelerators of urban change, exposing structural vulnerabilities while creating windows for reinvention. Building on a prior theoretical contribution that identified fifteen principles of urban antifragility, this paper tests and operationalizes the framework through an empirical assessment of 26 cities selected for their post-crisis adaptation trajectories. Using a tailored diagnostic methodology, we benchmark cities' Stress Response Strategies (SRS) and then evaluate Urban Development Trajectories (UDT) across four weighted dimensions, positioning each case along a fragility-robustness-resilience-antifragility continuum and applying a balanced-threshold rule to confirm antifragile status. Results show that "resilience enhanced by innovation and technology" is the most effective response typology (86.9/100), and that six cities meet the antifragile trajectory criteria. By mapping best practices to activated principles and analysing co-activations, the study identifies a robust "hard core" of principles - Sustainable Resilience (O), Strategic Diversity (F), Proactive Innovation (I), and Active Prevention (N) - supplemented by operational enablers (e.g., anticipation, mobilization, shock absorption). The paper concludes by proposing an evidence-based, SDG-aligned operational model that links high-impact principle pairings to measurable indicators, offering a practical roadmap for cities seeking to convert crises into sustained transformation. Keywords: Post-crisis strategies, Urban antifragility, Sustainable cities and communities, Disaster resilience and urban regeneration, Risk governance and Black Swan adaptation.

Why we are recommending this paper?
Due to your Interest in Resilience

This paper’s focus on urban resilience and identifying vulnerabilities is relevant to building robust and reliable distributed systems. Understanding how systems respond to disruptions is key to achieving resilience, a central interest.

MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm

Southeast University

Rate paper: 👍 👎 ♥ Save

AI Insights

MixServe uses a hybrid TP-EP parallelism strategy, which balances data and expert parallelism to minimize the dominant communication term. [3]
The authors conduct ablation studies to evaluate the impact of different components in MixServe, including the trade-off between DP and EP, and the effect of overlapping communication on performance. [3]
The paper presents MixServe, a distributed serving system for large-scale transformer models that optimizes parallelism and communication. [2]

Abstract
The Mixture of Experts (MoE) models are emerging as the latest paradigm for Large Language Models (LLMs). However, due to memory constraints, MoE models with billions or even trillions of parameters can only be deployed in multi-GPU or even multi-node & multi-GPU based serving systems. Thus, communication has became a major bottleneck in distributed serving systems, especially inter-node communication. Contemporary distributed MoE models are primarily implemented using all-reduce (AR) based tensor parallelism (TP) and all-to-all (A2A) based expert parallelism (EP). However, TP generally exhibits low inter-node efficiency and is thus confined to high-speed intra-node bandwidth. In contrast, EP tends to suffer from load imbalance, especially when the parallel degree is high. In this work, we introduce MixServe, a novel automatic distributed serving system for efficient deployment of MoE models by a novel TP-EP hybrid parallelism based on fused AR-A2A communication algorithm. MixServe begins by evaluating the communication overhead associated with various parallel strategies, taking into account the model hyperparameters and the configurations of network and hardware resources, and then automatically selects the most efficient parallel strategy. Then, we propose the TP-EP hybrid parallelism based on fused AR-A2A communication algorithm that overlaps intra-node AR communication and inter-node A2A communication. Extensive experiments on DeepSeek-R1 and Qwen3 models demonstrate that MixServe achieves superior inference performance, with 1.08~3.80x acceleration in time to first token (TTFT), 1.03~1.66x acceleration in inter-token latency (ITL), and 5.2%~50.3% throughput improvement compared to existing approaches.

Why we are recommending this paper?
Due to your Interest in Distributed Systems

This work explores distributed serving systems, particularly for MoE models, which are increasingly important for high-throughput applications. The focus on communication algorithms directly relates to optimizing system performance and reducing latency.

Fundamental Limits of Multi-User Distributed Computing of Linearly Separable Functions

EURECOM

Rate paper: 👍 👎 ♥ Save

AI Insights

Distributed computing: a method of dividing tasks among multiple processors or computers to improve efficiency and speed up computations. [3]
Linearly separable functions: functions that can be computed using linear combinations of inputs, which are useful in machine learning and signal processing applications. [3]
The proposed method uses a combination of coding theory and graph theory to optimize the computation and communication process. [3]
The paper assumes that the inputs are uniformly distributed, which may not be the case in real-world applications. [3]
Each computer can only do a small part of the task, but together they can finish it faster than one computer alone. [3]
The problem is about distributed computing and the tradeoff between computation and communication costs. [2]

Abstract
This work establishes the fundamental limits of the classical problem of multi-user distributed computing of linearly separable functions. In particular, we consider a distributed computing setting involving $L$ users, each requesting a linearly separable function over $K$ basis subfunctions from a master node, who is assisted by $N$ distributed servers. At the core of this problem lies a fundamental tradeoff between communication and computation: each server can compute up to $M$ subfunctions, and each server can communicate linear combinations of their locally computed subfunctions outputs to at most $Δ$ users. The objective is to design a distributed computing scheme that reduces the communication cost (total amount of data from servers to users), and towards this, for any given $K$, $L$, $M$, and $Δ$, we propose a distributed computing scheme that jointly designs the task assignment and transmissions, and shows that the scheme achieves optimal performance in the real field under various conditions using a novel converse. We also characterize the performance of the scheme in the finite field using another converse based on counting arguments.

Why we are recommending this paper?
Due to your Interest in Distributed Systems

This paper delves into the theoretical limits of distributed computing, a foundational area for designing efficient and scalable systems. Understanding these limits is essential for optimizing performance and achieving high throughput in distributed environments.

Interests not found

Help us improve your experience!