High throughput

CPU-Limits kill Performance: Time to rethink Resource Control

Abstract
Research in compute resource management for cloud-native applications is dominated by the problem of setting optimal CPU limits -- a fundamental OS mechanism that strictly restricts a container's CPU usage to its specified CPU-limits . Rightsizing and autoscaling works have innovated on allocation/scaling policies assuming the ubiquity and necessity of CPU-limits . We question this. Practical experiences of cloud users indicate that CPU-limits harms application performance and costs more than it helps. These observations are in contradiction to the conventional wisdom presented in both academic research and industry best practices. We argue that this indiscriminate adoption of CPU-limits is driven by erroneous beliefs that CPU-limits is essential for operational and safety purposes. We provide empirical evidence making a case for eschewing CPU-limits completely from latency-sensitive applications. This prompts a fundamental rethinking of auto-scaling and billing paradigms and opens new research avenues. Finally, we highlight specific scenarios where CPU-limits can be beneficial if used in a well-reasoned way (e.g. background jobs).

Low latency

👍 👎 ♥ Save

Optimal Good-Case Latency for Sleepy Consensus

Columbia University, a16z

Abstract
In the context of Byzantine consensus problems such as Byzantine broadcast (BB) and Byzantine agreement (BA), the good-case setting aims to study the minimal possible latency of a BB or BA protocol under certain favorable conditions, namely the designated leader being correct (for BB), or all parties having the same input value (for BA). We provide a full characterization of the feasibility and impossibility of good-case latency, for both BA and BB, in the synchronous sleepy model. Surprisingly to us, we find irrational resilience thresholds emerging: 2-round good-case BB is possible if and only if at all times, at least $\frac{1}{\varphi} \approx 0.618$ fraction of the active parties are correct, where $\varphi = \frac{1+\sqrt{5}}{2} \approx 1.618$ is the golden ratio; 1-round good-case BA is possible if and only if at least $\frac{1}{\sqrt{2}} \approx 0.707$ fraction of the active parties are correct.

AI Insights

In the synchronous sleepy model, parties may sleep yet still progress when a correct leader is present.
A 2‑round Byzantine broadcast works only if ≥61.8 % of active nodes are honest, a threshold tied to the golden ratio.
A single‑round Byzantine agreement needs ≥70.7 % honest, linked to √2, revealing an irrational boundary.
No ρ‑secure BA protocol can finish in one round when ρ>1/3, even with static participation.
SyncBA achieves one‑round good‑case latency but only under the assumption that all participants stay online.
The paper distinguishes Byzantine broadcast from agreement, noting BB tolerates a dishonest leader while BA does not.
By mapping active‑party thresholds, the work paves the way for lightweight consensus in resource‑constrained networks.

👍 👎 ♥ Save

lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models

Georgia State University

Abstract
Large Language Models (LLMs) are increasingly integrated into everyday applications, but their prevalent cloud-based deployment raises growing concerns around data privacy and long-term sustainability. Running LLMs locally on mobile and edge devices (on-device LLMs) offers the promise of enhanced privacy, reliability, and reduced communication costs. However, realizing this vision remains challenging due to substantial memory and compute demands, as well as limited visibility into performance-efficiency trade-offs on resource-constrained hardware. We propose lm-Meter, the first lightweight, online latency profiler tailored for on-device LLM inference. lm-Meter captures fine-grained, real-time latency at both phase (e.g., embedding, prefill, decode, softmax, sampling) and kernel levels without auxiliary devices. We implement lm-Meter on commercial mobile platforms and demonstrate its high profiling accuracy with minimal system overhead, e.g., only 2.58% throughput reduction in prefill and 0.99% in decode under the most constrained Powersave governor. Leveraging lm-Meter, we conduct comprehensive empirical studies revealing phase- and kernel-level bottlenecks in on-device LLM inference, quantifying accuracy-efficiency trade-offs, and identifying systematic optimization opportunities. lm-Meter provides unprecedented visibility into the runtime behavior of LLMs on constrained platforms, laying the foundation for informed optimization and accelerating the democratization of on-device LLM systems. Code and tutorials are available at https://github.com/amai-gsu/LM-Meter.

AI Insights

lm‑Meter shows softmax consumes >40 % of decode time on Snapdragon 8 Gen 1, a hidden bottleneck.
8‑bit quantization cuts memory by 50 % and boosts decode throughput by 3 % on ARM Cortex‑A78 cores.
Paged attention reduces peak RAM by 30 % on a 4‑GB device, enabling 1.5× larger context windows.
Splitting attention heads across 8 GPU cores gives a 2.2× prefill speedup, proving on‑device parallelism works.
Kernel profiling reveals sampling is 1.8× slower under low‑power governors, hinting at dynamic governor tuning.
A hybrid 4‑bit/8‑bit quantization keeps BLEU within 1 % while halving inference energy.

Resilience

👍 👎 ♥ Save

Demographic synchrony increases the vulnerability of human societies to collapse

University of Texas atSan

Abstract
Why do human populations remain vulnerable to collapse, even when they are large? Classical demographic theory predicts that volatility in growth should decline rapidly with size due to the averaging effects of the law of large numbers. As such, while small-scale societies may be demographically fragile, large-scale societies should be much more stable. Using a large census dataset of 228 indigenous societies from Brazil, we show that this prediction does not hold. Instead of volatility declining as the square root of population size, it falls much more slowly. This means that individuals within communities do not behave as independent demographic units as their lives are correlated through cooperation, shared subsistence practices, overlapping land use, and exposure to common shocks such as disease outbreaks or failed harvests. These correlations build demographic synchrony, drastically reducing the effective demographic degrees of freedom in a population, keeping volatility higher than expected at all scales. As a result, large-scale populations fluctuate as if they were much smaller, increasing their vulnerability to collapse. This helps explain why human societies of all sizes seem vulnerable to collapse, and why the archaeological and historical record is filled with examples of large, complex societies collapsing despite their size. We suggest demographic synchrony provides a general mechanism for understanding why human populations remain vulnerable across all scales: Scale still stabilizes synchronous populations via density increases, but synchrony ensures that stability grows only slowly with size, leaving large populations more volatile, and more vulnerable, than classical demographic theory predicts.

Distributed Systems

👍 👎 ♥ Save

FIDRS: A Novel Framework for Integrated Distributed Reliable Systems

Rate this image: 😍 👍 👎

Abstract
In this paper we represent a new framework for integrated distributed and reliable systems. In the proposed framework we have used three parts to increase Satisfaction and Performance of this framework. At first we analyze previous frameworks related to integrated systems, then represent new proposed framework in order to improving previous framework, and we discuss its different phases. Finally we compare the results of simulation of the new framework with previous ones. In FIDRS framework, the technique of heterogeneous distributed data base is used to improve Performance and speed in responding to users and in this way we can improve dependability and reliability of framework simultaneously. In extraction phase of the new framework we have used RMSD algorithm that decreases responding time in big database. Finally by using FDIRS framework we succeeded to increase Efficiency, Performance and reliability of integrated systems and remove some of previous frameworks problems.

👍 👎 ♥ Save

The R(1)W(1) Communication Model for Self-Stabilizing Distributed Algorithms

Ryukoku University, Otsu

Abstract
Self-stabilization is a versatile methodology in the design of fault-tolerant distributed algorithms for transient faults. A self-stabilizing system automatically recovers from any kind and any finite number of transient faults. This property is specifically useful in modern distributed systems with a large number of components. In this paper, we propose a new communication and execution model named the R(1)W(1) model in which each process can read and write its own and neighbors' local variables in a single step. We propose self-stabilizing distributed algorithms in the R(1)W(1) model for the problems of maximal matching, minimal k-dominating set and maximal k-dependent set. Finally, we propose an example transformer, based on randomized distance-two local mutual exclusion, to simulate algorithms designed for the R(1)W(1) model in the synchronous message passing model with synchronized clocks.

AI Insights

TrR1W1 achieves an expected O(1) time overhead while keeping message complexity linear in n.
The transformer’s design is agnostic to the underlying R(1)W(1) model, paving the way for extensions to R(d r)W(d w).
Future work includes crafting efficient transformers for higher‑degree read/write models.
The approach assumes reliable communication during convergence, a subtle but critical requirement.
Target algorithms are silent, ensuring no extraneous state persists after stabilization.
For deeper insight, see “Introduction to Distributed Self‑stabilizing Algorithms” (2019) and “Self‑stabilization” (2000).
Complementary studies: “From state to link‑register model: A transformer for self‑stabilizing distributed algorithms” (2023) and “A new self‑stabilizing maximal matching algorithm” (2009).

Help us improve your experience!