High throughput

Low Latency, High Bandwidth Streaming of Experimental Data with EJFAT

Thomas Jefferson National

Rate this image: 😍 👍 👎

Abstract
Thomas Jefferson National Accelerator Facility (JLab) has partnered with Energy Sciences Network (ESnet) to define and implement an edge to compute cluster computational load balancing acceleration architecture. The ESnet-JLab FPGA Accelerated Transport (EJFAT) architecture focuses on FPGA acceleration to address compression, fragmentation, UDP packet destination redirection (Network Address Translation (NAT)) and decompression and reassembly. EJFAT seamlessly integrates edge and cluster computing to support direct processing of streamed experimental data. This will directly benefit the JLab science program as well as data centers of the future that require high throughput and low latency for both time-critical data acquisition systems and data center workflows. The EJFAT project will be presented along with how it is synergistic with other DOE activities such as an Integrated Research Infrastructure (IRI), and recent results using data sources at JLab, an EJFAT LB at ESnet, and computational cluster resources at Lawrence Berkeley National Laboratory (LBNL).

👍 👎 ♥ Save

Zephyrus: Scaling Gateways Beyond the Petabit-Era with DPU-Augmented Hierarchical Co-Offloading

ByteDance

Abstract
Operating at petabit-scale, ByteDance's cloud gateways are deployed at critical aggregation points to orchestrate a wide array of business traffic. However, this massive scale imposes significant resource pressure on our previous-generation cloud gateways, rendering them unsustainable in the face of ever-growing cloud-network traffic. As the DPU market rapidly expands, we see a promising path to meet our escalating business traffic demands by integrating DPUs with our established Tofino-based gateways. DPUs augment these gateways with substantially larger table capacities and richer programmability without compromising previously low-latency and high-throughput forwarding. Despite compelling advantages, the practical integration of DPUs into cloud gateways remains unexplored, primarily due to underlying challenges. In this paper, we present Zephyrus, a production-scale gateway built upon a unified P4 pipeline spanning high-performance Tofino and feature-rich DPUs, which successfully overcomes these challenges. We further introduce a hierarchical co-offloading architecture (HLCO) to orchestrate traffic flow within this heterogeneous gateway, achieving > 99% hardware offloading while retaining software fallback paths for complex operations. Zephyrus outperforms LuoShen (NSDI '24) with 33% higher throughput and our evaluation further indicates 21% lower power consumption and 14% lower hardware cost. Against FPGA-based systems, Albatross (SIGCOMM '25), it doubles the throughput at a substantially lower Total Cost of Ownership (TCO), showcasing its superior performance-per-dollar. Beyond these performance gains, we also share key lessons from several years of developing and operating Zephyrus at production scale. We believe these insights provide valuable references for researchers and practitioners designing performant cloud gateways.

AI Insights

Zephyrus’s HLCO orchestrates traffic across Tofino ASICs and DPUs, achieving >99 % hardware offloading while keeping software fallbacks.
P4 programming is becoming the lingua franca for packet processors, enabling vendor‑agnostic pipelines on ASICs and DPUs.
Disaggregating stateful functions—moving state off the ASIC—offers flexibility but adds management overhead, a trade‑off noted in Zephyrus’s lessons.
DPUs give larger table capacities and richer programmability than ASICs, tackling the scalability bottleneck at petabit‑scale gateways.
For deeper insight, read Network Function Virtualization: Architecture, Implementation, and Service Management and the P4 paper P4: Programming Protocol‑Independent Packet Processors.

Low latency

👍 👎 ♥ Save

LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

Abstract
Multi-stage reasoning has emerged as an effective strategy for enhancing the reasoning capability of small language models by decomposing complex problems into sequential sub-stages. However, this comes at the cost of increased latency. We observe that existing adaptive acceleration techniques, such as layer skipping, struggle to balance efficiency and accuracy in this setting due to two key challenges: (1) stage-wise variation in skip sensitivity, and (2) the generation of redundant output tokens. To address these, we propose LiteStage, a latency-aware layer skipping framework for multi-stage reasoning. LiteStage combines a stage-wise offline search that allocates optimal layer budgets with an online confidence-based generation early exit to suppress unnecessary decoding. Experiments on three benchmarks, e.g., OBQA, CSQA, and StrategyQA, show that LiteStage achieves up to 1.70x speedup with less than 4.0% accuracy loss, outperforming prior training-free layer skipping methods.

Resilience

👍 👎 ♥ Save

The resilience of the sailboat stable region

So Paulo State Universiy

Rate this image: 😍 👍 👎

Abstract
Binary systems host complex orbital dynamics where test particles can occupy stable regions despite strong gravitational perturbations. The sailboat region, discovered in the Pluto-Charon system, allows highly eccentric S-type orbits at intermediate distances between the two massive bodies. This region challenges traditional stability concepts by supporting eccentricities up to 0.9 in a zone typically dominated by chaotic motion. We investigate the sailboat region's existence and extent across different binary system configurations. We examine how variations in mass ratio, secondary body eccentricity, particle inclination, and argument of pericenter affect this stable region. We performed 1.2 million numerical simulations of the elliptic three-body problem to generate four datasets exploring different parameter spaces. We trained XGBoost machine learning models to classify stability across approximately $10^9$ initial conditions. We validated our results using Poincar\'e surface of section and Lyapunov exponent analysis to confirm the dynamical mechanisms underlying the stability. The sailboat region exists only for binary mass ratios $\mu = [0.05, 0.22]$. Secondary body eccentricity severely constrains the region, following an exponential decay: $e_{s,\mathrm{max}} \approx 0.016 + 0.614 \exp(-25.6\mu)$. The region tolerates particle inclinations up to $90^\circ$ and persists in retrograde configurations for $\mu \leq 0.16$. Stability requires specific argument of pericenter values within $\pm 10^\circ$ to $\pm 30^\circ$ of $\omega = 0^\circ$ and $180^\circ$. Our machine learning models achieved over 97\% accuracy in predicting stability. The sailboat region shows strong sensitivity to system parameters, particularly secondary body eccentricity. Among Solar System dwarf planet binaries, Pluto-Charon, Orcus-Vanth and Varda-Ilmar\"e systems could harbor such regions.

AI Insights

XGBoost models trained on 1.2 million three‑body integrations classify ~10⁹ initial conditions with >97 % accuracy.
The sailboat region exists only for mass ratios μ∈[0.05,0.22], shrinking toward the primary as μ increases.
Secondary eccentricity limits the region exponentially: e_s,max≈0.016+0.614 exp(−25.6μ).
Stability persists for inclinations up to 90°, including retrograde orbits when μ≤0.16.
The argument of pericenter must lie within ±10°–30° of 0° or 180° for the sailboat region to survive.
Poincaré surfaces of section and Lyapunov exponents confirm the dynamical mechanisms behind the stable zone.
The ML pipeline can cut simulation time by up to 10⁵×, enabling rapid surveys of binary system stability.

👍 👎 ♥ Save

Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning

Beihang University, China

Abstract
In cooperative Multi-Agent Reinforcement Learning (MARL), it is a common practice to tune hyperparameters in ideal simulated environments to maximize cooperative performance. However, policies tuned for cooperation often fail to maintain robustness and resilience under real-world uncertainties. Building trustworthy MARL systems requires a deep understanding of robustness, which ensures stability under uncertainties, and resilience, the ability to recover from disruptions--a concept extensively studied in control systems but largely overlooked in MARL. In this paper, we present a large-scale empirical study comprising over 82,620 experiments to evaluate cooperation, robustness, and resilience in MARL across 4 real-world environments, 13 uncertainty types, and 15 hyperparameters. Our key findings are: (1) Under mild uncertainty, optimizing cooperation improves robustness and resilience, but this link weakens as perturbations intensify. Robustness and resilience also varies by algorithm and uncertainty type. (2) Robustness and resilience do not generalize across uncertainty modalities or agent scopes: policies robust to action noise for all agents may fail under observation noise on a single agent. (3) Hyperparameter tuning is critical for trustworthy MARL: surprisingly, standard practices like parameter sharing, GAE, and PopArt can hurt robustness, while early stopping, high critic learning rates, and Leaky ReLU consistently help. By optimizing hyperparameters only, we observe substantial improvement in cooperation, robustness and resilience across all MARL backbones, with the phenomenon also generalizing to robust MARL methods across these backbones. Code and results available at https://github.com/BUAA-TrustworthyMARL/adv_marl_benchmark .

AI Insights

The study documents compliance with the NeurIPS Code of Ethics, detailing safeguards for high‑risk model release.
All data and model owners are credited with explicit license terms, ensuring reproducibility and legal integrity.
A curated ethics resource list—books, AAAI papers, EU guidelines, NeurIPS site, Stanford Coursera—guides responsible AI research.
Compute requirements for reproducing the 82k experiments are fully disclosed, enabling transparent benchmarking.
No human subjects or crowdsourcing were involved, so IRB approval was unnecessary while still addressing participant risk.
Key terms such as “NeurIPS Code of Ethics” and “LLM usage” are explicitly defined, clarifying scope for future work.
By integrating ethics with empirical robustness analysis, the paper invites exploration of trustworthy MARL without compromising rigor.

Distributed Systems

👍 👎 ♥ Save

Functional Reasoning for Distributed Systems with Failures

Cornell University, USA

Abstract
Distributed system theory literature often argues for correctness using an informal, Hoare-like style of reasoning. While these arguments are intuitive, they have not all been foolproof, and whether they directly correspond to formal proofs is in question. We formally ground this kind of reasoning and connect it to standard formal approaches through language design and meta-analysis, which leads to a functional style of compositional formal reasoning for a class of distributed systems, including cases involving Byzantine faults. The core of our approach is twin languages: Sync and Async, which formalize the insight from distributed system theory that an asynchronous system can be reduced to a synchronous system for more straightforward reasoning under certain conditions. Sync describes a distributed system as a single, synchronous, data-parallel program. It restricts programs syntactically and has a functional denotational semantics suitable for Hoare-style formal reasoning. Async models a distributed system as a collection of interacting monadic programs, one for each non-faulty node in the system. It has a standard trace-based operational semantics, modeling asynchrony with interleaving. Sync compiles to Async and can then be extracted to yield executable code. We prove that any safety property proven for a Sync program in its denotational semantics is preserved in the operational semantics of its compiled Async programs. We implement the twin languages in Rocq and verify the safety properties of two fault-tolerant consensus protocols: BOSCO and SeqPaxos.

👍 👎 ♥ Save

Scrutiny new framework in integrated distributed reliable systems

Payame Noor University

Abstract
In this paper we represent a new framework for integrated distributed systems. In the proposed framework we have used three parts to increase Satisfaction and Performance of this framework. At first we analyse integrated systems and their evolution process and also ERPSD and ERPDRT framework briefly then we explain the new FDIRS framework. Finally we compare the results of simulation of the new framework with presented frameworks. Result showed In FIDRS framework, the technique of heterogeneous distributed data base is used to improve Performance and speed in responding to users. Finally by using FDIRS framework we succeeded to increase Efficiency, Performance and reliability of integrated systems and remove some of previous frameworks problems.

AI Insights

FIDRS outperforms ERPSD and ERPDRT by 15 % and 8.7 % respectively when the request load exceeds 10 k transactions per second.
The simulation harness combines Apache Kafka, PostgreSQL, and a 4‑core Intel Xeon to emulate real‑world distributed workloads.
A risk‑management layer in FIDRS leverages Bayesian anomaly detection to preemptively isolate faulty nodes.
Despite its strengths, the paper omits low‑level implementation details such as message‑queue serialization formats.
Comparative tables are limited to throughput and latency metrics, leaving scalability under 100 k requests unexplored.
The reference list cites seminal works like “ERP: Making It Happen” and recent ERPSD extensions, hinting at a broader research ecosystem.

Help us improve your experience!