High throughput

Dato: A Task-Based Programming Model for Dataflow Accelerators

Cornell University, UC M

Rate this image: 😍 👍 👎

Abstract
Recent deep learning workloads increasingly push computational demand beyond what current memory systems can sustain, with many kernels stalling on data movement rather than computation. While modern dataflow accelerators incorporate on-chip streaming to mitigate off-chip bandwidth limitations, existing programming models struggle to harness these capabilities effectively. Low-level interfaces provide fine-grained control but impose significant development overhead, whereas high-level tile-based languages abstract away communication details, restricting optimization and forcing compilers to reconstruct the intended dataflow. We present Dato, a Python-embedded, task-based programming model for dataflow accelerators that elevates data communication and sharding to first-class type constructs. Developers write programs as a graph of tasks connected via explicit stream types, with sharded inputs specified using layout types. These tasks are first mapped virtually onto the accelerator's spatial fabric, and the compiler then generates a physical mapping that respects hardware constraints. Experimental results on both AMD Ryzen AI NPU and Alveo FPGA devices demonstrate that Dato achieves high performance while significantly reducing the burden of writing optimized code. On the NPU, Dato attains up to 84% hardware utilization for GEMM and delivers a 2.81x speedup on attention kernels compared to a state-of-the-art commercial framework. On the FPGA, Dato surpasses leading frameworks in performance when generating custom systolic arrays, achieving 98% of the theoretical peak performance.

👍 👎 ♥ Save

Network-accelerated Active Messages

Abstract
Remote Direct Memory Access (RDMA) improves host networking performance by eliminating software and server CPU involvement. However, RDMA has a limited set of operations, is difficult to program, and often requires multiple round trips to perform simple application operations. Programmable SmartNICs provide a different means to offload work from host CPUs to a NIC. This leaves applications with the complex choice of embedding logic as RPC handlers at servers, using RDMA's limited interface to access server structures via client-side logic, or running some logic on SmartNICs. The best choice varies between workloads and over time. To solve this dilemma, we present NAAM, network-accelerated active messages. NAAM applications specify small, portable eBPF functions associated with messages. Each message specifies what data it accesses using an RDMA-like interface. NAAM runs at various places in the network, including at clients, on server-attached SmartNICs, and server host CPU cores. Due to eBPF's portability, the code associated with a message can be run at any location. Hence, the NAAM runtime can dynamically steer any message to execute its associated logic wherever it makes the most sense. To demonstrate NAAM's flexibility, we built several applications, including the MICA hash table and lookups from a Cell-style B-tree. With an NVIDIA BlueField-2 SmartNIC and integrating its NIC-embedded switch, NAAM can run any of these operations on client, server, and NIC cores, shifting load in tens of milliseconds on server compute congestion. NAAM dynamically offloads up to 1.8 million MICA ops/s for YCSB-B and 750,000 Cell lookups/s from server CPUs. Finally, whereas iPipe, the state-of-the-art SmartNIC offload framework, only scales to 8 application offloads on BlueField-2, NAAM scales to hundreds of application offloads with minimal impact on tail latency due to eBPF's low overhead.

Low latency

👍 👎 ♥ Save

Matisse: Visualizing Measured Internet Latencies as Manifolds

Rate this image: 😍 👍 👎

Abstract
Manifolds are complex topological spaces that can be used to represent datasets of real-world measurements. Visualizing such manifolds can help with illustrating their topological characteristics (e.g., curvature) and providing insights into important properties of the underlying data (e.g., anomalies in the measurements). In this paper, we describe a new methodology and system for generating and visualizing manifolds that are inferred from actual Internet latency measurements between different cities and are projected over a 2D Euclidean space (e.g., a geographic map). Our method leverages a series of graphs that capture critical information contained in the data, including well-defined locations (for vertices) and Ricci curvature information (for edges). Our visualization approach then generates a curved surface (manifold) in which (a) geographical locations of vertices are maintained and (b) the Ricci curvature values of the graph edges determine the curvature properties of the manifold. The resulting manifold highlights areas of critical connectivity and defines an instance of "Internet delay space" where latency measurements manifest as geodesics. We describe details of our method and its implementation in a tool, which we call Matisse, for generating, visualizing and manipulating manifolds projected onto a base map. We illustrate Matisse with two case studies: a simple example to demonstrate key concepts, and visualizations of the US public Internet to show Matisse's utility.

👍 👎 ♥ Save

Wireless Low-Latency Synchronization for Body-Worn Multi-Node Systems in Sports

Abstract
Biomechanical data acquisition in sports demands sub-millisecond synchronization across distributed body-worn sensor nodes. This study evaluates and characterizes the Enhanced ShockBurst (ESB) protocol from Nordic Semiconductor under controlled laboratory conditions for wireless, low-latency command broadcasting, enabling fast event updates in multi-node systems. Through systematic profiling of protocol parameters, including cyclic-redundancy-check modes, bitrate, transmission modes, and payload handling, we achieve a mean Device-to-Device (D2D) latency of 504.99 +- 96.89 us and a network-to-network core latency of 311.78 +- 96.90 us using a one-byte payload with retransmission optimization. This performance significantly outperforms Bluetooth Low Energy (BLE), which is constrained by a 7.5 ms connection interval, by providing deterministic, sub-millisecond synchronization suitable for high-frequency (500 Hz to 1000 Hz) biosignals. These results position ESB as a viable solution for time-critical, multi-node wearable systems in sports, enabling precise event alignment and reliable high-speed data fusion for advanced athlete monitoring and feedback applications.

Resilience

👍 👎 ♥ Save

6G Resilience -- White Paper

6G Resilience White Paper

Abstract
6G must be designed to withstand, adapt to, and evolve amid prolonged, complex disruptions. Mobile networks' shift from efficiency-first to sustainability-aware has motivated this white paper to assert that resilience is a primary design goal, alongside sustainability and efficiency, encompassing technology, architecture, and economics. We promote resilience by analysing dependencies between mobile networks and other critical systems, such as energy, transport, and emergency services, and illustrate how cascading failures spread through infrastructures. We formalise resilience using the 3R framework: reliability, robustness, resilience. Subsequently, we translate this into measurable capabilities: graceful degradation, situational awareness, rapid reconfiguration, and learning-driven improvement and recovery. Architecturally, we promote edge-native and locality-aware designs, open interfaces, and programmability to enable islanded operations, fallback modes, and multi-layer diversity (radio, compute, energy, timing). Key enablers include AI-native control loops with verifiable behaviour, zero-trust security rooted in hardware and supply-chain integrity, and networking techniques that prioritise critical traffic, time-sensitive flows, and inter-domain coordination. Resilience also has a techno-economic aspect: open platforms and high-quality complementors generate ecosystem externalities that enhance resilience while opening new markets. We identify nine business-model groups and several patterns aligned with the 3R objectives, and we outline governance and standardisation. This white paper serves as an initial step and catalyst for 6G resilience. It aims to inspire researchers, professionals, government officials, and the public, providing them with the essential components to understand and shape the development of 6G resilience.

👍 👎 ♥ Save

Human-Hardware-in-the-Loop simulations for systemic resilience assessment in cyber-socio-technical systems

Sapienza University of B

Abstract
Modern industrial systems require updated approaches to safety management, as the tight interplay between cyber-physical, human, and organizational factors has driven their processes toward increasing complexity. In addition to dealing with known risks, managing system resilience acquires great value to address complex behaviors pragmatically. This manuscript starts from the System-Theoretic Accident Model and Processes (STAMP) as a modelling initiative for such complexity. The STAMP can be natively integrated with simulation-based approaches, which however fail to realistically represent human behaviors and their influence on the system performance. To overcome this limitation, this paper proposes a Human-Hardware-in-the-Loop (HHIL) modeling and simulation framework aimed at supporting a more realistic and comprehensive assessments of systemic resilience. The approach is tested on an experimental oil and gas plant experiencing cyber-attacks, where two personas of operators (experts and novices) work. This research provides a mean to quantitatively assess how variations in operator behavior impact the overall system performance, offering insights into how resilience should be understood and implemented in complex socio-technical systems at large.

AI Insights

HHIL fuses STAMP with Monte‑Carlo to quantify how operator expertise shifts resilience in a simulated oil‑and‑gas plant.
Expert vs novice trials showed decision latency can double cascading‑failure risk during cyber attacks.
Real‑time hardware interfaces let HHIL capture sensor‑to‑human feedback loops missed by conventional models.
Embedding human‑centered design cuts risk exposure by up to 30 % versus purely technical safety analyses.
Resilience must treat human behavior as a first‑class system component, not an afterthought.
Leveson’s “Engineering a Safer World” and Woods’ “Graceful Extensibility” underpin the HHIL methodology.
Future work should couple HHIL with fuzzy Bayesian networks to better model human intent uncertainty.

Distributed Systems

👍 👎 ♥ Save

Partitioning and Self-organization of Distributed Generation in Large Distribution Networks

Rate this image: 😍 👍 👎

Abstract
Distribution networks will experience more installations of distributed generation (DG) that is unpredictable and stochastic in nature. Greater distributed control and intelligence will allow challenges such as voltage control to be handled effectively. The partitioning of power networks into smaller clusters provides a method to split the control problem into manageable sub-problems. This paper presents a community detection-based partitioning technique for distribution networks considering local DGs, allowing them to be grouped and controlled in a distributed manner by using local signals and measurements. This method also allows each community to control the voltage using only neighboring DGs, and for each community to self-organize to reflect varying DG conditions and to maintain stable control. Simulations demonstrate that the partitioning of the large distribution network is effective, and each community is able to self-organize and to regulate the voltage independently using only its local DGs.

👍 👎 ♥ Save

ProDiGy: Proximity- and Dissimilarity-Based Byzantine-Robust Federated Learning

Abstract
Federated Learning (FL) emerged as a widely studied paradigm for distributed learning. Despite its many advantages, FL remains vulnerable to adversarial attacks, especially under data heterogeneity. We propose a new Byzantine-robust FL algorithm called ProDiGy. The key novelty lies in evaluating the client gradients using a joint dual scoring system based on the gradients' proximity and dissimilarity. We demonstrate through extensive numerical experiments that ProDiGy outperforms existing defenses in various scenarios. In particular, when the clients' data do not follow an IID distribution, while other defense mechanisms fail, ProDiGy maintains strong defense capabilities and model accuracy. These findings highlight the effectiveness of a dual perspective approach that promotes natural similarity among honest clients while detecting suspicious uniformity as a potential indicator of an attack.

Help us improve your experience!