Distributed Agent Reasoning Across Independent Systems With Strict Data Locality

Outcoder Srl

Why we think this paper is great for you:
This paper directly explores distributed systems, focusing on agent communication across independent systems. It aligns perfectly with your interest in how distributed architectures operate.

Rate paper: 👍 👎 ♥ Save

Abstract
This paper presents a proof-of-concept demonstration of agent-to-agent communication across distributed systems, using only natural-language messages and without shared identifiers, structured schemas, or centralised data exchange. The prototype explores how multiple organisations (represented here as a Clinic, Insurer, and Specialist Network) can cooperate securely via pseudonymised case tokens, local data lookups, and controlled operational boundaries. The system uses Orpius as the underlying platform for multi-agent orchestration, tool execution, and privacy-preserving communication. All agents communicate through OperationRelay calls, exchanging concise natural-language summaries. Each agent operates on its own data (such as synthetic clinic records, insurance enrolment tables, and clinical guidance extracts), and none receives or reconstructs patient identity. The Clinic computes an HMAC-based pseudonymous token, the Insurer evaluates coverage rules and consults the Specialist agent, and the Specialist returns an appropriateness recommendation. The goal of this prototype is intentionally limited: to demonstrate feasibility, not to provide a clinically validated, production-ready system. No clinician review was conducted, and no evaluation beyond basic functional runs was performed. The work highlights architectural patterns, privacy considerations, and communication flows that enable distributed reasoning among specialised agents while keeping data local to each organisation. We conclude by outlining opportunities for more rigorous evaluation and future research in decentralised multi-agent systems.

AI Summary

Establishes strict data locality, ensuring each participating organization maintains exclusive control over its own data, tools, and operational policies. [2]
Presents a model for federated decision support where cross-node coordination relies on concise natural-language summaries and OperationRelay calls, rather than direct data or identifier exchange. [2]
Leverages Orpius's OperationRelay as a core mechanism for structured cross-system communication, abstracting away the need for shared RPC-style contracts or schemas. [2]
Strict Data Locality: An architectural principle where each organization maintains exclusive control over its own compute, storage, and secrets, with no direct cross-entity access to internal records. [2]
Demonstrates a novel architecture for distributed LLM-driven agentic reasoning, enabling cooperation across independent systems without shared schemas, memory, or centralized infrastructure. [1]
Introduces a privacy-preserving linkage strategy utilizing HMAC-derived pseudonymous tokens, allowing cross-organizational case coordination without disclosing patient identity. [1]
Provides a practical, end-to-end proof-of-concept using a healthcare scenario (Clinic, Insurer, Specialist) to validate the architectural patterns and privacy mechanisms. [1]
Shows that meaningful multi-step reasoning can occur through controlled natural-language interactions while maintaining strict data locality, addressing a gap in existing multi-agent frameworks. [1]
Orpius: An underlying software platform for multi-agent orchestration, tool execution, and privacy-preserving communication, providing isolated deployments and cross-node operation calls. [1]
Pseudonymised Case Tokens (HMAC-based): One-way cryptographic hashes (HMAC-SHA256) of patient identifiers, used to link cases across organizations (e.g., Clinic to Insurer) deterministically without revealing the original identity. [1]

SNAP: Low-Latency Test-Time Adaptation with Sparse Updates

KAIST

Why we think this paper is great for you:
The title itself highlights "Low-Latency," a key area of your interest. This work addresses crucial challenges in achieving fast adaptation in resource-constrained environments.

Rate paper: 👍 👎 ♥ Save

Abstract
Test-Time Adaptation (TTA) adjusts models using unlabeled test data to handle dynamic distribution shifts. However, existing methods rely on frequent adaptation and high computational cost, making them unsuitable for resource-constrained edge environments. To address this, we propose SNAP, a sparse TTA framework that reduces adaptation frequency and data usage while preserving accuracy. SNAP maintains competitive accuracy even when adapting based on only 1% of the incoming data stream, demonstrating its robustness under infrequent updates. Our method introduces two key components: (i) Class and Domain Representative Memory (CnDRM), which identifies and stores a small set of samples that are representative of both class and domain characteristics to support efficient adaptation with limited data; and (ii) Inference-only Batch-aware Memory Normalization (IoBMN), which dynamically adjusts normalization statistics at inference time by leveraging these representative samples, enabling efficient alignment to shifting target domains. Integrated with five state-of-the-art TTA algorithms, SNAP reduces latency by up to 93.12%, while keeping the accuracy drop below 3.3%, even across adaptation rates ranging from 1% to 50%. This demonstrates its strong potential for practical use on edge devices serving latency-sensitive applications. The source code is available at https://github.com/chahh9808/SNAP.

Reviewing definition of resilience in different disciplines with a focus on disaster restructure systems

Mississippi State

Why we think this paper is great for you:
This paper directly addresses the concept of resilience, a core interest for you. It provides a foundational review relevant to understanding robust system design.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
A key principle in resilience thinking is Embracing Change because change is, indeed, inevitable. In the face of a growing number of disasters, natural and human-made disasters, our critical infrastructures (CIs) are being challenged like never before. This recent trend has sparked a wave of interest among both practitioners and researchers in understanding and delving deeper into the concept of resilience across multiple disciplines. This paper provides an accessible review of these new insights, exploring various frameworks, guidebooks, and methodologies that define resilience through the lens of ecology, engineering, psychology, social science, community, and disaster management during crisis.

Real Time Proportional Throughput Maximization: How much advance notice should you give your scheduler?

International

Why we think this paper is great for you:
This paper directly tackles "throughput maximization" in real-time scheduling, which is highly relevant to your interest in high-performance systems.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
We will be exploring a generalization of real time scheduling problem sometimes called the real time throughput maximization problem. Our input is a sequence of jobs specified by their release time, deadline and processing time. We assume that jobs are announced before or at their release time. At each time step, the algorithm must decide whether to schedule a job based on the information so far. The goal is to maximize the value of the sum of the processing times of jobs that finish before their deadline, or the total ``active'' time. We extend this problem by defining a notion of $t$-advance-notice, a measure of how far in advance each job is given relative to their processing time. We show that there exists a $\frac{t}{2t+1}$-competitive algorithm when all jobs have $t$-advance-notice for $t\in [0,1]$. We also show that this ratio is optimal for all algorithms with $t$-advance-notice and that the upper bound of $\frac{t}{2t+1}$-competitiveness holds for all $t$, in particular that regardless of how much advance-notice is given, no algorithm can reach $\frac{1}{2}$-competitiveness.

CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations

University of Chinese

Why we think this paper is great for you:
This work is highly relevant to your interest in low latency, specifically exploring techniques like coroutines to hide memory latency in modern systems.

Rate paper: 👍 👎 ♥ Save

Abstract
Modern data-intensive applications face memory latency challenges exacerbated by disaggregated memory systems. Recent work shows that coroutines are promising in effectively interleaving tasks and hiding memory latency, but they struggle to balance latency-hiding efficiency with runtime overhead. We present CoroAMU, a hardware-software co-designed system for memory-centric coroutines. It introduces compiler procedures that optimize coroutine code generation, minimize context, and coalesce requests, paired with a simple interface. With hardware support of decoupled memory operations, we enhance the Asynchronous Memory Unit to further exploit dynamic coroutine schedulers by coroutine-specific memory operations and a novel memory-guided branch prediction mechanism. It is implemented with LLVM and open-source XiangShan RISC-V processor over the FPGA platform. Experiments demonstrate that the CoroAMU compiler achieves a 1.51x speedup over state-of-the-art coroutine methods on Intel server processors. When combined with optimized hardware of decoupled memory access, it delivers 3.39x and 4.87x average performance improvements over the baseline processor on FPGA-emulated disaggregated systems under 200ns and 800ns latency respectively.

Green Resilience of Cyber-Physical Systems: Doctoral Dissertation

Not specified

Why we think this paper is great for you:
This dissertation focuses on the resilience of cyber-physical systems, directly aligning with your interest in building robust and fault-tolerant architectures.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
Cyber-physical systems (CPS) combine computational and physical components. Online Collaborative AI System (OL-CAIS) is a type of CPS that learn online in collaboration with humans to achieve a common goal, which makes it vulnerable to disruptive events that degrade performance. Decision-makers must therefore restore performance while limiting energy impact, creating a trade-off between resilience and greenness. This research addresses how to balance these two properties in OL-CAIS. It aims to model resilience for automatic state detection, develop agent-based policies that optimize the greenness-resilience trade-off, and understand catastrophic forgetting to maintain performance consistency. We model OL-CAIS behavior through three operational states: steady, disruptive, and final. To support recovery during disruptions, we introduce the GResilience framework, which provides recovery strategies through multi-objective optimization (one-agent), game-theoretic decision-making (two-agent), and reinforcement learning (RL-agent). We also design a measurement framework to quantify resilience and greenness. Empirical evaluation uses real and simulated experiments with a collaborative robot learning object classification from human demonstrations. Results show that the resilience model captures performance transitions during disruptions, and that GResilience policies improve green recovery by shortening recovery time, stabilizing performance, and reducing human dependency. RL-agent policies achieve the strongest results, although with a marginal increase in CO2 emissions. We also observe catastrophic forgetting after repeated disruptions, while our policies help maintain steadiness. A comparison with containerized execution shows that containerization cuts CO2 emissions by half. Overall, this research provides models, metrics, and policies that ensure the green recovery of OL-CAIS.

Collaborative QA using Interacting LLMs. Impact of Network Structure, Node Capability and Distributed Data

Cornell University

Why we think this paper is great for you:
This paper investigates systems with "distributed data" and "network structure," offering insights into the design and performance of distributed systems, a key area for you.

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

Abstract
In this paper, we model and analyze how a network of interacting LLMs performs collaborative question-answering (CQA) in order to estimate a ground truth given a distributed set of documents. This problem is interesting because LLMs often hallucinate when direct evidence to answer a question is lacking, and these effects become more pronounced in a network of interacting LLMs. The hallucination spreads, causing previously accurate LLMs to hallucinate. We study interacting LLMs and their hallucination by combining novel ideas of mean-field dynamics (MFD) from network science and the randomized utility model from economics to construct a useful generative model. We model the LLM with a latent state that indicates if it is truthful or not with respect to the ground truth, and extend a tractable analytical model considering an MFD to model the diffusion of information in a directed network of LLMs. To specify the probabilities that govern the dynamics of the MFD, we propose a randomized utility model. For a network of LLMs, where each LLM has two possible latent states, we posit sufficient conditions for the existence and uniqueness of a fixed point and analyze the behavior of the fixed point in terms of the incentive (e.g., test-time compute) given to individual LLMs. We experimentally study and analyze the behavior of a network of $100$ open-source LLMs with respect to data heterogeneity, node capability, network structure, and sensitivity to framing on multiple semi-synthetic datasets.

Help us improve your experience!