Hi!

Your personalized paper recommendations for 02 to 06 February, 2026.

Less is More: Optimizing Probe Selection Using Shared Latency Anomalies

University of Chicago

Rate paper: 👍 👎 ♥ Save

AI Insights

The algorithm uses a maximum weighted set coverage problem formulation, which is known to be NP-hard, but provides provable guarantees. (ML: 0.83)👍👎
c: a user-defined coverage threshold (e.g., c=0.65 for 65% impact coverage) to control the trade-off between minimizing the number of probes and maximizing coverage of impactful events. (ML: 0.82)👍👎
The authors' approach can be used to reduce measurement cost and redundancy while ensuring that the most impactful network events are covered. (ML: 0.76)👍👎
log-impact: a transformation of the original impact values that shrinks their range, making probe selection less sensitive to noise from change point detection while preserving the relative ordering of anomalies. (ML: 0.74)👍👎
The authors propose a greedy algorithm for probe selection that maximizes the coverage of high-impact latency anomalies across the dataset. (ML: 0.73)👍👎
The proposed greedy algorithm for probe selection provides a good trade-off between minimizing the number of probes and maximizing coverage of impactful events. (ML: 0.72)👍👎
The choice of δ_IoU can significantly affect the total number of unique identifiers assigned to the anomalies and ultimately increases the total number of probes needed to cover the same set of anomalies. (ML: 0.71)👍👎
The choice of δ_IoU can significantly affect the total number of unique identifiers assigned to the anomalies and ultimately increases the total number of probes needed to cover the same set of anomalies. (ML: 0.71)👍👎
δ_IoU: a threshold for temporal overlap between probes, used to assign unique identifiers to anomalies. (ML: 0.63)👍👎
The authors experiment with different values of δ_IoU and the coverage fraction c to find the best trade-off between minimizing the number of probes and maximizing coverage of impactful events. (ML: 0.61)👍👎

Abstract
Latency anomalies, defined as persistent or transient increases in round-trip time (RTT), are common in residential Internet performance. When multiple users observe anomalies to the same destination, this may reflect shared infrastructure, routing behavior, or congestion. Inferring such shared behavior is challenging because anomaly magnitudes vary widely across devices, even within the same ISP and geographic area, and detailed network topology information is often unavailable. We study whether devices experiencing a shared latency anomaly observe similar changes in RTT magnitude using a topology-agnostic approach. Using four months of high-frequency RTT measurements from 99 residential probes in Chicago, we detect shared anomalies and analyze their consistency in amplitude and duration without relying on traceroutes or explicit path information. Building on prior change-point detection techniques, we find that many shared anomalies exhibit similar amplitude across users, particularly within the same ISP. Motivated by this observation, we design a sampling algorithm that reduces redundancy by selecting representative devices under user-defined constraints. Our approach captures 95 percent of aggregate anomaly impact using fewer than half of the deployed probes. Compared to two baselines, it identifies significantly more unique anomalies at comparable coverage levels. We further show that geographic diversity remains important when selecting probes within a single ISP, even at city scale. Overall, our results demonstrate that anomaly amplitude and duration provide effective topology-independent signals for scalable monitoring, troubleshooting, and cost-efficient sampling in residential Internet measurement.

Why we are recommending this paper?
Due to your Interest in Low latency

This paper directly addresses latency, a key concern for your interests in low latency systems. Analyzing shared latency anomalies provides a method for optimizing probe selection, aligning with your focus on high throughput.

Entanglement improves coordination in distributed systems

Delft University of Technology

Rate paper: 👍 👎 ♥ Save

Rate image: 👍 👎

AI Insights

Imagine you're at a restaurant, and there are two types of customers: those who want to eat quickly (high service rate) and those who take their time (low service rate). (ML: 0.98)👍👎
The goal is to find the best way to seat these customers so that everyone gets served as quickly as possible while also making sure the restaurant doesn't get too busy. (ML: 0.95)👍👎
This problem is like finding the optimal seating arrangement for these customers. (ML: 0.95)👍👎
The goal is to minimize customer waiting time while maximizing baseline task throughput. (ML: 0.94)👍👎
The problem is about finding an optimal routing policy for a system with exponential service times and Poisson arrivals. (ML: 0.83)👍👎

Abstract
Coordination in distributed systems is often hampered by communication latency, which degrades performance. Quantum entanglement offers fundamentally stronger correlations than classically achievable without communication. Crucially, these correlations manifest instantaneously upon measurement, irrespective of the physical distance separating the systems. We investigate the application of shared entanglement to a dual-work optimization problem in a distributed system comprising two servers. The system must process both a continuously available, preemptible baseline task and incoming customer requests arriving in pairs. System performance is characterized by the trade-off between baseline task throughput and customer waiting time. We present a rigorous analytical model demonstrating that when the baseline task throughput function is strictly convex, rewarding longer uninterrupted processing periods, entanglement-assisted routing strategies achieve Pareto-superior performance compared to optimal communication-free classical strategies. We prove this advantage through queueing-theoretic analysis, non-local game formulation, and computational certification of classical bounds. Our results identify distributed scheduling and coordination as a novel application domain for near-term entanglement-based quantum networks.

Why we are recommending this paper?
Due to your Interest in Distributed Systems

The exploration of quantum entanglement for coordination offers a novel approach to mitigating communication latency, a core interest in distributed systems. This research could significantly contribute to achieving your goals of low latency and high throughput.

Fast Compute via MC Boosting

Oxford University

Rate paper: 👍 👎 ♥ Save

AI Insights

However, it also notes that naive estimators can have high variance, which motivates the use of sequential correction techniques to improve performance. (ML: 0.98)👍👎
The authors discuss various applications of Monte Carlo methods in machine learning, including data augmentation, support vector machines, and neural network compression. (ML: 0.95)👍👎
The authors highlight the importance of regularizing priors to improve performance. (ML: 0.95)👍👎
Sequential correction (or 'boosting'): A technique used to improve the performance of Monte Carlo estimators by iteratively refining them using additional data or information. (ML: 0.94)👍👎
Regularizing priors is shown to be crucial for improving performance. (ML: 0.92)👍👎
They also touch on their potential for solving non-linear systems and inverse problems. (ML: 0.87)👍👎
A key contribution of the paper is its discussion of the relationship between Monte Carlo methods and statistical inference algorithms, particularly in the context of ill-conditioned inverse problems. (ML: 0.87)👍👎
The relationship between Monte Carlo methods and statistical inference algorithms is discussed, particularly in the context of ill-conditioned inverse problems. (ML: 0.87)👍👎
The paper concludes by emphasizing the need for further research on benchmarking Monte Carlo methods against deterministic solvers, as well as exploring their potential applications in emerging fields such as generative pre-trained transformers. (ML: 0.86)👍👎
The authors highlight their potential applications in various fields and emphasize the need for further research to improve performance and explore new areas. (ML: 0.85)👍👎
The authors provide an overview of the history and development of these methods, as well as their applications in various fields such as machine learning, statistics, and signal processing. (ML: 0.84)👍👎
The paper discusses Monte Carlo methods for solving linear systems, with a focus on sequential correction (or 'boosting') techniques. (ML: 0.80)👍👎
The paper highlights the advantages of using Monte Carlo methods for solving large-scale linear systems, including reduced computational complexity and improved accuracy. (ML: 0.79)👍👎
Fredholm equation: A type of integral equation that is typically discretized into a linear system. (ML: 0.77)👍👎
The paper provides an in-depth overview of Monte Carlo methods for solving linear systems, with a focus on sequential correction techniques. (ML: 0.74)👍👎
Neumann-series solution: A representation of the solution to a Fredholm equation of the second kind, which admits a stochastic solution via a random walk. (ML: 0.71)👍👎

Abstract
Modern training and inference pipelines in statistical learning and deep learning repeatedly invoke linear-system solves as inner loops, yet high-accuracy deterministic solvers can be prohibitively expensive when solves must be repeated many times or when only partial information (selected components or linear functionals) is required. We position \emph{Monte Carlo boosting} as a practical alternative in this regime, surveying random-walk estimators and sequential residual correction in a unified notation (Neumann-series representation, forward/adjoint estimators, and Halton-style sequential correction), with extensions to overdetermined/least-squares problems and connections to IRLS-style updates in data augmentation and EM/ECM algorithms. Empirically, we compare Jacobi and Gauss--Seidel iterations with plain Monte Carlo, exact sequential Monte Carlo, and a subsampled sequential variant, illustrating scaling regimes that motivate when Monte Carlo boosting can be an enabling compute primitive for modern statistical learning workflows.

Why we are recommending this paper?
Due to your Interest in High throughput

This work focuses on accelerating linear system solves, a common bottleneck in many distributed systems. The emphasis on speed and efficiency aligns directly with your requirements for both low latency and high throughput.

Perfect Network Resilience in Polynomial Time

TU Berlin

Rate paper: 👍 👎 ♥ Save

AI Insights

The definition of perfect resilience is not explicitly stated in the problem statement. (ML: 0.87)👍👎
Skipping priority list: A function πv that takes as input a link in Ev (or ⊥ modeling that the package starts inv) and outputs a permutation of Ev. (ML: 0.86)👍👎
The problem of perfect resilience in rooted graphs involves finding a forwarding pattern that can handle link failures and still deliver packages efficiently. (ML: 0.77)👍👎
Otherwise, it is called a trap. (ML: 0.76)👍👎
The authors also proposed a new forwarding pattern called the 'right-hand rule' which they showed to be perfectly resilient for certain types of graphs. (ML: 0.73)👍👎
Perfectly resilient: A rooted graph for which a perfectly resilient forwarding pattern exists. (ML: 0.73)👍👎
The concept of perfect resilience was first introduced by [1] as a way to handle link failures in packet networks. (ML: 0.72)👍👎
The updated right-hand rule λevv for each node v and some chosen link ev for v will be perfectly resilient in both cases. (ML: 0.71)👍👎
Minimal trap: A trap that does not contain another trap as a rooted minor. (ML: 0.70)👍👎
To show that dipole outerplanar graphs and rings of outerplanar graphs are perfectly resilient, we compute an outerplanar embedding for each induced subgraph, then 'stack' these graphs on top of each other, and argue that traversing an outer face of any graph yields a solution. (ML: 0.70)👍👎
Dipole outerplanar graphs and rings of outerplanar graphs are perfectly resilient. (ML: 0.69)👍👎

Abstract
Modern communication networks support local fast rerouting mechanisms to quickly react to link failures: nodes store a set of conditional rerouting rules which define how to forward an incoming packet in case of incident link failures. The rerouting decisions at any node $v$ must rely solely on local information available at $v$: the link from which a packet arrived at $v$, the target of the packet, and the incident link failures at $v$. Ideally, such rerouting mechanisms provide perfect resilience: any packet is routed from its source to its target as long as the two are connected in the underlying graph after the link failures. Already in their seminal paper at ACM PODC '12, Feigenbaum, Godfrey, Panda, Schapira, Shenker, and Singla showed that perfect resilience cannot always be achieved. While the design of local rerouting algorithms has received much attention since then, we still lack a detailed understanding of when perfect resilience is achievable. This paper closes this gap and presents a complete characterization of when perfect resilience can be achieved. This characterization also allows us to design an $O(n)$-time algorithm to decide whether a given instance is perfectly resilient and an $O(nm)$-time algorithm to compute perfectly resilient rerouting rules whenever it is. Our algorithm is also attractive for the simple structure of the rerouting rules it uses, known as skipping in the literature: alternative links are chosen according to an ordered priority list (per in-port), where failed links are simply skipped. Intriguingly, our result also implies that in the context of perfect resilience, skipping rerouting rules are as powerful as more general rerouting rules. This partially answers a long-standing open question by Chiesa, Nikolaevskiy, Mitrovic, Gurtov, Madry, Schapira, and Shenker [IEEE/ACM Transactions on Networking, 2017] in the affirmative.

Why we are recommending this paper?
Due to your Interest in Resilience

The paper's exploration of network resilience and fast rerouting mechanisms directly addresses the need for robust distributed systems. This research is relevant to your interest in resilience and optimizing system performance.

Emergence-as-Code for Self-Governing Reliable Systems

Innopolis University

Rate paper: 👍 👎 ♥ Save

AI Insights

Intent represents the desired system behavior, Evidence captures the actual system performance, and Governance ensures that the system meets its intended behavior. (ML: 0.98)👍👎
Emergence-as-Code for Self-Governing Reliable Systems Intent: The desired system behavior. (ML: 0.94)👍👎
The EaC framework consists of three main components: Intent, Evidence, and Governance. (ML: 0.92)👍👎
Emerge-as-Code (EaC) proposes a novel approach to make end-to-end journey reliability computable from intent plus evidence. (ML: 0.90)👍👎
EaC uses a compiler-controller interface to produce governance artifacts such as alerts, rollout gates, and constrained actions from intent and evidence. (ML: 0.88)👍👎
EaC has several benefits including improved reliability, reduced downtime, and increased efficiency. (ML: 0.86)👍👎
Emergence-as-Code for Self-Governing Reliable Systems Reliability in microservices is emergent from interactions, yet current SLO practice remains mostly local. (ML: 0.81)👍👎
The EaC framework is designed to be extensible and adaptable to different use cases and environments. (ML: 0.77)👍👎

Abstract
SLO-as-code has made per-service} reliability declarative, but user experience is defined by journeys whose reliability is an emergent property of microservice topology, routing, redundancy, timeouts/fallbacks, shared failure domains, and tail amplification. As a result, journey objectives (e.g., "checkout p99 < 400 ms") are often maintained outside code and drift as the system evolves, forcing teams to either miss user expectations or over-provision and gate releases with ad-hoc heuristics. We propose Emergence-as-Code (EmaC), a vision for making journey reliability computable and governable via intent plus evidence. An EmaC spec declares journey intent (objective, control-flow operators, allowed actions) and binds it to atomic SLOs and telemetry. A runtime inference component consumes operational artifacts (e.g., tracing and traffic configuration) to synthesize a candidate journey model with provenance and confidence. From the last accepted model, the EmaC compiler/controller derives bounded journey SLOs and budgets under explicit correlation assumptions (optimistic independence vs. pessimistic shared fate), and emits control-plane artifacts (burn-rate alerts, rollout gates, action guards) that are reviewable in a Git workflow. An anonymized artifact repository provides a runnable example specification and generated outputs.

Why we are recommending this paper?
Due to your Interest in Distributed Systems

This paper tackles the emergent properties of reliability in complex systems, a critical aspect of building resilient distributed systems. Understanding and managing these emergent behaviors is essential for achieving your desired system characteristics.

QuadRank: Engineering a High Throughput Rank

Karlsruhe Institute of Technology

Rate paper: 👍 👎 ♥ Save

AI Insights

Arithmetic encoding: This is a method for compressing binary data by representing it as a single number. (ML: 0.92)👍👎
Compact representation: This refers to a data structure that uses less memory than the original data. (ML: 0.89)👍👎
Rank/select queries: These are operations that count the number of 1s (or 0s) before a given position in a bit vector, or select the first 1 (or 0) after a given position. (ML: 0.86)👍👎
The proposed method is compared with existing methods, including Spider, Sux, and Movi, in terms of memory usage and query performance. (ML: 0.85)👍👎
Bit vectors: These are arrays of bits used to represent binary data. (ML: 0.84)👍👎
The paper discusses the development of a new data structure for rank/select queries on bit vectors. (ML: 0.79)👍👎
The proposed method is suitable for applications where memory usage is limited, such as in embedded systems or mobile devices. (ML: 0.72)👍👎
The authors propose a compact representation of the data structure using a combination of bitwise operations and arithmetic encoding. (ML: 0.72)👍👎
The results show that the new data structure achieves better memory efficiency and faster query times than the existing methods. (ML: 0.72)👍👎
The new data structure achieves better memory efficiency and faster query times than existing methods. (ML: 0.72)👍👎

Abstract
Given a text, a query $\mathsf{rank}(q, c)$ counts the number of occurrences of character $c$ among the first $q$ characters of the text. Space-efficient methods to answer these rank queries form an important building block in many succinct data structures. For example, the FM-index is a widely used data structure that uses rank queries to locate all occurrences of a pattern in a text. In bioinformatics applications, the goal is usually to process a given input as fast as possible. Thus, data structures should have high throughput when used with many threads. Contributions. For the binary alphabet, we develop BiRank with 3.28% space overhead. It merges the central ideas of two recent papers: (1) we interleave (inline) offsets in each cache line of the underlying bit vector [Laws et al., 2024], reducing cache-misses, and (2) these offsets are to the middle of each block so that only half of them need popcounting [Gottlieb and Reinert, 2025]. In QuadRank (14.4% space overhead), we extend these techniques to the $σ=4$ (DNA) alphabet. Both data structures require only a single cache miss per query, making them highly suitable for high-throughput and memory-bound settings. To enable efficient batch-processing, we support prefetching the cache lines required to answer upcoming queries. Results. BiRank and QuadRank are around $1.5\times$ and $2\times$ faster than similar-overhead methods that do not use inlining. Prefetching gives an additional $2\times$ speedup, at which point the dual-channel DDR4 RAM bandwidth becomes a hard limit on the total throughput. With prefetching, both methods outperform all other methods apart from SPIDER [Laws et al., 2024] by $2\times$. When using QuadRank with prefetching in a toy count-only FM-index, QuadFm, this results in a smaller size and up to $4\times$ speedup over Genedex, a state-of-the-art batching FM-index implementation.

Why we are recommending this paper?
Due to your Interest in High throughput

The Supportiveness-Safety Tradeoff in LLM Well-Being Agents

New York University

Rate paper: 👍 👎 ♥ Save

AI Insights

Potential risks of biased or inaccurate information being provided by large language models. (ML: 0.99)👍👎
To ensure that these systems are used responsibly, researchers must prioritize human-AI collaboration, explainability, and transparency. (ML: 0.99)👍👎
However, there are concerns about the potential risks of relying on AI for emotional support and therapy. (ML: 0.98)👍👎
Limited understanding of the long-term effects of relying on AI for emotional support. (ML: 0.98)👍👎
Researchers have proposed various methods to mitigate these risks, including using human-AI collaboration, transparency, and explainability. (ML: 0.98)👍👎
The integration of large language models in socially assistive robots has the potential to revolutionize mental health support, but it also raises important questions about accountability and transparency. (ML: 0.97)👍👎
Human-AI collaboration: Methods for combining the strengths of humans and AI to achieve better outcomes. (ML: 0.97)👍👎
The use of large language models in socially assistive robots has shown promise in improving mental health and well-being. (ML: 0.97)👍👎
Large language models: AI systems trained on vast amounts of text data to generate human-like responses. (ML: 0.95)👍👎
Socially assistive robots: Robots designed to interact with humans in a way that is intended to be helpful or supportive. (ML: 0.89)👍👎

Abstract
Large language models (LLMs) are being integrated into socially assistive robots (SARs) and other conversational agents providing mental health and well-being support. These agents are often designed to sound empathic and supportive in order to maximize user's engagement, yet it remains unclear how increasing the level of supportive framing in system prompts influences safety relevant behavior. We evaluated 6 LLMs across 3 system prompts with varying levels of supportiveness on 80 synthetic queries spanning 4 well-being domains (1440 responses). An LLM judge framework, validated against human ratings, assessed safety and care quality. Moderately supportive prompts improved empathy and constructive support while maintaining safety. In contrast, strongly validating prompts significantly degraded safety and, in some cases, care across all domains, with substantial variation across models. We discuss implications for prompt design, model selection, and domain specific safeguards in SARs deployment.

Why we are recommending this paper?
Due to your Interest in Resilience

💬 Help Shape Our Pricing

We're exploring pricing options to make this project sustainable. Take 3 minutes to share what you'd be willing to pay (if anything). Your input guides our future investment.

Share Your Feedback

Help us improve your experience!

This project is on its early stages your feedback can be pivotal on the future of the project. Let us know what you think about this week's papers and suggestions!

Give Feedback