Carnegie Mellon Univerisr
Abstract
For streaming speech recognition, a Transformer-based encoder has been widely
used with block processing. Although many studies addressed improving emission
latency of transducers, little work has been explored for improving encoding
latency of the block processing. We seek to reduce latency by frequently
emitting a chunk with a small shift rather than scarce large-chunk emissions,
resulting in higher computational costs. To efficiently compute with the small
chunk shift, we propose a new encoder, Spiralformer, tailored for block
processing by combining layer dropping and early exiting. We skip layer
computation in a cyclic manner and shift the computed layer in each block
spirally, which completes computation for all the layers over the block
processing. Experimentally, we observed that our method achieved 21.6%
reduction in the averaged token emission delay in Librispeech, and 7.0% in CSJ,
compared with the baseline with similar computational cost and word error
rates.
ETH Zurich, University of
Abstract
The widespread availability of cellular devices introduces new threat vectors
that allow users or attackers to bypass security policies and physical barriers
and bring unauthorized devices into sensitive areas. These threats can arise
from user non-compliance or deliberate actions aimed at data
exfiltration/infiltration via hidden devices, drones, etc. We identify a
critical gap in this context: the absence of low-latency systems for
high-quality and instantaneous monitoring of cellular transmissions. Such
low-latency systems are crucial to allow for timely detection, decision (e.g.,
geofencing or localization), and disruption of unauthorized communication in
sensitive areas. Operator-based monitoring systems, built for purposes such as
people counting or tracking, lack real-time capability, require cooperation
across multiple operators, and thus are hard to deploy. Operator-independent
monitoring approaches proposed in the literature either lack low-latency
capabilities or do not scale.
We propose LTag, the first low-latency, operator-independent and scalable
system designed to monitor cellular connections across all operators prior to
any user data transmission. LTag consists of several downlink sniffers and a
distributed network of uplink sniffers that measure both downlink protocol
information and uplink signal characteristics at multiple locations to gain a
detailed spatial image of uplink signals. LTag aggregates the recorded
information, processes it, and provides a decision about the connection all
prior to connection establishment of a UE. To evaluate LTag, we deployed it in
the context of geofencing, where LTag was able to determine if the signals
originate from inside or outside of an area within 2.3 ms of the initial base
station-to-device message, therefore enabling prompt and targeted suppression
of communication before any user data was transmitted.