Abstract
Artificial intelligence (AI) is expected to serve as a foundational
capability across the entire lifecycle of 6G networks, spanning design,
deployment, and operation. This article proposes a native AI-driven air
interface architecture built around two core characteristics: compression and
adaptation. On one hand, compression enables the system to understand and
extract essential semantic information from the source data, focusing on task
relevance rather than symbol-level accuracy. On the other hand, adaptation
allows the air interface to dynamically transmit semantic information across
diverse tasks, data types, and channel conditions, ensuring scalability and
robustness. This article first introduces the native AI-driven air interface
architecture, then discusses representative enabling methodologies, followed by
a case study on semantic communication in 6G non-terrestrial networks. Finally,
it presents a forward-looking discussion on the future of native AI in 6G,
outlining key challenges and research opportunities.
Abstract
The increasing adoption of low-cost environmental sensors and AI-enabled
applications has accelerated the demand for scalable and resilient data
infrastructures, particularly in data-scarce and resource-constrained regions.
This paper presents the design, implementation, and evaluation of the AirQo
data pipeline: a modular, cloud-native Extract-Transform-Load (ETL) system
engineered to support both real-time and batch processing of heterogeneous air
quality data across urban deployments in Africa. It is Built using open-source
technologies such as Apache Airflow, Apache Kafka, and Google BigQuery. The
pipeline integrates diverse data streams from low-cost sensors, third-party
weather APIs, and reference-grade monitors to enable automated calibration,
forecasting, and accessible analytics. We demonstrate the pipeline's ability to
ingest, transform, and distribute millions of air quality measurements monthly
from over 400 monitoring devices while achieving low latency, high throughput,
and robust data availability, even under constrained power and connectivity
conditions. The paper details key architectural features, including workflow
orchestration, decoupled ingestion layers, machine learning-driven sensor
calibration, and observability frameworks. Performance is evaluated across
operational metrics such as resource utilization, ingestion throughput,
calibration accuracy, and data availability, offering practical insights into
building sustainable environmental data platforms. By open-sourcing the
platform and documenting deployment experiences, this work contributes a
reusable blueprint for similar initiatives seeking to advance environmental
intelligence through data engineering in low-resource settings.