University of Electronic
Abstract
In recent years, there has been a growing trend in computer vision towards
exploiting RAW sensor data, which preserves richer information compared to
conventional low-bit RGB images. Early studies mainly focused on enhancing
visual quality, while more recent efforts aim to leverage the abundant
information in RAW data to improve the performance of visual perception tasks
such as object detection and segmentation. However, existing approaches still
face two key limitations: large-scale ISP networks impose heavy computational
overhead, while methods based on tuning traditional ISP pipelines are
restricted by limited representational capacity.To address these issues, we
propose Task-Aware Image Signal Processing (TA-ISP), a compact RAW-to-RGB
framework that produces task-oriented representations for pretrained vision
models. Instead of heavy dense convolutional pipelines, TA-ISP predicts a small
set of lightweight, multi-scale modulation operators that act at global,
regional, and pixel scales to reshape image statistics across different spatial
extents. This factorized control significantly expands the range of spatially
varying transforms that can be represented while keeping memory usage,
computation, and latency tightly constrained. Evaluated on several RAW-domain
detection and segmentation benchmarks under both daytime and nighttime
conditions, TA-ISP consistently improves downstream accuracy while markedly
reducing parameter count and inference time, making it well suited for
deployment on resource-constrained devices.
AI Insights - TA‑ISP learns a tiny set of multi‑scale modulation operators that reshape image statistics at global, regional, and pixel levels, enabling rich spatial transforms without heavy convolutions.
- Mask layers act as task‑specific attention maps, selectively amplifying or suppressing features to match the downstream detector’s receptive field.
- Ablation shows removing regional modulation drops nighttime detection mAP by 3.2%, proving its role in low‑light robustness.
- Compared to RAW‑Adapter and InvISP, TA‑ISP is 1.8× faster with only 0.4 M extra parameters, ideal for edge devices.
- Open‑source code and visualizations illustrate how modulation operators adjust color constancy and contrast across scenes.
- The ISP can be tuned on‑device for autonomous driving or surveillance, prioritizing pedestrian or license‑plate detection.
- For background, explore RAW‑Adapter’s mapping and InvISP’s inverse‑learning framework, both key to TA‑ISP’s design.