InstaDeep
Abstract
Open-access multispectral imagery from missions like Landsat 8-9 and
Sentinel-2 has fueled the development of geospatial foundation models (GFMs)
for humanitarian and environmental applications. Yet, their deployment remains
limited by (i) the absence of automated geospatial data pipelines and (ii) the
large size of fine-tuned models. Existing GFMs lack workflows for processing
raw satellite imagery, and downstream adaptations often retain the full
complexity of the original encoder.
We present InstaGeo, an open-source, end-to-end framework that addresses
these challenges by integrating: (1) automated data curation to transform raw
imagery into model-ready datasets; (2) task-specific model distillation to
derive compact, compute-efficient models; and (3) seamless deployment as
interactive web-map applications. Using InstaGeo, we reproduced datasets from
three published studies and trained models with marginal mIoU differences of
-0.73 pp for flood mapping, -0.20 pp for crop segmentation, and +1.79 pp for
desert locust prediction. The distilled models are up to 8x smaller than
standard fine-tuned counterparts, reducing FLOPs and CO2 emissions with minimal
accuracy loss.
Leveraging InstaGeo's streamlined data pipeline, we also curated a larger
crop segmentation dataset, achieving a state-of-the-art mIoU of 60.65%, a 12 pp
improvement over prior baselines. Moreover, InstaGeo enables users to progress
from raw data to model deployment within a single working day.
By unifying data preparation, model compression, and deployment, InstaGeo
transforms research-grade GFMs into practical, low-carbon tools for real-time,
large-scale Earth observation. This approach shifts geospatial AI toward data
quality and application-driven innovation. Source code, datasets, and model
checkpoints are available at:
https://github.com/instadeepai/InstaGeo-E2E-Geospatial-ML.git
AI Insights - InstaGeo supports multi‑temporal foundation models like Prithvi‑Eo‑2.0, enabling land‑cover, crop, and climate analysis from diverse satellite sources.
- Its pipeline ingests raw Landsat, Sentinel‑2, and MODIS imagery, auto‑converting it into model‑ready datasets.
- Task‑specific knowledge distillation compresses GFMs up to eight times smaller, cutting FLOPs and CO₂ with minimal accuracy loss.
- All code, datasets, and distilled checkpoints are open‑source on GitHub, boosting reproducibility.
- Researchers can deploy distilled models as interactive web‑maps within a single working day.
- InstaGeo unifies curation, compression, and deployment, steering geospatial AI toward low‑carbon, data‑quality solutions.