AI-powered satellite monitoring

for early wildfire detection in Indonesia

Overview
Indonesia loses more than 21 million hectares of forest and peatland to fire each year. Over 100,000 hotspots are detected annually by NASA satellite instruments, yet translating that raw data into timely, actionable alerts has historically required manual analysis, costly infrastructure, or both.

wildfire-detection is an open-source project built by Itsavirus to change that. It ingests NASA FIRMS satellite data, applies Isolation Forest anomaly detection across a hexagonal spatial grid, and delivers ranked daily alerts all in under an hour, fully automated, and free to self-host.

The project is deployed and live for Indonesia, with a public dashboard and MIT-licensed codebase. It is designed to be adaptable to any geography where NASA FIRMS data is available.
The Problem
Satellite fire data is plentiful. NASA FIRMS makes near real-time hotspot detections freely available for the entire globe. The bottleneck has never been data access — it has been what happens next.

Raw hotspot records carry no context. A single detection could be agricultural burning, an industrial flare, or the start of a major wildfire. Without a way to distinguish anomalous behaviour from baseline activity and to rank what matters most on any given day  the data overwhelms rather than informs.

The challenge was not detecting fire. Satellites already do that. The challenge was detecting fires that matter  anomalies that deviate meaningfully from what is normal for a given location and time.

Existing approaches to this problem typically rely on fixed thresholds: flag anything above X hotspots, or above Y fire radiative power. Those thresholds work in some regions and fail in others. They cannot adapt to seasonal variation, geographic heterogeneity, or the difference between a controlled burn and a spreading wildfire.
Our Approach
We built a four-stage pipeline that moves from raw satellite data to ranked, validated alerts without any manual intervention.
01
Data ingestion
NASA FIRMS satellite data pulled daily via automated pipeline. VIIRS (375 m) and MODIS (1 km) instruments provide near real-time fire detections across Indonesia's entire landmass, twice daily.
VIIRS · MODIS · PostgreSQL
02
Spatial aggregation
Raw hotspot coordinates mapped into H3 hexagonal cells at Resolution 7 (~5 km²). All detections within the same cell on the same day are aggregated, and neighbour activity across all six adjacent cells is calculated.
H3 · PostGIS · Pandas
03
Anomaly detection
An Isolation Forest model trained on six temporal and spatial features — including Fire Radiative Power, day-over-day delta, 7-day rolling ratio, and neighbour activity — flags cell-days that deviate significantly from learned baselines.
Scikit-learn · NumPy
04
Alert generation
ML anomaly scores are cross-validated against spatial coherence. Alerts are ranked daily by a hybrid score (70% ML weight, 30% spatial coherence) and classified into four severity tiers before delivery to the dashboard.
Top-K · coherence scoring
Core Deliverables
Why Isolation Forest?

Isolation Forest is an unsupervised machine learning algorithm designed specifically for anomaly detection. Unlike supervised models, it requires no labelled examples of wildfires. Instead, it learns what normal satellite activity looks like for a given area over time, then flags anything that deviates significantly from that baseline.

The core mechanic is elegant

Build 100 random decision trees and split data by randomly selected features and values. Anomalies get isolated in very few splits because they are statistically unusual. Normal data points require many more splits. The average path length across all 100 trees produces a confidence-weighted anomaly score.

The model runs on six input features per cell-day

Hotspot count, total Fire Radiative Power (FRP), maximum FRP, day-over-day delta, ratio versus 7-day rolling average, and neighbour activity. No manual thresholds are set. The model self-calibrates to flag the most anomalous 10% of cell-days based on learned patterns.

Why H3 hexagonal spatial indexing?

Rather than working with raw latitude/longitude coordinates, we map every hotspot into Uber's H3 hexagonal grid at Resolution 7 — approximately 5 km² per cell. Hexagons offer a specific mathematical advantage over square grids: every hexagon has exactly six equidistant neighbours, making spatial spread analysis consistent across the entire map without distortion near the equator.

Hybrid scoring: ML + spatial coherence

A pure ML anomaly score can surface statistical outliers that are not wildfires. The hybrid scoring layer cross-validates every ML flag against its geographic context. Each alert receives a final score calculated as 70% ML anomaly weight and 30% spatial coherence — how many of the six adjacent cells also show hotspot or anomalous activity on the same day. Alerts with zero active neighbours are flagged for manual review rather than surfaced as high-priority events.

Outcome
This project is a proof of concept for a broader approach: applying modern ML techniques to freely available public data to build infrastructure that would otherwise require significant investment in rules-based engineering and domain expertise.

The same pipeline — anomaly detection on geospatial time-series data, with spatial coherence validation — is applicable well beyond wildfire monitoring. Flood inundation, agricultural stress signals, illegal deforestation: the pattern transfers wherever satellite data meets a need for early anomaly detection at scale.

By open-sourcing the project, Itsavirus contributes a working, deployable reference implementation that development teams and research organisations can adapt to their own geography and data sources. The MIT licence means there are no restrictions on commercial or institutional use.

For organisations considering AI integration in their own operations, this project illustrates what practical AI engineering looks like: well-defined inputs, a defensible model choice, spatial reasoning built into the architecture, and a clear path from raw data to actionable output.