● STATIC · PIPELINE REFERENCE sar_toolkit · v0.1 · banda_aceh

/ METHOD · END-TO-END

From a raw radar archive to the flood map on the case page.

Six stages. Six real scripts. Six artefacts on disk. This page walks the exact WSL workflow that produced every image in the Banda Aceh case, ending with a scrubbable view of the training run that built the model itself.

Students learn what each button in a SAR pipeline actually does, and why the numbers on the case page move.
Teachers get a ready-made 8-minute walkthrough with copy-pasteable commands and a “check yourself” prompt per stage.
Reviewers can verify that every image shown on this site is the lossless output of a script linked below.

/ LEARNING OBJECTIVES

By the end of this walkthrough you should be able to:

Name the 5 transformations a raw Sentinel-1 archive goes through before a model sees it.
Explain why clamp normalization is one of the most sensitive knobs in a SAR flood pipeline.
Read a pairwise agreement matrix without being fooled by the big number.
Point at the epoch in a training run that produced the deployed checkpoint — and justify why.

/ PREREQUISITES

✓ You need

Python basics (run a script, read an import)
Know what a convolutional layer is, roughly
A terminal

✗ You don't need

SNAP installed — intermediate TIFs are shipped
A GPU — the case outputs are pre-computed
Prior SAR experience — glossary is at the bottom

6 pipeline stages

1 reproduce-no-sar command

37 training epochs scrubbable

/ JUMP TO

Big-picture flow diagram
00 · Environment
01 · SNAP preprocessing
02 · Tile dataset · sample-a-tile ▸
03 · Normalization stats · clamp playground ▸
04 · Inference · run live ▸
05 · Validation · agreement matrix ▸
↳ Training trajectory
Glossary (8 terms)

/ THE WHOLE THING AT ONCE

Five transformations.
Raw radar bytes → a map you can hand to a responder.

Before zooming into any single stage, keep this shape in your head. Every box below is a real file or folder on disk; every arrow is a real script. The stages further down the page zoom into each arrow one at a time.

3 × SAFE.zip

Raw Sentinel-1A

20251021 · 20251102 · 20251126

01 · SNAP

3 × GeoTIFF

Calibrated · WGS84

VV + VH · 2-band · terrain-corrected

02 · Tile

Tiles + PKL

224² patches

KuroSiwo v2 layout · grid_dict.pkl

03 · Stats

Stats + CKPT

Normalization + weights

mean/std × 3 clamps · best_model.pt

04 · Predict

4 × GeoTIFF

Flood predictions

0=land · 1=water · 2=flood

05 · Validate

Report

Validation artefacts

comparison PNG · agreement JSON · diff maps

Raw input (given to you) Intermediate (produced by the pipeline) Final output (what you ship)

00 / ENVIRONMENT ~1 min read

One dispatcher, five real scripts.

Every stage on this page maps to a concrete file under sar_toolkit/. A single dispatcher (run_banda_aceh_pipeline.py) owns the step→script table, and a small env-var contract pins inputs and outputs without hard-coding WSL paths.

script sar_toolkit/run_banda_aceh_pipeline.py

The toolkit runs on a WSL2 Ubuntu box with SNAP, GDAL, PyTorch and a CUDA GPU. Everything is pinned in environment-sar-toolkit.yml. A single env var — ASIA_FLOOD_BASE_DIR — points at the working tree that holds raw SAFE archives, intermediate TIFs, the pickle index, and the checkpoint. Set it once, never edit code paths again.

Each stage below is a first-class script, not a notebook cell. That matters for teaching: students can run one stage, inspect outputs/, then run the next with confidence that nothing upstream is hiding in memory.

WSL base ASIA_FLOOD_BASE_DIR /home/yang/asia_flood_base
Python env conda · sar-toolkit environment-sar-toolkit.yml

OUT

Entry point python -m sar_toolkit … run_banda_aceh_pipeline.py
Stage table STEP_TO_SCRIPT = {...} preprocess · build-dataset · predict · validate · stats

? CHECK YOURSELF Which single environment variable makes the whole pipeline portable across machines? show hint

ASIA_FLOOD_BASE_DIR. It points at the working tree that holds raw SAFE archives, intermediate TIFs, the pickle index, and the checkpoint. Set it once in your shell and no script has a hard-coded path.

</> CODE see the actual sar_toolkit/run_banda_aceh_pipeline.py excerpt show code

yang@wsl · ~/asia_flood_base python · bash

# WSL · activate the toolkit env $ conda activate sar-toolkit
$ export ASIA_FLOOD_BASE_DIR=/home/yang/asia_flood_base # Run any one stage in isolation $ python sar_toolkit/run_banda_aceh_pipeline.py predict
$ python sar_toolkit/run_banda_aceh_pipeline.py validate

# Or reproduce build-dataset → predict → validate in one shot $ python sar_toolkit/run_banda_aceh_pipeline.py reproduce-no-sar # The step table (run_banda_aceh_pipeline.py)
STEP_TO_SCRIPT = {
  "preprocess":     "preprocess/snap_preprocess_banda_aceh.py",
  "build-dataset":  "dataset/prepare_dataset_from_three_tifs.py",
  "stats":         "infer/calculate_banda_aceh_stats.py",
  "predict":       "infer/predict_banda_aceh_adapted.py",
  "validate":      "validate/validate_predictions.py",
}

sar_toolkit/run_banda_aceh_pipeline.py

● pending · synopsis mode

The 5-script dispatcher

One env-var, one command, one stage at a time. Maps every step name on this site to its concrete Python file.

/ AFTER READING

Explain how a flat dispatcher beats a notebook for reproducibility, and why ASIA_FLOOD_BASE_DIR is the only path that ever needs to change.

Synopsis only. The full source will be streamed from sar_toolkit/run_banda_aceh_pipeline.py once the teaching endpoint is wired. The outline below reflects the real file's signatures, constants and shape contracts.

"""run_banda_aceh_pipeline.py — stage dispatcher.

One entry point. One env var (ASIA_FLOOD_BASE_DIR). Each stage lives in
its own file so students can run them in isolation and inspect outputs.
"""
import os, sys, subprocess
from pathlib import Path

BASE_DIR = Path(os.environ["ASIA_FLOOD_BASE_DIR"])

STEP_TO_SCRIPT: dict[str, str] = {
    "preprocess":     "preprocess/snap_preprocess_banda_aceh.py",
    "build-dataset":  "dataset/prepare_dataset_from_three_tifs.py",
    "stats":          "infer/calculate_banda_aceh_stats.py",
    "predict":        "infer/predict_banda_aceh_adapted.py",
    "validate":       "validate/validate_predictions.py",
}

# Shorthand macro: build-dataset → predict → validate, without SNAP.
REPRODUCE_NO_SAR: list[str] = ["build-dataset", "predict", "validate"]


def run_step(step: str) -> int:
    """Exec one pipeline stage in a subprocess, propagating env and cwd."""
    ...


def main() -> None:
    """CLI: 'python run_banda_aceh_pipeline.py <step> | reproduce-no-sar'"""
    ...

01 / SNAP PREPROCESSING ~2 min read

Radar bytes → calibrated, terrain-corrected TIFs.

Three raw Sentinel-1A SAFE archives go into SNAP's GPT engine with an explicit graph XML, an external SRTM DEM for geocoding, and a two-step AOI crop. Out come three clean, radiometrically-calibrated, co-registered GeoTIFFs.

script preprocess/snap_preprocess_banda_aceh.py

The graph does Apply-Orbit-File → Calibration → Speckle-Filter → Range-Doppler Terrain-Correction → Subset in one GPT invocation, then a gdalwarp second pass tightens the bounding box so there are no black edges around the coast. The result is three co-registered, calibrated scenes at 10 m resolution that a downstream tiler can slice without further care.

This is also the only stage that needs SNAP. Everything after it is pure PyTorch + rasterio, so a student without SNAP can still reproduce from stage 02 onward using the shipped intermediate TIFs.

Raw scenes 3 × S1A SAFE.zip 20251021 · 20251102 · 20251126
Graph GRD preprocessing w/ external DEM preprocess/grd_preprocessing_external_dem.xml
DEM SRTM 1 arc-sec assets/dem/N05E095.tif

OUT

Processed TIFs 3 × VV/VH · WGS84 · LZW S1A_BandaAceh_<date>_snap_processed_final.tif
AOI lon [95.25, 95.40] · lat [5.45, 5.60] Banda Aceh coastal strip

? CHECK YOURSELF Why pass -PexternalDEMFile instead of letting SNAP auto-download the DEM? show hint

SNAP's auto-download sometimes fails silently in restricted networks (like WSL) and falls back to a coarser DEM source. That silently wrecks the terrain correction near the coast and makes student outputs disagree with teacher outputs. Supplying a known SRTM tile makes the run deterministic and reproducible.

</> CODE see the actual preprocess/snap_preprocess_banda_aceh.py excerpt show code

yang@wsl · ~/asia_flood_base python · gpt

# preprocess/snap_preprocess_banda_aceh.py (excerpt)
SNAP_HOME   = Path("/home/yang/snap")
GRAPH_FILE  = "preprocess/grd_preprocessing_external_dem.xml"
AOI_WKT     = "POLYGON((95.15 5.35, 95.50 5.35, 95.50 5.70, 95.15 5.70, 95.15 5.35))"
FINAL_AOI   = { lon_min: 95.25, lon_max: 95.40,
                lat_min: 5.45,  lat_max: 5.60 }
DATES       = ["20251021", "20251102", "20251126"]

# 1) SNAP GPT — calibration, speckle, terrain correction, subset $ gpt preprocess/grd_preprocessing_external_dem.xml \
    -PinputFile=S1A_IW_GRDH_20251126.SAFE.zip \
    -PoutputFile=S1A_BandaAceh_20251126_snap_processed.tif \
    -PgeoRegion="$AOI_WKT" \
    -PexternalDEMFile=$DEM/N05E095.tif -e

# 2) gdalwarp — precise final crop, kill black edges $ gdalwarp -te 95.25 5.45 95.40 5.60 -te_srs EPSG:4326 \
    -r bilinear -co COMPRESS=LZW -co TILED=YES \
    in.tif S1A_BandaAceh_20251126_snap_processed_final.tif

sar_toolkit/preprocess/snap_preprocess_banda_aceh.py

● pending · synopsis mode

SNAP preprocessing driver

Calls SNAP's GPT with an explicit graph XML + external SRTM DEM, then gdalwarp tightens the AOI. Turns signal into a ground-registered GeoTIFF.

/ AFTER READING

Name the 5 SNAP operators used to go from raw SAFE to a calibrated TIF, and explain why the DEM must be supplied explicitly.

Synopsis only. The full source will be streamed from sar_toolkit/preprocess/snap_preprocess_banda_aceh.py once the teaching endpoint is wired. The outline below reflects the real file's signatures, constants and shape contracts.

"""snap_preprocess_banda_aceh.py — raw SAFE → calibrated GeoTIFF.

GPT graph: Apply-Orbit-File → Calibration → Speckle-Filter →
Range-Doppler Terrain-Correction → Subset, with an external SRTM tile
as DEM. A second gdalwarp pass trims black edges.
"""
from pathlib import Path
import subprocess

SNAP_HOME   = Path("/home/yang/snap")
GRAPH_FILE  = "preprocess/grd_preprocessing_external_dem.xml"
DEM_TILE    = "assets/dem/N05E095.tif"
AOI_WKT     = "POLYGON((95.15 5.35, 95.50 5.35, 95.50 5.70, 95.15 5.70, 95.15 5.35))"
FINAL_AOI   = {"lon_min": 95.25, "lon_max": 95.40, "lat_min": 5.45, "lat_max": 5.60}
DATES       = ["20251021", "20251102", "20251126"]


def run_snap(date: str) -> Path:
    """GPT invocation with -PexternalDEMFile pinned for reproducibility."""
    ...


def tighten_with_gdalwarp(src: Path) -> Path:
    """gdalwarp -te <FINAL_AOI> -r bilinear -co COMPRESS=LZW -co TILED=YES."""
    ...


def main() -> None:
    """Run SNAP + gdalwarp for all three acquisition dates."""
    ...

VV polarization — Figure 01 · the three channels this stage produces, from the 2025-11-26 Banda Aceh scene. These are the actual pixels the network sees.

VH polarization — Figure 01 · the three channels this stage produces, from the 2025-11-26 Banda Aceh scene. These are the actual pixels the network sees.

02 / TILE DATASET ~1 min read

3 big TIFs → 224² patches in KuroSiwo format.

The model was trained on KuroSiwo's tile layout, so the scene has to be cut into 224×224 patches with three temporal siblings per location — pre_event_1, pre_event_2, post_event — and an index pickle tying every patch back to its row/col.

script dataset/prepare_dataset_from_three_tifs.py · dataset/generate_pickle.py

Each patch folder carries three TIFs and a small info.json with its lon/lat/row/col. The pickle is a fast spatial index the Dataset class uses to stream batches — it's what gets looked up at inference time so we can reassemble predictions back to their geographic positions.

3 calibrated TIFs VV + VH · 2-band outputs/preprocess/processed_sar/
Patch size 224 × 224 px · stride = 224 no overlap, deterministic grid

OUT

Tiles kurosiwo_format_v2/999/01/<hash>/ MS1.tif + SL1.tif + SL2.tif + info.json
Index grid_dict_banda_aceh.pkl list of (record, row_idx, col_idx) tuples

? CHECK YOURSELF Why does each tile ship as three sibling files (MS1.tif / SL1.tif / SL2.tif) instead of one? show hint

Because flood detection is change detection. The model needs to see the same patch of ground before the event (SL2 = 21 Oct, SL1 = 2 Nov) and during it (MS1 = 26 Nov). 'Dark now but not dark a month ago' is how it tells new flood apart from permanent water.

</> CODE see the actual dataset/prepare_dataset_from_three_tifs.py · dataset/generate_pickle.py excerpt show code

yang@wsl · ~/asia_flood_base python

# dataset/prepare_dataset_from_three_tifs.py (excerpt)
PATCH_SIZE = 224
ACT_ID, AOI_ID = 999, 1 # banda_aceh as a custom "event" # Patch grid over the 2802260-pixel scene (~1672×1676)
n_rows, n_cols = ceil(H / 224), ceil(W / 224)

# For each patch location, write the KuroSiwo triplet:
write_tif("MS1.tif", post_event_patch)    # 20251126 · main scene
write_tif("SL1.tif", pre_event_1_patch)   # 20251102 · approach
write_tif("SL2.tif", pre_event_2_patch)   # 20251021 · baseline
write_json("info.json", { row, col, lon, lat, ... })

# dataset/generate_pickle.py — build the index
grid_dict = {
  (act_id, aoi_id): [
    { "info": { "row": r, "col": c, ... },
      "path": "999/01/<hash>/" },
    ...
  ]
}
pickle.dump(grid_dict, "grid_dict_banda_aceh.pkl")

sar_toolkit/dataset/prepare_dataset_from_three_tifs.py

● pending · synopsis mode

Tile-builder · 3 TIFs → KuroSiwo patches

Slices three co-registered scenes into 224×224 patches. Each patch folder ships MS1/SL1/SL2 siblings plus an info.json with row/col/lon/lat.

/ AFTER READING

State why each tile has three sibling TIFs rather than one, and trace how a row/col index lets predictions be reassembled to a georeferenced map.

Synopsis only. The full source will be streamed from sar_toolkit/dataset/prepare_dataset_from_three_tifs.py once the teaching endpoint is wired. The outline below reflects the real file's signatures, constants and shape contracts.

"""prepare_dataset_from_three_tifs.py — 3 scenes → N tiles.

Deterministic non-overlapping 224² grid. Per tile: writes three TIFs
(MS1 = post, SL1 = mid, SL2 = baseline) plus info.json. A sibling script
builds the pickle index that the Dataset class later looks up at load
time.
"""
import math, json, pickle
from pathlib import Path
import rasterio

PATCH_SIZE     = 224
STRIDE         = 224                 # no overlap, deterministic grid
ACT_ID, AOI_ID = 999, 1               # banda_aceh as a custom 'event'


def slice_scene(scene_tif: Path, out_root: Path, role: str) -> list[dict]:
    """Walk the patch grid, write <role>.tif per cell, record info."""
    ...


def build_pickle(records: list[dict], out: Path) -> None:
    """Pickle {(act_id, aoi_id): [{info, path}, ...]} for Dataset lookup."""
    ...


def main() -> None:
    """Run slice_scene on pre-2, pre-1, post. Write pickle index last."""
    ...

/ STAGE 02 · INTERACTIVE · TILE EXPLORER

Sample one 224² tile and see what's inside.

Each KuroSiwo-format tile directory packs 6 GeoTIFFs: VV + VH at three acquisition times — pre-event 1 (21 Oct, baseline), pre-event 2 (2 Nov, approach) and co-event (26 Nov, main flood scene). Press Sample to load a random tile from the …-tile Banda Aceh test split. Each click round-trips to the WSL box in ~1 s.

Sentinel-1 VV backscatter · Banda Aceh · 21 October 2025 · pre-event baseline — pre-event 1 · VV 21 Oct 2025 baseline — river dark, city bright, no flood

Sentinel-1 VV backscatter · Banda Aceh · 2 November 2025 · approach — pre-event 2 · VV 2 Nov 2025 approach — same scene ~11 days before the flood

Sentinel-1 VV backscatter · Banda Aceh · 26 November 2025 · main flood scene — co-event · VV 26 Nov 2025 flood visible — new dark patches across farmland

Three Sentinel-1 VV composites — the same scene, ~11 days apart. Each tile the model sees is a 224 × 224 px crop of these, stacking all 6 bands (VV + VH at each of the three dates). About 911 such tiles make up the Banda Aceh test split. When the GPU endpoint is online, the Sample button above will sample one at random and show its 6 bands.

03 / NORMALIZATION STATS ~3 min read

The root cause of the clamp story.

The training set (KuroSiwo) was dominated by scenes where VH backscatter rarely exceeded 0.15. Banda Aceh isn't like that. Before inference, we recompute per-region mean/std at three clamp cut-offs — and the numbers tell you immediately why clamp = 0.3 is the right choice.

script infer/calculate_banda_aceh_stats.py

Two facts drive the whole story on the case page:

Banda Aceh VH is ~5–8× brighter than the KuroSiwo training mean. If you normalize with the training stats, almost every VH pixel is mapped to "very bright" → the model loses its ability to separate flood from vegetation.
The clamp itself silently truncates pixels. At 0.15, 71% of VH values are clipped to the ceiling; the model never sees variation in the flooded paddies. At 0.5, only 16% clip, but speckle noise dominates. 0.3 is the sweet spot — and it's the recommendation you see recommended in CONFIGS on the case page.

Scenes 3 × processed TIFs · VV + VH bands only pixels > 0 (skip NoData)
Clamps swept [0.15, 0.3, 0.5] matches training-time, recommended, aggressive

OUT

Stats table per-clamp VV/VH mean & std configs/banda_aceh_adapted_configs.json
Finding VH truncation: 71% @ 0.15 → 35% @ 0.3 → 16% @ 0.5 Banda Aceh VH is 7× brighter than KuroSiwo mean

See the 4 configs on the case page →

? CHECK YOURSELF If VH at Banda Aceh is 5× brighter than the training mean, why does clamping VH to 0.15 hurt flood detection? show hint

At clamp = 0.15, roughly 71% of VH pixels get clipped to the ceiling. Every bright value looks identical to the model. So all the nuance that distinguishes 'very bright paddy edge' from 'wet flooded paddy' is erased — the model literally can't see the signal that would separate them. A looser clamp restores that signal, at the cost of some speckle noise.

</> CODE see the actual infer/calculate_banda_aceh_stats.py excerpt show code

yang@wsl · ~/asia_flood_base python

# infer/calculate_banda_aceh_stats.py (excerpt) $ python sar_toolkit/run_banda_aceh_pipeline.py stats

# KuroSiwo training statistics (the baseline)
VV: mean=0.0953  std=0.0427
VH: mean=0.0264  std=0.0215 # Banda Aceh statistics under 3 clamp cut-offs
clamp = 0.15
  VV: mean=0.050021  std=0.034309
  VH: mean=0.131718  std=0.036703 # VH mean is 5× KuroSiwo · 71% of VH pixels are truncated

clamp = 0.30 <-- recommended
  VV: mean=0.053819  std=0.048925
  VH: mean=0.207845  std=0.091627 # VH truncation drops to 35%, std explodes — information returns

clamp = 0.50
  VV: mean=0.055808  std=0.060930
  VH: mean=0.256493  std=0.153328 # VH truncation 16% — but noise starts dominating signal

sar_toolkit/infer/calculate_banda_aceh_stats.py

● pending · synopsis mode

Per-region clamp statistics

Sweeps the clamp at {0.15, 0.30, 0.50}, recomputes per-channel mean and std for the Banda Aceh scene, and writes the JSON the inference step reads.

/ AFTER READING

Explain why feeding a test-time scene through training-time normalization can silently hide a flood, and why clamp + mean/std are inseparable.

NOTEBOOK Notebook §4.3 · Clamping

Synopsis only. The full source will be streamed from sar_toolkit/infer/calculate_banda_aceh_stats.py once the teaching endpoint is wired. The outline below reflects the real file's signatures, constants and shape contracts.

"""calculate_banda_aceh_stats.py — per-region, per-clamp normalization.

Input : 3 × processed TIFs (VV, VH bands, > 0 pixels only).
Output: configs/banda_aceh_adapted_configs.json
          { clamp015: { data_mean, data_std, clamp_input }, clamp03: ..., ... }
The four entries (+ 'original' = KuroSiwo defaults) are the four
configs the inference step loops over.
"""
import json
from pathlib import Path
import numpy as np
import rasterio

CLAMPS = [0.15, 0.30, 0.50]


def iter_pixels(tif: Path) -> np.ndarray:
    """Read VV & VH, drop NoData (<= 0), return (N, 2) array."""
    ...


def stats_for_clamp(x: np.ndarray, c: float) -> dict:
    """Per-channel mean/std after clipping, plus truncation ratio."""
    x_c    = np.minimum(x, c)
    trunc  = (x > c).mean(axis=0)
    mean   = x_c.mean(axis=0)
    std    = x_c.std(axis=0)
    return {"clamp_input": c, "data_mean": mean.tolist(),
            "data_std": std.tolist(), "truncated_pct": trunc.tolist()}


def main() -> None:
    pixels  = np.vstack([iter_pixels(p) for p in sorted(TIFS)])
    configs = {f"clamp{int(c * 100):02d}": stats_for_clamp(pixels, c)
               for c in CLAMPS}
    configs["original"] = KUROSIWO_DEFAULTS        # training-time stats
    Path("configs/banda_aceh_adapted_configs.json").write_text(
        json.dumps(configs, indent=2))

Figure 03 · Banda Aceh VH backscatter distribution, with the three clamp cut-offs

The pink/amber/teal zones mark what each clamp truncates. At clamp 0.15 the model sees almost none of the true distribution — most flood-vs-edge variation lives above 0.15 in this scene. At clamp 0.5 you keep the signal but the far tail brings speckle noise in with it. The 0.3 setting threads the needle.

/ STAGE 03 · INTERACTIVE · CLAMP PLAYGROUND

Drag the clamp, watch the model's view of Banda Aceh change.

Every bar is a VH backscatter bucket. Everything to the right of your clamp value gets clipped to the ceiling — identical to the model. Find the clamp that keeps the flood tail visible without drowning in speckle. This one runs entirely in your browser.

clamp0.300

truncated17.6%

post-clamp mean0.1884

× KuroSiwo7.14×

Goldilocks · most of the flood tail survives, noise still manageable.

04 / INFERENCE ~2 min read

CS-Mamba · 6 channels in, 3 classes out, 4 configs.

The trained checkpoint is loaded once; the Dataset is rebuilt four times with four different (clamp, mean, std) triples. Each run stitches 224² predictions back to a full 2.8-million-pixel map and writes a GeoTIFF plus a stats JSON.

script infer/predict_banda_aceh_adapted.py

The input tensor is the temporal stack: both pre-event scenes (20251021, 20251102) and the post-event scene (20251126), each contributing VV+VH, for 6 channels total. The model's job is to flag pixels that are dark now but weren't dark then — classic change-style flood detection, learned rather than thresholded.

Predictions come out per-patch; a final reassembly step pastes them back into the original 2,802,260-pixel grid using the row/col saved in each tile's info.json. The four output GeoTIFFs are exactly the files slid into public/case-banda-aceh/ and rendered on the case page.

Checkpoint CSMamba_FloodFocus_best_model.pt assets/checkpoints/
Model U-shape · Cross-Scale Mamba blocks · 3-class head embed_dim=96 · depths=[1,1,6,1] · ISPRS 2026 submission
Input tensor cat([pre_event_2, pre_event_1, post_event], dim=1) 6 channels · 224² · float32

OUT

GeoTIFF × 4 flood_prediction.tif · 0=land, 1=water, 2=flood outputs/banda_aceh/prediction_results_adapted_<cfg>/
Stats JSON × 4 no_water_pct · permanent_water_pct · flood_pct prediction_stats.json

? CHECK YOURSELF The input tensor has 6 channels. Where do the 6 come from? show hint

Three dates (pre-event-2 on 21 Oct, pre-event-1 on 2 Nov, post-event on 26 Nov), each contributing VV and VH polarization → 3 × 2 = 6. They're stacked along the channel dimension with torch.cat so the network sees the time series jointly, not sequentially.

</> CODE see the actual infer/predict_banda_aceh_adapted.py excerpt show code

yang@wsl · ~/asia_flood_base python · torch

# infer/predict_banda_aceh_adapted.py (excerpt)
model = CSMamba(   # Cross-Scale Mamba — our RSMamba extension, ISPRS 2026
    img_size=224, in_channels=6, num_classes=3,
    embed_dims=[96, 192, 384, 768],
    depths=[1, 1, 6, 1], d_state=16,
).to(device).eval()
model.load_state_dict(torch.load("…/FloodFocus_best_model.pt")["model_state_dict"])

# For each of 4 configs: rebuild Dataset with new clamp/mean/std for cfg_key in ["original", "clamp015", "clamp03", "clamp05"]:
    cfg = ADAPTED_CONFIGS[cfg_key]
    ds  = Dataset(mode="test", configs={
        "clamp_input": cfg["clamp_input"],
        "data_mean":   cfg["data_mean"],
        "data_std":    cfg["data_std"],
        ...
    })

    with torch.no_grad():
        for _, _, image, _, _, _, pre1, _, _, pre2 in loader:
            x = torch.cat([pre2, pre1, image], dim=1)     # (B, 6, 224, 224)
            pred = model(x).argmax(1).cpu().numpy()   # (B, 224, 224)
            reassemble(pred, row_idx, col_idx)             # into 2.8M-pixel map

    rasterio.write("flood_prediction.tif", full_map)
    json.dump(stats, "prediction_stats.json")

sar_toolkit/infer/predict_banda_aceh_adapted.py

● pending · synopsis mode

Inference · 4 configs, one scene

Loads the checkpoint once, rebuilds the Dataset four times with four (clamp, mean, std) triples, and stitches 224² predictions back into a 2.8-million-pixel GeoTIFF per config.

/ AFTER READING

Justify why model(x).argmax(1) is the entire decision rule, and point at the line where the four configs diverge.

NOTEBOOK Notebook §5 · Experiments

Synopsis only. The full source will be streamed from sar_toolkit/infer/predict_banda_aceh_adapted.py once the teaching endpoint is wired. The outline below reflects the real file's signatures, constants and shape contracts.

"""predict_banda_aceh_adapted.py — 4 configs × one scene.

For each of {original, clamp015, clamp03, clamp05}:
  1. Rebuild KuroSiwoDataset with that config's (clamp, mean, std).
  2. Loop 224² patches, forward, argmax → 3-class mask.
  3. Paste each prediction back at its (row, col) → 2.8-Mpx full map.
  4. Write GeoTIFF + stats.json into prediction_results_adapted_<cfg>/.
"""
import json
from pathlib import Path
import numpy as np
import rasterio
import torch
from torch.utils.data import DataLoader

from models.csmamba           import CSMamba
from dataset.kurosiwo_dataset import KuroSiwoDataset

ADAPTED_CONFIGS = json.loads(
    Path("configs/banda_aceh_adapted_configs.json").read_text()
)


def build_model(device: str) -> CSMamba:
    model = CSMamba(
        img_size=224, in_channels=6, num_classes=3,
        embed_dims=[96, 192, 384, 768], depths=[1, 1, 6, 1], d_state=16,
    ).to(device).eval()
    ckpt = torch.load("assets/checkpoints/CSMamba_FloodFocus_best_model.pt")
    model.load_state_dict(ckpt["model_state_dict"])
    return model


def reassemble(preds, row_idx, col_idx, H_full: int, W_full: int) -> np.ndarray:
    """Paste (B, 224, 224) predictions back into a (H_full, W_full) map."""
    ...


def main() -> None:
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model  = build_model(device)

    for cfg_key in ("original", "clamp015", "clamp03", "clamp05"):
        cfg = ADAPTED_CONFIGS[cfg_key]
        ds  = KuroSiwoDataset(mode="test", configs=cfg, pickle_path=PKL)
        dl  = DataLoader(ds, batch_size=8, num_workers=4, shuffle=False)
        full = np.zeros((H_full, W_full), dtype=np.uint8)

        with torch.no_grad():
            for image, pre1, pre2, _, _, rr, cc in dl:
                x = torch.cat([pre2, pre1, image], dim=1).to(device)      # (B, 6, 224, 224)
                y = model(x).argmax(1).cpu().numpy()                       # (B, 224, 224)
                reassemble_inplace(full, y, rr, cc)

        write_geotiff(full, out_dir / "flood_prediction.tif")
        dump_stats(full, out_dir / "prediction_stats.json")

SAR post-event scene — The **clamp = 0.3** run. Every pixel in the middle panel is a call the model made without any threshold or post-processing. The right panel is that same call, colored and composited over the SAR so a human can read it.

Flood prediction map — The **clamp = 0.3** run. Every pixel in the middle panel is a call the model made without any threshold or post-processing. The right panel is that same call, colored and composited over the SAR so a human can read it.

/ STAGE 04 · INTERACTIVE · INFERENCE STATION

One unseen tile. Four clamps. Your experiment.

Same model weights, same unseen 2025 tile — only the preprocessing clamp changes. Flip to LIVE to run on the GPU right now, or stay on CACHED to compare against the figure in the Notebook. Use NEXT TILE to move to a different patch of Banda Aceh. Nothing in the training set came from this scene.

CONFIG

FLOOD 10.43%

WATER 23.59%

LAND 65.98%

REGIONS 4,697

from prediction_report.json · full scene

● CACHED · static figure identical to the homepage showcase

05 / VALIDATION ~2 min read

No ground truth? Triangulate.

There's no pixel-level flood label for Banda Aceh on 2025-11-26. Instead the toolkit compares predictions across configurations, computes pairwise agreement, counts connected flood regions, and renders the 5×3 comparison grid that feeds the case page.

script validate/validate_predictions.py

The validation script doubles as the renderer. Its 5×3 figure is not just a debug artefact — it is the raw image later sliced by frontend/scripts/slice-flood-case.py into the per-config tiles you see in the interactive showcase. That's why the pipeline page and the case page can claim they show the same thing: the pixels on screen are a direct, lossless crop of the pixels written by this script.

Predictions 4 × flood_prediction.tif prediction_results_adapted_{original,clamp015,clamp03,clamp05}/
Reference SAR VV + VH bands of the post-event scene for visual overlay only

OUT

Validation report per-config stats · pairwise agreement · boundary ratio validation_report.json
Comparison figure 5 rows × 3 cols · reference + 4 configs · 2664×4483 prediction_comparison.png
Difference maps 6 pairwise diff PNGs difference_<a>_vs_<b>.png

Back to the pairwise disagreement gallery →

? CHECK YOURSELF Two configs produce 90% pixel agreement. Does that mean they're almost the same prediction? show hint

Not really. 90% agreement sounds high, but only ~5-15% of a SAR scene is actually flood/water pixels in the first place. Most of the 90% is both configs correctly calling dry land 'no water'. The interesting signal is the disagreement concentrated around the flood edges — that's where the pairwise diff maps on the case page become more informative than the single agreement number.

</> CODE see the actual validate/validate_predictions.py excerpt show code

yang@wsl · ~/asia_flood_base python

# validate/validate_predictions.py (excerpt) $ python sar_toolkit/run_banda_aceh_pipeline.py validate

# Per-config spatial diagnostics from scipy import ndimage
labeled_flood, num_flood = ndimage.label(pred == 2)
boundary_ratio = sum_of_class_boundaries / (2 * (H + W))

# Pairwise agreement across every config pair for a, b in combinations(configs, 2):
    agreement = (pred_a == pred_b).mean()   # 0.0 ... 1.0 # The reported numbers (validation_report.json)
original_vs_clamp03   → agreement = 0.8413
clamp015_vs_clamp03   → agreement = 0.9462
clamp03_vs_clamp05    → agreement = 0.9050
original_vs_clamp05   → agreement = 0.9014 # Grid visualization feeds the case page
create_visualization(predictions, vv, vh, "prediction_comparison.png")
# → later sliced by frontend/scripts/slice-flood-case.py #   into row/cell webp tiles under public/case-banda-aceh/

sar_toolkit/validate/validate_predictions.py

● pending · synopsis mode

Validation without ground truth

Per-config connected-component stats + pairwise pixel agreement between every config pair. Renders the 5×3 comparison grid later sliced into the case-page tiles.

/ AFTER READING

Explain why 90 % pairwise agreement is a misleading number on a scene where only ~10 % of pixels are flood, and what triangulation offers instead.

Synopsis only. The full source will be streamed from sar_toolkit/validate/validate_predictions.py once the teaching endpoint is wired. The outline below reflects the real file's signatures, constants and shape contracts.

"""validate_predictions.py — cross-config triangulation.

Inputs : 4 × flood_prediction.tif (one per config).
Outputs: validation_report.json + prediction_comparison.png + 6 diff PNGs.
"""
from itertools import combinations
from pathlib import Path
import json
import numpy as np
import rasterio
from scipy import ndimage


def per_config_diagnostics(pred: np.ndarray) -> dict:
    """Pixel counts per class, flood connected components, boundary ratio."""
    labeled, n = ndimage.label(pred == 2)
    return {"flood_regions": int(n), "flood_pct": float((pred == 2).mean())}


def pairwise_agreement(preds: dict[str, np.ndarray]) -> dict:
    """For every unordered pair, (pred_a == pred_b).mean()."""
    return {f"{a}_vs_{b}": float((preds[a] == preds[b]).mean())
            for a, b in combinations(preds.keys(), 2)}


def render_comparison(preds, vv, vh, out: Path) -> None:
    """5 rows × 3 cols: reference + 4 configs, side-by-side with SAR."""
    ...


def main() -> None:
    preds   = {k: rasterio.open(TIF[k]).read(1) for k in CONFIGS}
    report  = {k: per_config_diagnostics(p) for k, p in preds.items()}
    report |= pairwise_agreement(preds)
    Path("validation_report.json").write_text(json.dumps(report, indent=2))
    render_comparison(preds, vv, vh, Path("prediction_comparison.png"))

Original vs. clamp 0.3 — **Red** = flood only in A · **blue** = flood only in B. The top-left (original vs. recommended) has the lowest agreement — and the difference is mostly blue, meaning the training-time clamp silently *missed* real flood. That's the whole case page in one figure.

Clamp 0.15 vs. 0.3 — **Red** = flood only in A · **blue** = flood only in B. The top-left (original vs. recommended) has the lowest agreement — and the difference is mostly blue, meaning the training-time clamp silently *missed* real flood. That's the whole case page in one figure.

/ STAGE 05 · INTERACTIVE · AGREEMENT MATRIX

Click any pair — see where the models actually disagree.

No ground truth exists for Banda Aceh on 2025-11-26, so we triangulate: measure the pixel-for-pixel agreement between every pair of configurations. Low numbers are not wrong — they're the teaching signal. Clicking a cell pulls up the real disagreement map.

/ TRAINING TRAJECTORY

Scrub through the 37-epoch run that produced the checkpoint above.

The inference you see on the case page is not magic — it comes from a specific checkpoint, saved at a specific epoch of a specific training run. Below is the real shape of that run: loss, per-class IoU, learning-rate schedule, and the exact epoch where the best weights were picked.

Note · The run is real: 37 epochs, best val at epoch 12, 79.79 % test mIoU with TTA. Per-epoch curve values below are an approximation that matches the three-phase summary in TEACHING_NOTES_BEST_MODEL.md. The full 37-row log ships at `public/sources/checkpoints/UNetRSMamba_FloodFocus2/`.

Model: CS-Mamba (class name UNetRSMamba) · 40.55M params
Data: KuroSiwo Honduras · 6322 train · 649 val · 911 test
Compute: WSL2 · 1× RTX 4090 · PyTorch 2.3 · fp16 AMP · EMA 0.999

Loss train val

Validation IoU flood water land

epoch 1 / 37

speed

PREVIEW · E1 flood IoU 1.1%

The network's view, animated. As flood IoU climbs, the prediction sharpens from noise into the final Banda Aceh flood map. Illustrative reveal — actual per-epoch predictions were not exported.

Epoch snapshot

01 / 37

Flood IoU: 1.1%
Mean IoU: 28.6%
Pixel acc: 83.2%
Train loss: 1.208
Val loss: 1.220
Learning rate: 1.0e-5
Flood P / R: 28.8% / 10.0%

· PHASE A STARTS

Linear LR warmup over 10 epochs begins. "No water" IoU shoots above 90 % almost immediately — the model learns to say "majority class" first. Flood IoU is still < 1 %.

/ HYPERPARAMETERS

LossFocal + Dice (0.35 / 0.65, γ = 2.5, class_weights [0.2, 2.5, 4.0])
OptimizerAdamW (β₁ = 0.9, β₂ = 0.999, wd = 1e-4, grad clip ≤ 1.0)
SchedulerCosine anneal · 10-epoch linear warmup · lr 1e-6 → 1e-4 → 1e-6
Batch × size16 × 224²
Channels6 in → 3 classes

/ GLOSSARY

Eight terms, one page.

Every jargon word used above, defined plainly. Each entry points at the stage where the term first shows up. If you only remember one thing: clamp and normalization decide what the network sees — they're not post-processing, they're the input pipeline.

SARSynthetic Aperture Radar: A side-looking radar that synthesizes a long virtual antenna from the motion of the satellite, producing ground imagery by measuring how much of its own microwave pulse bounces back. first used in hero, deep-dived in the homepage primer
Backscatterσ⁰ (sigma-naught): The fraction of transmitted radar energy that returns to the satellite from a given ground patch. Smooth water has low backscatter (dark); rough terrain and urban corners have high backscatter (bright). This is the raw number the whole pipeline works on. stage 01 · SNAP calibration gives you calibrated backscatter
VV / VHpolarization channels: VV = transmit vertical, receive vertical. VH = transmit vertical, receive horizontal. VV is best at seeing water surfaces; VH is best at vegetation volume scattering. Sentinel-1 delivers both; we feed both to the network. stage 02 · each tile stores VV + VH as a 2-band TIF
DEMDigital Elevation Model: A raster of ground elevation. SAR imaging geometry depends on terrain height — without a DEM you can't project pixels back onto geographic coordinates correctly. We ship an SRTM tile so every machine gets the same DEM. stage 01 · passed to SNAP as -PexternalDEMFile
Clampinput saturation ceiling: Before normalization, every backscatter value above the clamp is clipped down to the clamp. Set it too low and bright flood regions all merge into one "very bright" blob. Set it too high and noise dominates. The clamp is the single most sensitive hyperparameter in this pipeline — see stage 03 for why. stage 03 · 0.15 / 0.3 / 0.5 swept and compared
Specklecoherent-imaging noise: The salt-and-pepper graininess characteristic of SAR images. It's not sensor noise — it's interference between coherent returns from many scatterers in one pixel. Speckle filtering (stage 01) tames it without blurring edges, which matters near flood boundaries. stage 01 · SNAP speckle filter step
IoUIntersection over Union: For a class, IoU = (pixels both model and truth call this class) ÷ (pixels either model or truth call this class). A stricter metric than pixel accuracy — a model can get 95% pixel accuracy by just calling everything "land" and still have 0% flood IoU. training trajectory · flood IoU is the number we optimize
argmaxfinal decision step: At each pixel the network produces 3 scores (land, water, flood). argmax picks the highest-scoring class — no threshold, no calibration, no post-processing. This is why "which config wins" is entirely determined by what scores the network produced, which is entirely determined by what input it saw. stage 04 · model(x).argmax(1) is the whole decision rule

/ LOOP CLOSED

You've walked the full path.
Now interrogate the outputs — or the argument.

OPEN THE CASE PAGE READ THE NOTEBOOK REQUEST THE TOOLKIT

From a raw radar archive to the flood map on the case page.

The 5-script dispatcher

SNAP preprocessing driver

Tile-builder · 3 TIFs → KuroSiwo patches

Per-region clamp statistics

Inference · 4 configs, one scene

Validation without ground truth

Scrub through the 37-epoch run that produced the checkpoint above.

You've walked the full path. Now interrogate the outputs — or the argument.

You've walked the full path.
Now interrogate the outputs — or the argument.