@bjyberg — surfaced via CR-115 (duplicate adm0_obs rows; producer-side tracking issue AdaptationAtlas/hazards_prototype#11) but it is cross-cutting, so flagging for a convention call rather than patching one pipeline.
Problem
At admin0 zonal extraction zone_id = gaul0_code, and disputed-territory slivers each have their own gaul0 code attributed to a claimant country. So a country ends up with multiple adm0 rows — its main polygon + each disputed sliver it claims. E.g. KEN → main 137 + Ilemi 135; both admin1/admin2 = NULL, so a naive WHERE iso3='KEN' returns 2 rows and consumers double-count/plot.
Disputed slivers (hazards_prototype/R/misc/calc_disputed_weights.R): Abyei 100 (SSD/SDN), Bīr Ṭawīl 110 (EGY/SDN), Hala'Ib 133 (EGY/SDN), Ilemi 135 (KEN/SSD). (W. Sahara excluded.)
Why it's a convention, not a fix
Every admin-extracted Atlas table — exposure, hazard, hazard_exposure, observational, … — inherits this. How we attribute disputed territory is politically sensitive and must be consistent Atlas-wide.
Options
- Drop slivers — country = its main polygon only (disputed area excluded everywhere).
- Count each sliver for every claimant — politically-neutral; each claimant's national stat = area-weighted (obs-grid cell count) pooled mean/sd over {main + claimed slivers}. Disputed area appears in both countries' stats. (Pete's preferred direction.)
- Apportion — split the sliver between claimants by some weight (less neutral).
Ask
Which convention should all admin-extraction producers adopt? Once set, the pipeline implements it uniformly (the obs adm0_obs republish + the same collapse in hazard/exposure producers). Until then, consumers keep the client-side GROUP BY / AVG dedup.
Related: STAC/CDH cataloging #2 · producer issue AdaptationAtlas/hazards_prototype#11
@bjyberg — surfaced via CR-115 (duplicate
adm0_obsrows; producer-side tracking issue AdaptationAtlas/hazards_prototype#11) but it is cross-cutting, so flagging for a convention call rather than patching one pipeline.Problem
At admin0 zonal extraction
zone_id = gaul0_code, and disputed-territory slivers each have their own gaul0 code attributed to a claimant country. So a country ends up with multiple adm0 rows — its main polygon + each disputed sliver it claims. E.g. KEN → main137+ Ilemi135; bothadmin1/admin2 = NULL, so a naiveWHERE iso3='KEN'returns 2 rows and consumers double-count/plot.Disputed slivers (
hazards_prototype/R/misc/calc_disputed_weights.R): Abyei100(SSD/SDN), Bīr Ṭawīl110(EGY/SDN), Hala'Ib133(EGY/SDN), Ilemi135(KEN/SSD). (W. Sahara excluded.)Why it's a convention, not a fix
Every admin-extracted Atlas table — exposure, hazard, hazard_exposure, observational, … — inherits this. How we attribute disputed territory is politically sensitive and must be consistent Atlas-wide.
Options
Ask
Which convention should all admin-extraction producers adopt? Once set, the pipeline implements it uniformly (the obs
adm0_obsrepublish + the same collapse in hazard/exposure producers). Until then, consumers keep the client-sideGROUP BY / AVGdedup.Related: STAC/CDH cataloging #2 · producer issue AdaptationAtlas/hazards_prototype#11