Summary
write_dataset is append-only: when the store already exists it always uses {"mode": "a", "append_dim": "time"} (storage/zarr_store.py:480). There is no insertion logic — a new date is concatenated onto the end of the time axis regardless of its chronological position. The doy attr is likewise appended positionally (zarr_store.py:840), tracking the same physical order.
This means any out-of-order arrival leaves the store's time axis non-monotonic.
How it bites
Concrete path (the one that surfaced this):
ingest_s2_roi_reflectance processes a broad date range. A date is skipped — e.g. an asset-incomplete STAC item causes the SCL load to be dropped in _compute_scl_phase (see the No such band/alias handling in ingest/s2_roi.py). Later dates are written normally.
- The skipped date is never written, so it never enters
get_existing_dates.
- On a later rerun (e.g. after the upstream item is reprocessed into a complete one),
query_stac_items(existing_dates=...) filters out the already-written later dates but keeps the previously-skipped earlier date, which then gets appended at the end.
Result, e.g.:
2025-07-12, 2025-07-14, 2025-07-11 ← out of chronological order
This is not unique to the dropped-date case — any backfill of an older date into an existing store produces it.
Why it's a problem
- Consumers that assume a monotonic time axis break: positional/label slicing (
.sel(time=slice(...)) on a non-monotonic index raises in xarray/pandas), and resolve_region's contiguity check explicitly rejects an out-of-order axis (zarr_store.py:659-661).
- Label-based selection and
get_existing_dates (set-valued) are unaffected, so the corruption is silent until something order-sensitive runs.
Proposed fix
Preferred: make the append path fail fast rather than silently append out of order — reject a write whose new time coordinate(s) are <= the store's current max time, with a clear error pointing at this issue. Appending strictly-increasing dates stays the fast path; backfills require an explicit ordered-insert/rewrite path (note resolve_region is overwrite-only and cannot grow the axis).
Alternative (weaker): sort-on-read in consumers — does not fix the on-disk invariant and is easy to forget.
Context
Found while debugging an S2 ROI ingest failure on conus_corn_sample_49 for 2025-07-10..2025-07-15, where earth-search served an asset-incomplete item (S2A_13UEQ_20250711_0_L2A, missing scl + reflectance bands) that crashed the whole run. The immediate crash is handled by dropping the date; this issue tracks the deeper append-ordering invariant the drop exposes.
Summary
write_datasetis append-only: when the store already exists it always uses{"mode": "a", "append_dim": "time"}(storage/zarr_store.py:480). There is no insertion logic — a new date is concatenated onto the end of the time axis regardless of its chronological position. Thedoyattr is likewise appended positionally (zarr_store.py:840), tracking the same physical order.This means any out-of-order arrival leaves the store's time axis non-monotonic.
How it bites
Concrete path (the one that surfaced this):
ingest_s2_roi_reflectanceprocesses a broad date range. A date is skipped — e.g. an asset-incomplete STAC item causes the SCL load to be dropped in_compute_scl_phase(see theNo such band/aliashandling iningest/s2_roi.py). Later dates are written normally.get_existing_dates.query_stac_items(existing_dates=...)filters out the already-written later dates but keeps the previously-skipped earlier date, which then gets appended at the end.Result, e.g.:
This is not unique to the dropped-date case — any backfill of an older date into an existing store produces it.
Why it's a problem
.sel(time=slice(...))on a non-monotonic index raises in xarray/pandas), andresolve_region's contiguity check explicitly rejects an out-of-order axis (zarr_store.py:659-661).get_existing_dates(set-valued) are unaffected, so the corruption is silent until something order-sensitive runs.Proposed fix
Preferred: make the append path fail fast rather than silently append out of order — reject a write whose new time coordinate(s) are <= the store's current max time, with a clear error pointing at this issue. Appending strictly-increasing dates stays the fast path; backfills require an explicit ordered-insert/rewrite path (note
resolve_regionis overwrite-only and cannot grow the axis).Alternative (weaker): sort-on-read in consumers — does not fix the on-disk invariant and is easy to forget.
Context
Found while debugging an S2 ROI ingest failure on
conus_corn_sample_49for2025-07-10..2025-07-15, where earth-search served an asset-incomplete item (S2A_13UEQ_20250711_0_L2A, missingscl+ reflectance bands) that crashed the whole run. The immediate crash is handled by dropping the date; this issue tracks the deeper append-ordering invariant the drop exposes.