fix(cosmos): enable PPCB to engage for EPK-range feed-range queries#4623
Conversation
Fixes #4611. PPCB partition-level marking is keyed on partition_key_range_id. Previously, pre_resolve_partition_key_range_id only extracted the ID if the target was a logical partition key. EPK-range queries (e.g. \SELECT * FROM c\ scoped to one partition via FeedRange) yielded \None\, so the failover pipeline ran its first attempt without a known PK range ID. When the first attempt hit a 503 from the hub region, the PPCB skipped marking the partition unavailable, meaning the failure counter never accumulated and the breaker never tripped. This fix resolves the EPK range via \ esolve_overlapping_ranges\ and seeds the ID when the range maps to exactly one physical partition. With the ID seeded from the first attempt, \MarkPartitionUnavailable\ is applied reliably, allowing the breaker to trip.
…4611) Extracts the single-overlapping-range decision from pre_resolve_partition_key_range_id into a pure, documented single_overlapping_range_id helper so the behavior is unit-testable without a full driver. Adds six tests locking in the incident-mitigation contract: single->Some and multi->None (both as direct helper unit tests and as cache-backed tests driving the real resolve_overlapping_ranges with a mocked fetch, matching the existing resolve_overlapping_ranges_* pattern), empty->None, and logical-PK->owning-range unchanged.
There was a problem hiding this comment.
Pull request overview
Fixes a Cosmos driver routing bug where the per-partition circuit breaker (and per-partition automatic failover) couldn’t attribute first-attempt failures for EPK-range FeedRange targets because partition_key_range_id was never pre-seeded.
Changes:
- Update
CosmosDriver::pre_resolve_partition_key_range_idto seed a PK range id for EPK-range feed ranges by resolving overlapping routing-map ranges and returning the id only when exactly one physical partition overlaps. - Add
single_overlapping_range_idhelper to collapse overlap results into an optionalPartitionKeyRangeId. - Add unit tests covering the single/multi/empty overlap collapse behavior plus cache-backed overlap resolution scenarios.
…4623 review) Addresses Copilot's review comment on pre_resolve_partition_key_range_id: it resolved (and cloned) every overlapping PartitionKeyRange just to check whether exactly one partition overlapped, discarding the vector in the common multi-partition case. - Adds ContainerRoutingMap::single_overlapping_range_id, which reuses the same O(log n) binary-search bounds (factored into a shared overlapping_range_bounds helper) and clones at most one range ID instead of every overlapping range. - Adds PartitionKeyRangeCache::resolve_single_overlapping_range_id as the non-cloning counterpart to resolve_overlapping_ranges. - pre_resolve_partition_key_range_id now calls the new cache method directly and maps the optional ID, removing the driver-level single_overlapping_range_id collapse helper. - Moves the single/multi/empty collapse unit tests down to the routing-map layer (where the logic now lives) and rewires the cache-backed EPK tests to the new non-cloning method. Same single->Some / multi->None / logical-PK-unchanged coverage, no wasted clone.
analogrelay
left a comment
There was a problem hiding this comment.
Seems OK, but I wonder if we should require that when an operation reaches the operation pipeline, it must map to a single PK Range (according to the client's cache). If it maps to multiple (which should only happen if a PK Range cache refresh hits between the dataflow pipeline and the operation pipeline for a given operation), we should actually consider emitting a synthetic 410 to force the dataflow pipeline to "split".
|
What might have happened here is that because PPCB and the dataflow pipeline logic landed so close, we got a semantic merge conflict. The dataflow pipeline added the concept of |
|
Honestly, we might want to just make a |
…nges The merge-preview against origin/main surfaced pre-existing breakage: #4588 renamed CosmosDriverRuntime::get_or_create_driver to create_driver (now taking only DriverOptions, account embedded), made PartitionFailoverOptions fields private, and removed the per-operation PPCB knobs from OperationOptions. Several inherited driver tests still used the old APIs and only compiled behind feature gates, so CI's --all-features --all-targets clippy fails. Fixes: migrate the three create_driver call sites to the new single-arg signature; migrate the PPCB override tests to configure thresholds via driver-level PartitionFailoverOptions (new build_ppcb_fixture with a 1-failure threshold; build_fixture delegates with None so non-PPCB tests are unchanged); cast write_failure_threshold() (u32) to i32 for the write_failure_count field; add a test-only PartitionFailoverConfig alias to keep inherited operation_pipeline tests compiling.
…ture Addresses @analogrelay's review on #4623. The dataflow pipeline fans a query into per-physical-partition sub-operations and stamps the owning partition_key_range_id (plus narrowed feed range / partition key) onto OperationOverrides rather than mutating the shared CosmosOperation. pre_resolve_partition_key_range_id previously inspected only the CosmosOperation, so for EPK-range query sub-ops it re-resolved overlapping ranges from scratch and could collapse to None on a multi-range match -- silently dropping the PPCB/PPAF seed. Fix: pass OperationOverrides into pre_resolve_partition_key_range_id and consult it first -- (1) use overrides.partition_key_range_id directly when present (no cache lookup, no multi-range collapse), (2) else resolve a logical partition key from overrides/operation, (3) else resolve an EPK range from overrides/operation, seeding only on a single owning partition. Also fixes the W2 in-memory-emulator failure: the non-PPCB region-failover fixture (build_fixture) was left on the driver-default circuit_breaker_enabled=true after the #4588 PPCB-config migration, so pre-resolution issued an unfaulted pkranges read to the primary that polluted the recorded host list. build_fixture now explicitly disables PPCB; build_ppcb_fixture keeps it enabled with a 1-failure threshold.
…611_ppcb_feed_range_queries # Conflicts: # sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs
…ution Add a warn! + debug_assert! in ContainerRoutingMap::single_overlapping_range_id when a feed range maps to multiple physical partitions. By the time a request reaches the operation pipeline it should map to exactly one partition (the dataflow pipeline splits multi-partition feed ranges first); a multi-owner result here signals a stale routing map / partition-split race. Behavior is unchanged in release builds (still returns None so PPCB/PPAF degrade gracefully). Existing multi-partition tests become should_panic to cover the debug assertion.
PPCB: Seed partition key range ID for EPK-range feed ranges
Fixes: #4611
Summary
This PR fixes a bug where the Per-Partition Circuit Breaker (PPCB) failed to engage for feed-range queries targeted using an Effective Partition Key (EPK) range (e.g.,
SELECT * FROM cscoped to one physical partition viaFeedRange).Because the breaker never tripped, the SDK would continue contacting the faulted hub region on every query instead of routing traffic away.
Root cause
PPCB attributes failures using the
partition_key_range_idinOperationRetryState.Previously,
CosmosDriver::pre_resolve_partition_key_range_idonly populated this ID if the operation targeted a logical partition key.For
FeedRangeRepr::Range(EPK ranges), no logical partition key exists, so the function returnedNone.With no pre-resolved ID:
LocationStateStore::record_hedge_outcomedrops theMarkPartitionUnavailableeffect because there is no ID to attribute the failure to.read_failure_thresholdfrom being reached.The Fix
Updated
pre_resolve_partition_key_range_idincosmos_driver.rsto introspectFeedRangetargets.resolve_partition_key_range_idlookup.pk_range_cache.resolve_overlapping_ranges(min_inclusive..max_exclusive).With this change, the first attempt of an EPK-range query starts with the correct
partition_key_range_id. Unavailability effects map successfully, counters increment, and the breaker correctly trips.Validation
cargo fmt+cargo clippy --testsclean.