Skip to content

fix(cosmos): enable PPCB to engage for EPK-range feed-range queries#4623

Merged
kundadebdatta merged 12 commits into
mainfrom
users/kundadebdatta/4611_ppcb_feed_range_queries
Jun 18, 2026
Merged

fix(cosmos): enable PPCB to engage for EPK-range feed-range queries#4623
kundadebdatta merged 12 commits into
mainfrom
users/kundadebdatta/4611_ppcb_feed_range_queries

Conversation

@kundadebdatta

Copy link
Copy Markdown
Member

PPCB: Seed partition key range ID for EPK-range feed ranges

Fixes: #4611

Summary

This PR fixes a bug where the Per-Partition Circuit Breaker (PPCB) failed to engage for feed-range queries targeted using an Effective Partition Key (EPK) range (e.g., SELECT * FROM c scoped to one physical partition via FeedRange).

Because the breaker never tripped, the SDK would continue contacting the faulted hub region on every query instead of routing traffic away.

Root cause

PPCB attributes failures using the partition_key_range_id in OperationRetryState.
Previously, CosmosDriver::pre_resolve_partition_key_range_id only populated this ID if the operation targeted a logical partition key.
For FeedRangeRepr::Range (EPK ranges), no logical partition key exists, so the function returned None.

With no pre-resolved ID:

  1. The first attempt of an operation runs with an empty PK Range ID.
  2. If this attempt hits a 503 (or other server error) in the hub region, LocationStateStore::record_hedge_outcome drops the MarkPartitionUnavailable effect because there is no ID to attribute the failure to.
  3. The failure counter never increments, preventing the PPCB read_failure_threshold from being reached.
  4. The operation eventually succeeds via cross-region failover, but the next request starts from scratch and the hub is contacted again.

The Fix

Updated pre_resolve_partition_key_range_id in cosmos_driver.rs to introspect FeedRange targets.

  • If it has a logical partition key, it delegates to the existing resolve_partition_key_range_id lookup.
  • If it is an EPK range, it calls pk_range_cache.resolve_overlapping_ranges(min_inclusive..max_exclusive).
  • If the overlapping ranges map to exactly one physical partition, we extract and return its ID.

With this change, the first attempt of an EPK-range query starts with the correct partition_key_range_id. Unavailability effects map successfully, counters increment, and the breaker correctly trips.

Validation

kundadebdatta and others added 2 commits June 17, 2026 13:22
Fixes #4611.

PPCB partition-level marking is keyed on partition_key_range_id. Previously, pre_resolve_partition_key_range_id only extracted the ID if the target was a logical partition key. EPK-range queries (e.g. \SELECT * FROM c\ scoped to one partition via FeedRange) yielded \None\, so the failover pipeline ran its first attempt without a known PK range ID. When the first attempt hit a 503 from the hub region, the PPCB skipped marking the partition unavailable, meaning the failure counter never accumulated and the breaker never tripped.

This fix resolves the EPK range via \
esolve_overlapping_ranges\ and seeds the ID when the range maps to exactly one physical partition. With the ID seeded from the first attempt, \MarkPartitionUnavailable\ is applied reliably, allowing the breaker to trip.
kundadebdatta and others added 2 commits June 17, 2026 17:07
…4611)

Extracts the single-overlapping-range decision from pre_resolve_partition_key_range_id into a pure, documented single_overlapping_range_id helper so the behavior is unit-testable without a full driver.

Adds six tests locking in the incident-mitigation contract: single->Some and multi->None (both as direct helper unit tests and as cache-backed tests driving the real resolve_overlapping_ranges with a mocked fetch, matching the existing resolve_overlapping_ranges_* pattern), empty->None, and logical-PK->owning-range unchanged.
@kundadebdatta kundadebdatta marked this pull request as ready for review June 18, 2026 02:29
@kundadebdatta kundadebdatta requested a review from a team as a code owner June 18, 2026 02:29
Copilot AI review requested due to automatic review settings June 18, 2026 02:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Cosmos driver routing bug where the per-partition circuit breaker (and per-partition automatic failover) couldn’t attribute first-attempt failures for EPK-range FeedRange targets because partition_key_range_id was never pre-seeded.

Changes:

  • Update CosmosDriver::pre_resolve_partition_key_range_id to seed a PK range id for EPK-range feed ranges by resolving overlapping routing-map ranges and returning the id only when exactly one physical partition overlaps.
  • Add single_overlapping_range_id helper to collapse overlap results into an optional PartitionKeyRangeId.
  • Add unit tests covering the single/multi/empty overlap collapse behavior plus cache-backed overlap resolution scenarios.

Comment thread sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs Outdated
kundadebdatta and others added 2 commits June 18, 2026 08:51
…4623 review)

Addresses Copilot's review comment on pre_resolve_partition_key_range_id: it
resolved (and cloned) every overlapping PartitionKeyRange just to check whether
exactly one partition overlapped, discarding the vector in the common
multi-partition case.

- Adds ContainerRoutingMap::single_overlapping_range_id, which reuses the same
  O(log n) binary-search bounds (factored into a shared overlapping_range_bounds
  helper) and clones at most one range ID instead of every overlapping range.
- Adds PartitionKeyRangeCache::resolve_single_overlapping_range_id as the
  non-cloning counterpart to resolve_overlapping_ranges.
- pre_resolve_partition_key_range_id now calls the new cache method directly and
  maps the optional ID, removing the driver-level single_overlapping_range_id
  collapse helper.
- Moves the single/multi/empty collapse unit tests down to the routing-map layer
  (where the logic now lives) and rewires the cache-backed EPK tests to the new
  non-cloning method. Same single->Some / multi->None / logical-PK-unchanged
  coverage, no wasted clone.
@kundadebdatta kundadebdatta enabled auto-merge (squash) June 18, 2026 15:55

@analogrelay analogrelay left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK, but I wonder if we should require that when an operation reaches the operation pipeline, it must map to a single PK Range (according to the client's cache). If it maps to multiple (which should only happen if a PK Range cache refresh hits between the dataflow pipeline and the operation pipeline for a given operation), we should actually consider emitting a synthetic 410 to force the dataflow pipeline to "split".

Comment thread sdk/cosmos/azure_data_cosmos_driver/src/driver/cache/container_routing_map.rs Outdated
Comment thread sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs
@analogrelay

Copy link
Copy Markdown
Member

What might have happened here is that because PPCB and the dataflow pipeline logic landed so close, we got a semantic merge conflict. The dataflow pipeline added the concept of OperationOverrides, which is a set of values passed into the operation pipeline to "override" those on the CosmosOperation. The reason we do that is so that we can hold a single CosmosOperation for the query, but vary only the PK Range ID, Partition Key, and EPK Range. However, pre_resolve_partition_key_range_id is only looking at the CosmosOperation. It should look at the OperationOverrides too.

@analogrelay

Copy link
Copy Markdown
Member

Honestly, we might want to just make a CosmosOperationView<'a> type that holds a &'a CosmosOperation and the OperationOverrides and provides accessors that check the overrides first. That way we have one thing we can pass through the pipeline. I wouldn't try to land that here though, let's just get this fix in and then review that.

kundadebdatta added 2 commits June 18, 2026 10:05
…nges

The merge-preview against origin/main surfaced pre-existing breakage: #4588 renamed CosmosDriverRuntime::get_or_create_driver to create_driver (now taking only DriverOptions, account embedded), made PartitionFailoverOptions fields private, and removed the per-operation PPCB knobs from OperationOptions. Several inherited driver tests still used the old APIs and only compiled behind feature gates, so CI's --all-features --all-targets clippy fails.

Fixes: migrate the three create_driver call sites to the new single-arg signature; migrate the PPCB override tests to configure thresholds via driver-level PartitionFailoverOptions (new build_ppcb_fixture with a 1-failure threshold; build_fixture delegates with None so non-PPCB tests are unchanged); cast write_failure_threshold() (u32) to i32 for the write_failure_count field; add a test-only PartitionFailoverConfig alias to keep inherited operation_pipeline tests compiling.
…ture

Addresses @analogrelay's review on #4623. The dataflow pipeline fans a query into per-physical-partition sub-operations and stamps the owning partition_key_range_id (plus narrowed feed range / partition key) onto OperationOverrides rather than mutating the shared CosmosOperation. pre_resolve_partition_key_range_id previously inspected only the CosmosOperation, so for EPK-range query sub-ops it re-resolved overlapping ranges from scratch and could collapse to None on a multi-range match -- silently dropping the PPCB/PPAF seed.

Fix: pass OperationOverrides into pre_resolve_partition_key_range_id and consult it first -- (1) use overrides.partition_key_range_id directly when present (no cache lookup, no multi-range collapse), (2) else resolve a logical partition key from overrides/operation, (3) else resolve an EPK range from overrides/operation, seeding only on a single owning partition.

Also fixes the W2 in-memory-emulator failure: the non-PPCB region-failover fixture (build_fixture) was left on the driver-default circuit_breaker_enabled=true after the #4588 PPCB-config migration, so pre-resolution issued an unfaulted pkranges read to the primary that polluted the recorded host list. build_fixture now explicitly disables PPCB; build_ppcb_fixture keeps it enabled with a 1-failure threshold.
@kundadebdatta kundadebdatta self-assigned this Jun 18, 2026
…611_ppcb_feed_range_queries

# Conflicts:
#	sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs
…ution

Add a warn! + debug_assert! in ContainerRoutingMap::single_overlapping_range_id when a feed range maps to multiple physical partitions. By the time a request reaches the operation pipeline it should map to exactly one partition (the dataflow pipeline splits multi-partition feed ranges first); a multi-owner result here signals a stale routing map / partition-split race. Behavior is unchanged in release builds (still returns None so PPCB/PPAF degrade gracefully). Existing multi-partition tests become should_panic to cover the debug assertion.
Comment thread sdk/cosmos/azure_data_cosmos_driver/src/driver/cosmos_driver.rs
@kundadebdatta kundadebdatta merged commit 4787eae into main Jun 18, 2026
12 checks passed
@kundadebdatta kundadebdatta deleted the users/kundadebdatta/4611_ppcb_feed_range_queries branch June 18, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cosmos The azure_cosmos crate

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants