cosmos: expose CosmosRuntime, consolidate client config#4588
Conversation
…efault_operation_options
Renames `CosmosDriverRuntimeBuilder::with_operation_options` to
`with_default_operation_options` and the matching runtime accessors
`CosmosDriverRuntime::operation_options()` →
`default_operation_options()` and `set_operation_options()` →
`set_default_operation_options()`. The new names make the runtime-layer
role explicit and match the option-resolution hierarchy
(per-op → per-driver → runtime → env → built-in default).
`DriverOptions{,Builder}::operation_options()` and
`DriverOptionsBuilder::with_operation_options` are unchanged — those
are the per-driver layer and keep their plain names.
Updates all in-driver call sites, the SDK's CosmosClientBuilder forwarding
in cosmos_client_builder.rs, the partition-level failover spec doc, and
in-driver tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a per-driver `user_agent_suffix` override on `DriverOptions`. The runtime continues to precompute a single `UserAgent` and now stores it behind `Arc<UserAgent>`. When a driver opts in via `DriverOptionsBuilder::with_user_agent_suffix`, the driver computes its own `UserAgent` from the runtime's wrapping-SDK identifier and the driver-level suffix and stores it in its own `Arc`. When the override is unset, the driver clones the runtime's `Arc<UserAgent>` — drivers sharing a runtime without overrides share the same allocation (verified by a new `Arc::ptr_eq` test). `CosmosDriver` now stamps requests using its own `user_agent` field (both on the data-plane hot path and through metadata refresh paths). The metadata-refresh closure captured by `LocationStateStore` now also captures the driver's `Arc<UserAgent>` so refresh requests carry the driver's identity. Bootstrap requests (account-properties probes before any `CosmosDriver` exists) keep using the runtime's User-Agent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename `CosmosDriverRuntime::get_or_create_driver` to `create_driver`. The runtime no longer caches drivers per account; each call returns a fresh `CosmosDriver` wrapped in an `Arc`. Direct consumers of the driver runtime (currently only `azure_data_cosmos`) are now responsible for any sharing they want; the SDK pattern remains "one CosmosClient => one CosmosDriver per build, with the underlying runtime shared via `CosmosClientBuilder::with_runtime". Updated all in-driver and SDK call sites, tests, doc examples, README, and ARCHITECTURE.md (per-account-cache claim replaced with the new model). Extended the driver test framework so per-operation helpers inherit `preferred_regions` configured by `run_with_unique_db_and_hedging` (formerly relied on the pre-warmed cache). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fault injection rules are now configured per-driver via `DriverOptionsBuilder::with_fault_injection_rules`, not on the shared `CosmosDriverRuntime`. Each `CosmosDriver` owns its own (potentially FI-wrapped) HTTP client factory: drivers without rules cheaply clone the runtime's factory `Arc`; drivers with rules wrap it once with `FaultInjectingHttpClientFactory`. This enables true per-driver fault-injection isolation across clients sharing a runtime.
`CosmosDriverRuntime{,Builder}::with_fault_injection_rules` and the runtime-level `fault_injection_enabled` flag are removed. The diagnostic flag is now per-driver. The SDK's `CosmosClientBuilder::with_fault_injection` continues to work; rules flow through `build_driver_options` onto the per-driver options.
Test framework migrated: `DriverTestClient` now stores FI rules and applies them per-operation driver via the new `DriverTestRunContext::driver_options()` helper.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Throughput-control-group registrations are now configurable at both the runtime layer (shared defaults across all drivers using the same runtime) and the driver layer (per-client extensions via DriverOptionsBuilder). CosmosDriver merges the runtime and driver registries at construction. The merge is additive: cross-layer (container, name) collisions error before the driver becomes visible. Within-builder collisions still error at register-time. Mutable settings (priority level, throughput bucket) propagate across layers via Arc<RwLock<...>>, so callers holding either reference observe the same updated state. CosmosDriver::new now returns Result; create_driver propagates the error with the existing CLIENT_THROUGHPUT_CONTROL_GROUP_REGISTRATION_FAILED status. All TCG lookups during request processing now route through the driver's merged registry instead of the runtime's. SDK CosmosClientBuilder::with_throughput_control_group now flows registrations into the per-driver options instead of the runtime. SDK clients sharing a runtime no longer inherit each other's TCG registrations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce CosmosRuntime and CosmosRuntimeBuilder in azure_data_cosmos as thin newtypes around the driver crate's CosmosDriverRuntime and CosmosDriverRuntimeBuilder. The SDK builder exposes only the runtime-shaped options end users are likely to touch: connection pool, default operation options, user-agent suffix, CPU refresh interval, and throughput-control groups. Runtime-only diagnostics options (workload id, correlation id) stay on the driver builder for advanced consumers. CosmosRuntime::global() returns a lazily-initialized process-wide default runtime backed by async_lock::OnceCell. It honors AZURE_COSMOS_PER_PARTITION_CIRCUIT_BREAKER_ENABLED, applies the SDK's wrapping-SDK identifier (azsdk-rust-cosmos/<version>), and — when the allow_invalid_certificates Cargo feature is enabled — defaults emulator server certificate validation to DangerousDisabled. CosmosRuntimeBuilder::build() always applies the wrapping-SDK identifier, so even custom runtimes are correctly attributed to this crate. A doc-hidden CosmosRuntimeBuilder::from_driver_builder escape hatch (gated on __internal_in_memory_emulator) lets the test harness inject mock HTTP factories while still getting the SDK's identifier wired up. Also re-export ConnectionPoolOptions, ConnectionPoolOptionsBuilder, and EmulatorServerCertValidation from azure_data_cosmos::options so users can configure custom runtimes without referencing the driver crate directly. The new types are wired up in lib.rs but are not yet referenced by CosmosClientBuilder — that integration lands in the next change, when the SDK builder gains with_runtime() and falls back to global() on build(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewire CosmosClientBuilder around the new CosmosRuntime model. The
builder now resolves a runtime at build() time (using the supplied one or
falling back to CosmosRuntime::global()), then constructs per-driver
DriverOptions carrying everything the client wants to override on top of
the runtime: default OperationOptions, user-agent suffix override,
fault-injection rules, and throughput-control groups.
API changes (breaking):
- Add with_runtime(CosmosRuntime), with_default_operation_options(OperationOptions).
- Rename with_fault_injection -> with_fault_injection_rules (now -> Result<Self>).
- Rename with_throughput_control_group -> register_throughput_control_group
(now -> Result<Self>).
- Remove with_proxy_allowed, with_allow_emulator_invalid_certificates,
with_throttling_retry_options, and the doc-hidden with_driver_runtime_builder
in favor of a custom CosmosRuntime via with_runtime.
- Drop the corresponding fields from CosmosClientOptions: allow_proxy,
allow_emulator_invalid_certificates, throughput_control_groups buffer,
fault_injection_rules buffer, driver_runtime_builder cell.
Per-client UA suffix now propagates through DriverOptions::user_agent_suffix
rather than the runtime builder, matching the runtime/driver layered model.
Tests:
- Updated tests/framework/test_client.rs: removed with_allow_emulator_invalid_certificates
calls (the global runtime handles this via the allow_invalid_certificates
Cargo feature now), renamed with_fault_injection -> with_fault_injection_rules.
- Updated tests/emulator_tests/cosmos_proxy.rs: build a custom CosmosRuntime
with a ConnectionPoolOptions configured for proxy + emulator-cert
relaxation, attach via with_runtime.
- Updated tests/emulator_tests/cosmos_backup_endpoints.rs: dropped the
per-client cert flag (handled by the global runtime).
- Updated tests/in_memory_emulator_tests/{end_to_end,user_agent}.rs:
every with_driver_runtime_builder call now routes through
CosmosRuntimeBuilder::from_driver_builder(...).build().await + with_runtime.
- Replaced one with_throttling_retry_options site with
with_default_operation_options containing the same throttling settings.
Docs:
- Updated fault_injection/mod.rs doc-comment example and prose to use
with_fault_injection_rules.
- Updated docs/sdk-to-driver-cutover.md and docs/ConfigurationOptions.md
to reflect the new builder names.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds entries to both CHANGELOGs for the runtime-options refactor:
- azure_data_cosmos_driver: per-driver UA suffix override, per-driver
fault-injection rules, per-driver throughput-control group registry
(additive merge), and the breaking renames/removals on
CosmosDriverRuntime{,Builder} (with_operation_options ->
with_default_operation_options, removal of the per-account driver
cache + get_or_create_driver -> create_driver, removal of runtime-
level fault injection, registry lookup move from runtime to driver,
CosmosDriver::new now returns Result).
- azure_data_cosmos: new CosmosRuntime / CosmosRuntimeBuilder types
(with full delegating-setter surface), CosmosRuntime::global(), new
ConnectionPoolOptions{,Builder} and EmulatorServerCertValidation
re-exports, and the slim CosmosClientBuilder surface (added
with_runtime / with_default_operation_options /
with_fault_injection_rules / register_throughput_control_group;
removed with_proxy_allowed / with_allow_emulator_invalid_certificates
/ with_throttling_retry_options / with_fault_injection /
with_throughput_control_group / with_driver_runtime_builder). The
allow_invalid_certificates Cargo feature is now scoped to
CosmosRuntime::global()'s default cert-validation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… it on driver to set options
…Options Removes the throughput-control-group registry from CosmosDriverRuntime entirely — TCGs are now a driver-level concern. The SDK's CosmosRuntimeBuilder::register_throughput_control_group is removed (use CosmosClientBuilder::register_throughput_control_group). Adds a nested ThroughputControlOptions group on OperationOptions::throughput_control mirroring the ThrottlingRetryOptions pattern, with three independently layered fields: group_name (replaces the old top-level OperationOptions::throughput_control_group), and direct throughput_bucket / priority_level overrides that emit wire headers without requiring a registered group. Final header resolution per field: direct value wins, else resolved group_name lookup, else omit. The implicit "default group for container" fallback on the request path is removed. Pipeline contexts now carry a small, Copy ResolvedThroughputControl by value instead of an Option<&ThroughputControlGroupSnapshot>, removing lifetime juggling in the attempt / hedge context structs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR promotes a shared Cosmos “runtime” concept to the SDK surface (CosmosRuntime) and restructures configuration layering so process-wide concerns (transport / TLS / proxy / UA defaults) live on the runtime while per-client/per-driver concerns (default operation options, fault injection, throughput-control groups, partition-failover tuning) live on the client/driver builder surfaces.
Changes:
- Adds
CosmosRuntime/CosmosRuntimeBuildertoazure_data_cosmosand rewiresCosmosClientBuilderto consume a runtime and produce per-client driver options. - Refactors
azure_data_cosmos_driverto remove per-account driver caching (get_or_create_driver→create_driver) and move fault injection / throughput-control groups / partition-failover tuning toDriverOptions. - Consolidates configuration into nested option groups (e.g.,
OperationOptions::throughput_control,PartitionFailoverOptions) and renames emulator TLS policy toServerCertificateValidation.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure_data_cosmos/tests/in_memory_emulator_tests/user_agent.rs | Updates tests to build SDK clients using CosmosRuntimeBuilder and with_runtime. |
| sdk/cosmos/azure_data_cosmos/tests/in_memory_emulator_tests/end_to_end.rs | Updates emulator E2E tests for runtime-based construction and new operation-options wiring. |
| sdk/cosmos/azure_data_cosmos/tests/in_memory_emulator_tests/dual_backend.rs | Adapts dual-backend test harness to create_driver and new TLS validation policy type. |
| sdk/cosmos/azure_data_cosmos/tests/in_memory_emulator_tests/driver_end_to_end.rs | Updates driver-focused emulator tests for per-driver fault injection and create_driver. |
| sdk/cosmos/azure_data_cosmos/tests/framework/test_client.rs | Updates shared SDK test framework to configure emulator TLS via CosmosRuntime and new FI registration API. |
| sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_proxy.rs | Reworks proxy test to configure proxy + TLS validation via runtime connection pool options. |
| sdk/cosmos/azure_data_cosmos/tests/emulator_tests/cosmos_backup_endpoints.rs | Updates backup-endpoint test (but currently drops explicit emulator TLS runtime wiring). |
| sdk/cosmos/azure_data_cosmos/src/runtime.rs | Introduces the SDK-level CosmosRuntime wrapper over the driver runtime and its builder. |
| sdk/cosmos/azure_data_cosmos/src/options/mod.rs | Re-exports additional driver option types to support runtime/client configuration without direct driver dependency. |
| sdk/cosmos/azure_data_cosmos/src/lib.rs | Exposes CosmosRuntime / CosmosRuntimeBuilder at the crate root. |
| sdk/cosmos/azure_data_cosmos/src/fault_injection/mod.rs | Updates fault-injection module docs to reflect the new per-driver FI registration flow. |
| sdk/cosmos/azure_data_cosmos/src/constants.rs | Removes an SDK env-var constant that’s now consolidated under AZURE_COSMOS_PPCB_*. |
| sdk/cosmos/azure_data_cosmos/src/clients/cosmos_client_builder.rs | Major refactor: runtime attachment, per-client defaults, partition-failover options, per-driver FI & throughput groups. |
| sdk/cosmos/azure_data_cosmos/docs/sdk-to-driver-cutover.md | Updates design notes/docs for the new FI wiring and runtime/client split. |
| sdk/cosmos/azure_data_cosmos/docs/ConfigurationOptions.md | Updates configuration docs to match the new “defaults via operation options/runtime” model. |
| sdk/cosmos/azure_data_cosmos/CHANGELOG.md | Documents new runtime surface and migration notes for builder/config changes. |
| sdk/cosmos/azure_data_cosmos/Cargo.toml | Removes the allow_invalid_certificates feature from the SDK crate metadata/features. |
| sdk/cosmos/azure_data_cosmos_driver/tests/multi_write_tests/driver_partition_failover.rs | Moves PPCB enable/tuning tests to PartitionFailoverOptions (driver-level). |
| sdk/cosmos/azure_data_cosmos_driver/tests/multi_region_tests/driver_partition_failover.rs | Same as above for multi-region scenarios. |
| sdk/cosmos/azure_data_cosmos_driver/tests/multi_region_failover.rs | Updates multi-region failover tests for per-driver FI and create_driver. |
| sdk/cosmos/azure_data_cosmos_driver/tests/in_memory_emulator_tests/throttling.rs | Updates in-memory emulator throttling tests for runtime defaults + per-driver FI. |
| sdk/cosmos/azure_data_cosmos_driver/tests/in_memory_emulator_tests/hedging.rs | Updates hedging tests for driver-only options (PartitionFailoverOptions, create_driver). |
| sdk/cosmos/azure_data_cosmos_driver/tests/in_memory_emulator_tests/excluded_regions_fallback.rs | Switches to create_driver + DriverOptions construction. |
| sdk/cosmos/azure_data_cosmos_driver/tests/in_memory_emulator_tests/error_diagnostics.rs | Switches to create_driver + DriverOptions construction. |
| sdk/cosmos/azure_data_cosmos_driver/tests/in_memory_emulator_tests/account_metadata_refresh.rs | Switches to create_driver + DriverOptions construction. |
| sdk/cosmos/azure_data_cosmos_driver/tests/gateway_query_plan_comparison.rs | Switches to create_driver + DriverOptions construction. |
| sdk/cosmos/azure_data_cosmos_driver/tests/framework/test_client.rs | Refactors driver test harness to apply preferred regions/FI/PPCB via per-operation DriverOptions. |
| sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_partition_failover.rs | Moves PPCB enable/tuning to PartitionFailoverOptions and new harness entrypoint. |
| sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_backup_endpoints.rs | Updates to use create_driver + DriverOptions. |
| sdk/cosmos/azure_data_cosmos_driver/tests/emulator_tests/driver_account_metadata_failover.rs | Updates docs/comments and expectations for create_driver bootstrap path. |
| sdk/cosmos/azure_data_cosmos_driver/src/options/throughput_control.rs | Adds resolved throughput-control struct and registry merge support. |
| sdk/cosmos/azure_data_cosmos_driver/src/options/policies.rs | Renames/reshapes TLS validation policy enum and emulator detection hook. |
| sdk/cosmos/azure_data_cosmos_driver/src/options/partition_failover.rs | Adds new driver-level PartitionFailoverOptions with env-var parsing under AZURE_COSMOS_PPCB_*. |
| sdk/cosmos/azure_data_cosmos_driver/src/options/operation_options.rs | Introduces nested ThroughputControlOptions and removes PPCB knobs from per-operation options. |
| sdk/cosmos/azure_data_cosmos_driver/src/options/mod.rs | Wires new option modules/exports (PartitionFailoverOptions, ThroughputControlOptions, ServerCertificateValidation). |
| sdk/cosmos/azure_data_cosmos_driver/src/options/identity.rs | Adjusts identity docs (minor cleanup). |
| sdk/cosmos/azure_data_cosmos_driver/src/options/env_parsing.rs | Improves validation error field naming across multiple env-var groups. |
| sdk/cosmos/azure_data_cosmos_driver/src/options/driver_options.rs | Expands DriverOptions to carry per-driver UA suffix, FI rules, throughput groups, PPCB tuning. |
| sdk/cosmos/azure_data_cosmos_driver/src/options/connection_pool.rs | Renames emulator TLS toggle to ServerCertificateValidation and updates env-var mapping. |
| sdk/cosmos/azure_data_cosmos_driver/src/models/cosmos_operation.rs | Updates docs/examples to use create_driver(DriverOptions...). |
| sdk/cosmos/azure_data_cosmos_driver/src/in_memory_emulator/client.rs | Updates docs/examples to use create_driver(DriverOptions...). |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/transport/mod.rs | Switches emulator TLS decision logic to new ServerCertificateValidation API. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/runtime.rs | Removes driver cache, renames operation-defaults APIs, changes UA storage to Arc, removes runtime-level FI/groups. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/routing/routing_systems.rs | Replaces internal partition-failover config with PartitionFailoverOptions getters. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/routing/partition_endpoint_state.rs | Removes internal PartitionFailoverConfig and threads PartitionFailoverOptions through routing state. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/routing/location_state_store.rs | Threads new partition-failover options type through store + failback loop. |
| sdk/cosmos/azure_data_cosmos_driver/src/driver/pipeline/operation_pipeline.rs | Switches from group snapshot to resolved throughput-control header inputs per request. |
| sdk/cosmos/azure_data_cosmos_driver/README.md | Updates README, but currently includes literal diff markers in a code block. |
| sdk/cosmos/azure_data_cosmos_driver/docs/PARTITION_LEVEL_FAILOVER_SPEC.md | Updates spec snippet for renamed runtime default-operation-options accessor. |
| sdk/cosmos/azure_data_cosmos_driver/CHANGELOG.md | Documents the driver-side option layering restructure and breaking API changes. |
| sdk/cosmos/azure_data_cosmos_driver/ARCHITECTURE.md | Updates architecture docs for create_driver and no driver caching. |
| sdk/cosmos/azure_data_cosmos_benchmarks/src/lib.rs | Updates benchmarks to use create_driver(DriverOptions...). |
| sdk/cosmos/.github/skills/cosmos-pre-commit-validation/SKILL.md | Updates local test command examples to remove the deleted allow_invalid_certificates feature. |
Copilot's findings
- Files reviewed: 54/54 changed files
- Comments generated: 5
Restructured the SDK and driver CHANGELOG entries for the runtime/options refactor so the whole branch reads as one Features Added bullet (with sub-bullets per option group) and one Breaking Changes bullet (with a catalog of every renamed / removed / relocated public surface) per crate, matching the "1-2 entries per PR" repo convention. - azure_data_cosmos (0.35.0): - Restored the cross-regional hedging (Azure#4432) and throttling-retry (Azure#4544) entries that earlier passes had dropped, updating their CosmosClientBuilder API references to use with_default_operation_options instead of the removed with_operation_options / with_throttling_retry_options setters. - Folded the previous 4 Features Added entries (runtime types, re-exports, PartitionFailoverOptions, ThroughputControlOptions) into one bullet with sub-sections for runtime/client setters, re-exports, and the new nested throughput_control group. - Folded the slimmed-builder Breaking Changes into one entry that lists every removed / renamed setter side-by-side, plus the allow_invalid_certificates feature removal with its ServerCertificateValidation::RequiredUnlessEmulator migration path. - azure_data_cosmos_driver (0.4.0): - Updated the existing Azure#4432 entry to reference PartitionFailoverOptions::consecutive_hedge_win_threshold (the field's new location) instead of the now-removed OperationOptions field. - Folded the previous 3 Features Added entries (DriverOptionsBuilder overrides, PartitionFailoverOptions, ThroughputControlOptions) into a single bullet with sub-sections. - Folded the seven Breaking Changes entries about the restructure (runtime rename, driver cache, runtime FI, runtime TCG registry, throughput_control_group removal, OperationOptions PPCB-field removals, cert-validation rename, env-var renames) into one Migration impact bullet with sub-sections. Retained the pre-existing entries for resolve_container_by_rid (Azure#4506) and PartitionKey::EMPTY removal. Also drops the stale "merges with runtime-layer registry" rustdoc on CosmosClientBuilder::register_throughput_control_group; the runtime registry no longer exists, so the doc now describes the driver-only scope with build()-time duplicate detection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
55bbff6 to
52d7c7e
Compare
- Remove the `CosmosClientBuilder::with_throttling_retry_options` helper that the CHANGELOG already documented as removed. Throttle retry configuration goes through `with_default_operation_options` (or a `CosmosRuntime` for process-wide defaults). - Clean up unresolved `-`/`+` diff markers in the driver README's Usage example. - Promote `allow_invalid_certificates` from an env-var-only hack into an explicit `TestOptions` opt-in (`TestOptions::for_emulator()`). Each emulator-only test in `tests/emulator_tests/` now opts into the relaxed runtime explicitly; live and multi-write tests are untouched. The existing `AZURE_COSMOS_CONNECTION_STRING=emulator` shorthand still auto-relaxes for backward compatibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
52d7c7e to
33191ba
Compare
🟡 PR Deep Review · 5 inline findingsReviewed this runtime/config refactor in depth, including the history of #4147 (driver cache), #4252 (TLS feature gating), and #4156 (PPCB env vars). Overall this is a strong, well-executed breaking refactor -- the env rename is complete and consistent, the new
Items 1 and 2 are the ones I would most want resolved or explicitly signed off before merge. Separately: the existing Copilot comment that |
9701a12 to
971fa3b
Compare
kundadebdatta
left a comment
There was a problem hiding this comment.
LGTM. We can discuss the allow_invalid_certificates separately.
|
Yep, it's straightforward to add back. |
Integrate the Public API Reorganization (#4512) and runtime/options restructure (#4588) from main with the RID-addressing work on this branch. Conflict resolutions: - lib.rs / cosmos_client.rs / container_client.rs: adopt main's reorganized module layout and import grouping, re-adding the RID surface (resource_identity module, ResourceId/ResourceIdentity, ResourceIdentity imports). - database_client.rs: keep the RID-aware resource_id() short-circuit; route throughput methods through it and delegate the missing-_rid error path to main's resource_id_or_error helper. - cosmos_status.rs: drop the duplicate 20306 status; main's generic SERVICE_RETURNED_OBJECT_WITHOUT_RID is now canonical. - cosmos_driver.rs: keep fetch_container_by_rid; adopt main's fallible CosmosDriver::new signature. - CHANGELOGs: combine both crates' Unreleased entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The two from_options_env_override_* tests referenced OperationOptions::per_partition_circuit_breaker_enabled and PartitionFailoverConfig::from_options, both removed by #4588 when PPCB enablement moved to driver-level PartitionFailoverOptions. They were dead code carried in by the main merge and broke 'cargo clippy --all-targets' (lib test) compilation.
…view) Addresses analogrelay review comment on RuntimeEnvConfig: 'Might want to align this with changes coming in #4588.' Removes the bespoke RuntimeEnvConfig CosmosOptions macro struct and routes AZURE_COSMOS_CPU_REFRESH_INTERVAL_MS through the shared parse_duration_millis_from_env helper, matching how #4588's driver-level PartitionFailoverOptions builder reads its duration env vars. All duration env vars now resolve on one path. Drops the now-obsolete RuntimeEnvConfig field-mapping unit test (the helper is covered by env_parsing tests) and updates the driver CHANGELOG to reflect the actual consolidation scope.
…Options (PR #4562) The #4588 merge moved PPCB enablement off OperationOptions onto driver-level PartitionFailoverOptions, which dropped this PR's PPCB _OVERRIDE kill switch while the changelogs/docs still advertised it. This restores the feature on its new home. Adds PartitionFailoverOptions::circuit_breaker_enabled_override (env AZURE_COSMOS_PPCB_ENABLED_OVERRIDE, lenient boolean via new parse_optional_bool_from_env helper) plus the matching builder setter. The override is authoritative over BOTH the circuit_breaker_enabled option and the account property enable_per_partition_failover_behavior, applied at the two effective-PPCB resolution sites (PartitionEndpointState::new and LocationStateStore account-property refresh). Updates both CHANGELOGs and ConfigurationOptions.md to the correct env-var name/scope, and adds unit tests covering both override directions at the options and routing-state layers.
…nges The merge-preview against origin/main surfaced pre-existing breakage: #4588 renamed CosmosDriverRuntime::get_or_create_driver to create_driver (now taking only DriverOptions, account embedded), made PartitionFailoverOptions fields private, and removed the per-operation PPCB knobs from OperationOptions. Several inherited driver tests still used the old APIs and only compiled behind feature gates, so CI's --all-features --all-targets clippy fails. Fixes: migrate the three create_driver call sites to the new single-arg signature; migrate the PPCB override tests to configure thresholds via driver-level PartitionFailoverOptions (new build_ppcb_fixture with a 1-failure threshold; build_fixture delegates with None so non-PPCB tests are unchanged); cast write_failure_threshold() (u32) to i32 for the write_failure_count field; add a test-only PartitionFailoverConfig alias to keep inherited operation_pipeline tests compiling.
…ture Addresses @analogrelay's review on #4623. The dataflow pipeline fans a query into per-physical-partition sub-operations and stamps the owning partition_key_range_id (plus narrowed feed range / partition key) onto OperationOverrides rather than mutating the shared CosmosOperation. pre_resolve_partition_key_range_id previously inspected only the CosmosOperation, so for EPK-range query sub-ops it re-resolved overlapping ranges from scratch and could collapse to None on a multi-range match -- silently dropping the PPCB/PPAF seed. Fix: pass OperationOverrides into pre_resolve_partition_key_range_id and consult it first -- (1) use overrides.partition_key_range_id directly when present (no cache lookup, no multi-range collapse), (2) else resolve a logical partition key from overrides/operation, (3) else resolve an EPK range from overrides/operation, seeding only on a single owning partition. Also fixes the W2 in-memory-emulator failure: the non-PPCB region-failover fixture (build_fixture) was left on the driver-default circuit_breaker_enabled=true after the #4588 PPCB-config migration, so pre-resolution issued an unfaulted pkranges read to the primary that polluted the recorded host list. build_fixture now explicitly disables PPCB; build_ppcb_fixture keeps it enabled with a 1-failure threshold.
PRs #4588 and #4590 had "semantic" merge conflicts. They didn't actually conflict with each other at the diff level, but new tests added by #4590 used APIs that #4588 renamed or moved. Because we don't enforce up-to-date branches before merge, they both landed in main and broke it. This PR resolves that conflict and fixes the tests from #4590 to use the correct APIs.
…PPCB, plus `AZURE_COSMOS_HEDGING_ENABLED` master switch (#4562) ## Summary Adds environment-driven enablement controls for two availability features — cross-region read hedging and the per-partition circuit breaker (PPCB) — on top of the existing programmatic configuration, plus an incident `_OVERRIDE` kill switch for each. - **Hedging** gains a master switch `AZURE_COSMOS_HEDGING_ENABLED` (env layer of the normal `OperationOptions` layering) and a top-priority kill switch `AZURE_COSMOS_HEDGING_ENABLED_OVERRIDE`. Both are implemented generically in the `CosmosOptions` derive via `#[option(env = "...", overridable)]`. - **PPCB** enablement is a driver-level `PartitionFailoverOptions` concern (since #4588), so its kill switch is a dedicated `PartitionFailoverOptions::circuit_breaker_enabled_override` field, set via `AZURE_COSMOS_PPCB_ENABLED_OVERRIDE`. When set it is authoritative over **both** the `circuit_breaker_enabled` option and the server account property `enable_per_partition_failover_behavior`. Hedging and PPCB remain enabled by default; all new variables are inert unless set. ## Resolution layering - **Hedging** (`hedging_enabled`): `{ENV}_OVERRIDE → operation → account → runtime → {ENV}`. - **PPCB** (`circuit_breaker_enabled_override`): override wins over `circuit_breaker_enabled` option **and** the account property. (Not part of the per-operation layering — PPCB enablement is driver-level.) All `_OVERRIDE` values are read once at runtime-build time; flipping mid-incident requires a process restart. Booleans parse leniently (`true/false`, `1/0`, `yes/no`, `on/off`); an unrecognized value is logged and ignored. ## Changes ### `azure_data_cosmos_macros` - New `overridable` field flag (`#[option(env = "...", overridable)]`) → auto-generates `{ENV}_OVERRIDE` parsing + a top-priority `env_override` view layer via `new_with_override`. - New `#[options(env_only)]` struct mode → generates only `from_env()`/ `from_env_vars()` (no View/Builder/Default), letting an existing builder type double as its own env source. - New `#[option(env = "...", parser = path)]` attribute → custom `fn(&str) -> Option<T>` parsing (e.g. `Duration` from a millisecond count), with lenient None-is-ignored semantics. - Crate version `0.1.0` → `0.2.0`; driver depends on it by `path`. ### `azure_data_cosmos_driver` (core) - `OperationOptions::hedging_enabled: Option<bool>` (`#[option(env = "AZURE_COSMOS_HEDGING_ENABLED", overridable)]`); a new Priority-0 branch in `resolve_availability_strategy` evaluates it before `availability_strategy`. - `PartitionFailoverOptions::circuit_breaker_enabled_override` (env `AZURE_COSMOS_PPCB_ENABLED_OVERRIDE`), applied at the two effective-PPCB resolution sites (`PartitionEndpointState::new` and the `LocationStateStore` account-property refresh). - Options-layering cleanup (per review): `RuntimeEnvConfig`, `DiagnosticsEnvConfig`, and `ConnectionPoolEnvConfig` removed — the builders now read env directly (via `env_only` + `parser`) or through the shared `parse_duration_millis_from_env` helper. Malformed env values are warn-and- ignored (fail-soft); bounds violations still hard-error. ### Documentation - `ConfigurationOptions.md` `_OVERRIDE` table updated with both switches and a note that PPCB's override is driver-level (outside the per-operation layering). - CHANGELOG entries in `azure_data_cosmos`, `azure_data_cosmos_driver`, and `azure_data_cosmos_macros`. ## Out of scope - PPAF (server-driven) enablement is unchanged. - The multi-region eligibility gate (`should_hedge`) and default threshold are unchanged. --------- Co-authored-by: kundadebdatta <kunda.debdatta@microsoft.com>
Promotes the driver's
CosmosDriverRuntimeto a first-class SDK concept asCosmosRuntime, and re-shapesCosmosClientBuilderso that per-process concerns (transport, TLS, proxy, UA defaults) live on the runtime while per-client concerns (operation defaults, fault injection, throughput-control groups, partition-failover tuning) stay on the builder. Along the way, a handful of related config knobs are consolidated into proper nested option groups (PartitionFailoverOptions,ThroughputControlOptions), the partition-failover env vars get a singleAZURE_COSMOS_PPCB_*prefix, and theEmulatorServerCertValidationenum is renamed and reshaped to a saferServerCertificateValidationwith an emulator-awareRequiredUnlessEmulatorpolicy.See the
azure_data_cosmosandazure_data_cosmos_driverCHANGELOGs for the full surface-level migration catalog, including every renamed, removed, and relocated public item.