Skip to content

Add CosmosDB control queue transport for node agent communication#2312

Merged
jeremydmiller merged 3 commits into
mainfrom
cosmos-control-queue
Mar 16, 2026
Merged

Add CosmosDB control queue transport for node agent communication#2312
jeremydmiller merged 3 commits into
mainfrom
cosmos-control-queue

Conversation

@jeremydmiller

@jeremydmiller jeremydmiller commented Mar 16, 2026

Copy link
Copy Markdown
Member

Closes #2311

Summary

  • Adds a CosmosDB-based control transport (cosmoscontrol:// protocol) for direct node-to-node agent communication, mirroring the existing DatabaseControlTransport pattern used by RDBMS persistence
  • Stores control messages as documents in the existing wolverine container using partition keys of control-{nodeId} for efficient single-partition polling
  • Automatically registers the control transport in CosmosDbMessageStore.Initialize() when running in DurabilityMode.Balanced
  • Adds dual [JsonProperty] and [JsonPropertyName] attributes to CosmosDB document types for STJ/Newtonsoft compatibility
  • Adds LeadershipElectionCompliance tests in a dedicated CosmosDbTests.LeaderElection project

Implementation

  • CosmosDbControlTransportITransport implementation managing per-node endpoints
  • CosmosDbControlEndpointEndpoint subclass creating listeners/senders per node
  • CosmosDbControlListenerIListener polling CosmosDb every 1 second for control messages
  • CosmosDbControlSenderISender writing control message documents with 30-second TTL
  • ControlMessage — Document model stored in the shared wolverine container

Test plan

  • All LeadershipElectionCompliance tests pass individually (10/12 pass, 2 are timing-sensitive and pass on retry)
  • All 55 existing CosmosDbTests pass with 0 failures
  • Leadership tests moved to separate CosmosDbTests.LeaderElection project to avoid flakiness affecting core tests

🤖 Generated with Claude Code

jeremydmiller and others added 3 commits March 16, 2026 15:01
Introduces a CosmosDB-based control transport that enables node-to-node
communication using the existing wolverine container with partition-based
message routing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…uments

Fixes CosmosDB 400/1001 (partition key mismatch) errors when the
CosmosClient is configured with System.Text.Json instead of
Newtonsoft.Json. All document types now carry both [JsonProperty] and
[JsonPropertyName] attributes so property names serialize to camelCase
regardless of which serializer is active.

Adds serialization tests verifying both serializers produce identical
property names for all CosmosDB document types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the LeadershipElectionCompliance tests from CosmosDbTests into a
dedicated CosmosDbTests.LeaderElection project under LeaderElection/,
matching the pattern used by PostgreSQL, SQL Server, MySQL, and RavenDb
leadership election tests. This prevents flaky leadership tests from
affecting the core CosmosDb compliance test suite.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a control queue option for CosmosDb

1 participant