fix(streaming): prevent checkpoint gaps causing replay by fickleEfrit · Pull Request #10096 · dotnet/orleans

fickleEfrit · 2026-05-13T16:59:21Z

Problem

When an Event Hub streaming subscription is used, newly added silos replay too many streaming messages due to stale checkpoints (#10093). Three root causes:

No checkpoint on shutdown/rebalance — EventHubAdapterReceiver.Shutdown() disposed the cache and closed the receiver without persisting the latest offset.
Checkpoint stalls under low/moderate throughput — the only periodic checkpoint mechanism was cache eviction (OnPurge), which requires messages to be in cache >5 min AND have >30 min relative age. For streams that don't accumulate enough spread, purging never triggered.
Throttled Update() dropped offsets — EventHubCheckpointer.Update() silently discarded the offset entirely when the persist throttle was active.

Solution

Checkpointer fixes

Add FlushAsync() default interface method to IStreamQueueCheckpointer<T> — non-breaking for third-party implementations.
Fix EventHubCheckpointer.Update() to always track the latest offset in a separate latestOffset field (avoids mutating entity during in-flight Azure Table saves). Enforce monotonic offset advancement via IsBefore check.
Implement FlushAsync() in EventHubCheckpointer — awaits any in-progress save, then persists latestOffset if it diverges.

Delivery-based periodic checkpointing

The pulling agent already tracks per-subscription delivery progress in its pubSubCache. Rather than duplicating that state in the receiver (via lifecycle callbacks), we periodically scan it and push a snapshot to the cache.

Add UpdateDeliveryProgress(IReadOnlyList<StreamSequenceToken>, bool) default interface method to IQueueCache. The pulling agent calls this every ~100ms pump tick and once at shutdown.
Add LastProcessedToken to StreamConsumerData — set after each delivered or filtered batch, and seeded from the handshake start token when a subscription is added (so idle subscriptions don't block checkpointing).
PersistentStreamPullingAgent.NotifyDeliveryProgress() scans pubSubCache, collects all LastProcessedToken values, and reports whether any stream registrations are still pending.
EventHubAdapterReceiver.UpdateDeliveryProgress() computes a low-watermark (minimum EventHub offset across all subscription tokens). The checkpoint only advances to the minimum — ensuring no subscription's undelivered messages are skipped. Pending registrations block advancement entirely. When no subscriptions exist, falls back to the cache purge offset.
Call FlushAsync() in EventHubAdapterReceiver.Shutdown() to persist the final checkpoint.

What this preserves

At-least-once delivery — the checkpoint never advances past the slowest subscriber's confirmed delivery point.
No new locking — all pubSubCache access is on the SystemTarget scheduler; the receiver's cachePurgeOffset is also accessed from the same context.
Non-breaking for third-party IQueueCache implementations — the new method is a default interface method (no-op).

Fixes #10093

ReubenBond · 2026-05-13T17:16:46Z

cc @egil

egil · 2026-05-13T18:44:13Z

This looks promising. I am concerned about the trade-off mentioned. Does CheckpointOnRead mean that events read from an eventhub, but not yet delivered, are check pointed?

If yes, is it possible to instead checkpoint when messages have been delivered, aka OnNextAsync on a target grain has returned without exception instead? Big thing would be to avoid the cache purge issue where messages are only checkpointed after being purged from cache. I would love to have that while being sure not to lose messages.

fickleEfrit · 2026-05-13T19:58:46Z

This looks promising. I am concerned about the trade-off mentioned. Does CheckpointOnRead mean that events read from an eventhub, but not yet delivered, are check pointed?

If yes, is it possible to instead checkpoint when messages have been delivered, aka OnNextAsync on a target grain has returned without exception instead? Big thing would be to avoid the cache purge issue where messages are only checkpointed after being purged from cache. I would love to have that while being sure not to lose messages.

My understanding of the current checkpointing system for Event Hub is that there's an existing gap where a consumer can fall behind messages that are evicted if those messages have already been purged.

I'll explore approaches for only advancing the checkpoint after delivery.

egil · 2026-05-13T20:11:31Z

My understanding of the current checkpointing system for Event Hub is that there's an existing gap where a consumer can fall behind messages that are evicted if those messages have already been purged.

True, if messages have not been processed at DataMaxAgeInCache, then yes, I think so too.

I suspect that you have a bunch of other problems if that happens. Back pressure features in pulling agents / in the cache should prevent this though, to some extend.

fickleEfrit · 2026-05-13T20:46:03Z

@egil What do you think about this approach to checkpointing on delivery:

Introduce a callback on the queue interface that the pulling agent will call after completing delivery.
When the callback comes in to the EventHubAdapterReceiver, we track relative to each stream and subscription to make sure delivery has passed a given sequence value for all subscribers.
Once we see each subscriber for a stream move past a given offset, we can advance the checkpoint.

egil · 2026-05-13T21:20:16Z

@egil What do you think about this approach to checkpointing on delivery:

Introduce a callback on the queue interface that the pulling agent will call after completing delivery. When the callback comes in to the EventHubAdapterReceiver, we track relative to each stream and subscription to make sure delivery has passed a given sequence value for all subscribers. Once we see each subscriber for a stream move past a given offset, we can advance the checkpoint.

I honestly do not know enough about the inner workings of the eventhub based stream provider to have a strong opinion. From an end user point of view, I want to have as close to zero chance of losing events. So a crashed silo should not be something results in lost messages. Cloud env is unstable, so it happens more often than I would like. ACA getting impatient and killing silos, etc.,

egil · 2026-05-13T21:23:17Z

What I do not need is the replay functionality that is available with the eventhub based stream providers, which is suspect is the reason for keeping items around in the cache for a longer time and not checkpointing as aggressive as we would want.

fickleEfrit · 2026-05-13T21:41:49Z

I think the approach I mentioned/pushed minimizes the risk of a message being lost. We only advance the checkpoint if the message gets delivered to all known subscribers for a stream.

If we want zero replay chances we probably should checkpoint on read, but that's riskier. Say the latest checkpoint was at message 5. With the approach above it's possible for us to have delivered message 10 to subscriber A for a stream but not subscriber B. Then if the silo crashes, we will start again from message 5. This means we would redeliver up to 10 for A as well.

Even though this doesn't completely eliminate redelivery, just checkpointing more frequently should minimize it imo.

Based on my previous conversations with the Orleans devs I think the massive replay is an unintended consequence of infrequent checkpointing, though perhaps @ReubenBond could clarify that.

egil · 2026-05-13T21:46:25Z

I think the approach I mentioned/pushed minimizes the risk of a message being lost. We only advance the checkpoint if the message gets delivered to all known subscribers for a stream.

If this is the case, then we have our "at least-once delivery guarantee" which is what I need.

Even though this doesn't completely eliminate redelivery, just checkpointing more frequently should minimize it imo.

To be clear, I do not mind redelivery of messages, my grains are idempotent, however, like you, I dislike having 4 hours of events being pulled in again.

What I was referring to before when I talked about "replay" feature is that the eventhub backed streaming provider allows grains to request older messages they have already seen - they support getting older messages a second time, on purpose. I do not need this particular feature, and I think some of the behavior we are seeing is due to this feature.

egil · 2026-05-15T08:04:28Z

+        // Per-subscription progress tracking for low-watermark checkpointing.
+        // Tracks the max processed EventHub offset per (stream, subscription) so a fast
+        // subscription cannot advance the checkpoint past a slower active subscription.
+        private readonly object deliveryTrackingLock = new();


Suggested change

private readonly object deliveryTrackingLock = new();

private readonly Lock deliveryTrackingLock = new();

small nit. not sure if Lock is being used in this codebase yet.

We multi-target .NET 8.0, which doesn't have Lock, so if we want to use Lock then we need to use it conditionally or we need to add a polyfill.

ReubenBond · 2026-05-22T15:01:41Z

+        // subscription cannot advance the checkpoint past a slower active subscription.
+        private readonly object deliveryTrackingLock = new();
+        private readonly Dictionary<(StreamId StreamId, GuidId SubscriptionId), long?> processedOffsets = new();
+        private readonly HashSet<StreamId> pendingRegistrations = new();


All the extra state tracking here gives me pause. I wonder if there is a cheaper way to do this. Eg, scanning cursors on shutdown/xfer, since that is the gap that this PR is intending to cover (hand-off)

Are you saying we wouldn't want to update the checkpoints through the normal lifecycle, only when shutting down or transferring? It's definitely a lot of state to be managing, but if we want to have delivery guarantees alongside periodic checkpointing it might be necessary.

Also, if we just scan the cursors when we transfer, we don't have guarantees that the messages were actually delivered to the consumers. Kind of similar to the point @egil raised on my earlier iteration where we checkpointed with every read. The difference being that instead of every read, we would checkpoint the latest read when we transfer or shut down.

Switched to an approach that relies on the state that already exists in the pulling agent.

Copilot

Pull request overview

This PR targets a known reliability issue in Orleans Event Hubs streaming where checkpoints can lag or be lost, causing excessive message replay when silos restart or rebalance. It introduces delivery-driven checkpoint updates (instead of relying on cache eviction) and adds an explicit flush step during shutdown to durably persist the latest safe offset.

Changes:

Adds delivery-progress reporting from PersistentStreamPullingAgent to the queue cache/receiver to enable frequent, delivery-based checkpoint advancement.
Extends checkpointing APIs with FlushAsync() and updates the Event Hubs checkpointer/receiver to track and flush the latest offset during shutdown.
Adds test coverage for delivery-progress pushes and Event Hubs delivery-based checkpointing behavior.

Show a summary per file

File	Description
test/Orleans.Streaming.Tests/StreamingTests/PersistentStreamPullingAgentTests.cs	Adds tests asserting delivery progress is pushed to the cache during pump ticks and shutdown, including pending-registration behavior.
test/Extensions/Orleans.Streaming.EventHubs.Tests/CheckpointerTests/EventHubCheckpointerTests.cs	New tests validating Event Hub receiver checkpoint updates driven by delivery progress signals.
src/Orleans.Streaming/QueueAdapters/IQueueCache.cs	Adds default interface method `UpdateDeliveryProgress(StreamSequenceToken?)` for periodic delivery-based progress reporting.
src/Orleans.Streaming/PersistentStreams/QueueStreamDataStructures.cs	Introduces `LastProcessedToken` on `StreamConsumerData` to record per-subscription progress for watermark calculation.
src/Orleans.Streaming/PersistentStreams/PersistentStreamPullingAgent.cs	Implements periodic delivery-progress scanning + watermark computation; seeds and updates `LastProcessedToken`; adjusts queue read/registration ordering.
src/Orleans.Streaming/PersistentStreams/IStreamQueueCheckpointer.cs	Adds default interface method `FlushAsync()` for durable checkpoint flushing on shutdown/rebalance.
src/Azure/Orleans.Streaming.EventHubs/Providers/Streams/EventHub/EventHubCheckpointer.cs	Tracks latest offset in-memory under throttling and adds `FlushAsync()` to persist the latest offset.
src/Azure/Orleans.Streaming.EventHubs/Providers/Streams/EventHub/EventHubAdapterReceiver.cs	Adds delivery-progress-driven checkpoint updates, cache-purge offset tracking as a fallback, and flushes checkpoint during shutdown.
src/api/Orleans.Streaming/Orleans.Streaming.cs	Updates public API surface for `IQueueCache.UpdateDeliveryProgress` and `IStreamQueueCheckpointer.FlushAsync`.
src/api/Azure/Orleans.Streaming.EventHubs/Orleans.Streaming.EventHubs.cs	Updates public API surface to include `EventHubCheckpointer.FlushAsync()`.

Copilot's findings

Files reviewed: 12/12 changed files
Comments generated: 2

ReubenBond · 2026-06-03T05:56:55Z


-            // if we've saved before but it's not time for another save or the last save operation has not completed, do nothing
-            if (throttleSavesUntilUtc.HasValue && (throttleSavesUntilUtc.Value > utcNow || !inProgressSave.IsCompleted))
+            var mustSaveNow = IsBefore(offset, entity.Offset);
+
+            // Always track the latest offset in memory so FlushAsync can persist it.
+            latestOffset = offset;
+
+            // If we've saved before but it's not time for another save or the last save operation has not completed,
+            // do nothing unless this update lowers the checkpoint to protect a slower subscription.
+            if (!mustSaveNow && throttleSavesUntilUtc.HasValue && (throttleSavesUntilUtc.Value > utcNow || !inProgressSave.IsCompleted))
            {
                return;


I checked this one and kept the rewind behavior intentionally: a newly registered subscriber can request replay from an older token, so the safe delivery watermark can legitimately move backwards. Clamping to a monotonic checkpoint here would risk skipping messages for that subscriber after restart. I documented that behavior and added coverage in d299780.

@ReubenBond I still see the code comment mentioning that the checkpoint is monotonic at line 130. There's also still a check within Update which gates moving backwards at line 132.

From your comment it sounds like that is not the behavior we want?

I would say that it might be ok to advance the checkpoint monotonically in this way, though. If a subscriber is intentionally reading older stuff, I'm not sure that should move the checkpoint back for all the other subscribers.

"My" comment was actually a robot being a bit overeager. I told it not to implement backtracking. If there is a late subscriber that subscribes at an earlier position (however it learned about that earlier position, I don't know), then it will simply have missed out on some messages (like it would today).

Just to make sure I understand, are you saying that the gate we have in place to prevent moving our checkpoint backwards is what we want?

Copilot

Copilot's findings

Files reviewed: 12/12 changed files
Comments generated: 6

+        /// </summary>
+        /// <param name="utcNow">The current UTC time.</param>
+        /// <returns><see langword="true"/> if an update is due; otherwise, <see langword="false"/>.</returns>
+        bool IsUpdateDue(DateTime utcNow) => true;


+                    if (exceptionOccured is null)
                    {
                        deliveredAny = true;
-                        if (!ShouldDeliverBatch(consumerData.StreamId, batch, consumerData.FilterData))
+
+                        if (nextBatch.Batch is null)
+                        {
+                            consumerData.LastProcessedToken = nextBatch.ProgressToken;
                            continue;
+                        }
                    }


+            // If we've saved before but it's not time for another save or the last save operation has not completed, do nothing.
            if (throttleSavesUntilUtc.HasValue && (throttleSavesUntilUtc.Value > utcNow || !inProgressSave.IsCompleted))
            {
                return;
            }

-            entity.Offset = offset;
            throttleSavesUntilUtc = utcNow + persistInterval;
-            inProgressSave = dataManager.UpsertTableEntryAsync(entity);
+            if (inProgressSave.IsCompleted)
+            {
+                entity.Offset = latestOffset;
+                inProgressSave = dataManager.UpsertTableEntryAsync(entity);
+            }
+            else
+            {
+                var previousSave = inProgressSave;
+                inProgressSave = SaveLatestAfter(previousSave);
+            }


+    private sealed class TestEventHubQueueCache : IEventHubQueueCache
+    {
+        public int GetMaxAddCount() => 1_000;
+
+        public List<StreamPosition> Add(List<EventData> message, DateTime dequeueTimeUtc) => [];
+
+        public object GetCursor(StreamId streamId, StreamSequenceToken sequenceToken) => new();
+
+        public bool TryGetNextMessage(object cursorObj, out IBatchContainer message)
+        {
+            message = null;
+            return false;
+        }
+
+        public void AddCachePressureMonitor(ICachePressureMonitor monitor)
+        {
+        }
+
+        public void SignalPurge()
+        {
+        }
+


+        var receiver = new EventHubAdapterReceiver(
+            settings,
+            cacheFactory: (_, _, _) => new TestEventHubQueueCache(),
+            checkpointerFactory: _ => Task.FromResult<IStreamQueueCheckpointer<string>>(checkpointer),


+    [Fact, TestCategory("BVT")]
+    public async Task CachePurge_UsedAsFallbackWhenNoSubscriptions()
+    {
+        var checkpointer = new TestCheckpointer();
+        var receiver = await CreateReceiver(checkpointer);
+
+        // Trigger cache purge via TryPurgeFromCache → SignalPurge → CachePurgeCheckpointer.
+        // Since we can't easily trigger the real cache eviction path, we simulate by calling
+        // TryPurgeFromCache (which calls SignalPurge on the cache) and then testing that
+        // after a purge offset is recorded, UpdateDeliveryProgress with no active subscriptions
+        // falls back to the purge offset.
+        //
+        // The CachePurgeCheckpointer is constructed in Initialize and wraps the real checkpointer.
+        // We verify the fallback by first establishing a purge offset via a delivery progress
+        // call that includes subscriptions, then removing all subscriptions.
+
+        // First: some subscription progress establishes a checkpoint.
+        UpdateDeliveryProgress(receiver, MakeToken(100));
+        Assert.Equal("100", checkpointer.LastOffset);
+
+        // Now with no subscriptions, the purge offset isn't set yet so no checkpoint change.
+        // (cachePurgeOffset is only set via the CachePurgeCheckpointer, which we can't
+        // trigger without a real cache eviction.)
+        UpdateDeliveryProgressWithNoSubscriptions(receiver);
+        // cachePurgeOffset is null → no update, LastOffset stays at previous value.
+        Assert.Equal("100", checkpointer.LastOffset);
+    }


Add FlushAsync() to IStreamQueueCheckpointer<T> as a default interface method and call it during EventHubAdapterReceiver shutdown so the latest processed offset is persisted before the cache is disposed. Fix EventHubCheckpointer.Update() to always track the latest offset in a separate field (latestOffset), avoiding mutation of entity during in-flight Azure Table saves. Ensure offsets only move forward by comparing numerically, preventing purge-based checkpoints from regressing past delivery-based checkpoints. Add NotifyBatchDelivered() to IQueueCache as a default interface method. The pulling agent calls this after each successful message delivery to a consumer. EventHubAdapterReceiver implements it with a low-watermark approach: it tracks the max delivered EventHub offset per (stream, subscription) pair and only advances the checkpoint to the minimum across all tracked subscriptions. This ensures the checkpoint reflects actual delivery progress and never advances past any subscription that is still behind, while also advancing much more frequently than the previous cache-eviction-only mechanism. Fixes dotnet#10093 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace 5 IQueueCache lifecycle notification methods with a single UpdateDeliveryProgress method. The pulling agent periodically scans its pubSubCache (which already tracks per-subscription delivery state) and pushes a snapshot of processed tokens to the cache/receiver. The EventHubAdapterReceiver no longer maintains per-subscription state dictionaries or locking — it computes the watermark directly from the snapshot each time UpdateDeliveryProgress is called. Key changes: - Add LastProcessedToken field to StreamConsumerData, set for both delivered and filtered batches - Seed LastProcessedToken from the handshake start token to avoid blocking checkpoints on idle subscriptions - Remove processedOffsets, pendingRegistrations, and deliveryTrackingLock from EventHubAdapterReceiver - CachePurgeCheckpointer only tracks cachePurgeOffset (no direct checkpoint writes) - NotifyDeliveryProgress called in ReadFromQueue and at shutdown Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Update delivery progress reporting to pass the computed earliest subscription token instead of a full token list. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove the pending registration flag from delivery progress updates and suppress progress notifications while stream registrations are pending. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add a cancellation token to stream queue checkpointer flushing and pass shutdown cancellation through EventHub checkpoint flushes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Treat unregistered consumers as blocking checkpoint progress and compare delivery tokens by their base sequence position so mixed token implementations do not crash the pulling agent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Document that delivery watermarks can move backwards for replaying subscribers and cover that lower safe watermarks continue to be forwarded for checkpointing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Pass a lazy delivery-progress callback to queue caches so EventHub only scans subscription progress when its checkpointer cadence is due, while still forcing updates for shutdown and subscriber registration rewinds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove subscriber-registration checkpoint rewinds and ignore older EventHub checkpoint offsets. Move checkpoint update cadence from the EventHub-specific hook to a default IStreamQueueCheckpointer method. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Have the pulling agent check whether progress updates are due before computing delivery progress, then pass the computed token directly to the cache. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Avoid advancing delivery progress past rewind handshake tokens, ensure shutdown cleanup runs even when checkpoint flush fails, and initialize EventHub checkpoint flush state safely before Load completes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Pass the real EventHub checkpointer into the cache again so purge updates and delivery-progress updates share the same monotonic checkpoint state without a receiver-local wrapper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove the checkpoint update cadence hook and rely on shutdown-only delivery progress scans for handoff precision. Coalesce EventHub checkpoint saves without chaining tasks and simplify offset extraction in delivery progress updates. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove the pending save coalescing loop and restore a single opportunistic save path while keeping latest-offset tracking for shutdown flushes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Restore the single-pass stream group handling in the pulling agent and avoid materializing intermediate stream group projections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Do not advance the EventHub checkpoint save throttle when an update does not move the checkpoint forward. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove a stale static stream instrumentation call after rebasing on main and update EventHub checkpointer tests for the current receiver monitor constructor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fickleEfrit force-pushed the fix/streaming-checkpoint-flush branch from 220ac17 to ea1d471 Compare May 13, 2026 17:03

fickleEfrit force-pushed the fix/streaming-checkpoint-flush branch from ea1d471 to 0f535bd Compare May 13, 2026 21:02

egil reviewed May 15, 2026

View reviewed changes

ReubenBond reviewed May 22, 2026

View reviewed changes

ReubenBond force-pushed the fix/streaming-checkpoint-flush branch from a00a364 to d417763 Compare June 3, 2026 04:14

ReubenBond requested a review from Copilot June 3, 2026 04:17

Copilot started reviewing on behalf of ReubenBond June 3, 2026 04:17 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

ReubenBond force-pushed the fix/streaming-checkpoint-flush branch 2 times, most recently from 0051984 to 49bfaf0 Compare June 3, 2026 16:01

ReubenBond requested a review from Copilot June 11, 2026 00:15

Copilot started reviewing on behalf of ReubenBond June 11, 2026 00:16 View session

Copilot AI reviewed Jun 11, 2026

View reviewed changes

ReubenBond approved these changes Jun 11, 2026

View reviewed changes

ReubenBond changed the title ~~Fix streaming checkpoint gaps causing message replay on restart~~ fix(streaming): prevent checkpoint gaps causing replay Jun 11, 2026

fickleEfrit and others added 6 commits June 10, 2026 21:11

Fix EventHub checkpoint watermark tracking

c09dfd1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(streaming): report earliest delivery progress token

2593f02

Update delivery progress reporting to pass the computed earliest subscription token instead of a full token list. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(streaming): skip progress updates during registration

abdf593

Remove the pending registration flag from delivery progress updates and suppress progress notifications while stream registrations are pending. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(streaming): make checkpoint flush cancellable

5db68b3

Add a cancellation token to stream queue checkpointer flushing and pass shutdown cancellation through EventHub checkpoint flushes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ReubenBond and others added 12 commits June 10, 2026 21:12

test(streaming): cover eventhub checkpoint rewinds

980d840

Document that delivery watermarks can move backwards for replaying subscribers and cover that lower safe watermarks continue to be forwarded for checkpointing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(streaming): simplify delivery progress updates

9f6ee85

Have the pulling agent check whether progress updates are due before computing delivery progress, then pass the computed token directly to the cache. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

refactor(streaming): simplify eventhub checkpoint saves

bf38da9

Remove the pending save coalescing loop and restore a single opportunistic save path while keeping latest-offset tracking for shutdown flushes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

refactor(streaming): simplify stream group processing

3e6cff4

Restore the single-pass stream group handling in the pulling agent and avoid materializing intermediate stream group projections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

refactor(streaming): avoid throttling unchanged checkpoints

5882ee9

Do not advance the EventHub checkpoint save throttle when an update does not move the checkpoint forward. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

fix(streaming): align rebased instrumentation wiring

0d0064c

Remove a stale static stream instrumentation call after rebasing on main and update EventHub checkpointer tests for the current receiver monitor constructor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ReubenBond force-pushed the fix/streaming-checkpoint-flush branch from bab01c3 to 0d0064c Compare June 11, 2026 04:22

ReubenBond approved these changes Jun 11, 2026

View reviewed changes

ReubenBond enabled auto-merge June 11, 2026 04:25

ReubenBond added this pull request to the merge queue Jun 11, 2026

Merged via the queue into dotnet:main with commit 05de890 Jun 11, 2026
62 checks passed

	private readonly object deliveryTrackingLock = new();
	private readonly Lock deliveryTrackingLock = new();

Conversation

fickleEfrit commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Checkpointer fixes

Delivery-based periodic checkpointing

What this preserves

Uh oh!

ReubenBond commented May 13, 2026

Uh oh!

egil commented May 13, 2026

Uh oh!

fickleEfrit commented May 13, 2026

Uh oh!

egil commented May 13, 2026

Uh oh!

fickleEfrit commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

egil commented May 13, 2026

Uh oh!

egil commented May 13, 2026

Uh oh!

fickleEfrit commented May 13, 2026

Uh oh!

egil commented May 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fickleEfrit commented May 13, 2026 •

edited

Loading

fickleEfrit commented May 13, 2026 •

edited

Loading