Skip to content

fix(streaming): prevent checkpoint gaps causing replay#10096

Merged
ReubenBond merged 18 commits into
dotnet:mainfrom
fickleEfrit:fix/streaming-checkpoint-flush
Jun 11, 2026
Merged

fix(streaming): prevent checkpoint gaps causing replay#10096
ReubenBond merged 18 commits into
dotnet:mainfrom
fickleEfrit:fix/streaming-checkpoint-flush

Conversation

@fickleEfrit

@fickleEfrit fickleEfrit commented May 13, 2026

Copy link
Copy Markdown
Contributor

Problem

When an Event Hub streaming subscription is used, newly added silos replay too many streaming messages due to stale checkpoints (#10093). Three root causes:

  1. No checkpoint on shutdown/rebalanceEventHubAdapterReceiver.Shutdown() disposed the cache and closed the receiver without persisting the latest offset.
  2. Checkpoint stalls under low/moderate throughput — the only periodic checkpoint mechanism was cache eviction (OnPurge), which requires messages to be in cache >5 min AND have >30 min relative age. For streams that don't accumulate enough spread, purging never triggered.
  3. Throttled Update() dropped offsetsEventHubCheckpointer.Update() silently discarded the offset entirely when the persist throttle was active.

Solution

Checkpointer fixes

  • Add FlushAsync() default interface method to IStreamQueueCheckpointer<T> — non-breaking for third-party implementations.
  • Fix EventHubCheckpointer.Update() to always track the latest offset in a separate latestOffset field (avoids mutating entity during in-flight Azure Table saves). Enforce monotonic offset advancement via IsBefore check.
  • Implement FlushAsync() in EventHubCheckpointer — awaits any in-progress save, then persists latestOffset if it diverges.

Delivery-based periodic checkpointing

The pulling agent already tracks per-subscription delivery progress in its pubSubCache. Rather than duplicating that state in the receiver (via lifecycle callbacks), we periodically scan it and push a snapshot to the cache.

  • Add UpdateDeliveryProgress(IReadOnlyList<StreamSequenceToken>, bool) default interface method to IQueueCache. The pulling agent calls this every ~100ms pump tick and once at shutdown.
  • Add LastProcessedToken to StreamConsumerData — set after each delivered or filtered batch, and seeded from the handshake start token when a subscription is added (so idle subscriptions don't block checkpointing).
  • PersistentStreamPullingAgent.NotifyDeliveryProgress() scans pubSubCache, collects all LastProcessedToken values, and reports whether any stream registrations are still pending.
  • EventHubAdapterReceiver.UpdateDeliveryProgress() computes a low-watermark (minimum EventHub offset across all subscription tokens). The checkpoint only advances to the minimum — ensuring no subscription's undelivered messages are skipped. Pending registrations block advancement entirely. When no subscriptions exist, falls back to the cache purge offset.
  • Call FlushAsync() in EventHubAdapterReceiver.Shutdown() to persist the final checkpoint.

What this preserves

  • At-least-once delivery — the checkpoint never advances past the slowest subscriber's confirmed delivery point.
  • No new locking — all pubSubCache access is on the SystemTarget scheduler; the receiver's cachePurgeOffset is also accessed from the same context.
  • Non-breaking for third-party IQueueCache implementations — the new method is a default interface method (no-op).

Fixes #10093

@fickleEfrit fickleEfrit force-pushed the fix/streaming-checkpoint-flush branch from 220ac17 to ea1d471 Compare May 13, 2026 17:03
@ReubenBond

Copy link
Copy Markdown
Member

cc @egil

@egil

egil commented May 13, 2026

Copy link
Copy Markdown
Contributor

This looks promising. I am concerned about the trade-off mentioned. Does CheckpointOnRead mean that events read from an eventhub, but not yet delivered, are check pointed?

If yes, is it possible to instead checkpoint when messages have been delivered, aka OnNextAsync on a target grain has returned without exception instead? Big thing would be to avoid the cache purge issue where messages are only checkpointed after being purged from cache. I would love to have that while being sure not to lose messages.

@fickleEfrit

Copy link
Copy Markdown
Contributor Author

This looks promising. I am concerned about the trade-off mentioned. Does CheckpointOnRead mean that events read from an eventhub, but not yet delivered, are check pointed?

If yes, is it possible to instead checkpoint when messages have been delivered, aka OnNextAsync on a target grain has returned without exception instead? Big thing would be to avoid the cache purge issue where messages are only checkpointed after being purged from cache. I would love to have that while being sure not to lose messages.

My understanding of the current checkpointing system for Event Hub is that there's an existing gap where a consumer can fall behind messages that are evicted if those messages have already been purged.

I'll explore approaches for only advancing the checkpoint after delivery.

@egil

egil commented May 13, 2026

Copy link
Copy Markdown
Contributor

My understanding of the current checkpointing system for Event Hub is that there's an existing gap where a consumer can fall behind messages that are evicted if those messages have already been purged.

True, if messages have not been processed at DataMaxAgeInCache, then yes, I think so too.

I suspect that you have a bunch of other problems if that happens. Back pressure features in pulling agents / in the cache should prevent this though, to some extend.

@fickleEfrit

fickleEfrit commented May 13, 2026

Copy link
Copy Markdown
Contributor Author

@egil What do you think about this approach to checkpointing on delivery:

Introduce a callback on the queue interface that the pulling agent will call after completing delivery.
When the callback comes in to the EventHubAdapterReceiver, we track relative to each stream and subscription to make sure delivery has passed a given sequence value for all subscribers.
Once we see each subscriber for a stream move past a given offset, we can advance the checkpoint.

@fickleEfrit fickleEfrit force-pushed the fix/streaming-checkpoint-flush branch from ea1d471 to 0f535bd Compare May 13, 2026 21:02
@egil

egil commented May 13, 2026

Copy link
Copy Markdown
Contributor

@egil What do you think about this approach to checkpointing on delivery:

Introduce a callback on the queue interface that the pulling agent will call after completing delivery. When the callback comes in to the EventHubAdapterReceiver, we track relative to each stream and subscription to make sure delivery has passed a given sequence value for all subscribers. Once we see each subscriber for a stream move past a given offset, we can advance the checkpoint.

I honestly do not know enough about the inner workings of the eventhub based stream provider to have a strong opinion. From an end user point of view, I want to have as close to zero chance of losing events. So a crashed silo should not be something results in lost messages. Cloud env is unstable, so it happens more often than I would like. ACA getting impatient and killing silos, etc.,

@egil

egil commented May 13, 2026

Copy link
Copy Markdown
Contributor

What I do not need is the replay functionality that is available with the eventhub based stream providers, which is suspect is the reason for keeping items around in the cache for a longer time and not checkpointing as aggressive as we would want.

@fickleEfrit

Copy link
Copy Markdown
Contributor Author

I think the approach I mentioned/pushed minimizes the risk of a message being lost. We only advance the checkpoint if the message gets delivered to all known subscribers for a stream.

If we want zero replay chances we probably should checkpoint on read, but that's riskier. Say the latest checkpoint was at message 5. With the approach above it's possible for us to have delivered message 10 to subscriber A for a stream but not subscriber B. Then if the silo crashes, we will start again from message 5. This means we would redeliver up to 10 for A as well.

Even though this doesn't completely eliminate redelivery, just checkpointing more frequently should minimize it imo.

Based on my previous conversations with the Orleans devs I think the massive replay is an unintended consequence of infrequent checkpointing, though perhaps @ReubenBond could clarify that.

@egil

egil commented May 13, 2026

Copy link
Copy Markdown
Contributor

I think the approach I mentioned/pushed minimizes the risk of a message being lost. We only advance the checkpoint if the message gets delivered to all known subscribers for a stream.

If this is the case, then we have our "at least-once delivery guarantee" which is what I need.

Even though this doesn't completely eliminate redelivery, just checkpointing more frequently should minimize it imo.

To be clear, I do not mind redelivery of messages, my grains are idempotent, however, like you, I dislike having 4 hours of events being pulled in again.

What I was referring to before when I talked about "replay" feature is that the eventhub backed streaming provider allows grains to request older messages they have already seen - they support getting older messages a second time, on purpose. I do not need this particular feature, and I think some of the behavior we are seeing is due to this feature.

// Per-subscription progress tracking for low-watermark checkpointing.
// Tracks the max processed EventHub offset per (stream, subscription) so a fast
// subscription cannot advance the checkpoint past a slower active subscription.
private readonly object deliveryTrackingLock = new();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private readonly object deliveryTrackingLock = new();
private readonly Lock deliveryTrackingLock = new();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit. not sure if Lock is being used in this codebase yet.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We multi-target .NET 8.0, which doesn't have Lock, so if we want to use Lock then we need to use it conditionally or we need to add a polyfill.

// subscription cannot advance the checkpoint past a slower active subscription.
private readonly object deliveryTrackingLock = new();
private readonly Dictionary<(StreamId StreamId, GuidId SubscriptionId), long?> processedOffsets = new();
private readonly HashSet<StreamId> pendingRegistrations = new();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the extra state tracking here gives me pause. I wonder if there is a cheaper way to do this. Eg, scanning cursors on shutdown/xfer, since that is the gap that this PR is intending to cover (hand-off)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying we wouldn't want to update the checkpoints through the normal lifecycle, only when shutting down or transferring? It's definitely a lot of state to be managing, but if we want to have delivery guarantees alongside periodic checkpointing it might be necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we just scan the cursors when we transfer, we don't have guarantees that the messages were actually delivered to the consumers. Kind of similar to the point @egil raised on my earlier iteration where we checkpointed with every read. The difference being that instead of every read, we would checkpoint the latest read when we transfer or shut down.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to an approach that relies on the state that already exists in the pulling agent.

@ReubenBond ReubenBond force-pushed the fix/streaming-checkpoint-flush branch from a00a364 to d417763 Compare June 3, 2026 04:14
@ReubenBond ReubenBond requested a review from Copilot June 3, 2026 04:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets a known reliability issue in Orleans Event Hubs streaming where checkpoints can lag or be lost, causing excessive message replay when silos restart or rebalance. It introduces delivery-driven checkpoint updates (instead of relying on cache eviction) and adds an explicit flush step during shutdown to durably persist the latest safe offset.

Changes:

  • Adds delivery-progress reporting from PersistentStreamPullingAgent to the queue cache/receiver to enable frequent, delivery-based checkpoint advancement.
  • Extends checkpointing APIs with FlushAsync() and updates the Event Hubs checkpointer/receiver to track and flush the latest offset during shutdown.
  • Adds test coverage for delivery-progress pushes and Event Hubs delivery-based checkpointing behavior.
Show a summary per file
File Description
test/Orleans.Streaming.Tests/StreamingTests/PersistentStreamPullingAgentTests.cs Adds tests asserting delivery progress is pushed to the cache during pump ticks and shutdown, including pending-registration behavior.
test/Extensions/Orleans.Streaming.EventHubs.Tests/CheckpointerTests/EventHubCheckpointerTests.cs New tests validating Event Hub receiver checkpoint updates driven by delivery progress signals.
src/Orleans.Streaming/QueueAdapters/IQueueCache.cs Adds default interface method UpdateDeliveryProgress(StreamSequenceToken?) for periodic delivery-based progress reporting.
src/Orleans.Streaming/PersistentStreams/QueueStreamDataStructures.cs Introduces LastProcessedToken on StreamConsumerData to record per-subscription progress for watermark calculation.
src/Orleans.Streaming/PersistentStreams/PersistentStreamPullingAgent.cs Implements periodic delivery-progress scanning + watermark computation; seeds and updates LastProcessedToken; adjusts queue read/registration ordering.
src/Orleans.Streaming/PersistentStreams/IStreamQueueCheckpointer.cs Adds default interface method FlushAsync() for durable checkpoint flushing on shutdown/rebalance.
src/Azure/Orleans.Streaming.EventHubs/Providers/Streams/EventHub/EventHubCheckpointer.cs Tracks latest offset in-memory under throttling and adds FlushAsync() to persist the latest offset.
src/Azure/Orleans.Streaming.EventHubs/Providers/Streams/EventHub/EventHubAdapterReceiver.cs Adds delivery-progress-driven checkpoint updates, cache-purge offset tracking as a fallback, and flushes checkpoint during shutdown.
src/api/Orleans.Streaming/Orleans.Streaming.cs Updates public API surface for IQueueCache.UpdateDeliveryProgress and IStreamQueueCheckpointer.FlushAsync.
src/api/Azure/Orleans.Streaming.EventHubs/Orleans.Streaming.EventHubs.cs Updates public API surface to include EventHubCheckpointer.FlushAsync().

Copilot's findings

  • Files reviewed: 12/12 changed files
  • Comments generated: 2

Comment thread src/Orleans.Streaming/PersistentStreams/PersistentStreamPullingAgent.cs Outdated
Comment on lines 135 to 145

// if we've saved before but it's not time for another save or the last save operation has not completed, do nothing
if (throttleSavesUntilUtc.HasValue && (throttleSavesUntilUtc.Value > utcNow || !inProgressSave.IsCompleted))
var mustSaveNow = IsBefore(offset, entity.Offset);

// Always track the latest offset in memory so FlushAsync can persist it.
latestOffset = offset;

// If we've saved before but it's not time for another save or the last save operation has not completed,
// do nothing unless this update lowers the checkpoint to protect a slower subscription.
if (!mustSaveNow && throttleSavesUntilUtc.HasValue && (throttleSavesUntilUtc.Value > utcNow || !inProgressSave.IsCompleted))
{
return;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this one and kept the rewind behavior intentionally: a newly registered subscriber can request replay from an older token, so the safe delivery watermark can legitimately move backwards. Clamping to a monotonic checkpoint here would risk skipping messages for that subscriber after restart. I documented that behavior and added coverage in d299780.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ReubenBond I still see the code comment mentioning that the checkpoint is monotonic at line 130. There's also still a check within Update which gates moving backwards at line 132.

From your comment it sounds like that is not the behavior we want?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that it might be ok to advance the checkpoint monotonically in this way, though. If a subscriber is intentionally reading older stuff, I'm not sure that should move the checkpoint back for all the other subscribers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"My" comment was actually a robot being a bit overeager. I told it not to implement backtracking. If there is a late subscriber that subscribes at an earlier position (however it learned about that earlier position, I don't know), then it will simply have missed out on some messages (like it would today).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand, are you saying that the gate we have in place to prevent moving our checkpoint backwards is what we want?

@ReubenBond ReubenBond force-pushed the fix/streaming-checkpoint-flush branch 2 times, most recently from 0051984 to 49bfaf0 Compare June 3, 2026 16:01
@ReubenBond ReubenBond requested a review from Copilot June 11, 2026 00:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot's findings

  • Files reviewed: 12/12 changed files
  • Comments generated: 6

/// </summary>
/// <param name="utcNow">The current UTC time.</param>
/// <returns><see langword="true"/> if an update is due; otherwise, <see langword="false"/>.</returns>
bool IsUpdateDue(DateTime utcNow) => true;
Comment on lines +825 to 834
if (exceptionOccured is null)
{
deliveredAny = true;
if (!ShouldDeliverBatch(consumerData.StreamId, batch, consumerData.FilterData))

if (nextBatch.Batch is null)
{
consumerData.LastProcessedToken = nextBatch.ProgressToken;
continue;
}
}
Comment on lines +141 to +157
// If we've saved before but it's not time for another save or the last save operation has not completed, do nothing.
if (throttleSavesUntilUtc.HasValue && (throttleSavesUntilUtc.Value > utcNow || !inProgressSave.IsCompleted))
{
return;
}

entity.Offset = offset;
throttleSavesUntilUtc = utcNow + persistInterval;
inProgressSave = dataManager.UpsertTableEntryAsync(entity);
if (inProgressSave.IsCompleted)
{
entity.Offset = latestOffset;
inProgressSave = dataManager.UpsertTableEntryAsync(entity);
}
else
{
var previousSave = inProgressSave;
inProgressSave = SaveLatestAfter(previousSave);
}
Comment on lines +61 to +82
private sealed class TestEventHubQueueCache : IEventHubQueueCache
{
public int GetMaxAddCount() => 1_000;

public List<StreamPosition> Add(List<EventData> message, DateTime dequeueTimeUtc) => [];

public object GetCursor(StreamId streamId, StreamSequenceToken sequenceToken) => new();

public bool TryGetNextMessage(object cursorObj, out IBatchContainer message)
{
message = null;
return false;
}

public void AddCachePressureMonitor(ICachePressureMonitor monitor)
{
}

public void SignalPurge()
{
}

Comment on lines +125 to +128
var receiver = new EventHubAdapterReceiver(
settings,
cacheFactory: (_, _, _) => new TestEventHubQueueCache(),
checkpointerFactory: _ => Task.FromResult<IStreamQueueCheckpointer<string>>(checkpointer),
Comment on lines +255 to +281
[Fact, TestCategory("BVT")]
public async Task CachePurge_UsedAsFallbackWhenNoSubscriptions()
{
var checkpointer = new TestCheckpointer();
var receiver = await CreateReceiver(checkpointer);

// Trigger cache purge via TryPurgeFromCache → SignalPurge → CachePurgeCheckpointer.
// Since we can't easily trigger the real cache eviction path, we simulate by calling
// TryPurgeFromCache (which calls SignalPurge on the cache) and then testing that
// after a purge offset is recorded, UpdateDeliveryProgress with no active subscriptions
// falls back to the purge offset.
//
// The CachePurgeCheckpointer is constructed in Initialize and wraps the real checkpointer.
// We verify the fallback by first establishing a purge offset via a delivery progress
// call that includes subscriptions, then removing all subscriptions.

// First: some subscription progress establishes a checkpoint.
UpdateDeliveryProgress(receiver, MakeToken(100));
Assert.Equal("100", checkpointer.LastOffset);

// Now with no subscriptions, the purge offset isn't set yet so no checkpoint change.
// (cachePurgeOffset is only set via the CachePurgeCheckpointer, which we can't
// trigger without a real cache eviction.)
UpdateDeliveryProgressWithNoSubscriptions(receiver);
// cachePurgeOffset is null → no update, LastOffset stays at previous value.
Assert.Equal("100", checkpointer.LastOffset);
}
@ReubenBond ReubenBond changed the title Fix streaming checkpoint gaps causing message replay on restart fix(streaming): prevent checkpoint gaps causing replay Jun 11, 2026
fickleEfrit and others added 6 commits June 10, 2026 21:11
Add FlushAsync() to IStreamQueueCheckpointer<T> as a default interface
method and call it during EventHubAdapterReceiver shutdown so the latest
processed offset is persisted before the cache is disposed.

Fix EventHubCheckpointer.Update() to always track the latest offset in
a separate field (latestOffset), avoiding mutation of entity during
in-flight Azure Table saves. Ensure offsets only move forward by
comparing numerically, preventing purge-based checkpoints from
regressing past delivery-based checkpoints.

Add NotifyBatchDelivered() to IQueueCache as a default interface method.
The pulling agent calls this after each successful message delivery to a
consumer. EventHubAdapterReceiver implements it with a low-watermark
approach: it tracks the max delivered EventHub offset per (stream,
subscription) pair and only advances the checkpoint to the minimum
across all tracked subscriptions. This ensures the checkpoint reflects
actual delivery progress and never advances past any subscription that
is still behind, while also advancing much more frequently than the
previous cache-eviction-only mechanism.

Fixes dotnet#10093

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace 5 IQueueCache lifecycle notification methods with a single
UpdateDeliveryProgress method. The pulling agent periodically scans
its pubSubCache (which already tracks per-subscription delivery state)
and pushes a snapshot of processed tokens to the cache/receiver.

The EventHubAdapterReceiver no longer maintains per-subscription state
dictionaries or locking — it computes the watermark directly from the
snapshot each time UpdateDeliveryProgress is called.

Key changes:
- Add LastProcessedToken field to StreamConsumerData, set for both
  delivered and filtered batches
- Seed LastProcessedToken from the handshake start token to avoid
  blocking checkpoints on idle subscriptions
- Remove processedOffsets, pendingRegistrations, and deliveryTrackingLock
  from EventHubAdapterReceiver
- CachePurgeCheckpointer only tracks cachePurgeOffset (no direct
  checkpoint writes)
- NotifyDeliveryProgress called in ReadFromQueue and at shutdown

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update delivery progress reporting to pass the computed earliest subscription token instead of a full token list.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the pending registration flag from delivery progress updates and suppress progress notifications while stream registrations are pending.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a cancellation token to stream queue checkpointer flushing and pass shutdown cancellation through EventHub checkpoint flushes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ReubenBond and others added 12 commits June 10, 2026 21:12
Treat unregistered consumers as blocking checkpoint progress and compare delivery tokens by their base sequence position so mixed token implementations do not crash the pulling agent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document that delivery watermarks can move backwards for replaying subscribers and cover that lower safe watermarks continue to be forwarded for checkpointing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pass a lazy delivery-progress callback to queue caches so EventHub only scans subscription progress when its checkpointer cadence is due, while still forcing updates for shutdown and subscriber registration rewinds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove subscriber-registration checkpoint rewinds and ignore older EventHub checkpoint offsets. Move checkpoint update cadence from the EventHub-specific hook to a default IStreamQueueCheckpointer method.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Have the pulling agent check whether progress updates are due before computing delivery progress, then pass the computed token directly to the cache.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Avoid advancing delivery progress past rewind handshake tokens, ensure shutdown cleanup runs even when checkpoint flush fails, and initialize EventHub checkpoint flush state safely before Load completes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pass the real EventHub checkpointer into the cache again so purge updates and delivery-progress updates share the same monotonic checkpoint state without a receiver-local wrapper.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the checkpoint update cadence hook and rely on shutdown-only delivery progress scans for handoff precision. Coalesce EventHub checkpoint saves without chaining tasks and simplify offset extraction in delivery progress updates.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the pending save coalescing loop and restore a single opportunistic save path while keeping latest-offset tracking for shutdown flushes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restore the single-pass stream group handling in the pulling agent and avoid materializing intermediate stream group projections.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Do not advance the EventHub checkpoint save throttle when an update does not move the checkpoint forward.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove a stale static stream instrumentation call after rebasing on main and update EventHub checkpointer tests for the current receiver monitor constructor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ReubenBond ReubenBond force-pushed the fix/streaming-checkpoint-flush branch from bab01c3 to 0d0064c Compare June 11, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Streaming] Newly added silos replay too many streaming messages due to no updated checkpointing

4 participants