fix: atomic claim for LlmQueue non-capture processing (#1190)#1200
Conversation
ProcessNextRequestAsync and LlmQueueToProposalWorker.ProcessSingleItemAsync used a non-atomic fetch-then-mutate pattern for claiming pending LLM requests. Under concurrent workers, the same request could be claimed twice, producing duplicate proposals. Add TryClaimProcessingAsync to ILlmQueueRepository / LlmQueueRepository that atomically transitions Pending -> Processing with an optimistic concurrency guard (WHERE Status = Pending AND UpdatedAt = @expected), mirroring the existing TryClaimProcessingCaptureAsync pattern. Update both ProcessNextRequestAsync and ProcessSingleItemAsync to use the atomic claim instead of the racy read-then-MarkAsProcessing-then-save flow. ProcessNextRequestAsync now iterates candidates and skips any that fail the atomic claim, falling through to the next FIFO candidate. Tests: 6 new unit tests (service claim success, claim failure, fallthrough to next candidate, skip capture requests, empty queue, FIFO ordering) + 4 new integration tests (claim pending, fail when stale, concurrent race exactly-one, reject non-pending). All 3,297 Application.Tests + 1,744 Api.Tests green.
Adversarial Code ReviewCRITICAL
HIGH
MEDIUM
LOW
Bot Comments Addressed
Summary0 CRITICAL, 0 HIGH, 2 MEDIUM, 1 LOW. No merge blockers. All findings are fixable with minimal changes. |
There was a problem hiding this comment.
Code Review
This pull request introduces optimistic concurrency for claiming pending non-capture requests in the LLM queue by adding and implementing TryClaimProcessingAsync using raw SQL updates. It updates the background worker and queue service to use this atomic claim mechanism and adds corresponding integration and unit tests. The review feedback identifies a critical bug in LlmQueueService.ProcessNextRequestAsync where re-fetching the claimed request via GetByIdAsync returns a stale, tracked in-memory entity from EF Core's cache instead of the updated database state. The reviewer suggests directly updating the tracked entity's state in-memory using candidate.MarkAsProcessing() to avoid an extra database roundtrip and ensure the correct status is returned.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
M1: Document orphaned-Processing edge case on null re-fetch after successful TryClaimProcessingAsync claim with explanatory comment. M2: Rename snake_case `claimed_request` to camelCase `claimedRequest`. L1: Rename misleading test from ShouldReturnConflict to ShouldReturnNotFound_WhenAllClaimsFail.
Adversarial Review -- Fixes Applied
All findings addressed. CI status: PENDING (new run triggered by fix push). |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f7eb7a451
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The raw-SQL UPDATE in TryClaimProcessingAsync bypasses the EF change tracker, leaving any tracked instance stale at Pending. Reload the tracked entity on a successful claim so callers holding the instance (and GetByIdAsync via the FindAsync identity map) observe Processing. Document the refresh contract on ILlmQueueRepository.
Drop the post-claim GetByIdAsync re-fetch: it served the stale tracked entity from the identity map, so the returned DTO reported Pending after a successful claim. The repository now refreshes the tracked candidate on claim, so map it directly. This also removes the misleading comments (the re-fetch did not reflect DB state, and no warning was logged). Unit test mocks now honor the refresh contract via callbacks.
Two integration tests against real SQLite: a tracked candidate fetched via GetByStatusAsync shows Processing after a successful claim, and GetByIdAsync without clearing the change tracker returns Processing. Both fail with stale Pending without the post-claim reload.
Review Fixes -- Stale Tracked Entity After Atomic Claim
How the fix works
Out-of-scope finding tracked
Test evidence
In-thread replies posted on both bot comments. Threads left open for the orchestrator to resolve. |
…ngAsync Add 'AND RequestType NOT LIKE inbox.capture.%' to the non-capture claim WHERE clause, mirroring the inverse guard in TryClaimProcessingCaptureAsync so the two claim paths are mutually exclusive at the SQL layer.
When the post-claim re-fetch no longer sees the row Processing, we DID win the claim but the row vanished/mutated between UPDATE and SELECT. Log a Warning and emit a distinct 'claimed_then_missing' telemetry outcome instead of conflating it with losing the claim race.
…after claim Read the persisted row via a fresh AsNoTracking query and assert tracked.UpdatedAt equals it, distinguishing a true ReloadAsync from an in-memory MarkAsProcessing() substitute that would set a different UTC-now timestamp.
Record the expectedUpdatedAt each fake claim receives and add a happy-path test asserting it equals the pending item's actual UpdatedAt. Catches a regression passing default/now (which would stall the queue in production via a no-match optimistic-concurrency UPDATE while tests stayed green) in both worker fakes.
Residual Review Findings -- Fixes AppliedAll five residual LOW findings from the prior review are addressed. Verified with
CI noteMain is green at Verification commands |
Summary
Fixes #1190 --
ProcessNextRequestAsyncandLlmQueueToProposalWorker.ProcessSingleItemAsyncused a non-atomic fetch-then-mutate pattern for claiming pending LLM requests. Under concurrent workers, the same request could be claimed twice, producing duplicate proposals.TryClaimProcessingAsynctoILlmQueueRepository/LlmQueueRepositorythat atomically transitionsPending -> Processingwith an optimistic concurrency guard (WHERE Status = Pending AND UpdatedAt = @expected), mirroring the existingTryClaimProcessingCaptureAsyncpatternLlmQueueService.ProcessNextRequestAsyncto use the atomic claim, iterating FIFO candidates and skipping any that fail the claimLlmQueueToProposalWorker.ProcessSingleItemAsyncto useTryClaimProcessingAsyncinstead of the racyGetByIdAsync+MarkAsProcessing+SaveChangesAsyncflowExpectedUpdatedAtfor non-capture batch items inBuildFairBatchItemsTest plan
LlmQueueServiceTests: claim success, claim failure (concurrent), fallthrough to next candidate, skip capture requests, empty queue, FIFO orderingLlmQueueRepositoryIntegrationTests: claim pending request, fail when status already changed, concurrent race (exactly one wins), reject non-pending requestProcessBatch_ItemClaimedBetweenFetchAndProcess_SkipsGracefullyworker test to use atomic claim pattern