ENG-3687: Fix requires_input watchdog guard for manual webhook and manual task DSRs#8264
Conversation
The requeue_interrupted_tasks watchdog incorrectly canceled or requeued privacy requests in requires_input/pending_external status. These DSRs are intentionally paused waiting for manual webhook data or manual task input and have no running Celery task by design. Additionally, requeue_requires_input_requests blindly requeued all requires_input DSRs when no AccessManualWebhooks existed, bricking DSRs paused by manual_task connections (which have RequestTasks). Fixes: - Add early continue in watchdog loop for paused statuses - Only requeue DSRs with zero RequestTasks (manual_webhook, pre-graph) and skip those with RequestTasks (manual_task, in-graph) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
Verifies that when both types are in requires_input simultaneously, only the webhook DSR (zero RequestTasks) is requeued while the manual_task DSR (has RequestTasks) is left alone. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8264 +/- ##
=======================================
Coverage 85.09% 85.09%
=======================================
Files 669 669
Lines 43585 43591 +6
Branches 5125 5127 +2
=======================================
+ Hits 37087 37095 +8
+ Misses 5392 5391 -1
+ Partials 1106 1105 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/code-review |
There was a problem hiding this comment.
Code Review
The fix is correct and well-scoped. Two independent bugs are addressed:
-
requeue_interrupted_taskswatchdog — An earlycontinueguard at the top of the per-request loop now skips allrequires_input/pending_externalDSRs before any path that could cancel or error them. Clean and correct. -
requeue_requires_input_requests— The newhas_request_tasksguard correctly distinguishes between manual-webhook DSRs (noRequestTasks, pre-graph) and manual-task DSRs (haveRequestTasks, in-graph), preventing the latter from being incorrectly requeued when connection configs are updated.
The changelog entry is present and the log message typo fix is a welcome cleanup.
Findings
connection_util.py — .limit(1).count() > 0 is functional but non-idiomatic; see inline comment for cleaner alternatives (.first() is not None or .exists()).
request_service.py — The inner-loop requires_input/pending_external check (~line 688) is now dead code: the new outer guard fires first for any request with those statuses. No behavior change, but worth removing in a follow-up.
Tests — The four test cases in TestWatchdogSkipsPausedRequests all exercise the same single code path (the new early continue). Their docstrings reference four different prior code paths that are no longer individually reachable. Several mock setups (e.g., the get_cached_task_id side-effect chain in test_subtask_cache_exception_skips_cancel) are never consumed. The tests prove the right behavior but could be simplified to avoid the misleading setup overhead.
Overall a solid fix with good test coverage for the two new behaviors; the notes above are all minor.
🔬 Codegraph: connected (51846 nodes)
💡 Write /code-review in a comment to re-run this review.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use .first() is not None instead of .limit(1).count() > 0 - Remove dead inner requires_input/pending_external guard - Simplify watchdog tests: collapse 4 tests into 1 parametrized test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
eastandwestwind
left a comment
There was a problem hiding this comment.
Nice work! Some small things to note, but not blocking.
- Add NOTE comment calling out RequestTask heuristic assumption in connection_util.py - Rename _M to _REQUEST_SERVICE_MODULE in test file for clarity Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The early guard was blanket-skipping all requires_input/pending_external DSRs, preventing orphaned async task detection from running. A pending_external DSR with an orphaned callback task (deleted connector) should still be requeued. Fix: guard at two levels instead of one blanket skip: - No cached task_id: skip paused DSRs (never dispatched to Celery) - No cached subtask_id: skip paused DSR subtasks (Celery completed, waiting for manual input or external system) Orphaned tasks (have a cached subtask_id but connection deleted) bypass both guards and are correctly requeued. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…om:ethyca/fides into eng-3687/fix-requires-input-watchdog-guard
…nual task DSRs (#8264) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ticket ENG-3687
Description Of Changes
The
requeue_interrupted_taskswatchdog andrequeue_requires_input_requestsfunction incorrectly error/requeue privacy requests that are intentionally paused for manual input.Bug 1 — Watchdog errors paused DSRs: The watchdog polls for "stuck" privacy requests and cancels/requeues them. Four code paths in the watchdog had no awareness of
requires_inputorpending_externalstatus — DSRs paused for manual webhook data or manual task input were treated as stuck and errored within one polling cycle (~45 seconds to 5 minutes).Bug 2 — Connection config updates brick manual task DSRs:
requeue_requires_input_requests()is called unconditionally after everypatch_connection_configscall. If noAccessManualWebhookrecords exist (normal for manual_task-only setups), ALLrequires_inputDSRs are force-transitioned toin_processing— including those paused by manual_task connections. These DSRs then fail because the manual task data hasn't been submitted.Code Changes
request_service.py: Added earlycontinueguard at the top of the watchdog loop forrequires_inputandpending_externalstatuses. This skips all four downstream cancellation/requeue paths with a single check.connection_util.py: AddedRequestTaskexistence check inrequeue_requires_input_requests. DSRs with RequestTasks (manual_task, paused in-graph) are skipped. DSRs without RequestTasks (manual_webhook, paused pre-graph) are still correctly requeued when all webhooks are disabled.Steps to Confirm
Setup
Create a
manual_webhookconnection and configure its AccessManualWebhook fields:Create a
manual_taskconnection and add a field:Link both connections to a system:
Bug 1: Watchdog no longer errors requires_input DSRs
requires_inputFIDES__EXECUTION__INTERRUPTED_TASK_REQUEUE_INTERVAL=30for faster testing)requires_input— it should NOT transition toerrorSkipping privacy request <id> in requires_input status - intentionally pausedBug 2: Patching a connection no longer bricks requires_input DSRs
requires_inputstatus, PATCH an unrelated connection:requires_inputDSR is NOT requeued toin_processingCoexistence: Both webhook and manual task DSRs
requires_inputrequires_inputResume flow still works
requires_inputDSRPOST /api/v1/privacy-request/{id}/resume_from_requires_inputin_processingand continues processingAutomated tests
All 12 tests should pass. Also run existing watchdog tests for regressions:
Pre-Merge Checklist
CHANGELOG.mdupdated