Skip to content

ENG-3687: Fix requires_input watchdog guard for manual webhook and manual task DSRs#8264

Merged
JadeCara merged 10 commits into
mainfrom
eng-3687/fix-requires-input-watchdog-guard
May 26, 2026
Merged

ENG-3687: Fix requires_input watchdog guard for manual webhook and manual task DSRs#8264
JadeCara merged 10 commits into
mainfrom
eng-3687/fix-requires-input-watchdog-guard

Conversation

@JadeCara
Copy link
Copy Markdown
Contributor

@JadeCara JadeCara commented May 21, 2026

Ticket ENG-3687

Description Of Changes

The requeue_interrupted_tasks watchdog and requeue_requires_input_requests function incorrectly error/requeue privacy requests that are intentionally paused for manual input.

Bug 1 — Watchdog errors paused DSRs: The watchdog polls for "stuck" privacy requests and cancels/requeues them. Four code paths in the watchdog had no awareness of requires_input or pending_external status — DSRs paused for manual webhook data or manual task input were treated as stuck and errored within one polling cycle (~45 seconds to 5 minutes).

Bug 2 — Connection config updates brick manual task DSRs: requeue_requires_input_requests() is called unconditionally after every patch_connection_configs call. If no AccessManualWebhook records exist (normal for manual_task-only setups), ALL requires_input DSRs are force-transitioned to in_processing — including those paused by manual_task connections. These DSRs then fail because the manual task data hasn't been submitted.

Code Changes

  • request_service.py: Added early continue guard at the top of the watchdog loop for requires_input and pending_external statuses. This skips all four downstream cancellation/requeue paths with a single check.
  • connection_util.py: Added RequestTask existence check in requeue_requires_input_requests. DSRs with RequestTasks (manual_task, paused in-graph) are skipped. DSRs without RequestTasks (manual_webhook, paused pre-graph) are still correctly requeued when all webhooks are disabled.

Steps to Confirm

Note: Connections must be linked to a system via PUT /api/v1/connection/{key}/system-links before DSRs will process correctly. Unlinked connections cause an unrelated runner crash.

Setup

  1. Create a manual_webhook connection and configure its AccessManualWebhook fields:

    PATCH /api/v1/connection
    [{"key": "test_manual_webhook", "connection_type": "manual_webhook", "name": "Test Manual Webhook", "access": "write"}]
    
    POST /api/v1/connection/test_manual_webhook/access_manual_webhook
    {"fields": [{"pii_field": "id", "dsr_package_label": "customer_id"}]}
    
  2. Create a manual_task connection and add a field:

    PATCH /api/v1/connection
    [{"key": "test_manual_task", "connection_type": "manual_task", "name": "Test Manual Task", "access": "write"}]
    
    POST /api/v1/plus/connection/test_manual_task/manual-field
    {"label": "Customer Name", "help_text": "Full name", "field_type": "text", "request_type": "access"}
    
  3. Link both connections to a system:

    PUT /api/v1/connection/test_manual_webhook/system-links
    {"links": [{"system_fides_key": "<your_system_key>"}]}
    
    PUT /api/v1/connection/test_manual_task/system-links
    {"links": [{"system_fides_key": "<your_system_key>"}]}
    

Bug 1: Watchdog no longer errors requires_input DSRs

  1. Submit and approve a DSR (access request) — it should transition to requires_input
  2. Wait at least one watchdog polling cycle (default 300s, or set FIDES__EXECUTION__INTERRUPTED_TASK_REQUEUE_INTERVAL=30 for faster testing)
  3. Verify the DSR remains in requires_input — it should NOT transition to error
  4. Check logs for: Skipping privacy request <id> in requires_input status - intentionally paused

Bug 2: Patching a connection no longer bricks requires_input DSRs

  1. With a DSR in requires_input status, PATCH an unrelated connection:
    PATCH /api/v1/connection
    [{"key": "test_postgres", "connection_type": "postgres", "name": "Test Postgres", "access": "read"}]
    
  2. Verify the requires_input DSR is NOT requeued to in_processing

Coexistence: Both webhook and manual task DSRs

  1. Have both a manual_webhook and manual_task connection configured (from setup)
  2. Submit two DSRs — both should enter requires_input
  3. PATCH an unrelated connection (same as Bug 2)
  4. Verify both DSRs remain in requires_input

Resume flow still works

  1. Upload manual webhook data for a requires_input DSR
  2. Call POST /api/v1/privacy-request/{id}/resume_from_requires_input
  3. Verify the DSR transitions to in_processing and continues processing

Automated tests

docker exec fides bash -c "uv sync && pytest --no-cov tests/fides/ops/service/privacy_request/test_requeue_interrupted_tasks_guards.py tests/fides/ops/util/test_connection_util.py -v"

All 12 tests should pass. Also run existing watchdog tests for regressions:

docker exec fides bash -c "pytest --no-cov tests/fides/ops/service/privacy_request/test_request_service.py -v"

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • No followup issues
  • Database migrations:
    • No migrations
  • Documentation:
    • No documentation updates required

JadeCara and others added 2 commits May 21, 2026 16:26
The requeue_interrupted_tasks watchdog incorrectly canceled or requeued
privacy requests in requires_input/pending_external status. These DSRs
are intentionally paused waiting for manual webhook data or manual task
input and have no running Celery task by design.

Additionally, requeue_requires_input_requests blindly requeued all
requires_input DSRs when no AccessManualWebhooks existed, bricking
DSRs paused by manual_task connections (which have RequestTasks).

Fixes:
- Add early continue in watchdog loop for paused statuses
- Only requeue DSRs with zero RequestTasks (manual_webhook, pre-graph)
  and skip those with RequestTasks (manual_task, in-graph)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
fides-plus-nightly Ready Ready Preview, Comment May 26, 2026 10:37pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
fides-privacy-center Ignored Ignored May 26, 2026 10:37pm

Request Review

Verifies that when both types are in requires_input simultaneously,
only the webhook DSR (zero RequestTasks) is requeued while the
manual_task DSR (has RequestTasks) is left alone.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.09%. Comparing base (5a94d9a) to head (2ba2f00).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8264   +/-   ##
=======================================
  Coverage   85.09%   85.09%           
=======================================
  Files         669      669           
  Lines       43585    43591    +6     
  Branches     5125     5127    +2     
=======================================
+ Hits        37087    37095    +8     
+ Misses       5392     5391    -1     
+ Partials     1106     1105    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JadeCara
Copy link
Copy Markdown
Contributor Author

/code-review

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The fix is correct and well-scoped. Two independent bugs are addressed:

  1. requeue_interrupted_tasks watchdog — An early continue guard at the top of the per-request loop now skips all requires_input / pending_external DSRs before any path that could cancel or error them. Clean and correct.

  2. requeue_requires_input_requests — The new has_request_tasks guard correctly distinguishes between manual-webhook DSRs (no RequestTasks, pre-graph) and manual-task DSRs (have RequestTasks, in-graph), preventing the latter from being incorrectly requeued when connection configs are updated.

The changelog entry is present and the log message typo fix is a welcome cleanup.

Findings

connection_util.py.limit(1).count() > 0 is functional but non-idiomatic; see inline comment for cleaner alternatives (.first() is not None or .exists()).

request_service.py — The inner-loop requires_input/pending_external check (~line 688) is now dead code: the new outer guard fires first for any request with those statuses. No behavior change, but worth removing in a follow-up.

Tests — The four test cases in TestWatchdogSkipsPausedRequests all exercise the same single code path (the new early continue). Their docstrings reference four different prior code paths that are no longer individually reachable. Several mock setups (e.g., the get_cached_task_id side-effect chain in test_subtask_cache_exception_skips_cancel) are never consumed. The tests prove the right behavior but could be simplified to avoid the misleading setup overhead.

Overall a solid fix with good test coverage for the two new behaviors; the notes above are all minor.


🔬 Codegraph: connected (51846 nodes)

💡 Write /code-review in a comment to re-run this review.

Comment thread src/fides/api/util/connection_util.py Outdated
Comment thread src/fides/api/service/privacy_request/request_service.py Outdated
Comment thread tests/fides/ops/service/privacy_request/test_requeue_interrupted_tasks_guards.py Outdated
Comment thread tests/fides/ops/service/privacy_request/test_requeue_interrupted_tasks_guards.py Outdated
JadeCara and others added 2 commits May 21, 2026 17:21
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use .first() is not None instead of .limit(1).count() > 0
- Remove dead inner requires_input/pending_external guard
- Simplify watchdog tests: collapse 4 tests into 1 parametrized test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JadeCara JadeCara marked this pull request as ready for review May 22, 2026 00:01
@JadeCara JadeCara requested a review from a team as a code owner May 22, 2026 00:01
@JadeCara JadeCara requested review from eastandwestwind and galvana and removed request for a team and galvana May 22, 2026 00:01
Copy link
Copy Markdown
Contributor

@eastandwestwind eastandwestwind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Some small things to note, but not blocking.

Comment thread src/fides/api/util/connection_util.py
Comment thread tests/fides/ops/service/privacy_request/test_requeue_interrupted_tasks_guards.py Outdated
JadeCara and others added 2 commits May 26, 2026 15:56
- Add NOTE comment calling out RequestTask heuristic assumption in connection_util.py
- Rename _M to _REQUEST_SERVICE_MODULE in test file for clarity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JadeCara and others added 2 commits May 26, 2026 16:33
The early guard was blanket-skipping all requires_input/pending_external
DSRs, preventing orphaned async task detection from running. A
pending_external DSR with an orphaned callback task (deleted connector)
should still be requeued.

Fix: guard at two levels instead of one blanket skip:
- No cached task_id: skip paused DSRs (never dispatched to Celery)
- No cached subtask_id: skip paused DSR subtasks (Celery completed,
  waiting for manual input or external system)

Orphaned tasks (have a cached subtask_id but connection deleted) bypass
both guards and are correctly requeued.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…om:ethyca/fides into eng-3687/fix-requires-input-watchdog-guard
@JadeCara JadeCara enabled auto-merge May 26, 2026 22:44
@JadeCara JadeCara added this pull request to the merge queue May 26, 2026
Merged via the queue into main with commit d85fa4a May 26, 2026
68 of 69 checks passed
@JadeCara JadeCara deleted the eng-3687/fix-requires-input-watchdog-guard branch May 26, 2026 22:54
JadeCara added a commit that referenced this pull request May 27, 2026
…nual task DSRs (#8264)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants