Skip to content

feat(background-checks): hourly reconciliation for stuck checks (CS-473)#2996

Merged
Marfuen merged 5 commits into
mainfrom
cs-473
Jun 2, 2026
Merged

feat(background-checks): hourly reconciliation for stuck checks (CS-473)#2996
Marfuen merged 5 commits into
mainfrom
cs-473

Conversation

@Marfuen
Copy link
Copy Markdown
Contributor

@Marfuen Marfuen commented Jun 2, 2026

CS-473 — Reconcile stuck background checks

Relates to: https://linear.app/compai/issue/CS-473 (sub-issue of CS-475)

Problem

Background-check status is driven entirely by Identity webhooks. When a webhook is missed/not processed, the check stays stuck forever (e.g. Identity:Pending, References:Pending) with no fallback to catch up.

Fix

An hourly Trigger.dev scheduled task (reconcile-background-checks-schedule) that:

  • Finds stale, in-flight checks: status in invited/in_progress/in_review, with an identityBackgroundCheckId, whose lastSyncedAt is older than 1h (terminal + cancelled are skipped).
  • Polls Identity (getBackgroundCheck) for each and, if the reported status changed, applies it via the same fields the webhook handler writes (status, sub-statuses, and the report snapshot for completed checks).
  • Backs off (just bumps lastSyncedAt) when nothing changed.

CS-475's admin Retry gives a manual unblock; this is the automatic safety net.

⚠️ Assumption to verify

The job reads the lifecycle status from Identity's GET /v1/background-checks/:id. The app doesn't currently derive status from that response (it's the detailed report), so this assumes the GET returns a top-level status (consistent with what create returns). The parse is defensive: if status is absent/unknown, the check is left untouched and counted as unparseable (visible in the run logs) — safe no-op. If the GET doesn't expose status, we'll need a dedicated Identity status endpoint (follow-up); the logs will make this obvious after deploy.

Tests

  • API (jest): reconcile-background-checks-schedule.spec.ts → 10/10 — parseIdentityCheckState (valid / missing / invalid / extra-fields / non-object) and runReconciliation (api-key skip, status applied + snapshot, no-change bump, unparseable, query shape).
  • tsc clean on the new files.

Notes

  • Declarative hourly cron; runs in prod/staging only from the latest deployment, and locally only with the dev CLI running.

🤖 Generated with Claude Code


Summary by cubic

Adds an hourly reconciliation task that polls Identity to unstick background checks when webhooks are missed. Fixes CS-473 so completed checks no longer stay in Pending; the stale cutoff uses the actual run time to avoid late-cron gaps.

  • New Features
    • Added reconcile-background-checks-schedule via @trigger.dev/sdk (hourly cron, 30-minute maxDuration) to find stale in-flight checks (>1h since last sync; invited/in_progress/in_review) with an identityBackgroundCheckId (cutoff based on current run time).
    • Polls Identity and applies changes: updates status and sub-statuses (even if only sub-statuses changed) and stores the completed report snapshot; otherwise only bumps lastSyncedAt.
    • Defensive and safe: parses status and statuses independently; never nulls sub-statuses missing from GET; writes are guarded to non-terminal states to avoid resurrecting; logs unparseable responses and skips if BACKGROUND_CHECK_API_KEY is not set.

Written for commit 98dcc11. Summary will update on new commits.

Review in cubic

Background check status is normally driven by Identity webhooks; when one
is missed the check stays stuck (e.g. Identity:Pending forever). Add an
hourly Trigger.dev scheduled task that polls Identity for stale in-flight
checks and applies any status it reports (same fields the webhook writes).
Parses the Identity GET response defensively and no-ops when status can't
be determined.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Marfuen
Copy link
Copy Markdown
Contributor Author

Marfuen commented Jun 2, 2026

@cubic-dev-ai ultrareview this

@linear
Copy link
Copy Markdown

linear Bot commented Jun 2, 2026

CS-473

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 2, 2026

@cubic-dev-ai ultrareview this

@Marfuen Ultrareview monthly budget exhausted (12/12 used). Budget resets at the start of next month.

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
app Ready Ready Preview, Comment Jun 2, 2026 8:07pm
comp-framework-editor Ready Ready Preview, Comment Jun 2, 2026 8:07pm
portal Ready Ready Preview, Comment Jun 2, 2026 8:07pm

Request Review

@Marfuen
Copy link
Copy Markdown
Contributor Author

Marfuen commented Jun 2, 2026

@cubic-dev-ai review this

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 2, 2026

@cubic-dev-ai review this

@Marfuen I have started the AI code review. It will take a few minutes to complete.

@Marfuen Marfuen marked this pull request as ready for review June 2, 2026 19:29
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

2 issues found across 2 files

Confidence score: 3/5

  • There is concrete regression risk in apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts: reconciliation updates keyed only by id are not concurrency-safe and can overwrite checks that became terminal/cancelled after selection.
  • parseIdentityCheckState in apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts can discard a valid top-level status when statuses is malformed, which may cause recoverable checks to be skipped instead of reconciled.
  • Given the medium-to-high severity and solid confidence on both findings, this looks mergeable only with caution rather than a low-risk change.
  • Pay close attention to apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts - race-safe update conditions and status parsing behavior could directly affect reconciliation correctness.

Linked issue analysis

Linked issue: CS-473: [BUG] Background check not updating

Status Acceptance criteria Notes
Query stale in-flight checks (invited/in_progress/in_review) with an identityBackgroundCheckId and lastSyncedAt older than 1h The task queries db.backgroundCheckRequest.findMany with the non-terminal statuses, identityBackgroundCheckId not null, and lastSyncedAt < payload - 1h.
Poll Identity (GET /v1/background-checks/:id) and parse returned lifecycle status and sub-statuses defensively The code calls identityClient.getBackgroundCheck and uses parseIdentityCheckState to extract status/statuses defensively; tests cover valid/missing/invalid inputs.
When Identity reports a different status, apply the same fields the webhook handler writes (status, sub-statuses, and completed report snapshot) On status change the task fetches a report snapshot and updates status, identityStatus, employmentStatus, referenceStatus, etc., and reportSnapshot/reportSyncedAt when present; tests assert the update call.
When the Identity status has not changed, back off by bumping lastSyncedAt (no status change applied) If nextStatus equals current status the code only updates lastSyncedAt; a test verifies that update shape and that updated count remains 0.
If Identity's response has no parseable status, leave the record untouched and count as unparseable (safe no-op) parseIdentityCheckState returns empty for unparseable responses; runReconciliation increments unparseable and skips updates; tests cover this behavior.
Schedule the task to run hourly (cron) in prod/staging The file declares a schedules.task with cron '0 * * * *' and an hourly comment.

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts Outdated
Comment thread apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

1 issue found across 2 files

Confidence score: 3/5

  • There is a concrete medium-severity risk in apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts: if top-level status does not change, sub-status updates are skipped, which can leave pending sub-statuses stale.
  • Given the 6/10 severity and 7/10 confidence, this is likely user-impacting behavior rather than a cosmetic issue, so merge risk is moderate until addressed.
  • Pay close attention to apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts - ensure sub-status reconciliation runs even when the parent status is unchanged.

Linked issue analysis

Linked issue: CS-473: [BUG] Background check not updating

Status Acceptance criteria Notes
Query stale in-flight checks (invited/in_progress/in_review) with an identityBackgroundCheckId and lastSyncedAt older than 1h The task queries db.backgroundCheckRequest.findMany with the non-terminal statuses, identityBackgroundCheckId not null, and lastSyncedAt < payload - 1h.
Poll Identity (GET /v1/background-checks/:id) and parse returned lifecycle status and sub-statuses defensively The code calls identityClient.getBackgroundCheck and uses parseIdentityCheckState to extract status/statuses defensively; tests cover valid/missing/invalid inputs.
When Identity reports a different status, apply the same fields the webhook handler writes (status, sub-statuses, and completed report snapshot) On status change the task fetches a report snapshot and updates status, identityStatus, employmentStatus, referenceStatus, etc., and reportSnapshot/reportSyncedAt when present; tests assert the update call.
When the Identity status has not changed, back off by bumping lastSyncedAt (no status change applied) If nextStatus equals current status the code only updates lastSyncedAt; a test verifies that update shape and that updated count remains 0.
If Identity's response has no parseable status, leave the record untouched and count as unparseable (safe no-op) parseIdentityCheckState returns empty for unparseable responses; runReconciliation increments unparseable and skips updates; tests cover this behavior.
Schedule the task to run hourly (cron) in prod/staging The file declares a schedules.task with cron '0 * * * *' and an hourly comment.

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts Outdated
- parseIdentityCheckState parses status/statuses independently so a
  malformed statuses object no longer drops a valid status (P2)
- refresh sub-statuses even when the top-level status is unchanged, and
  never null out sub-statuses the GET omitted (P2 — the CS-473 symptom)
- write via updateMany guarded on status IN non-terminal, so a check that
  became terminal/cancelled between select and write isn't resurrected (P1)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel vercel Bot temporarily deployed to Preview – portal June 2, 2026 19:37 Inactive
@vercel vercel Bot temporarily deployed to Preview – app June 2, 2026 19:37 Inactive
@Marfuen
Copy link
Copy Markdown
Contributor Author

Marfuen commented Jun 2, 2026

@cubic-dev-ai review this

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 2, 2026

@cubic-dev-ai review this

@Marfuen I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Confidence score: 3/5

  • There is a concrete medium-severity regression risk in apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts: maxDuration is set in milliseconds while Trigger.dev expects seconds.
  • This likely changes runtime behavior from a 30-minute timeout to roughly 20 days, which can cause jobs to run far longer than intended and delay failure handling.
  • Given the high confidence (9/10) and direct behavioral impact, this is some merge risk rather than a merge-blocking failure, but it should be corrected soon. Pay close attention to apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts - timeout unit mismatch (ms vs seconds).

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts Outdated
Trigger.dev maxDuration is specified in seconds; 1000*60*30 was ~20 days
instead of 30 minutes. Use 30*60.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel vercel Bot temporarily deployed to Preview – portal June 2, 2026 19:44 Inactive
@vercel vercel Bot temporarily deployed to Preview – app June 2, 2026 19:44 Inactive
@Marfuen
Copy link
Copy Markdown
Contributor Author

Marfuen commented Jun 2, 2026

@cubic-dev-ai review this

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 2, 2026

@cubic-dev-ai review this

@Marfuen I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

1 issue found across 2 files

Confidence score: 4/5

  • This PR is likely safe to merge, with one moderate logic issue rather than a broad stability concern.
  • In apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts, the stale-check cutoff is based on scheduled time instead of actual run time, which can delay reconciliation when cron starts late and may cause user-visible lag in background check updates.
  • Given the issue is medium severity (5/10) but high confidence (8/10), the merge risk looks limited but worth a follow-up fix soon.
  • Pay close attention to apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts - stale cutoff timing may drift when jobs run behind schedule.

Linked issue analysis

Linked issue: CS-473: [BUG] Background check not updating

Status Acceptance criteria Notes
Add an hourly Trigger.dev scheduled task reconcile-background-checks-schedule that finds stale, in-flight checks (>1h) with an identityBackgroundCheckId and status in invited/in_progress/in_review. The schedule is registered (schedules.task) and the code queries db.backgroundCheckRequest.findMany with NON_TERMINAL_STATUSES and a stale-before timestamp.
Poll Identity (GET /v1/background-checks/:id) for each stale check and parse a top-level status and granular sub-statuses defensively (count unparseable responses and leave record untouched). The code uses BackgroundCheckIdentityClient.getBackgroundCheck and parseIdentityCheckState; unparseable responses increment unparseable and skip updates.
Apply reported changes: update top-level status and sub-status fields using the same fields the webhook handler writes; for completed checks also fetch and store the report snapshot. The task builds an update payload that sets status and specific sub-status fields when they differ, calls fetchCompletedReportSnapshot and writes reportSnapshot/reportSyncedAt when present, then updates via db.updateMany.
Refresh changed sub-statuses even when the top-level status is unchanged (i.e., handle cases where only sub-statuses advanced). The code updates individual sub-status fields when they differ and tests assert updating identityStatus without changing top-level status.
When nothing changed, back off by only bumping lastSyncedAt (no other writes). When no fields differ, data.lastSyncedAt is set and updateMany is called with only lastSyncedAt; tests assert this behavior.
Skip reconciliation entirely when BACKGROUND_CHECK_API_KEY is not configured. runReconciliation early-returns with a logged warning if BACKGROUND_CHECK_API_KEY is missing; test asserts no DB queries in that case.
Concurrency-safe update: ensure rows are only updated if still non-terminal to avoid resurrecting terminal states. updateMany uses a WHERE clause asserting status is in NON_TERMINAL_STATUSES, preventing updates to rows that became terminal after selection.
Include tests covering parseIdentityCheckState and runReconciliation behaviors (valid/malformed responses, status changes, no-change backoff, unparseable cases). A comprehensive jest test file was added verifying parsing, status updates, sub-status changes, no-change behavior, unparseable handling, query shape, and API-key skip.

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread apps/api/src/trigger/background-checks/reconcile-background-checks-schedule.ts Outdated
Use Date.now() instead of the scheduled payload.timestamp so a late cron
start doesn't narrow the reconciliation window and delay recovery.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Marfuen Marfuen merged commit 3d6e609 into main Jun 2, 2026
4 of 7 checks passed
@Marfuen Marfuen deleted the cs-473 branch June 2, 2026 20:02
claudfuen pushed a commit that referenced this pull request Jun 2, 2026
# [3.67.0](v3.66.2...v3.67.0) (2026-06-02)

### Bug Fixes

* **api:** guarantee non-null SoA justification on YES defaults ([7f564df](7f564df))
* **api:** include a default justification on SoA ([732f262](732f262))
* **app:** able to edit the justification ([2939178](2939178))
* **app:** fix empty justification issue on SoA ([43fa889](43fa889))
* **app:** keep SoA justification dialog open when save fails ([a5621cb](a5621cb))
* **app:** return a generic default when no family match on SoA ([6682be1](6682be1))
* **app:** show default justification at all times on SoA ([13f468a](13f468a))
* **background-checks:** move admin actions into the status card footer ([#2998](#2998)) ([dcd4b4d](dcd4b4d))
* **people:** stop tracking background checks for auditor-only members (CS-416) ([#2995](#2995)) ([4e7d57d](4e7d57d))

### Features

* **admin:** add Finding Templates management to admin panel ([c381397](c381397))
* **background-checks:** admin cancel/delete/retry (CS-475) ([#2993](#2993)) ([51c3b3d](51c3b3d))
* **background-checks:** hourly reconciliation for stuck checks (CS-473) ([#2996](#2996)) ([3d6e609](3d6e609))
@claudfuen
Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.67.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants