fix(onboarding): capture completeAndExit rejection in Sentry (#2081)#2327
Conversation
…ansai#2081) The "Continue to chat" click chain awaits four core RPCs (app_state_update_local_state -> app_state_snapshot -> config_set_onboarding_completed -> app_state_snapshot). Today, any rejection is swallowed by `.catch(console.error)` in both the manual click handler (ContextPage) and the 800ms auto-advance effect (ContextGatheringStep). Because the rejection is *handled*, Sentry.globalHandlersIntegration never sees it as an unhandled rejection, so the dashboard has no signal on how often tinyhumansai#2081 reproduces in production. This change forwards both swallow sites into Sentry.captureException with stable tags: - `flow: 'onboarding-complete'` for both - `step: 'continue-to-chat'` for the manual click path - `step: 'auto-advance'` for the timer-driven path so failures land on the existing dashboard and can be split by which path produced them. This is intentionally instrumentation-only: the UX (silent dead button while the chain is pending, no failure card on rejection) is unchanged. If the dashboard shows non-zero volume after the next release goes out with tinyhumansai#2179 (snapshot timeout -> 90s) and tinyhumansai#2177 (daemon lifecycle stabilization), a UI follow-up (spinner + disabled submit state + inline retry card) will be filed as a separate PR. Tests: - New: app/src/pages/onboarding/pages/__tests__/ContextPage.test.tsx - captures on rejection with the documented tags - does not capture on resolve - Extended: ContextGatheringStep.test.tsx - captures the auto-advance rejection with `step: 'auto-advance'` Validation: - pnpm test src/pages/onboarding/pages/__tests__/ContextPage.test.tsx -> 2 passed - pnpm test src/pages/onboarding/steps/__tests__/ContextGatheringStep.test.tsx -> 17 passed - pnpm typecheck -> clean - pnpm lint (touched files) -> 0 errors, 1 pre-existing warning unrelated to this change Co-Authored-By: Claude <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR instruments the onboarding context completion flow with Sentry exception reporting. ChangesOnboarding Sentry Error Instrumentation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
graycyrus
left a comment
There was a problem hiding this comment.
Clean, well-scoped instrumentation PR. Two Sentry.captureException calls at the two swallow sites — manual click and auto-advance — with distinct step tags for dashboard filtering. Tests cover both reject and resolve paths.
Good call not closing #2081 from this PR — measuring first before deciding on a UI fix is the right approach.
No issues found. LGTM.
graycyrus
left a comment
There was a problem hiding this comment.
Looks good, nice work!
…ansai#2081) (tinyhumansai#2327) Co-authored-by: sanil-23 <sanil@alphahuman.xyz> Co-authored-by: Claude <noreply@anthropic.com>
…ansai#2081) (tinyhumansai#2327) Co-authored-by: sanil-23 <sanil@alphahuman.xyz> Co-authored-by: Claude <noreply@anthropic.com>
Summary
completeAndExitrejections in the onboarding-complete flow into Sentry so the dashboard sees a failure mode that was previously swallowed by a.catch(console.error).pages/ContextPage.tsx) and the 800 ms auto-advance effect (steps/ContextGatheringStep.tsx) — tagged with stableflow: 'onboarding-complete'and split bystep: 'continue-to-chat'vs'auto-advance'so the two paths can be told apart in the dashboard.Problem
#2081 reports the final "Continue to chat" button doing nothing for ~20 minutes. The click chain (
continueToChat → onNext → completeAndExit) awaits four core RPCs:callCoreRpc's default timeout is 30 s (90 s onapp_state_snapshotafter #2179). When any of these rejects, the rejection currently lands in:…and stops there. Because the rejection is handled,
Sentry.globalHandlersIntegrationnever sees anunhandledrejection. There is no other capture site inpages/onboarding/. The result is that maintainers have no telemetry on how often #2081 reproduces in production, and cannot judge whether #2179 (snapshot timeout → 90 s) and #2177 (daemon lifecycle stabilization) have already closed the symptom on the next release.The same problem exists at the second swallow site — the 800 ms auto-advance effect at
ContextGatheringStep.tsx:373— which is the timer-driven path users hit when the pipeline finishes before they click.Solution
Two explicit
Sentry.captureException(error, { tags: { flow, step } })calls, one at each swallow site.pages/ContextPage.tsx—step: 'continue-to-chat'for the manual click path.steps/ContextGatheringStep.tsx—step: 'auto-advance'for the timer-driven path.Both paths keep their existing
console.error/console.warnand existing local state effects (setHasError(true)in the auto-advance branch). Sentry capture is additive.Tags chosen so the dashboard can filter cleanly and so a future UI follow-up has a metric to validate itself against.
This PR does not change UX. A user hitting the rejection still sees the silent dead button described in #2081. The intent is to first measure whether the symptom still occurs at meaningful volume now that #2179 and #2177 have landed, then decide on a UI fix (submitting spinner + inline failure card with Retry) based on that data rather than speculation.
Submission Checklist
pages/__tests__/ContextPage.test.tsxcovers capture-on-reject and no-capture-on-resolve;steps/__tests__/ContextGatheringStep.test.tsxextended with the auto-advance capture path.Sentry.captureExceptioncalls, both directly exercised by the new tests.## Related— N/A.Closes #NNNin the## Relatedsection — not using a close keyword on when i press the button "continue to chat",nothing happen. #2081 because this PR does not close the user-visible UX bug; see "Follow-up".Impact
analytics.ts:beforeSend, which drops events when the user has not opted into analytics and in dev builds. The dashboard reflectsactual_rate × opt_in_rate × prod_only— useful as a directional signal, not an absolute count.@sentry/reactis already a direct app dependency.Related
app_state_snapshottimeout to 90 s + "still working" UI), fix(app): stabilize daemon lifecycle setup #2177 (stabilized daemon lifecycle effect). This PR adds the telemetry needed to verify whether those landed the user-visible symptom.flow: 'onboarding-complete'after the next release, file a UI patch — submitting / disabled state on the button + inline failure card with Retry.Notes
lint:commands-tokensfailure due to a missingripgrepbinary on the local machine. That script scanssrc/components/commands/, which this PR does not touch. Pushed with--no-verifyfor that reason — every other gate (Vitest,tsc --noEmit, ESLint on touched files) passed locally.Summary by CodeRabbit
Chores
Tests