Skip to content

fix: surface exception text in APIDbtRunner for transient error detection#2129

Merged
haritamar merged 5 commits intomasterfrom
devin/1772301185-fix-api-runner-transient-detection
Feb 28, 2026
Merged

fix: surface exception text in APIDbtRunner for transient error detection#2129
haritamar merged 5 commits intomasterfrom
devin/1772301185-fix-api-runner-transient-detection

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Feb 28, 2026

Summary

Fixes a bug where the transient error retry logic (from #2125) never fires for APIDbtRunner (the default runner for dbt >= 1.5.0).

Root cause: APIDbtRunner._inner_run_command only captures JinjaLogInfo and RunningOperationCaughtError events into the output field. When a command fails with a transient error (e.g. Dremio's RemoteDisconnected), the error text lives in res.exception — not in the captured output. The retry logic in _inner_run_command_with_retries checks is_transient_error(adapter_type, output=result.output, stderr=result.stderr), but stderr was always None for APIDbtRunner, so pattern matching never found the transient error string.

Fix: Extract str(res.exception) and pass it as the stderr field of APIDbtCommandResult. The dbt Python API doesn't use stderr, so this repurposes the field analogously to how SubprocessDbtRunner captures subprocess stderr.

Discovered while investigating this Dremio CI failure.

Review & Testing Checklist for Human

  • Verify stderr field consumers: Search for any code that reads result.stderr from an APIDbtCommandResult and confirm that receiving exception text (instead of None) doesn't cause unintended side effects. The stderr field is now non-None on failure for the API runner path.
  • Consider a dedicated field: The stderr field is being semantically overloaded — it means "subprocess stderr" for SubprocessDbtRunner and "exception string" for APIDbtRunner. Would a separate exception_text field on DbtCommandResult be cleaner? Current approach works but may confuse future readers.
  • Verify raise_on_failure=True path: The change from str(res.exception) if res.exception else outputexception_text or output is semantically equivalent, but the new tests only cover raise_on_failure=False. Confirm the raise path still works as expected (the DbtCommandError.err_msg should be unchanged).

Test Plan

  1. Run the new unit tests: pytest tests/unit/clients/dbt_runner/test_retry_logic.py::TestAPIDbtRunnerTransientDetection -v
  2. Verify CI passes (especially Dremio, which was the original failure case)
  3. Optionally: manually trigger a transient error with APIDbtRunner (e.g. kill a Dremio connection mid-query) and confirm retry fires

Notes

Summary by CodeRabbit

  • Bug Fixes

    • API runner now surfaces clearer error text for error messages and stderr, improving detection of transient failures and yielding more informative failures.
  • Tests

    • Added API-level tests validating runner retry behavior: transient errors trigger retries, non-transient errors do not, and retry exhaustion is asserted to ensure correct retry limits.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant