fix: surface exception text in APIDbtRunner for transient error detection by devin-ai-integration[bot] · Pull Request #2129 · elementary-data/elementary

devin-ai-integration · 2026-02-28T18:00:36Z

Summary

Fixes a bug where the transient error retry logic (from #2125) never fires for APIDbtRunner (the default runner for dbt >= 1.5.0).

Root cause: APIDbtRunner._inner_run_command only captures JinjaLogInfo and RunningOperationCaughtError events into the output field. When a command fails with a transient error (e.g. Dremio's RemoteDisconnected), the error text lives in res.exception — not in the captured output. The retry logic in _inner_run_command_with_retries checks is_transient_error(adapter_type, output=result.output, stderr=result.stderr), but stderr was always None for APIDbtRunner, so pattern matching never found the transient error string.

Fix: Extract str(res.exception) and pass it as the stderr field of APIDbtCommandResult. The dbt Python API doesn't use stderr, so this repurposes the field analogously to how SubprocessDbtRunner captures subprocess stderr.

Discovered while investigating this Dremio CI failure.

Review & Testing Checklist for Human

Verify stderr field consumers: Search for any code that reads result.stderr from an APIDbtCommandResult and confirm that receiving exception text (instead of None) doesn't cause unintended side effects. The stderr field is now non-None on failure for the API runner path.
Consider a dedicated field: The stderr field is being semantically overloaded — it means "subprocess stderr" for SubprocessDbtRunner and "exception string" for APIDbtRunner. Would a separate exception_text field on DbtCommandResult be cleaner? Current approach works but may confuse future readers.
Verify raise_on_failure=True path: The change from str(res.exception) if res.exception else output → exception_text or output is semantically equivalent, but the new tests only cover raise_on_failure=False. Confirm the raise path still works as expected (the DbtCommandError.err_msg should be unchanged).

Test Plan

Run the new unit tests: pytest tests/unit/clients/dbt_runner/test_retry_logic.py::TestAPIDbtRunnerTransientDetection -v
Verify CI passes (especially Dremio, which was the original failure case)
Optionally: manually trigger a transient error with APIDbtRunner (e.g. kill a Dremio connection mid-query) and confirm retry fires

Notes

Link to Devin run: https://app.devin.ai/sessions/e01c77a2322a476faf5f6162e9390351
Requested by: @haritamar
The raise_on_failure=True path was already partially working before this fix (exception text was in DbtCommandError.err_msg), but the raise_on_failure=False path was completely broken for transient detection

Summary by CodeRabbit

Bug Fixes
- API runner now surfaces clearer error text for error messages and stderr, improving detection of transient failures and yielding more informative failures.
Tests
- Added API-level tests validating runner retry behavior: transient errors trigger retries, non-transient errors do not, and retry exhaustion is asserted to ensure correct retry limits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: surface exception text in APIDbtRunner for transient error detection#2129

fix: surface exception text in APIDbtRunner for transient error detection#2129
haritamar merged 5 commits intomasterfrom
devin/1772301185-fix-api-runner-transient-detection

devin-ai-integration bot commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devin-ai-integration bot commented Feb 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review & Testing Checklist for Human

Test Plan

Notes

Summary by CodeRabbit

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration bot commented Feb 28, 2026 •

edited by coderabbitai bot

Loading