test(e2e): fix agent timeouts by delaying test execution until Phoenix tracing endpoints are fully bound by visahak · Pull Request #202 · AgentToolkit/altk-evolve

visahak · 2026-04-19T23:32:13Z

Context & Issue

The e2e test pipeline was consistently timing out (and failing at the 90-second limit) specifically for the smolagents test, leading to cascading delays for developers executing local tests.

Root Cause

A race condition occurred during the startup of the phoenix_server fixture. The tests aggressively proceeded the moment the Phoenix /status web endpoint reported a 200 OK. However, the underlying OpenTelemetry tracing receivers (/v1/traces) weren't fully bound yet. Because smolagents is the first agent to run, its rapid script completion (in around 1.8 seconds) resulted in immediate span exports that bounced with a Connection refused error.

To handle this, OpenTelemetry enters an aggressive asynchronous retry loop that blocked the thread from shutting down down natively, ultimately causing the script to hang until the overarching 90-second test suite boundary aggressively killed it. (Subsequent tests trivially succeeded because Phoenix had 90 extra seconds to start parsing endpoints while smolagents was frozen).

Solution

Added a simple time.sleep(5) immediately after detecting 200 OK on /status inside tests/e2e/test_e2e_pipeline.py. This provides adequate breathing room for Phoenix to establish and expose its full set of gRPC/HTTP span endpoints prior to smolagents beginning its trace injection. Validated fully that all 12 pipeline tests now complete efficiently.

Summary by CodeRabbit

Tests
- Improved end-to-end test reliability with enhanced server initialization timing to ensure consistent test execution during local development.

coderabbitai · 2026-04-19T23:32:27Z

Warning

Rate limit exceeded

@visahak has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 48 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 19 minutes and 48 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dccbca06-0dde-4b5c-bf35-7eab14d15dad

📥 Commits

Reviewing files that changed from the base of the PR and between 7f910c1 and e018a4e.

📒 Files selected for processing (1)

tests/e2e/test_e2e_pipeline.py

📝 Walkthrough

Walkthrough

A 5-second delay was added to the e2e test pipeline immediately after the Phoenix /status endpoint becomes reachable during local server startup, providing additional initialization time before subsequent operations begin.

Changes

Cohort / File(s)	Summary
E2E Test Timing `tests/e2e/test_e2e_pipeline.py`	Added fixed 5-second delay after Phoenix `/status` endpoint becomes reachable to ensure server is fully initialized before proceeding with test execution.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

test: restructure test suite into four distinct execution tiers #128: Introduced the phoenix_server fixture and startup/polling behavior that this PR extends with additional initialization delay.

Suggested reviewers

gaodan-fang
jayaramkr
vinodmut

Poem

🐰 A heartbeat, then five more seconds fly,
Phoenix spreads its wings up to the sky,
Patience whispers soft through the delay,
Ready now to test the hopeful way! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding a delay to fix e2e test timeouts caused by Phoenix endpoints not being fully bound. It directly addresses the primary issue resolved in the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/e2e/test_e2e_pipeline.py (1)

28-56: ⚠️ Potential issue | 🟠 Major

Probe OTLP readiness in both Phoenix paths to ensure trace receivers are bound before tests run.

The pre-existing server path (line 30) yields immediately after /status succeeds, while the locally started path (line 56) sleeps 5 seconds. Both paths have the same race: /status responds before the OTLP trace receiver endpoint is fully bound. Additionally, 5 seconds is arbitrary and can be insufficient on slow CI runners or wasteful on fast ones.

Poll the actual OTLP HTTP endpoint (/v1/traces) with a timeout-bounded retry loop in both paths. The documentation confirms the endpoint is http://localhost:6006/v1/traces.

Example fix

+def _wait_for_phoenix_otlp_ready(base_url: str, timeout: float = 15.0) -> None:
+    deadline = time.monotonic() + timeout
+    last_error = None
+
+    while time.monotonic() < deadline:
+        try:
+            request = urllib.request.Request(
+                f"{base_url}/v1/traces",
+                data=b'{"resourceSpans":[]}',
+                headers={"Content-Type": "application/json"},
+                method="POST",
+            )
+            with urllib.request.urlopen(request, timeout=2):
+                return
+        except urllib.error.HTTPError as exc:
+            if exc.code in {400, 405, 415}:
+                return
+            last_error = exc
+        except (urllib.error.URLError, ConnectionError) as exc:
+            last_error = exc
+
+        time.sleep(0.25)
+
+    pytest.fail(f"Phoenix OTLP receiver was not ready within {timeout}s: {last_error}")
+
+
 `@pytest.fixture`(scope="session", autouse=True)
 def phoenix_server():
@@
         urllib.request.urlopen("http://localhost:6006/status", timeout=2)
         print("\nPhoenix is already running on port 6006.")
+        _wait_for_phoenix_otlp_ready("http://localhost:6006")
         yield "http://localhost:6006"
         return
@@
             urllib.request.urlopen("http://localhost:6006/status", timeout=2)
             print("Phoenix server is up and running.")
-            time.sleep(5)  # Wait for OTLP receivers to fully spin up
+            _wait_for_phoenix_otlp_ready("http://localhost:6006")
             break

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/test_e2e_pipeline.py` around lines 28 - 56, The /status check can
return before the OTLP trace receiver is bound; update both code paths that call
urllib.request.urlopen("http://localhost:6006/status", ...) (the
pre-existing-server path that yields immediately and the locally-started path
that currently does time.sleep(5) after confirming /status) to instead poll the
OTLP HTTP endpoint "http://localhost:6006/v1/traces" with a bounded retry loop
and short per-request timeout, failing the test setup if the trace endpoint does
not become responsive within the retry window; keep the existing
max_retries/timeout semantics but replace the unconditional yield/sleep with the
polling logic so both the pre-existing server and the proc-started server wait
for OTLP readiness before proceeding.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/e2e/test_e2e_pipeline.py`:
- Around line 28-56: The /status check can return before the OTLP trace receiver
is bound; update both code paths that call
urllib.request.urlopen("http://localhost:6006/status", ...) (the
pre-existing-server path that yields immediately and the locally-started path
that currently does time.sleep(5) after confirming /status) to instead poll the
OTLP HTTP endpoint "http://localhost:6006/v1/traces" with a bounded retry loop
and short per-request timeout, failing the test setup if the trace endpoint does
not become responsive within the retry window; keep the existing
max_retries/timeout semantics but replace the unconditional yield/sleep with the
polling logic so both the pre-existing server and the proc-started server wait
for OTLP readiness before proceeding.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4e65c71e-d2b9-473b-a07c-51a14ea0d1e7

📥 Commits

Reviewing files that changed from the base of the PR and between 034de13 and 7f910c1.

📒 Files selected for processing (1)

tests/e2e/test_e2e_pipeline.py

gaodan-fang · 2026-04-20T18:56:31Z

is the e2e pipeline run in the CI?

coderabbitai Bot reviewed Apr 19, 2026

View reviewed changes

test(e2e): add startup delay to phoenix tracing fixture to fix timeouts

fcd8127

visahak force-pushed the fix/e2e-phoenix-startup branch from 7f910c1 to fcd8127 Compare April 19, 2026 23:39

visahak requested review from gaodan-fang, illeatmyhat and vinodmut April 20, 2026 18:26

Merge branch 'main' into fix/e2e-phoenix-startup

e018a4e

vinodmut approved these changes Apr 20, 2026

View reviewed changes

visahak merged commit 928da40 into AgentToolkit:main Apr 20, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): fix agent timeouts by delaying test execution until Phoenix tracing endpoints are fully bound#202

test(e2e): fix agent timeouts by delaying test execution until Phoenix tracing endpoints are fully bound#202
visahak merged 2 commits into
AgentToolkit:mainfrom
visahak:fix/e2e-phoenix-startup

visahak commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

gaodan-fang commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

visahak commented Apr 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context & Issue

Root Cause

Solution

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gaodan-fang commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

visahak commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading