Skip to content

test(e2e): fix agent timeouts by delaying test execution until Phoenix tracing endpoints are fully bound#202

Merged
visahak merged 2 commits into
AgentToolkit:mainfrom
visahak:fix/e2e-phoenix-startup
Apr 20, 2026
Merged

test(e2e): fix agent timeouts by delaying test execution until Phoenix tracing endpoints are fully bound#202
visahak merged 2 commits into
AgentToolkit:mainfrom
visahak:fix/e2e-phoenix-startup

Conversation

@visahak
Copy link
Copy Markdown
Collaborator

@visahak visahak commented Apr 19, 2026

Context & Issue

The e2e test pipeline was consistently timing out (and failing at the 90-second limit) specifically for the smolagents test, leading to cascading delays for developers executing local tests.

Root Cause

A race condition occurred during the startup of the phoenix_server fixture. The tests aggressively proceeded the moment the Phoenix /status web endpoint reported a 200 OK. However, the underlying OpenTelemetry tracing receivers (/v1/traces) weren't fully bound yet. Because smolagents is the first agent to run, its rapid script completion (in around 1.8 seconds) resulted in immediate span exports that bounced with a Connection refused error.

To handle this, OpenTelemetry enters an aggressive asynchronous retry loop that blocked the thread from shutting down down natively, ultimately causing the script to hang until the overarching 90-second test suite boundary aggressively killed it. (Subsequent tests trivially succeeded because Phoenix had 90 extra seconds to start parsing endpoints while smolagents was frozen).

Solution

Added a simple time.sleep(5) immediately after detecting 200 OK on /status inside tests/e2e/test_e2e_pipeline.py. This provides adequate breathing room for Phoenix to establish and expose its full set of gRPC/HTTP span endpoints prior to smolagents beginning its trace injection. Validated fully that all 12 pipeline tests now complete efficiently.

Summary by CodeRabbit

  • Tests
    • Improved end-to-end test reliability with enhanced server initialization timing to ensure consistent test execution during local development.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 19, 2026

Warning

Rate limit exceeded

@visahak has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 48 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 19 minutes and 48 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dccbca06-0dde-4b5c-bf35-7eab14d15dad

📥 Commits

Reviewing files that changed from the base of the PR and between 7f910c1 and e018a4e.

📒 Files selected for processing (1)
  • tests/e2e/test_e2e_pipeline.py
📝 Walkthrough

Walkthrough

A 5-second delay was added to the e2e test pipeline immediately after the Phoenix /status endpoint becomes reachable during local server startup, providing additional initialization time before subsequent operations begin.

Changes

Cohort / File(s) Summary
E2E Test Timing
tests/e2e/test_e2e_pipeline.py
Added fixed 5-second delay after Phoenix /status endpoint becomes reachable to ensure server is fully initialized before proceeding with test execution.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Suggested reviewers

  • gaodan-fang
  • jayaramkr
  • vinodmut

Poem

🐰 A heartbeat, then five more seconds fly,
Phoenix spreads its wings up to the sky,
Patience whispers soft through the delay,
Ready now to test the hopeful way! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding a delay to fix e2e test timeouts caused by Phoenix endpoints not being fully bound. It directly addresses the primary issue resolved in the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/test_e2e_pipeline.py (1)

28-56: ⚠️ Potential issue | 🟠 Major

Probe OTLP readiness in both Phoenix paths to ensure trace receivers are bound before tests run.

The pre-existing server path (line 30) yields immediately after /status succeeds, while the locally started path (line 56) sleeps 5 seconds. Both paths have the same race: /status responds before the OTLP trace receiver endpoint is fully bound. Additionally, 5 seconds is arbitrary and can be insufficient on slow CI runners or wasteful on fast ones.

Poll the actual OTLP HTTP endpoint (/v1/traces) with a timeout-bounded retry loop in both paths. The documentation confirms the endpoint is http://localhost:6006/v1/traces.

Example fix
+def _wait_for_phoenix_otlp_ready(base_url: str, timeout: float = 15.0) -> None:
+    deadline = time.monotonic() + timeout
+    last_error = None
+
+    while time.monotonic() < deadline:
+        try:
+            request = urllib.request.Request(
+                f"{base_url}/v1/traces",
+                data=b'{"resourceSpans":[]}',
+                headers={"Content-Type": "application/json"},
+                method="POST",
+            )
+            with urllib.request.urlopen(request, timeout=2):
+                return
+        except urllib.error.HTTPError as exc:
+            if exc.code in {400, 405, 415}:
+                return
+            last_error = exc
+        except (urllib.error.URLError, ConnectionError) as exc:
+            last_error = exc
+
+        time.sleep(0.25)
+
+    pytest.fail(f"Phoenix OTLP receiver was not ready within {timeout}s: {last_error}")
+
+
 `@pytest.fixture`(scope="session", autouse=True)
 def phoenix_server():
@@
         urllib.request.urlopen("http://localhost:6006/status", timeout=2)
         print("\nPhoenix is already running on port 6006.")
+        _wait_for_phoenix_otlp_ready("http://localhost:6006")
         yield "http://localhost:6006"
         return
@@
             urllib.request.urlopen("http://localhost:6006/status", timeout=2)
             print("Phoenix server is up and running.")
-            time.sleep(5)  # Wait for OTLP receivers to fully spin up
+            _wait_for_phoenix_otlp_ready("http://localhost:6006")
             break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/test_e2e_pipeline.py` around lines 28 - 56, The /status check can
return before the OTLP trace receiver is bound; update both code paths that call
urllib.request.urlopen("http://localhost:6006/status", ...) (the
pre-existing-server path that yields immediately and the locally-started path
that currently does time.sleep(5) after confirming /status) to instead poll the
OTLP HTTP endpoint "http://localhost:6006/v1/traces" with a bounded retry loop
and short per-request timeout, failing the test setup if the trace endpoint does
not become responsive within the retry window; keep the existing
max_retries/timeout semantics but replace the unconditional yield/sleep with the
polling logic so both the pre-existing server and the proc-started server wait
for OTLP readiness before proceeding.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/e2e/test_e2e_pipeline.py`:
- Around line 28-56: The /status check can return before the OTLP trace receiver
is bound; update both code paths that call
urllib.request.urlopen("http://localhost:6006/status", ...) (the
pre-existing-server path that yields immediately and the locally-started path
that currently does time.sleep(5) after confirming /status) to instead poll the
OTLP HTTP endpoint "http://localhost:6006/v1/traces" with a bounded retry loop
and short per-request timeout, failing the test setup if the trace endpoint does
not become responsive within the retry window; keep the existing
max_retries/timeout semantics but replace the unconditional yield/sleep with the
polling logic so both the pre-existing server and the proc-started server wait
for OTLP readiness before proceeding.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4e65c71e-d2b9-473b-a07c-51a14ea0d1e7

📥 Commits

Reviewing files that changed from the base of the PR and between 034de13 and 7f910c1.

📒 Files selected for processing (1)
  • tests/e2e/test_e2e_pipeline.py

@visahak visahak force-pushed the fix/e2e-phoenix-startup branch from 7f910c1 to fcd8127 Compare April 19, 2026 23:39
@gaodan-fang
Copy link
Copy Markdown
Collaborator

is the e2e pipeline run in the CI?

@visahak visahak merged commit 928da40 into AgentToolkit:main Apr 20, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants