Skip to content

isolating and improving timeout tests#11839

Merged
brettsam merged 1 commit into
devfrom
brettsam/fix-timeout-token-flake
Jun 19, 2026
Merged

isolating and improving timeout tests#11839
brettsam merged 1 commit into
devfrom
brettsam/fix-timeout-token-flake

Conversation

@brettsam

@brettsam brettsam commented Jun 18, 2026

Copy link
Copy Markdown
Member

The TimeoutTest_UsingToken_CSharp would sometimes fail -- and it seemed to be whenever the CI machine was reporting health issues (low-memory). Analysis suggests that the tight while loop this test used may have caused it to fail in such conditions by consuming a thread while waiting for the test scenario to run.

Changing the behavior in the test function to be more cpu-friendly by reacting directly to the token rather than spinning. Also swapping the other test scenario to use a Task rather than blocking a thread.

Not 100% guaranteed that this will fix the issue, but it seems like the right direction.

It's entirely possible that the low-memory conditions are caused by other tests leaking somewhere, but that's analysis I'll be doing elsewhere.

Edited:
It turns out that the 3 timeout tests were leaking massive memory. During some runs they'd use upwards of 2.4gb. Dump analysis showed tens of thousands of CancellationToken registrations that appeared to come from configuration somewhere.

The one big oddity with these tests was that they wired up both the production configuration and the test configurations. It's possible that something in there was causing some cyclical change token registration causing chaos.

In reality, these tests are testing in-proc cancellation and likely don't even impact anything we do today which is strictly out-of-proc. So rather than chase it down to the exact culprit, I:

  • removed the production registration from the tests (memory usage plummetted)
  • kept the optimization from before with the test method itself to avoid spinning
  • separated the timeout tests into their own process

I've run this several times, both with and without memory diagnostics running and it seems stable now.

Pull request checklist

IMPORTANT: Currently, changes must be backported to the in-proc branch to be included in Core Tools and non-Flex deployments.

  • Backporting to the in-proc branch is not required
    • Otherwise: Link to backporting PR
  • My changes do not require documentation changes
    • Otherwise: Documentation issue linked to PR
  • My changes should not be added to the release notes for the next release
    • Otherwise: I've added my notes to release_notes.md
  • My changes do not need to be backported to a previous version
    • Otherwise: Backport tracked by issue/PR #issue_or_pr
  • My changes do not require diagnostic events changes
    • Otherwise: I have added/updated all related diagnostic events and their documentation (Documentation issue linked to PR)
  • I have added all required tests (Unit tests, E2E tests)

@brettsam brettsam requested a review from a team as a code owner June 18, 2026 17:12
Copilot AI review requested due to automatic review settings June 18, 2026 17:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the C# TimeoutToken test script to avoid high CPU usage and thread blocking during timeout scenarios, improving CI stability under resource-constrained conditions.

Changes:

  • Replaced the tight spin-wait loop with an infinite Task.Delay(..., token) that completes immediately on cancellation.
  • Replaced Thread.Sleep with await Task.Delay in the non-token scenario to avoid blocking a thread.

@brettsam

Copy link
Copy Markdown
Member Author

ok that clearly didn't work -- adding diagnostics to see if we can catch it and determine where mem is leaking. i'll re-ping when it's sready for another review

@brettsam

Copy link
Copy Markdown
Member Author

/azp run host.integration-tests

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines failed to run 1 pipeline(s).

@brettsam

Copy link
Copy Markdown
Member Author

/azp run host.integration-tests

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines failed to run 1 pipeline(s).

@brettsam

Copy link
Copy Markdown
Member Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s), but failed to run 1 pipeline(s).

@brettsam

Copy link
Copy Markdown
Member Author

/azp run host.integration-tests

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@brettsam

Copy link
Copy Markdown
Member Author

/azp run host.integration-tests

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@brettsam brettsam force-pushed the brettsam/fix-timeout-token-flake branch from 510ed9b to 651dbc3 Compare June 19, 2026 15:28
@brettsam brettsam changed the title removing high-cpu issues in test app isolating and improving timeout tests Jun 19, 2026
@brettsam brettsam merged commit 285dd58 into dev Jun 19, 2026
11 checks passed
@brettsam brettsam deleted the brettsam/fix-timeout-token-flake branch June 19, 2026 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants