Make scheduled outerloop builds succeed when only Helix tests fail by mmitche · Pull Request #129049 · dotnet/runtime

mmitche · 2026-06-05T17:20:42Z

Note

This pull request was authored with the assistance of GitHub Copilot.

Problem

Several scheduled outerloop pipelines (the outerloop.yml family: runtime-libraries-coreclr outerloop and its -windows/-linux/-osx variants) use an always: false scheduled trigger. With always: false, AzDO only starts a new scheduled run if the source changed since the last successful scheduled run.

Because the repo has many flaky outerloop tests, the Helix test work items virtually always have at least one failure, which fails the "Send to Helix" step and therefore the whole build. The build never reaches a succeeded state, so AzDO re-queues the same, unchanged commit day after day, submitting more and more Helix work for no benefit. (Empirically confirmed: a single commit was re-run and failed for 19 consecutive days; once a sibling definition produced a genuinely successful run, the same-SHA re-queue stopped.)

Why `continueOnError` is not enough

continueOnError: true only downgrades the build to partiallySucceeded, which AzDO's always: false scheduler still does not treat as successful — so the same commit keeps getting re-queued. The Helix step must end fully successful (exit 0).

Fix

Make the "Send to Helix" step actually succeed on scheduled runs by disabling the two Arcade Microsoft.DotNet.Helix.Sdk properties that fail the build (both default to true):

FailOnWorkItemFailure — CheckHelixJobStatus errors when a work item exits non-zero.
FailOnTestFailure — CheckAzurePipelinesTestResults errors when any published test failed.

Setting both to false lets the msbuild step exit 0, producing a fully succeeded build. Failed tests are still published and visible in the test results tab; AzDO does not auto-degrade a build to partiallySucceeded just because a published test run contains failures — only a failing task would.

Changes

eng/pipelines/libraries/helix.yml: Added a failOnTestFailures parameter (default true, preserving today's behavior) wired to /p:FailOnWorkItemFailure and /p:FailOnTestFailure on the Send to Helix msbuild invocation.
eng/pipelines/libraries/outerloop.yml: Passes failOnTestFailures: false only on scheduled runs (Build.Reason == 'Schedule') for all three matrix legs (Release, Debug, NET48).

Behavior preservation

The new parameter defaults to true, so all other helix.yml callers are unaffected (none set WaitForWorkItemCompletion or these properties on this path, so they already resolve to true). Only scheduled outerloop runs change behavior. PR / rolling / manual outerloop runs continue to fail on Helix failures exactly as before. Build/compile breaks still fail scheduled runs (this only affects the Helix step).

Tradeoff

On scheduled runs, FailOnWorkItemFailure=false also masks work-item crashes/timeouts/infra failures, not just test-assertion failures. This is an accepted tradeoff for the goal of stopping the wasteful daily re-queue of unchanged commits; results remain visible in the Helix/test reporting.

The libraries outerloop pipeline runs on a daily schedule with always:false, meaning AzDO only re-queues a commit if there were changes since the last successful scheduled run. Because flaky outerloop tests cause the 'Send to Helix' task to fail on essentially every scheduled run, the build never succeeds, so AzDO re-queues the same commit every day and submits ever more Helix work for an unchanged sha. Set shouldContinueOnError on the Send to Helix step for scheduled builds only (Build.Reason == 'Schedule'), so Helix work item failures no longer fail the build. Compile/build breaks still fail the build, and PR/CI/manual runs are unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dotnet-policy-service · 2026-06-05T17:22:07Z

Tagging subscribers to this area: @dotnet/area-infrastructure-libraries
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR updates the libraries outerloop Azure DevOps pipeline to avoid failing scheduled runs due to Helix work item/test failures, with the intent of preventing always: false schedules from repeatedly re-queuing the same commit and submitting duplicate Helix work.

Changes:

Pass shouldContinueOnError: ${{ eq(variables['Build.Reason'], 'Schedule') }} into the three platform-matrix.yml invocations in outerloop.yml.
Add inline YAML comments explaining the rationale (avoid same-SHA daily re-queues and wasted Helix capacity).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mmitche · 2026-06-05T17:38:29Z

Bleh, it's right. partiallySucceeded won't cause AzDO to avoid scheduling.

continueOnError only marks the build partiallySucceeded, which AzDO's always:false scheduler still treats as not-successful, so the same commit keeps getting re-queued daily. Instead, for scheduled builds, tell the Helix SDK not to fail the build on work item / test failures by passing FailOnWorkItemFailure=false and FailOnTestFailure=false. The Send to Helix step then fully succeeds, so a perpetually-flaky scheduled run no longer causes AzDO to re-queue the same sha. - helix.yml: add failOnTestFailures parameter (default true = current behavior) wired to the FailOnWorkItemFailure/FailOnTestFailure Helix SDK properties. - outerloop.yml: pass failOnTestFailures=false only for scheduled builds (Build.Reason == 'Schedule'); replaces the earlier shouldContinueOnError approach. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…will revert) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mmitche · 2026-06-05T20:56:13Z

If this looks reasonble we should backport to 9.0 and 10.0 for outerloop.

lewing · 2026-06-06T01:11:31Z

/azp list

azure-pipelines · 2026-06-06T01:11:37Z

CI/CD Pipelines for this repository: runtime-coreclr outerloop runtime-coreclr jitstress runtime-coreclr jitstressregs runtime-coreclr jitstress2-jitstressregs runtime-coreclr gcstress0x3-gcstress0xc runtime-coreclr gcstress-extra runtime-coreclr r2r-extra runtime-coreclr jitstress-isas-x86 runtime-coreclr jitstress-isas-arm runtime-coreclr jitstressregs-x86 runtime-coreclr libraries-jitstressregs runtime-coreclr libraries-jitstress2-jitstressregs runtime-coreclr r2r runtime-coreclr runincontext runtime-coreclr crossgen2 runtime-libraries-coreclr outerloop runtime-libraries-coreclr outerloop-windows runtime-libraries-coreclr outerloop-linux runtime-libraries-coreclr outerloop-osx runtime runtime-libraries enterprise-linux runtime-libraries stress-http runtime-libraries stress-ssl runtime-dev-innerloop runtime-coreclr crossgen2 outerloop coreclr-release-outerloop-nightly runtime-coreclr crossgen2-composite runtime-jit-experimental runtime-coreclr libraries-jitstress dotnet-linker-tests runtime-coreclr ilasm runtime-coreclr crossgen2-composite gcstress runtime-coreclr pgo runtime-coreclr libraries-pgo Antigen runtime-community Fuzzlyn runtime-coreclr superpmi-replay runtime-wasm runtime-coreclr superpmi-diffs runtime-coreclr superpmi-asmdiffs-checked-release runtime-extra-platforms jit-cfg runtime-wasm-perf runtime-llvm runtime-coreclr jitstress-random runtime-coreclr libraries-jitstress-random runtime-wasm-non-libtests runtime-android runtime-androidemulator runtime-ioslike runtime-ioslikesimulator runtime-linuxbionic runtime-maccatalyst runtime-coreclr pgostress runtime-coreclr jitstress-isas-avx512 runtime-libraries-mono outerloop runtime-sanitized runtime-wasm-dbgtests runtime-wasm-optional runtime-nativeaot-outerloop runtime-coreclr superpmi-collect-test runtime-diagnostics runtime-interpreter runtime-report-green runtime-coreclr hardware-intrinsics runtime-libraries-interpreter runtime-libraries stress-http-pr runtime-libraries stress-ssl-pr hardware-intrinsics-arm64

mmitche · 2026-06-17T15:41:31Z

@lewing any concerns here? See https://dev.azure.com/dnceng-public/public/_build/results?buildId=1451767&view=results for a test run (conditional changed to "manual" to verify the functionality)

lewing

I'm fine with with it @steveisok @jeffschwMSFT for visibility

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

mmitche · 2026-06-17T18:56:10Z

Will watch to see whether outerloop runs on 8.0 stop happening when there are no changes. If so, will backport to 9.0 and 10.0

Make scheduled outerloop Helix step succeed instead of continueOnError PR dotnet#129049 set FailOnWorkItemFailure/FailOnTestFailure to false on scheduled outerloop runs so the Send to Helix step succeeds (avoiding always:false re-queue of the same commit). That hid work item failures entirely. Add a WarnOnHelixTestFailure property that emits a build warning for each failed Helix work item, keeping them visible in the AzDO timeline without failing the build (the Helix step already disables warnaserror). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…9629) ## Problem PR #129049 made scheduled outerloop builds succeed when only Helix tests fail, by setting `FailOnWorkItemFailure`/`FailOnTestFailure` to `false` on scheduled runs (via the `failOnTestFailures: false` parameter). This stopped AzDO's `always: false` scheduler from re-queueing the same commit day after day. The side effect: failed Helix work items became **completely invisible** in the Azure DevOps timeline. The `Send to Helix` step is fully green, so there is no signal that work items failed (even though, for flaky outerloop, they almost always do). ## Fix Surface failed work items as **warnings** instead of silently dropping them. Warnings keep the failures visible in the timeline but do **not** degrade the build below `succeeded` (so the `always: false` re-queue fix from #129049 is preserved). - **`src/libraries/sendtohelixhelp.proj`**: new `WarnOnHelixWorkItemFailure` target (`AfterTargets=CheckHelixJobStatus`) that emits a `<Warning>` for each failed `@(CompletedWorkItem)` when `WarnOnHelixTestFailure=true`. This mirrors what the Arcade SDK's `CheckHelixJobStatus` would have *errored* on, but as a warning. - **`eng/pipelines/libraries/helix.yml`**: new `warnOnTestFailures` parameter (default `false`) wired to `/p:WarnOnHelixTestFailure`. - **`eng/pipelines/libraries/outerloop.yml`**: scheduled runs now set `warnOnTestFailures: true` alongside `failOnTestFailures: false` on all three legs. No warn-as-error change was needed: the `Send to Helix` step already runs with warnaserror disabled (`_warnAsErrorParamHelixOverride`), so these warnings are not promoted back into build-failing errors. ## Validation Ran the `runtime-libraries-coreclr outerloop` pipeline (dnceng-public def 125, [build 1472840](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1472840)) with a temporary Manual gate. Multiple CoreCLR_Release legs completed **succeeded** with failed work items surfaced as warnings and **zero errors**, e.g.: ``` src/libraries/sendtohelixhelp.proj(364,5): warning : Work item System.Runtime.Numerics.Tests in job 2e01f1b1-... has failed. Failure log: https://helix.dot.net/api/.../console ``` Legs whose work items all passed produced no such warning, as expected. > [!NOTE] > This pull request was authored with the assistance of GitHub Copilot. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 5, 2026 17:20

Copilot started reviewing on behalf of mmitche June 5, 2026 17:20 View session

github-actions Bot added the area-Infrastructure-libraries label Jun 5, 2026

dotnet-policy-service Bot assigned mmitche Jun 5, 2026

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Comment thread eng/pipelines/libraries/outerloop.yml Outdated

mmitche changed the title ~~Don't fail scheduled outerloop builds on Helix work item failures~~ Make scheduled outerloop builds succeed when only Helix tests fail Jun 5, 2026

TEMP: also apply failOnTestFailures on Manual runs (validation only, …

b36307c

…will revert) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 5, 2026 18:16

Copilot started reviewing on behalf of mmitche June 5, 2026 18:16 View session

Copilot AI reviewed Jun 5, 2026

Revert temporary Manual-gate validation change

522f1e2

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mmitche requested review from akoeplinger and lewing June 5, 2026 20:55

lewing requested a review from kotlarmilos June 17, 2026 16:19

lewing approved these changes Jun 17, 2026

View reviewed changes

lewing requested a review from Copilot June 17, 2026 16:39

Copilot started reviewing on behalf of lewing June 17, 2026 16:39 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

mmitche merged commit d5fbb45 into dotnet:release/8.0 Jun 17, 2026
91 of 97 checks passed

mmitche mentioned this pull request Jun 19, 2026

Surface scheduled outerloop Helix work item failures as warnings #129629

Merged

This was referenced Jun 26, 2026

[release/9.0] Surface scheduled outerloop Helix work item failures (backport of #129049, #129629) #129908

Open

[release/10.0] Surface scheduled outerloop Helix work item failures (backport of #129049, #129629) #129909

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make scheduled outerloop builds succeed when only Helix tests fail#129049

Make scheduled outerloop builds succeed when only Helix tests fail#129049
mmitche merged 4 commits into
dotnet:release/8.0from
mmitche:dev/scheduled-outerloop-helix-continueonerror

mmitche commented Jun 5, 2026 •

edited

Loading

Uh oh!

dotnet-policy-service Bot commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

mmitche commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

mmitche commented Jun 5, 2026

Uh oh!

lewing commented Jun 6, 2026

Uh oh!

azure-pipelines Bot commented Jun 6, 2026

Uh oh!

mmitche commented Jun 17, 2026

Uh oh!

lewing left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

mmitche commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mmitche commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Why continueOnError is not enough

Fix

Changes

Behavior preservation

Tradeoff

Uh oh!

dotnet-policy-service Bot commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

mmitche commented Jun 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

mmitche commented Jun 5, 2026

Uh oh!

lewing commented Jun 6, 2026

Uh oh!

azure-pipelines Bot commented Jun 6, 2026

Uh oh!

mmitche commented Jun 17, 2026

Uh oh!

lewing left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

mmitche commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mmitche commented Jun 5, 2026 •

edited

Loading

Why `continueOnError` is not enough