[S-TIR][MetaSchedule] Make evolutionary search resilient to trace replay failures by cchung100m · Pull Request #19438 · apache/tvm

cchung100m · 2026-04-24T11:44:18Z

Hi Committers,

This PR is trying to fix issues #17934. Any suggestions would be appreciated if you are available.

Root Cause

During EvolutionarySearch candidate generation, trace->ApplyToSchedule(...) could throw ScheduleError.
The exception was propagated through parallel execution and aborted tuning.
Error handling was inconsistent between measured and unmeasured paths, and failure visibility was limited.

Solutions

Catch trace replay failures in ThreadedTraceApply::Apply and return nullopt instead of crashing.
Add trace replay failure counting (trace_fail_counter_) and accessor (TraceFailCount()).
Align measured path PickBestFromDatabase with unmeasured behavior: skip invalid candidates and continue.
Add visible WARNING logs when trace replay failures occur (to avoid silent failures).

…lay failures

gemini-code-assist

Code Review

This pull request enhances the robustness of the evolutionary search strategy by gracefully handling trace replay failures. Key changes include wrapping schedule application in try-catch blocks within ThreadedTraceApply, introducing an atomic counter to track these failures, and updating PickBestFromDatabase and SampleInitPopulation to log warnings and filter out invalid schedules instead of terminating. The review feedback suggests replacing DLOG with TVM_PY_LOG to ensure that detailed failure information is visible in production builds as well as debug builds.

cchung100m · 2026-04-24T16:00:56Z

Hi @tlopex @mshr-h

This PR is trying to fix issues #17934. Any suggestions would be appreciated if you are available.

tlopex

Overall LGTM. Could you have a look at the review of Gemini and fix them as Gemini suggested?

cchung100m · 2026-04-25T09:26:02Z

Hi @tlopex
Thanks for the prompt reply. I updated the part you mentioned. 😄

cchung100m · 2026-04-26T00:57:07Z

Thanks to @tlopex 😄

[S-TIR][MetaSchedule] Make evolutionary search resilient to trace rep…

d8f8193

…lay failures

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread src/s_tir/meta_schedule/utils.h Outdated

Comment thread src/s_tir/meta_schedule/utils.h Outdated

[S-TIR][MetaSchedule] Add test case

9f54fe0

cchung100m marked this pull request as ready for review April 24, 2026 16:00

tlopex requested changes Apr 25, 2026

View reviewed changes

[S-TIR][MetaSchedule] Update the log

fadd299

tlopex approved these changes Apr 25, 2026

View reviewed changes

tlopex merged commit 0a0dd31 into apache:main Apr 25, 2026
9 checks passed

cchung100m deleted the issue-17934 branch April 26, 2026 00:56

ysh329 mentioned this pull request May 6, 2026

[Release] v0.24.0 Release Candidate Notes #19513

Closed

Cookiee235 mentioned this pull request Jun 1, 2026

[Bug] [MetaSchedule] parallel_for_dynamic error with ScheduleError: (not rendered) #17934

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[S-TIR][MetaSchedule] Make evolutionary search resilient to trace replay failures#19438

[S-TIR][MetaSchedule] Make evolutionary search resilient to trace replay failures#19438
tlopex merged 3 commits into
apache:mainfrom
cchung100m:issue-17934

cchung100m commented Apr 24, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

cchung100m commented Apr 24, 2026

Uh oh!

tlopex left a comment

Uh oh!

cchung100m commented Apr 25, 2026

Uh oh!

Uh oh!

cchung100m commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

cchung100m commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause

Solutions

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

cchung100m commented Apr 24, 2026

Uh oh!

tlopex left a comment

Choose a reason for hiding this comment

Uh oh!

cchung100m commented Apr 25, 2026

Uh oh!

Uh oh!

cchung100m commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cchung100m commented Apr 24, 2026 •

edited

Loading