Skip to content

Add skip to flaky MacOS RPC test#9753

Merged
areusch merged 5 commits into
apache:mainfrom
driazati:flaky_rpc
Jan 6, 2022
Merged

Add skip to flaky MacOS RPC test#9753
areusch merged 5 commits into
apache:mainfrom
driazati:flaky_rpc

Conversation

@driazati

@driazati driazati commented Dec 16, 2021

Copy link
Copy Markdown
Member

Skip for #9824

cc @areusch

@driazati driazati marked this pull request as ready for review December 16, 2021 08:14
@KJlaccHoeUM9l

Copy link
Copy Markdown
Contributor

Hello @driazati!

Thank you for your comment! Our team also noticed this problem.
It looks like the first time this error occurred when running this action in PR#9483.

Failure occurs in a test that seeks to verify that the auto scheduler can be used with an RPC Session. The error occurs due to checking the tuning log after a tuning session. The test says that at this stage, an error code was found inside the log, which says the following:

Errors happen when compiling code on device
(e.g. OpenCL JIT on the device)

It doesn't look like the problem is in the whole test or in the iOS RPC application.

At the moment, it has not been possible to reproduce the problem locally to find out more details about this failure. If you have any additional information about this or can reproduce this problem yourself, then we would be very grateful for any additional observations.

@driazati

Copy link
Copy Markdown
Member Author

I wasn't able to reproduce it locally either (and it doesn't seem to fail in CI super often either, maybe 1 out of every 20 recent runs on main) and I don't really know the autotuning/rpc code well enough to guess where the problem might be, but it still comes up from time to time so I think it should be disabled until a proper repro/fix is found. It makes more sense to mark this as a flaky failure with xfail rather than skip though so it will still run, but not report as an unexpected error if it fails.

@KJlaccHoeUM9l

Copy link
Copy Markdown
Contributor

Hello @driazati!
We have only been able to reproduce this issue in the azure cloud.
To do this, we slightly modified main.yaml as follows:

for ((i=1; i < 100; i++)); do python -m pytest -vrP tests/python/contrib/test_rpc_server_device.py; done

It looks like this line has fallen off:

func = remote.load_module (os.path.split (build_res.filename) [1])

The problem requires further investigation.

@areusch

areusch commented Jan 3, 2022

Copy link
Copy Markdown
Contributor

thanks @driazati can you create a GH issue for this test or mention it in this PR?

@areusch

areusch commented Jan 6, 2022

Copy link
Copy Markdown
Contributor

sorry one q: has anyone repro'd this on linux? if it hasn't failed there, i'd lean towards disabling it only on windows/os x for now. the GH actions can be retriggered and don't necessarily block commit.

@KJlaccHoeUM9l could you let us know which os you were using in azure?

@areusch

areusch commented Jan 6, 2022

Copy link
Copy Markdown
Contributor

oh sorry--ignore. i misread the test decorators.

@areusch areusch merged commit 33724bb into apache:main Jan 6, 2022
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
* Add skip to flaky MacOS RPC test

* Use flaky marker instead

* link issue

* trigger ci

* trigger ci

Co-authored-by: driazati <driazati@users.noreply.github.com>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* Add skip to flaky MacOS RPC test

* Use flaky marker instead

* link issue

* trigger ci

* trigger ci

Co-authored-by: driazati <driazati@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants