Skip to content

[CONTRIB] PopenPoolExecutor#6959

Merged
junrushao merged 1 commit into
apache:mainfrom
tqchen:popen
Dec 20, 2020
Merged

[CONTRIB] PopenPoolExecutor#6959
junrushao merged 1 commit into
apache:mainfrom
tqchen:popen

Conversation

@tqchen

@tqchen tqchen commented Nov 23, 2020

Copy link
Copy Markdown
Member

PopenPoolExecutor implements a ProcessPoolExecutor backed by popen.

  • Can handles invoking functions in tvm namespace (because the worker does not import context from the runner, but only tvm)
  • Unlike multiprocessing, does not require __main__ block,
    which means it can directly run on a jupyter notebook block
  • Come with timeout and fault tolerant support to timeout
    long running jobs, and restart the process when an error happens.

Recommended usage: it is recommended to create a pool and reuse
it in a long running job(e.g. autotuning) so that the processes
are reused when possible.

@tqchen

tqchen commented Nov 23, 2020

Copy link
Copy Markdown
Member Author

PopenPoolExecutor implements a ProcessPoolExecutor backed by popen.

- Only handles invoking functions in tvm namespace.
- Unlike multiprocessing, does not require __main__ block,
  which means it can directly run on jupyter notebook.
- Come with timeout and fault tolerant support to timeout
  long running jobs, and restart the process when an error happens.

Recommended usage: it is recommended to create a pool and reuse
it in a long running job(e.g. autotuning) so that the process
are reused when possible.
@tqchen

tqchen commented Nov 23, 2020

Copy link
Copy Markdown
Member Author

Additional note: the system overhead of the popen pool and multiprocess.Pool is around 1e-4 sec/item. Which means they can be used to perform heavy duty tasks like compilation, but are not intended for fine grained parallelism. parallel_for in c++ should be used in those cases

@tkonolige

Copy link
Copy Markdown
Contributor

How does this work when a user registers a function? Will the registered function be available in the subprocess?

@tqchen

tqchen commented Nov 23, 2020

Copy link
Copy Markdown
Member Author

In that case the function will need to be registered at the startup time when tvm is imported (since the popen worker also import tvm during startup). Otherwise it won't be available in the subprocess. Closures can still be passed via cloudpickle. To make registeration of any place work we will need to use fork (note that multiprocessing + spawn only works when registeration happens in global scope as well). We could support an additional closures for registeration during pool creation, if there is really a need to do so.

PopenPool is not intended to serve as a general purpose pool, but could be used to solve the particular problem of tir compilation where we can control the behavior inside the tvm

@tqchen

tqchen commented Dec 15, 2020

Copy link
Copy Markdown
Member Author

cc @tkonolige @merrymercy @junrushao1994 let me know if we want to review, merge and try it out

@tkonolige

Copy link
Copy Markdown
Contributor

This seems reasonable, but I'm not really sure how well it will work. Have you tested it with autoscheduler or autotvm?

@junrushao

Copy link
Copy Markdown
Member

It looks promising :-) I can try it out with Jupyter later today

@tqchen

tqchen commented Dec 15, 2020

Copy link
Copy Markdown
Member Author

@tkonolige I do not have bandwith to test it out, prelimary benchmark shows it is close to multprocess.Pool on most platforms. Given that it is mostly self contained we could try to merge it in

@tkonolige

Copy link
Copy Markdown
Contributor

I can try and use it with autotvm today.

@junrushao

junrushao commented Dec 15, 2020

Copy link
Copy Markdown
Member

Just tested with Jupyter notebook - it works smoothly!

@junrushao junrushao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread python/tvm/exec/popen_worker.py
@tqchen

tqchen commented Dec 17, 2020

Copy link
Copy Markdown
Member Author

going to merge after two days

@tkonolige tkonolige left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested on macOS. It appears to work!

@junrushao junrushao merged commit 37af2d7 into apache:main Dec 20, 2020
masahi pushed a commit to masahi/tvm that referenced this pull request Dec 24, 2020
PopenPoolExecutor implements a ProcessPoolExecutor backed by popen.

- Only handles invoking functions in tvm namespace.
- Unlike multiprocessing, does not require __main__ block,
  which means it can directly run on jupyter notebook.
- Come with timeout and fault tolerant support to timeout
  long running jobs, and restart the process when an error happens.

Recommended usage: it is recommended to create a pool and reuse
it in a long running job(e.g. autotuning) so that the process
are reused when possible.
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Jan 20, 2021
PopenPoolExecutor implements a ProcessPoolExecutor backed by popen.

- Only handles invoking functions in tvm namespace.
- Unlike multiprocessing, does not require __main__ block,
  which means it can directly run on jupyter notebook.
- Come with timeout and fault tolerant support to timeout
  long running jobs, and restart the process when an error happens.

Recommended usage: it is recommended to create a pool and reuse
it in a long running job(e.g. autotuning) so that the process
are reused when possible.
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jan 21, 2021
PopenPoolExecutor implements a ProcessPoolExecutor backed by popen.

- Only handles invoking functions in tvm namespace.
- Unlike multiprocessing, does not require __main__ block,
  which means it can directly run on jupyter notebook.
- Come with timeout and fault tolerant support to timeout
  long running jobs, and restart the process when an error happens.

Recommended usage: it is recommended to create a pool and reuse
it in a long running job(e.g. autotuning) so that the process
are reused when possible.
electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021
PopenPoolExecutor implements a ProcessPoolExecutor backed by popen.

- Only handles invoking functions in tvm namespace.
- Unlike multiprocessing, does not require __main__ block,
  which means it can directly run on jupyter notebook.
- Come with timeout and fault tolerant support to timeout
  long running jobs, and restart the process when an error happens.

Recommended usage: it is recommended to create a pool and reuse
it in a long running job(e.g. autotuning) so that the process
are reused when possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants