Use OnceLock to store TokioRuntime#895
Conversation
6fa05ba to
ff1e758
Compare
|
Because the change was performance related, we should probably benchmark the change. I did a quick & dirty one (making no effort to isolate resources / stop other processes on my laptop), but the results tell me I should run a cleaner one tomorrow before merging. Using
|
|
From the benchmarking above it appears to be a mostly negative impact. Am I reading that right? |
|
Yes. My ~dirty benchmark had a pretty consistently negative performance impact. Hence, the need for further investigation. That result surprises me, though. |
18c6098 to
030ec20
Compare
|
Alright, second benchmark running after clean boot look like this has basically no impact. I wouldn't mind if @andygrove or anyone else has a nice benchmarking server setup to run their own, but I'm more comfortable with this now. Reminder:
|
efd2046 to
37d3356
Compare
37d3356 to
85f3ca3
Compare
I also included a reference comment in case future users experience problems with using datafusion-python behind a forking app server l ike `gunicorn`.
|
Ran it one more time with the
|
| // If we run into that problem, in the future we can look to `delta-rs` | ||
| // which adds a check in that disallows calls from a forked process | ||
| // https://github.com/delta-io/delta-rs/blob/87010461cfe01563d91a4b9cd6fa468e2ad5f283/python/src/utils.rs#L10-L31 | ||
| static RUNTIME: OnceLock<TokioRuntime> = OnceLock::new(); |
There was a problem hiding this comment.
if I'm understanding it correctly (please correct me if I'm not making sense).
So now it use OnceLock<Runtime> than OnceLock<Arc<Runtime>> to get rid of Python heap thoroughly. And it will not be thread-safe?
There was a problem hiding this comment.
I recommend looking at the PR diff to see the real change.
There is an intermediate commit that went from OnceLock<Arc<Runtime>> to OnceLock<Runtime>, but that's because the Arc was superfluous.
OnceLock is already Send + Sync and explicitly thread-safe.
So, if you discover that it is not actually thread-safe, please share!
Which issue does this PR close?
Closes #737.
Rationale for this change
To avoid repeatedly recreating the tokio runtime, #341 began storing the tokio runtime on the python heap at
datafusion.runtime.This created the requirement that the
datafusionpython package must be available to any user of the rust cratedatafusion-python.However, some users (#737) want to leverage the rust-crate
datafusion-pythonto create their own custom python package without depending on or using the pythondatafusionpackage.What changes are included in this PR?
This PR uses
std::sync::OnceLockinstead of the python heap to ensure that only one tokio runtime gets created once.This both removes the implicit dependency and avoids
rust --> python --> rustroundtrip every time the runtime is used.Are there any user-facing changes?
datafusion.runtimehas been removed, but that was already unusable for users of the python package.