Reuse the Tokio Runtime by kdbrooks · Pull Request #341 · apache/datafusion-python

kdbrooks · 2023-04-24T22:49:29Z

Which issue does this PR close?

Closes #340

Rationale for this change

Currently, we create a new Tokio Runtime and associated threads often which is not good for performance. This PR uses a module level attribute to create this once and reuse it.

Are there any user-facing changes?

No.

…use it.

andygrove · 2023-04-24T23:33:53Z

Thanks @kylebrooks-8451. Can you share any info on how much difference this makes to performance? I am wondering if we should delay the 23.0.0 release to get this merged in?

kdbrooks · 2023-04-24T23:36:17Z

I wanted to get some numbers on this but I didn't have a great way to benchmark it. I know it is significant for our use case which is running an Arrow Flight Server using Datafusion as as engine but I don't have any hard numbers. Is there an easy way to benchmark the Python bindings? I see a benchmark suite for Datafusion proper.

andygrove · 2023-04-24T23:47:12Z

I'm mostly testing with TPC-H using code here:

https://github.com/sql-benchmarks/sqlbench-runners/tree/main/datafusion-python

I doubt it will impact this benchmark all that much though.

This PR is fixing an ugly hack, so I think we should go ahead and merge this.

cc @jdye64

kdbrooks · 2023-04-25T13:35:21Z

@andygrove - I ran that benchmark you linked on my MacBook Pro 6 Core i7 2.6 GHz. Using the TPCH Parquet Data with a Scale Factor of 1.0 and the sqlbench-h SF=1 queries, I got a 245% or 2.45x speedup with the PR using release wheel builds. I'm glad this made it into the 23 release!

	Before	After
setup	246.6	26.7
q1	639.2	322.3
q2	616.3	198.2
q3	539.7	150.7
q4	408.1	107.5
q5	702.5	198.6
q6	125.3	50.7
q7	897.8	413.3
q8	868.4	237.5
q9	1265.6	348.2
q10	683.7	256.5
q11	245.4	105.1
q12	318.4	133
q13	1390.9	591
q14	195.8	88.7
q15	296.9	133.4
q16	269.4	106
q17	3291.9	1555
q18	2970.2	1060.1
q19	262.4	156.1
q20	668.9	370.5
q21	1018	624.7
total	17674.9	7207.1

Speedup		2.452

Store the Tokio Runtime in an _internal module level attribute and re…

ada5c27

…use it.

andygrove approved these changes Apr 24, 2023

View reviewed changes

andygrove merged commit 545e93e into apache:main Apr 24, 2023

kdbrooks deleted the hotfix/reuse-tokio-runtime branch April 25, 2023 12:14

Michael-J-Ward mentioned this pull request Jun 21, 2024

How do I bring dependencies in my binding? #737

Closed

Michael-J-Ward mentioned this pull request Oct 3, 2024

Use OnceLock to store TokioRuntime #895

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse the Tokio Runtime#341

Reuse the Tokio Runtime#341
andygrove merged 1 commit into
apache:mainfrom
kdbrooks:hotfix/reuse-tokio-runtime

kdbrooks commented Apr 24, 2023

Uh oh!

andygrove commented Apr 24, 2023

Uh oh!

kdbrooks commented Apr 24, 2023

Uh oh!

andygrove commented Apr 24, 2023

Uh oh!

kdbrooks commented Apr 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kdbrooks commented Apr 24, 2023

Which issue does this PR close?

Rationale for this change

Are there any user-facing changes?

Uh oh!

andygrove commented Apr 24, 2023

Uh oh!

kdbrooks commented Apr 24, 2023

Uh oh!

andygrove commented Apr 24, 2023

Uh oh!

kdbrooks commented Apr 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants