Describe the bug
There is a blog post that reports relatively poor performance of DataFusion compared to DuckDB and Hyper:
https://www.architecture-performance.fr/ap_blog/tpc-h-benchmark-of-hyper-duckdb-and-datafusion-on-parquet-files/
To Reproduce
I would like someone to try and reproduce the DataFusion performance reported in the blog and propose ways to improve the performance of DataFusion (perhaps by enabling some of the options that are off by default)
Expected behavior
No response
Additional context
@Dandandan suggests on slack that with parallelized scan of a parquet file this benchmark may go faster
Describe the bug
There is a blog post that reports relatively poor performance of DataFusion compared to DuckDB and Hyper:
https://www.architecture-performance.fr/ap_blog/tpc-h-benchmark-of-hyper-duckdb-and-datafusion-on-parquet-files/
To Reproduce
I would like someone to try and reproduce the DataFusion performance reported in the blog and propose ways to improve the performance of DataFusion (perhaps by enabling some of the options that are off by default)
Expected behavior
No response
Additional context
@Dandandan suggests on slack that with parallelized scan of a parquet file this benchmark may go faster