The purpose of this issue is to track the work that we need to do in the DataFusion project to support moving the dask-sql planner to DataFusion.
High Priority Tech Debt
We need to fix some issues before we can really get started on the main features.
High Priority Features
The SQL Query Planner and Logical Plan need to implement these features. Note that DataFusion does not necessarily need to implement a physical plan for these features, so that reduces the scope of this work.
High Priority Bugs
These are the bugs that we are seeing when attempting to parse all the queries from our benchmark suite.
Ongoing improvements
We do not need these for the benchmark suite but these are features and bugs that we are likely to eventually run into so it makes sense to be proactive and work on these.
Refactoring of DataFusion crates
We currently bring in the full datafusion crate as a dependency, including the physical plans and execution engine. We really should just depend on the features necessary for SQL query planning and logical plan optimization. These are the issues that need to be implemented to achieve that.
Lower Priority Tech Debt / Misc Other Items
The purpose of this issue is to track the work that we need to do in the DataFusion project to support moving the dask-sql planner to DataFusion.
High Priority Tech Debt
We need to fix some issues before we can really get started on the main features.
physical-exprcrate toexprcrate apache/datafusion#2251LogicalPlan::TableScanshould not depend on the physical plan apache/datafusion#2247High Priority Features
The SQL Query Planner and Logical Plan need to implement these features. Note that DataFusion does not necessarily need to implement a physical plan for these features, so that reduces the scope of this work.
LogicalPlan::TableScanshould not depend on the physical plan apache/datafusion#2247Queryfor datafusion apache/datafusion#2181Expr::InSubqueryandExpr::ScalarSubqueryapache/datafusion#2342ROLLUPandCUBEgrouping sets in SQL query planner and logical plan apache/datafusion#2378groupingaggregate function in the SQL planner apache/datafusion#2477DecimaltoFloat64apache/datafusion#2380High Priority Bugs
These are the bugs that we are seeing when attempting to parse all the queries from our benchmark suite.
roundfunction with two arguments apache/datafusion#2420Ongoing improvements
We do not need these for the benchmark suite but these are features and bugs that we are likely to eventually run into so it makes sense to be proactive and work on these.
OFFSETin SQL query planner + logical plan apache/datafusion#2377order byexpression that references complexgroup byexpression apache/datafusion#2360UNIONvsUNION ALL(introduce a LogicalPlan::Distinct) apache/datafusion#2573Refactoring of DataFusion crates
We currently bring in the full datafusion crate as a dependency, including the physical plans and execution engine. We really should just depend on the features necessary for SQL query planning and logical plan optimization. These are the issues that need to be implemented to achieve that.
ExecutionPropsfromOptimizerRuletrait apache/datafusion#2614Lower Priority Tech Debt / Misc Other Items