diff --git a/README.md b/README.md index e78f61370..d83b78ce3 100644 --- a/README.md +++ b/README.md @@ -28,26 +28,26 @@ DataFusion's Python bindings can be used as an end-user tool as well as providin ## Features -- Execute queries using SQL or DataFrames against CSV, Parquet, and JSON data sources -- Queries are optimized using DataFusion's query optimizer -- Execute user-defined Python code from SQL -- Exchange data with Pandas and other DataFrame libraries that support PyArrow -- Serialize and deserialize query plans in Substrait format -- Experimental support for executing SQL queries against Polars, Pandas and cuDF +- Execute queries using SQL or DataFrames against CSV, Parquet, and JSON data sources. +- Queries are optimized using DataFusion's query optimizer. +- Execute user-defined Python code from SQL. +- Exchange data with Pandas and other DataFrame libraries that support PyArrow. +- Serialize and deserialize query plans in Substrait format. +- Experimental support for transpiling SQL queries to DataFrame calls with Polars, Pandas, and cuDF. ## Comparison with other projects -Here is a comparison with similar projects that may help understand when DataFusion might be suitable and unsuitable +Here is a comparison with similar projects that may help understand when DataFusion might be suitable and unsuitable for your needs: -- [DuckDB](http://www.duckdb.org/) is an open source, in-process analytic database. Like DataFusion, it supports - very fast execution, both from its custom file format and directly from Parquet files. Unlike DataFusion, it is - written in C/C++ and it is primarily used directly by users as a serverless database and query system rather than - as a library for building such database systems. +- [DuckDB](http://www.duckdb.org/) is an open source, in-process analytic database. Like DataFusion, it supports + very fast execution, both from its custom file format and directly from Parquet files. Unlike DataFusion, it is + written in C/C++ and it is primarily used directly by users as a serverless database and query system rather than + as a library for building such database systems. -- [Polars](http://pola.rs/) is one of the fastest DataFrame libraries at the time of writing. Like DataFusion, it - is also written in Rust and uses the Apache Arrow memory model, but unlike DataFusion it does not provide full SQL - support, nor as many extension points. +- [Polars](http://pola.rs/) is one of the fastest DataFrame libraries at the time of writing. Like DataFusion, it + is also written in Rust and uses the Apache Arrow memory model, but unlike DataFusion it does not provide full SQL + support, nor as many extension points. ## Example Usage @@ -110,6 +110,7 @@ See [examples](examples/README.md) for more information. - [Executing SQL on Polars](./examples/sql-on-polars.py) - [Executing SQL on Pandas](./examples/sql-on-pandas.py) +- [Executing SQL on cuDF](./examples/sql-on-cudf.py) ## How to install (from pip) diff --git a/examples/README.md b/examples/README.md index ce98600fe..2c4775ea4 100644 --- a/examples/README.md +++ b/examples/README.md @@ -29,21 +29,22 @@ Here is a direct link to the file used in the examples: ### Executing Queries with DataFusion -- [Query a Parquet file using SQL](./examples/sql-parquet.py) -- [Query a Parquet file using the DataFrame API](./examples/dataframe-parquet.py) -- [Run a SQL query and store the results in a Pandas DataFrame](./examples/sql-to-pandas.py) -- [Query PyArrow Data](./examples/query-pyarrow-data.py) +- [Query a Parquet file using SQL](./sql-parquet.py) +- [Query a Parquet file using the DataFrame API](./dataframe-parquet.py) +- [Run a SQL query and store the results in a Pandas DataFrame](./sql-to-pandas.py) +- [Query PyArrow Data](./query-pyarrow-data.py) ### Running User-Defined Python Code -- [Register a Python UDF with DataFusion](./examples/python-udf.py) -- [Register a Python UDAF with DataFusion](./examples/python-udaf.py) +- [Register a Python UDF with DataFusion](./python-udf.py) +- [Register a Python UDAF with DataFusion](./python-udaf.py) ### Substrait Support -- [Serialize query plans using Substrait](./examples/substrait.py) +- [Serialize query plans using Substrait](./substrait.py) ### Executing SQL against DataFrame Libraries (Experimental) -- [Executing SQL on Polars](./examples/sql-on-polars.py) -- [Executing SQL on Pandas](./examples/sql-on-pandas.py) +- [Executing SQL on Polars](./sql-on-polars.py) +- [Executing SQL on Pandas](./sql-on-pandas.py) +- [Executing SQL on cuDF](./sql-on-cudf.py) diff --git a/examples/sql-on-cudf.py b/examples/sql-on-cudf.py index 407cb1f00..999756fc8 100644 --- a/examples/sql-on-cudf.py +++ b/examples/sql-on-cudf.py @@ -19,8 +19,6 @@ ctx = SessionContext() -ctx.register_parquet( - "taxi", "/home/jeremy/Downloads/yellow_tripdata_2021-01.parquet" -) +ctx.register_parquet("taxi", "yellow_tripdata_2021-01.parquet") df = ctx.sql("select passenger_count from taxi") print(df)