Is your feature request related to a problem or challenge?
Thanks to the great work from @metesynnada @devinjdangelo and others, it is now possible to have DataFusion insert data into CSV and JSON tables.
However, the User Experience / UX is tough as it is not easy to write datasets because: External tables require an existing location to exist, even if empty
So for example, if I wanted to write to json files in /tmp/my_table I currently need to create a the target file / directory externally
mkdir /tmp/my_table
echo "" > /tmp/my_table/1.json
datafusion-cli
❯ create external table my_table(x int, y int) stored as JSON location '/tmp/my_table';
0 rows in set. Query took 0.002 seconds.
❯ insert into my_table values (1,2), (3, 4);
+-------+
| count |
+-------+
| 2 |
+-------+
1 row in set. Query took 0.006 seconds.
❯
\q
$ cat /tmp/my_table/1.json
{"x":1,"y":2}
{"x":3,"y":4}
Describe the solution you'd like
I would like to be able to have datafusion create the targets directly
Something like this, without any setup:
❯ create external table my_table(x int, y int) stored as JSON location '/tmp/my_table';
0 rows in set. Query took 0.002 seconds.
❯ insert into my_table values (1,2), (3, 4);
+-------+
| count |
+-------+
| 2 |
+-------+
1 row in set. Query took 0.006 seconds.
$ cat /tmp/my_table/1.json
{"x":1,"y":2}
{"x":3,"y":4}
Describe alternatives you've considered
Option 1: Automatically create the target
One option is simply to remove the existence check on CREATE EXTERNAL TABLE
❯ create external table my_table(x int, y int) stored as JSON location '/tmp/my_table';
0 rows in set. Query took 0.002 seconds.
❯ insert into my_table values (1,2), (3, 4);
+-------+
| count |
+-------+
| 2 |
+-------+
1 row in set. Query took 0.006 seconds.
The downside is that the reason CREATE EXTERNAL TABLE errors if the target doesn't exist is to help people debug errors when reading
Option 2: Add new DDL to CREATE EXTERNAL TABLE
Perhaps we could add a phrase to CREATE EXTERNAL TABLE like FOR WRITE
❯ create external table my_table(x int, y int) stored as JSON location '/tmp/my_table' FOR WRITE;
0 rows in set. Query took 0.002 seconds.
The semantics would be if the target file/directory doesn't already exist, then create it rather than error
Option 3: Add a config parameter to control the default behavior
We could also add a config parameter like
❯ set catalog.auto_create_tables = true;
SET
❯ create external table my_table(x int, y int) stored as JSON location '/tmp/my_table' FOR WRITE;
0 rows in set. Query took 0.002 seconds.
with the same semantics as option 2 (create it target doesn't exist)
Additional context
No response
Is your feature request related to a problem or challenge?
Thanks to the great work from @metesynnada @devinjdangelo and others, it is now possible to have DataFusion insert data into CSV and JSON tables.
However, the User Experience / UX is tough as it is not easy to write datasets because: External tables require an existing location to exist, even if empty
So for example, if I wanted to write to json files in
/tmp/my_tableI currently need to create a the target file / directory externally$ cat /tmp/my_table/1.json {"x":1,"y":2} {"x":3,"y":4}Describe the solution you'd like
I would like to be able to have datafusion create the targets directly
Something like this, without any setup:
$ cat /tmp/my_table/1.json {"x":1,"y":2} {"x":3,"y":4}Describe alternatives you've considered
Option 1: Automatically create the target
One option is simply to remove the existence check on
CREATE EXTERNAL TABLEThe downside is that the reason
CREATE EXTERNAL TABLEerrors if the target doesn't exist is to help people debug errors when readingOption 2: Add new DDL to
CREATE EXTERNAL TABLEPerhaps we could add a phrase to
CREATE EXTERNAL TABLElikeFOR WRITEThe semantics would be if the target file/directory doesn't already exist, then create it rather than error
Option 3: Add a config parameter to control the default behavior
We could also add a config parameter like
with the same semantics as option 2 (create it target doesn't exist)
Additional context
No response