Compressed CSV/JSON support#3642
Conversation
|
This looks very cool @Licht-T -- thank you for the contribution. Hopefully we'll get a chance to review it in the next few days |
|
Thanks, @alamb. The test works on my Windows environment but still fails on the Windows CI. I will push a debug code. |
|
I found |
|
Now all green. Ready to get reviewed. |
| async-compression = { version = "0.3.14", features = ["bzip2", "gzip", "futures-io", "tokio"] } | ||
| async-trait = "0.1.41" | ||
| bytes = "1.1" | ||
| bzip2 = "0.4.3" |
There was a problem hiding this comment.
Any ideas how many dependencies we are adding? I guess some of them are already transitive dependencies through parquet.
|
I took a quick look through this PR and it looks great -- thank you @Licht-T Can you please merge / rebase from master to resolve conflicts so that I can merge it? Thanks again! |
|
@alamb Merged from master. I believe we need some documentation about this, right? I am willing to do that. |
Yes please @Licht-T -- that would be super helpful. Perhaps https://arrow.apache.org/datafusion/user-guide/sql/ddl.html?highlight=external+table and https://arrow.apache.org/datafusion/user-guide/sql/index.html?highlight=external+table ? If you are not able to do so, we should at least file a ticket to document the feature |
| doc-comment = "0.3" | ||
| env_logger = "0.9" | ||
| fuzz-utils = { path = "fuzz-utils" } | ||
| rstest = "0.15.0" |
There was a problem hiding this comment.
@Dandandan what do we think about adding this new testing library?
|
Thanks again @Licht-T |
|
Benchmark runs are scheduled for baseline = 58afdf7 and contender = b8a3a78. b8a3a78 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
@Licht-T Were you able to test this change? I am trying these csv options and it simple returns nothing on a valid .gz file.
|

Which issue does this PR close?
Closes #3641.
Rationale for this change
Explained in #3641.
What changes are included in this PR?
FileCompressionTypeas the text file compression type definition.Read/Stream.COMPRESSION TYPESQL token.Are there any user-facing changes?
Yes.
I am not sure which are public APIs, but maybe yes.