QuackIO.jl 🐣🦆

If it quacks like a tabular IO package, then it is a tabular IO package (powered by DuckDB).

QuackIO provides a native Julia interface to DuckDB read/write functions. They support all Julia table types that follow the Tables.jl interface. Reading and writing is most efficient with columnar table types, such as StructArray or columntable.

Write table

write_table(filename, table; options...)
Executes COPY table TO file (options) in DuckDB. Here, table is an arbitrary Julia table object:

using QuackIO

write_table("my_file.csv", tbl)
write_table("my_file.pq", tbl, format=:parquet)

Read full table into memory

read_*(tabletype, filename; options...) where * is csv, parquet, or json
Calls the corresponding DuckDB function and converts the result to the specified Julia table type.

using QuackIO, Tables, StructArrays

# read_* call DuckDB functions with corresponding names
# and support any Julia table format:
tbl = read_csv(columntable, "my_file.csv", delim=";")
tbl = read_parquet(StructArray, "my_file.pq")
tbl = read_json(rowtable, "my_file.json")

Thanks to DuckDB and its Julia integration, QuackIO functions are performant. They can even be faster than native Julia readers for these formats.

Lazy read: only materialize a subset

QuackIO is based on DuckDB – a fully-featured SQL database. Such a backend makes it straightforward to do basic data manipulation on the fly, without materializing the whole table in memory.
Thanks to the SQLCollections.jl integration, even the syntax for performing manipulations in SQL is basically the same as for in-memory Julia datasets.

Column Selection

Load only specific columns from a CSV file into memory:

using QuackIO
using SQLCollections

# @o syntax comes from Accessors.jl
map(
   (@o (;_.version_number, _.release_date)),
   read_csv(SQLCollection, "https://duckdb.org/data/duckdb-releases.csv")) |> collect

Extract First N Rows

Only load first 3 rows from a CSV file into memory:

using QuackIO
using SQLCollections

first(read_csv(SQLCollection, "https://duckdb.org/data/duckdb-releases.csv"), 3) |> collect

Generic Queries

Even more complex queries can be performed with basically the same syntax as for in-memory Julia datasets. DataPipes.jl is especially useful for writing multi-step pipelines.
Compare loading all data into memory and doing filtering there:

using QuackIO
using Tables
using DataPipes

# plain Julia tables –
# load everything into memory first, then filter and select columns:
@p read_csv(rowtable, "https://duckdb.org/data/duckdb-releases.csv") |>
   filter(startswith(_.version_number, "0.10.")) |>
   map((;_.version_number, _.release_date)) |>
   first(__, 3)

Versus doing everything in DuckDB and loading only the small final table into memory:

using SQLCollections

# with SQLCollections –
# filter and select columns lazily in DuckDB,
# load only the small final table into memory:
@p read_csv(SQLCollection, "https://duckdb.org/data/duckdb-releases.csv") |>
   filter(@o startswith(_.version_number, "0.10.")) |>
   map(@o (;_.version_number, _.release_date)) |>
   first(__, 3) |>
   collect

Experimental: Very common tasks, such as column selection, are also supported through dedicated keyword arguments in read_* functions. For example, read_csv(...; select=["version_number", "release_date"], limit=3); see docstrings for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
ext		ext
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QuackIO.jl 🐣🦆

Write table

Read full table into memory

Lazy read: only materialize a subset

Column Selection

Extract First N Rows

Generic Queries

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QuackIO.jl 🐣🦆

Write table

Read full table into memory

Lazy read: only materialize a subset

Column Selection

Extract First N Rows

Generic Queries

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages