Client for Updating a Simple Data Warehouse on Blob Storage
- optimize for simplicity and user friendliness
- storage is cheap (compared to compute)
- pre-compute as much as possible
- should work out of the box
- advanced configuration should be opt-in
- explicit is better than implicit
- straightforwardness over magic
pip install datablob- csv
- geojson (points only)
- json
- json lines
- parquet, including geoparquet
- shapefile (points only)
- xlsx (Microsoft Excel)
from datablob import DataBlobClient
client = DataBlobClient(bucket_name="example-test-bucket-123", bucket_path="prefix/to/dataportal")
client.update_dataset(name="fleet", version="2", data=rows, xlsx=True)
# automatically creates the following files
# s3://example-test-bucket-123/prefix/to/dataportal/fleet/v2/meta.json
# s3://example-test-bucket-123/prefix/to/dataportal/fleet/v2/data.csv
# s3://example-test-bucket-123/prefix/to/dataportal/fleet/v2/data.points.geojson
# s3://example-test-bucket-123/prefix/to/dataportal/fleet/v2/data.json
# s3://example-test-bucket-123/prefix/to/dataportal/fleet/v2/data.jsonl
# s3://example-test-bucket-123/prefix/to/dataportal/fleet/v2/data.parquet
# s3://example-test-bucket-123/prefix/to/dataportal/fleet/v2/data.points.shp.zip
# s3://example-test-bucket-123/prefix/to/dataportal/fleet/v2/data.xlsx