Skip to content

feat(zstd): encode nullable columns#167

Merged
dfa1 merged 2 commits into
mainfrom
feat/zstd-nullable-encode
Jun 26, 2026
Merged

feat(zstd): encode nullable columns#167
dfa1 merged 2 commits into
mainfrom
feat/zstd-nullable-encode

Conversation

@dfa1

@dfa1 dfa1 commented Jun 26, 2026

Copy link
Copy Markdown
Owner

Summary

Closes the vortex.zstd nullable encode gap (TODO.md). The reader already decoded nullable vortex.zstd (validity child[0] + a packed valid-only payload, scattered on decode), but ZstdEncodingEncoder could only write fully-valid primitive and varbin columns.

What changed

  • ZstdEncodingEncoderencode(...) now accepts:
    • NullableData (primitive nullable): strips null positions from the storage array, compresses only the valid values.
    • String[] carrying nulls (utf8/binary nullable): strips nulls, compresses only the valid strings.
      In both cases the validity bitmap is encoded as a Bool child[0] (buffer indices remapped by 1, frame payload owns buffer[0]). Mirrors the Rust reference: only valid values reach the compressed payload.
  • Testsencode_nullablePrimitive_roundTrips + encode_nullableUtf8_roundTrips (encode → decode → assert values + validity).
  • TODO.md — removed the completed item.
  • CHANGELOG.md — user-facing Added entry.

Testing

  • 14/14 ZstdEncodingEncoderTest pass (incl. 2 new round-trips).
  • ./mvnw -q test -pl writer -am -Dtest=ZstdEncodingEncoderTest

Notes

  • Multi-frame encode (the other vortex.zstd TODO item) is still open.

🤖 Generated with Claude Code

dfa1 and others added 2 commits June 26, 2026 16:37
ZstdEncodingEncoder had no null handling: only fully-valid primitive and
varbin columns could be written, while the reader already decoded
nullable vortex.zstd (validity child[0] + a packed valid-only payload).

Close the asymmetry. The encoder now accepts NullableData (primitive)
and String[] carrying nulls (utf8/binary): null positions are stripped
so only valid values reach the compressed frame, and the validity bitmap
is encoded as a Bool child[0] — mirroring the Rust reference. Adds
encode->decode round-trip tests for both shapes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Route NullableData straight to a configured nullable-capable encoder
(EncodingEncoder.acceptsNullable) instead of always masked-wrapping, so
an explicitly-configured vortex.zstd encodes nullable primitive columns
directly (validity as Bool child[0]). Default writes are unaffected:
DEFAULT_CODECS has no zstd, and the cascade path keeps the masked layout.

Reject non-nullable utf8/binary carrying a stray null rather than
silently emitting a nullable layout. Nullable varbin stays data-driven
(validity child only when nulls are present), matching the Rust ref.

Add Rust-interop ITs for nullable zstd (primitive I64 + utf8) and unit
tests for the all-null payload corner and the non-nullable-with-null guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dfa1 dfa1 merged commit 1b3713b into main Jun 26, 2026
6 checks passed
@dfa1 dfa1 deleted the feat/zstd-nullable-encode branch June 26, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant