Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- `DType.isUnsigned()` — `true` for the unsigned integer primitives (`U8`–`U64`), `false` otherwise. ([#159](https://github.com/dfa1/vortex-java/issues/159))

### Changed

- The `vortex.zstd` encoding now compresses and decompresses through `io.github.dfa1.zstd:zstd` (FFM bindings to the native `libzstd`) instead of `io.airlift:aircompressor-v3`. Consumers of `vortex.zstd` must declare the `zstd` dependency plus the `zstd-native-<platform>` artifact for their platform (e.g. `zstd-native-osx-aarch64`, `zstd-native-linux-x86_64`).

### Fixed

- Zone-map pruning now compares filter values in the *column's* type domain rather than by the boxed value's type. A predicate whose value is boxed at a different width (e.g. `Integer` on an `I64` column) — or any value on a `U64` column — previously pruned nothing and silently degraded to a full scan; it now prunes correctly (unsigned columns by unsigned order). As part of this, a filter value genuinely incomparable to its column (e.g. a `String` against a numeric column) now raises `VortexException` during the scan instead of silently disabling pruning — a behaviour change for callers that relied on the previous silent full scan. ([#159](https://github.com/dfa1/vortex-java/issues/159))
Expand Down
8 changes: 4 additions & 4 deletions bom/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@
<dependencyManagement>
<dependencies>
<!-- optional transitive deps: consumers who use vortex.zstd must declare
aircompressor explicitly; the BOM pins the tested version. -->
the zstd FFM bindings explicitly; the BOM pins the tested version. -->
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<version>${aircompressor.version}</version>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
<version>${zstd.version}</version>
</dependency>
<dependency>
<groupId>io.github.dfa1.vortex</groupId>
Expand Down
4 changes: 2 additions & 2 deletions calcite/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
<scope>test</scope>
</dependency>
<dependency>
Expand Down
4 changes: 2 additions & 2 deletions cli/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@
<artifactId>vortex-inspector</artifactId>
</dependency>
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
</dependency>
<dependency>
<!-- Required by hardwood-core (via vortex-parquet) for ZSTD-compressed
Expand Down
4 changes: 2 additions & 2 deletions csv/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@
</dependency>
<!-- testing -->
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
<scope>test</scope>
</dependency>
<dependency>
Expand Down
4 changes: 2 additions & 2 deletions integration/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
<dependencies>
<!-- testing -->
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
<scope>test</scope>
</dependency>
<!-- hardwood (Parquet reader) requires zstd-jni to decompress ZSTD-compressed Parquet pages -->
Expand Down
4 changes: 2 additions & 2 deletions jdbc/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
</dependency>
<!-- testing -->
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
<scope>test</scope>
</dependency>
<dependency>
Expand Down
6 changes: 3 additions & 3 deletions parquet/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@
<artifactId>vortex-reader</artifactId>
<scope>test</scope>
</dependency>
<!-- reader declares aircompressor-v3 optional; pull it in so the test can decode ZSTD chunks -->
<!-- reader declares the zstd bindings optional; pull it in so the test can decode ZSTD chunks -->
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
<scope>test</scope>
</dependency>
<dependency>
Expand Down
4 changes: 2 additions & 2 deletions performance/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,8 @@
<scope>compile</scope>
</dependency>
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
</dependency>
<!-- hardwood (Parquet reader) requires zstd-jni to decompress ZSTD-compressed Parquet pages -->
<dependency>
Expand Down
65 changes: 61 additions & 4 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@
<maven.compiler.release>25</maven.compiler.release>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<!-- production -->
<aircompressor.version>3.6</aircompressor.version>
<zstd.version>0.1</zstd.version>


<fastcsv.version>4.3.0</fastcsv.version>
<hardwood.version>1.0.0.CR2</hardwood.version>
Expand Down Expand Up @@ -190,9 +191,9 @@
<version>${hardwood.version}</version>
</dependency>
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<version>${aircompressor.version}</version>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
<version>${zstd.version}</version>
</dependency>
<dependency>
<groupId>com.github.luben</groupId>
Expand Down Expand Up @@ -435,6 +436,62 @@
</build>

<profiles>
<profile>
<!-- io.github.dfa1.zstd:zstd ships no bundled native; its FFM loader reads
libzstd from a /native/<platform>/ classpath resource carried by a separate
zstd-native-<platform> artifact. Pull in the one matching the build host.
Add more platforms as needed. -->
<id>zstd-native-osx-aarch64</id>
<activation>
<os>
<family>mac</family>
<arch>aarch64</arch>
</os>
</activation>
<dependencies>
<dependency>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd-native-osx-aarch64</artifactId>
<version>${zstd.version}</version>
<scope>runtime</scope>
</dependency>
</dependencies>
</profile>
<profile>
<id>zstd-native-linux-x86_64</id>
<activation>
<os>
<family>unix</family>
<name>linux</name>
<arch>amd64</arch>
</os>
</activation>
<dependencies>
<dependency>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd-native-linux-x86_64</artifactId>
<version>${zstd.version}</version>
<scope>runtime</scope>
</dependency>
</dependencies>
</profile>
<profile>
<id>zstd-native-windows-x86_64</id>
<activation>
<os>
<family>windows</family>
<arch>amd64</arch>
</os>
</activation>
<dependencies>
<dependency>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd-native-windows-x86_64</artifactId>
<version>${zstd.version}</version>
<scope>runtime</scope>
</dependency>
</dependencies>
</profile>
<profile>
<!-- Activated in CI: ./mvnw verify -P coverage -->
<!-- Attaches the JaCoCo agent to surefire/failsafe and writes one
Expand Down
4 changes: 2 additions & 2 deletions reader/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@
<artifactId>vortex-core</artifactId>
</dependency>
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
<optional>true</optional>
</dependency>
<!-- testing -->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,11 @@
import io.github.dfa1.vortex.reader.array.MaterializedShortArray;
import io.github.dfa1.vortex.reader.array.VarBinArray;

import io.airlift.compress.v3.zstd.ZstdDecompressor;
import io.airlift.compress.v3.zstd.ZstdJavaDecompressor;
import io.github.dfa1.zstd.ZstdDecompressCtx;

import java.io.IOException;
import java.lang.foreign.Arena;
import java.lang.foreign.MemorySegment;
import java.lang.foreign.ValueLayout;

/// Read-only decoder for `vortex.zstd`.
public final class ZstdEncodingDecoder implements EncodingDecoder {
Expand Down Expand Up @@ -52,7 +51,7 @@ public Array decode(DecodeContext ctx) {
}
if (meta.dictionary_size() != 0) {
throw new VortexException(EncodingId.VORTEX_ZSTD,
"dictionary-compressed Zstd segments are not supported (pure-Java decoder)");
"dictionary-compressed Zstd segments are not supported");
}

BoolArray validity = null;
Expand Down Expand Up @@ -149,25 +148,38 @@ private static MemorySegment decompressFrames(
int frameCount,
long totalUncompressed
) {
// Zero-copy: decompress each native frame straight into its slice of the arena output,
// no heap byte[] bounce. The mmap'd file buffers are already native; the scratch arena
// only services the heap segments unit tests hand in.
MemorySegment out = ctx.arena().allocate(totalUncompressed);
ZstdDecompressor decompressor = new ZstdJavaDecompressor();
long outOffset = 0;
for (int i = 0; i < frameCount; i++) {
MemorySegment frameSeg = ctx.buffer(i);
byte[] compressed = frameSeg.toArray(ValueLayout.JAVA_BYTE);
int uncompSize = (int) meta.frames().get(i).uncompressed_size();
byte[] temp = new byte[uncompSize];
int written = decompressor.decompress(compressed, 0, compressed.length, temp, 0, uncompSize);
if (written != uncompSize) {
throw new VortexException(EncodingId.VORTEX_ZSTD,
"frame " + i + ": expected " + uncompSize + " bytes, got " + written);
try (ZstdDecompressCtx dctx = new ZstdDecompressCtx();
Arena scratch = Arena.ofConfined()) {
long outOffset = 0;
for (int i = 0; i < frameCount; i++) {
MemorySegment src = asNative(ctx.buffer(i), scratch);
int uncompSize = (int) meta.frames().get(i).uncompressed_size();
long written = dctx.decompress(out.asSlice(outOffset, uncompSize), src);
if (written != uncompSize) {
throw new VortexException(EncodingId.VORTEX_ZSTD,
"frame " + i + ": expected " + uncompSize + " bytes, got " + written);
}
outOffset += uncompSize;
}
MemorySegment.copy(MemorySegment.ofArray(temp), 0, out, outOffset, uncompSize);
outOffset += uncompSize;
}
return out;
}

/// Returns `seg` unchanged when it is already native (the production mmap path); otherwise
/// copies it into `scratch` so the zero-copy native API can read it.
private static MemorySegment asNative(MemorySegment seg, Arena scratch) {
if (seg.isNative()) {
return seg;
}
MemorySegment copy = scratch.allocate(Math.max(seg.byteSize(), 1));
MemorySegment.copy(seg, 0, copy, 0, seg.byteSize());
return copy.asSlice(0, seg.byteSize());
}

private static Array buildArray(DType dtype, long n, MemorySegment decompressed, DecodeContext ctx) {
if (dtype instanceof DType.Primitive dt) {
return buildPrimitive(dt, n, decompressed);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,9 @@ void scan_publishedFixture_decodesAllRows(String fixture) throws Exception {
assertThat(totalRows).isGreaterThan(0);
}

// The published zstd.vortex fixture is dictionary-compressed; the pure-Java decoder has no
// Zstd dictionary support and must fail fast with a clear message rather than mis-decode.
// Tracked by https://github.com/dfa1/vortex-java/issues/104 (upstream airlift/aircompressor#119).
// The published zstd.vortex fixture is dictionary-compressed; the decoder has no Zstd
// dictionary support and must fail fast with a clear message rather than mis-decode.
// Tracked by https://github.com/dfa1/vortex-java/issues/104.
@Test
void scan_zstdVortex_rejectsDictionaryCompression() throws Exception {
// Given
Expand Down
4 changes: 2 additions & 2 deletions writer/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@
<artifactId>vortex-core</artifactId>
</dependency>
<dependency>
<groupId>io.airlift</groupId>
<artifactId>aircompressor-v3</artifactId>
<groupId>io.github.dfa1.zstd</groupId>
<artifactId>zstd</artifactId>
<optional>true</optional>
</dependency>
<!-- testing -->
Expand Down
Loading
Loading