You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Slightly improves the performance of writing rows.
What changes are included in this PR?
To avoid cloning the SchemaRef we pass in the schema as a separate parameter. I also marked the benchmark functions as inline(never) so that they stand out more in the profiler, since they are operating on large chunks of data this should not create any overhead.
Benchmark results on i7-10510U, run with $ RUSTFLAGS="-C target-cpu=skylake" cargo bench --features row,jit --bench jit:
master branch:
row serializer time: [2.0518 s 2.0745 s 2.1029 s]
row serializer jit time: [1.8530 s 1.8626 s 1.8723 s]
this branch:
row serializer time: [1.6923 s 1.7042 s 1.7161 s]
row serializer jit time: [1.8468 s 1.8562 s 1.8657 s]
If I understand the code correctly then the jit calls the same write_field_xyz functions as the rust version and is not able to inline these functions. So it avoids the type dispatch, but instead has several more function calls than the rust code (which is able to inline some of the write_field functions). It should be possible to speed up the jit a lot if it could directly generate code corresponding to the write_field methods that could get inlined and also avoid the downcasting.
After searching and discussing with @houqp, it seems complicated to make cranelift to inline rust function into JIT code. I want to try LLVM out with both assembly and IR inline capabilities. I will report here if I make some progress.
Quote Postgres JIT docs here:
One big advantage of JITing expressions is that it can significantly
reduce the overhead of PostgreSQL's extensible function/operator
mechanism, by inlining the body of called functions/operators.
It obviously is undesirable to maintain a second implementation of
commonly used functions, just for inlining purposes. Instead we take
advantage of the fact that the Clang compiler can emit LLVM IR.
The ability to do so allows us to get the LLVM IR for all operators
(e.g. int8eq, float8pl etc), without maintaining two copies. These
bitcode files get installed into the server's
$pkglibdir/bitcode/postgres/
Using existing LLVM functionality (for parallel LTO compilation),
additionally an index is over these is stored to
$pkglibdir/bitcode/postgres.index.bc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #1973.
Rationale for this change
Slightly improves the performance of writing rows.
What changes are included in this PR?
To avoid cloning the
SchemaRefwe pass in the schema as a separate parameter. I also marked the benchmark functions asinline(never)so that they stand out more in the profiler, since they are operating on large chunks of data this should not create any overhead.Benchmark results on i7-10510U, run with
$ RUSTFLAGS="-C target-cpu=skylake" cargo bench --features row,jit --bench jit:master branch:
this branch:
If I understand the code correctly then the jit calls the same
write_field_xyzfunctions as the rust version and is not able to inline these functions. So it avoids the type dispatch, but instead has several more function calls than the rust code (which is able to inline some of thewrite_fieldfunctions). It should be possible to speed up the jit a lot if it could directly generate code corresponding to thewrite_fieldmethods that could get inlined and also avoid the downcasting.