Commit 2816f37
committed
ARROW-10812: [Rust] Make BooleanArray not a PrimitiveArray
This PR creates a new struct `BooleanArray`, that replaces `PrimitiveArray<BooleanType>`, so that we do not have to consider the differences between being bit-packed and non-bit packed.
This difference is causing a significant performance degradation described on ARROW-10453 and #8837 .
This usage of different logic is already observed in most of our kernels, as the code for byte-width and bit-packed is almost always different, due to how offsets are computed. With this PR, that offset computation no longer depends on bit-packed vs non-bit-packed.
IMPORTANT: this removed support from Boolean array to UnionArray, as `UnionArray` currently only supports `PrimitiveType`.
Micro benchmarks (worse to best, statistically insignificant ignored):
| benchmark | variation |
|-------------- | -------------- |
| min nulls 512 | 33.7 |
| record_batches_to_csv | 23.1 |
| array_string_from_vec 256 | 5.6 |
| array_string_from_vec 512 | 5.2 |
| take bool nulls 512 | 4.9 |
| cast int32 to int64 512 | 2.5 |
| equal_512 | 2.3 |
| filter u8 very low selectivity | 2.2 |
| array_slice 512 | 2.1 |
| take bool nulls 1024 | 2.0 |
| cast int64 to int32 512 | 1.6 |
| min 512 | 1.6 |
| take i32 512 | 1.1 |
| add 512 | 1.1 |
| array_slice 2048 | 1.0 |
| length | 1.0 |
| filter u8 low selectivity | 0.9 |
| filter u8 high selectivity | 0.9 |
| array_string_from_vec 128 | 0.9 |
| cast int32 to float64 512 | 0.9 |
| cast timestamp_ms to i64 512 | 0.8 |
| take str null indices 512 | 0.6 |
| sum 512 | 0.4 |
| filter context u8 very low selectivity | -0.7 |
| take i32 1024 | -0.9 |
| filter context f32 very low selectivity | -0.9 |
| cast float64 to float32 512 | -1.0 |
| equal_nulls_512 | -1.0 |
| cast time32s to time32ms 512 | -1.1 |
| sort 2^12 | -1.2 |
| struct_array_from_vec 128 | -1.4 |
| array_from_vec 256 | -1.4 |
| array_from_vec 128 | -1.5 |
| filter context u8 high selectivity | -1.6 |
| limit 512, 512 | -1.7 |
| equal_string_nulls_512 | -1.8 |
| take i32 nulls 1024 | -1.8 |
| struct_array_from_vec 512 | -1.9 |
| filter context f32 high selectivity | -2.0 |
| cast timestamp_ms to timestamp_ns 512 | -2.2 |
| take i32 nulls 512 | -2.3 |
| buffer_bit_ops or | -2.4 |
| array_from_vec 512 | -2.6 |
| cast float64 to uint64 512 | -2.7 |
| take str 512 | -2.8 |
| min nulls string 512 | -3.1 |
| cast int32 to int32 512 | -3.3 |
| array_slice 128 | -3.3 |
| filter context u8 w NULLs very low selectivity | -3.3 |
| buffer_bit_ops and | -3.4 |
| struct_array_from_vec 256 | -4.2 |
| cast int32 to uint32 512 | -4.5 |
| multiply 512 | -5.2 |
| equal_string_512 | -5.5 |
| take str null values null indices 1024 | -6.8 |
| sum nulls 512 | -13.3 |
| add_nulls_512 | -17.6 |
| like_utf8 scalar contains | -17.8 |
| nlike_utf8 scalar contains | -17.9 |
| nlike_utf8 scalar complex | -24.6 |
| like_utf8 scalar complex | -25.2 |
| cast time64ns to time32s 512 | -42.7 |
| cast date64 to date32 512 | -49.1 |
| cast date32 to date64 512 | -50.7 |
| nlike_utf8 scalar starts with | -51.1 |
| nlike_utf8 scalar ends with | -55.1 |
| like_utf8 scalar ends with | -55.5 |
| like_utf8 scalar starts with | -56.3 |
| nlike_utf8 scalar equals | -67.8 |
| like_utf8 scalar equals | -74.2 |
| eq Float32 | -75.7 |
| gt_eq Float32 | -76.1 |
| lt_eq Float32 | -76.5 |
| not | -77.1 |
| and | -78.6 |
| or | -78.7 |
| lt_eq scalar Float32 | -79.4 |
| eq scalar Float32 | -82.1 |
| neq Float32 | -82.1 |
| lt scalar Float32 | -82.1 |
| lt Float32 | -82.3 |
| gt Float32 | -82.4 |
| gt_eq scalar Float32 | -82.4 |
| neq scalar Float32 | -82.6 |
| gt scalar Float32 | -84.7 |
Closes #8842 from jorgecarleitao/boolean
Lead-authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>1 parent a774ae7 commit 2816f37
24 files changed
Lines changed: 988 additions & 434 deletions
File tree
- rust
- arrow
- benches
- src
- array
- equal
- compute/kernels
- csv
- util
- parquet/src/arrow
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
49 | | - | |
| 49 | + | |
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
50 | 64 | | |
51 | 65 | | |
52 | 66 | | |
| |||
101 | 115 | | |
102 | 116 | | |
103 | 117 | | |
104 | | - | |
| 118 | + | |
105 | 119 | | |
106 | 120 | | |
107 | 121 | | |
108 | 122 | | |
109 | | - | |
| 123 | + | |
110 | 124 | | |
111 | 125 | | |
112 | 126 | | |
113 | 127 | | |
114 | 128 | | |
115 | | - | |
| 129 | + | |
116 | 130 | | |
117 | 131 | | |
118 | 132 | | |
119 | 133 | | |
120 | | - | |
| 134 | + | |
121 | 135 | | |
122 | 136 | | |
123 | 137 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
0 commit comments