ARROW-373: [C++] JSON serialization format for testing#202
ARROW-373: [C++] JSON serialization format for testing#202wesm wants to merge 27 commits intoapache:masterfrom
Conversation
|
This has ended up being a quite large project. Getting closer to something workable though, will try to get there this week |
|
Getting closer here. @xhochy I won't do it in this PR, but I'd sort of like to consolidate all the array classes into array.h and the builder classes into builder.h, then remove the |
external project Change-Id: I65db68e6972e12368da3ded0b70b8578689d45f3
Change-Id: I52f616d96a1abaa35e2620393f9c471ee7c152e5
Change-Id: I26ff705a285af0217fc1f9b71e646ebda1111016
Change-Id: I8dce236d9c9d5ee6badbe384249b3e2b0fbfc5a8
Change-Id: I1ea61fd3ff1d480eefdc663696e784e90ac0b7b6
Change-Id: Ie2107cee9b85c79122506ca81701865a7063b691
Change-Id: I70d02dcd2958217deb37296f280c0298d4f943a0
Change-Id: I1cf8aae078e76c03dcc6e6f7000ccfb6587cdc78
Change-Id: I6d34fa8cbb25ea3fdaf53bada30690a2b8dd6c1f
Change-Id: I112eaf0f591b80d94fad827cbf0d1c813d30d0bc
…g yet Change-Id: Ic33aed8adb80cc79cf24f843ef508722f0ff384c
…ce of c_type type member Change-Id: I7326738f8a235770ddebe9d5cf1ef90eb49b3e35
Change-Id: I7c52a441edcef57a7a868c80f0caa2f4ac734f22
Change-Id: I9cda0f769d8c942893cb0e33e772068b4c850ef8
Change-Id: I008bf0adc806034062d6684fd1615448db246c6b
Change-Id: I3e7508c435e4a9f5fe76c1df3951338a24d81839
Change-Id: Iab4687ef38f889100a8e83fad59c1bec3772810a
Change-Id: If772fa9cf8b3ea4f04cdf0825d91572e96825f31
Change-Id: I92590f8ad17c761576af499584225d4fe24c7440
Change-Id: I9403a253307d304d0dc5a71e5d8b7e623fbfa69f
Change-Id: I325f3c3a33c1ded53b083d19d72234794338b28b
Change-Id: Ic6efc59347c8234c8707492aa741eabaf82c0ffe
cpp/src/arrow/CMakeLists.txt
Outdated
There was a problem hiding this comment.
What's our policy on - vs _ in filenames`?
There was a problem hiding this comment.
I'm not sure there's a hard and fast rule. Kudu seems to use dashes for internal files (and for unit tests), and _ as a separator in "public" headers:
https://github.com/apache/kudu/tree/master/src/kudu/client
so you would have
widget_xyz-test.cc
or
widget_xyz.h
widget_xyz-internal.h
https://google.github.io/styleguide/cppguide.html#File_Names
let me know what you think
There was a problem hiding this comment.
The Kudu conventions seem fine to me. As long as we have some kind of convention I'm happy with it.
There was a problem hiding this comment.
Okay. By this reasoning we should rename bit-util.h and memory-pool.h to bit_util.h and memory_pool.h. Can do this here or in another patch
Change-Id: I56d3222db251c99af5c8a3536909e45b429c8150
|
OK, I'm done with this patch for now (before it gets any bigger) with a round trip JSON record batch test -- additional features / testing can get done in the course of building the integration test harness |
| @@ -0,0 +1,279 @@ | |||
| // Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
For the sake of documentation and understanding, it would be nice to have a test that will read a JSON from a string embedded in the code here. Not sure how large the minimal possible version could be but that would be really helpful to understand the format.
There was a problem hiding this comment.
Agreed, let me try to write a minimal example now.
Change-Id: Icd4cd4b58cb0ce392a856f83f89dd3e8a01a54b9
Change-Id: I200b04cababa0d02db39de764aea79e201372700
I also changed this code to use `arrow/api.h` in some places to be less sensitive to some kinds of header changes. Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#202 from wesm/PARQUET-797 and squashes the following commits: d909443 [Wes McKinney] Fix arrow/util/status.h use 72758ea [Wes McKinney] Update arrow hash 4802052 [Wes McKinney] Update parquet/arrow to use arrow/api.h where relevant, ARROW-418 API changes Change-Id: I2fd353425abc2fcba337a127c595f059786c9daf
updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.2 → v0.11.5](astral-sh/ruff-pre-commit@v0.11.2...v0.11.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Initial commit * init project * complete most of the annotations * fix FixedSizeBufferWriter init annotation * bump 10.0.1.2 * complete parquet core annotations * bump 10.0.1.3 * re-export modules * fix: add return type for foreign_buffer * fix output_stream and read_message annotations * ci: add release job * pre-commit specify flake8 version to 5.0.4 * flake8 ignore F821 for private files * optimize annotations * bump 10.0.1.4 * if param supports IOBase, it should also support NativeFile * bump 10.0.1.5 * pre-commit adds mypy lint * bump 10.0.1.6 * fix ci name * Remove version restrictions for Python. * release 10.0.1.7 * update poetry ci * Fix stubs for Table factory methods The main problem was that these were annotated as instance methods rather than static/class methods, but I've added some detail, too. * update pre-commit * update * fix: make fs.FileSystem.from_uri and hdfs.HadoopFileSystem.from_uri as classmethod * fix: fix read_metadata and read_schema wrong annotations (#11) * fix: typo S3FileSystem schema -> scheme (#12) * bump version 10.0.1.8 (#13) * . (#16) * make DataType hashable (#22) * pa.table support recordbatch (#20) * RecordBatchStreamReader supports next (#18) * add RecordBatch.to_pylist (#23) * precise return types for to_pandas (#25) * bump version 10.0.1.9 (#26) * [pre-commit.ci] pre-commit autoupdate (#27) * [pre-commit.ci] pre-commit autoupdate (#28) * Fix types in FlightDescriptor class (#29) * Fix types in FlightDescriptor class * Add argument types * chore: update pre-commit config (#30) * build: use `pixi` to manage project (#31) * chore: add taplo config (#32) * chore: update LICENSE date (#33) * doc: add CODE_OF_CONDUCT.md (#34) * [pre-commit.ci] pre-commit autoupdate (#38) * [pre-commit.ci] pre-commit autoupdate (#39) updates: - [github.com/astral-sh/ruff-pre-commit: v0.5.7 → v0.6.1](astral-sh/ruff-pre-commit@v0.5.7...v0.6.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (#48) updates: - [github.com/astral-sh/ruff-pre-commit: v0.6.1 → v0.6.2](astral-sh/ruff-pre-commit@v0.6.1...v0.6.2) - [github.com/pre-commit/mirrors-mypy: v1.11.1 → v1.11.2](pre-commit/mirrors-mypy@v1.11.1...v1.11.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * refactor: rewrite type annotations by hand. (#35) * chore: restart * update ruff config * build: add extra dependencies * update mypy config * feat: add util.pyi * feat: add types.pyi * feat: impl lib.pyi * update * feat: add acero.pyi * feat: add compute.pyi * add benchmark.pyi * add cffi * feat: add csv.pyi * disable isort single line * reformat * update compute.pyi * add _auzurefs.pyi * add _cuda.pyi * add _dataset.pyi * rename _stub_typing.pyi -> _stubs_typing.pyi * add _dataset_orc.pyi * add pyarrow-stubs/_dataset_parquet_encryption.pyi * add _dataset_parquet.pyi * add _feather.pyi * feat: add _flight.pyi * add _fs.pyi * add _gcsfs.pyi * add _hdfs.pyi * add _json.pyi * add _orc.pyi * add _parquet_encryption.pyi * add _parquet.pyi * update * add _parquet.pyi * add _s3fs.pyi * add _substrait.pyi * update * update * add parquet/core.pyi * add parquet/encryption.pyi * add BufferProtocol * impl _filesystemdataset_write * add dataset.pyi * add feather.pyi * add flight.pyi * add fs.pyi * add gandiva.pyi * add json.pyi * add orc.pyi * add pandas_compat.pyi * add substrait.pyi * update util.pyi * add interchange * add __lib_pxi * update __lib_pxi * update * update * add types.pyi * feat: add scalar.pyi * update types.pyi * update types.pyi * update scalar.pyi * update * update * update * update * update * update * feat: impl array * feat: add builder.pyi * add scipy * add tensor.pyi * feat: impl NativeFile * update io.pyi * complete io.pyi * add ipc.pyi * mv benchmark.pyi into __lib_pxi * add table.pyi * do re-export in lib.pyi * fix io.pyi * update * optimize scalar.pyi * optimize indices * complete ipc.pyi * update * fix NullableIterable * fix string array * ignore overload-overlap error * fix _Tabular.__getitem__ * remove additional_dependencies * remove check-mypy.sh (apache#49) * release 20240828 (apache#50) * fix release tag (apache#51) * ci: install hatch by pip (apache#52) * ci: fix hatch keyring (apache#53) * ci: use Release environment (apache#54) * remove Scalar generic type var _IsValid (apache#56) * remove Scalar generic type var _IsValid * make Array, Scalar, Types generic type var as covariant type (apache#57) * remove Field generic type var _Nullable (apache#58) * remove Field generic type var _Nullable * fix: pa.dictionary and pa.schema annotation (apache#59) * fix pa.dictionary annotation * fix: schema annotation * release new version (apache#60) * [pre-commit.ci] pre-commit autoupdate (apache#62) * release: 2024.9.3 (apache#63) use new date release format %Y.%m.%d * support pyarrow compute funcs (apache#61) * update compute.pyi * impl Aggregation funcs * impl arithmetic * imit bit-wise functions * imit rounding functions * optimize annotation * impl logarithmic functions * update * impl comparisons funcs * impl logical funcs * impl string predicates and transforms * impl string padding * impl string trimming * impl string splitting and component extraction * impl string joining and slicing * impl Containment tests * impl Categorizations * impl Structural transforms * impl Conversions * impl Temporal component extraction * impl random, Timezone handling * impl Array-wise functions * fix timestamp scalar * support build array with list of scalar (apache#64) * release 2024.9.4 (apache#65) * Version follows the version of pyarrow (apache#66) * import parquet.core into parquet __init__.py (apache#67) Update __init__.pyi * release 17.1 (apache#69) * fix: add missing submodule benchmark, csv and cuda (apache#71) * release 17.2 (apache#72) * fix: from_pylist covariance (apache#73) * [pre-commit.ci] pre-commit autoupdate (apache#74) * Fix return type for middleware factory's start_call (apache#75) It can return None if middleware is not needed for a given call. * release 17.3 (apache#76) * fix: add missing return type in FlightDescriptor static methods (apache#80) * Support Tabular filter with Expression (apache#81) support Tabular filter with Expression * Support compute functions to accept Expression as parameter (apache#82) * fix: Fix the return value of Expression comparison (apache#83) * release 17.4 (apache#84) * fix: fix the array return type (apache#89) * a few type improvements, mostly flight related (apache#90) * FlightError.extra_info -> bytes * annotate FlightStreamReader.cancel return * BasicAuth serialize/deserialize * RecordBatchFileReader.schema * actually str | bytes * add_type_to_Field (apache#87) * add_type_to_Field * Field.type should return the covariant DataType --------- Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * Support fsspec.AbstractFileSystem (apache#88) * supported_filesystem * fixes * remove unused import --------- Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * release 17.5 (apache#91) * [pre-commit.ci] pre-commit autoupdate (apache#95) * fix: parquet not accepting NativeFile (apache#98) * feat: support pa.Buffer buffer protocol (apache#99) * feat: Support `compute` functions to accept ChunkedArray. (apache#100) * release 17.6 (apache#101) * [pre-commit.ci] pre-commit autoupdate (apache#102) * working towards making return signatures only have one type (mean and exp) (apache#105) * group_by_returns_TableGroupBy * return_single_type_for_mean_exp * revert table.pyi * compute.mean does not support BinaryScalar or BinaryArray --------- Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * a table group_by was returing Self but should return TableGroupBy (apache#104) group_by_returns_TableGroupBy * [pre-commit.ci] pre-commit autoupdate (apache#106) updates: - [github.com/pre-commit/pre-commit-hooks: v4.6.0 → v5.0.0](pre-commit/pre-commit-hooks@v4.6.0...v5.0.0) - [github.com/astral-sh/ruff-pre-commit: v0.6.7 → v0.6.9](astral-sh/ruff-pre-commit@v0.6.7...v0.6.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: RecordBatch missing `from_arrays` and `from_pandas` (apache#108) * release 17.7 (apache#109) * fix_combine_chunks (apache#110) * make Self backward compatible (apache#115) * fix: update ConvertOptions (apache#114) * add type property to Array (apache#112) * add type property to Array * Array.type should return covariant --------- Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * release 17.8 (apache#117) * Add include_columns parameter in ConvertOptions (apache#118) * add list[str] overload to rename_columns (apache#119) * release 17.9 (apache#120) * [pre-commit.ci] pre-commit autoupdate (apache#124) updates: - [github.com/astral-sh/ruff-pre-commit: v0.6.9 → v0.7.0](astral-sh/ruff-pre-commit@v0.6.9...v0.7.0) - [github.com/pre-commit/mirrors-mypy: v1.11.2 → v1.12.1](pre-commit/mirrors-mypy@v1.11.2...v1.12.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * improve type annotations for parquet writer (apache#125) Add support for per-field compression specification Add missing none compression value. * Add missing return type for Schema.serialize (apache#123) * Add `Schema.field(int)` (apache#122) * Change various io related functions to support `StrPath` as a path input (apache#121) * Change various io related functions to support StrPath as a path input * fmt * Added StrPath | IO for feather types * fix type hint for sort_by (apache#130) sort_by takes str or list[tuple(name, order)] as its argument where str is a field name not a sort order * metadata on a schema can be passed as str (apache#128) For details see https://github.com/apache/arrow/blob/apache-arrow-17.0.0/python/pyarrow/types.pxi\#L2053-L2056 * Correct typevars for DictionaryType, MapType, RunEncodedType (apache#126) Correct type hints for Dictionary, RunEndEncoded and Map Signed-off-by: Jonas Dedden <university@jonas-dedden.de> Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * Add some more StrPath io parts that were overlooked. (apache#131) * Add some more StrPath io parts that were overlooked. Additionally, add the utility typealias `SingleOrList` that can be used in places where we want a concise type declaration but the there is a large union of types. * write_dataset(base_dir = ) can also take Path * Support ChunkedArray in add/append methods in Table (apache#129) * Add missing partitioning typing case (apache#132) This should now support the examples in the docstring for partitioning. * fix: typo 'permissive' instead of 'premissive' (apache#133) * release 17.10 (apache#134) * fix incorrect type hints for compute.sort_indices (apache#135) * disallow passing `names` as an argument to table when using dictionaries (apache#137) * [pre-commit.ci] pre-commit autoupdate (apache#138) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.0 → v0.7.1](astral-sh/ruff-pre-commit@v0.7.0...v0.7.1) - [github.com/pre-commit/mirrors-mypy: v1.12.1 → v1.13.0](pre-commit/mirrors-mypy@v1.12.1...v1.13.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add missing type for FlightEndpoint (apache#136) * release 17.11 (apache#139) * [pre-commit.ci] pre-commit autoupdate (apache#140) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.1 → v0.7.2](astral-sh/ruff-pre-commit@v0.7.1...v0.7.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (apache#142) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.2 → v0.7.3](astral-sh/ruff-pre-commit@v0.7.2...v0.7.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * chore: Create FUNDING.yml (apache#143) Create FUNDING.yml * fix: `read_schema` should return Schema (apache#145) fix: read_schema should return Schema * release 17.12 (apache#146) * [pre-commit.ci] pre-commit autoupdate (apache#147) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.3 → v0.7.4](astral-sh/ruff-pre-commit@v0.7.3...v0.7.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: `to_table` argument `columns` can be a dict of expressions (apache#149) * [pre-commit.ci] pre-commit autoupdate (apache#148) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.4 → v0.8.1](astral-sh/ruff-pre-commit@v0.7.4...v0.8.1) * ruff: ignore PYI063 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * release 17.13 (apache#151) * fix: FileSystem metadata value should be str (apache#152) * fix: FileSystemHandler metadata value should be str (apache#153) * [pre-commit.ci] pre-commit autoupdate (apache#154) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.1 → v0.8.2](astral-sh/ruff-pre-commit@v0.8.1...v0.8.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * improve coverage for pyarrow.struct typehint (apache#157) * fix: ipc typing (apache#159) * release 17.14 (apache#160) * fix: add missing param 'nbytes' to NativeFile.read (apache#163) * release 17.15 (apache#164) * [pre-commit.ci] pre-commit autoupdate (apache#161) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.2 → v0.8.3](astral-sh/ruff-pre-commit@v0.8.2...v0.8.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add 'None' as a valid argument for partitioning to the various parquet reading functions (apache#166) * [pre-commit.ci] pre-commit autoupdate (apache#165) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.6](astral-sh/ruff-pre-commit@v0.8.3...v0.8.6) - [github.com/pre-commit/mirrors-mypy: v1.13.0 → v1.14.1](pre-commit/mirrors-mypy@v1.13.0...v1.14.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: should use Collection[Array] instead list[Array] (apache#170) "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance Consider using "Sequence" instead, which is covariant * fix: update type hints for path_or_paths and source parameters in ParquetDataset and read_table (apache#171) * [pre-commit.ci] pre-commit autoupdate (apache#167) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.6 → v0.9.1](astral-sh/ruff-pre-commit@v0.8.6...v0.9.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 17.16 (apache#172) * Fixed pa.fixed_shape_tensor (apache#175) * [pre-commit.ci] pre-commit autoupdate (apache#173) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.1 → v0.9.4](astral-sh/ruff-pre-commit@v0.9.1...v0.9.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: Preserve generic in `ChunkedArray.type` (apache#177) * release 17.17 (apache#178) * [pre-commit.ci] pre-commit autoupdate (apache#176) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.4 → v0.9.6](astral-sh/ruff-pre-commit@v0.9.4...v0.9.6) - [github.com/pre-commit/mirrors-mypy: v1.14.1 → v1.15.0](pre-commit/mirrors-mypy@v1.14.1...v1.15.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: support to construct ListArray with primitive type (apache#179) * fix: Avoid `chunked_array` overlapping overloads (apache#183) * fix: Add placeholder annotations to `pc.if_else` (apache#182) * fix: Widen `Array` to `Array | ChunkedArray` (apache#181) * fix: add `pc.fill_null` (apache#185) - https://arrow.apache.org/docs/python/generated/pyarrow.compute.fill_null.html - https://github.com/narwhals-dev/narwhals/blob/05e47b27ebe27b24196cee5956d07748d65a62ee/narwhals/_arrow/series.py#L675 * fix: Allow Table.from_arrays to take a list containing a mix of Array and ChunkedArray (apache#187) Update table.pyi * release 17.18 (apache#188) * [pre-commit.ci] pre-commit autoupdate (apache#180) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.6 → v0.9.10](astral-sh/ruff-pre-commit@v0.9.6...v0.9.10) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: from_arrays for both Table and RecordBatch (apache#189) * fix: resolve some `pa.compute` overlaps (apache#184) * fix: resolve overlapping `compute.(add|divide)` * fix: copy from non-cloned signature * fix: resolve overlapping `compute.exp` * fix: resolve overlapping `compute.power` * fix: resolve overlapping `compute.equal` * fix: resolve overlapping `compute.and_` * fix: Include `Array` in `chunked_array` overload (apache#190) narwhals-dev/narwhals@0237f7a * release 17.19 (apache#191) * Add Scalar, Array and Type classes for Json & Uuid (apache#194) * Add Scalar, Array and Type classes for Json & Uuid * Formatting fixes * [pre-commit.ci] pre-commit autoupdate (apache#192) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.10 → v0.11.2](astral-sh/ruff-pre-commit@v0.9.10...v0.11.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Revert "Add Scalar, Array and Type classes for Json & Uuid" (apache#195) Revert "Add Scalar, Array and Type classes for Json & Uuid (apache#194)" This reverts commit 8f77909. * fix: Add missing `pc.equal` overload (apache#196) * feat: support pyarrow 19.0 (apache#198) * build: upgrade pyarrow min version to 19.0 * feat: support pyarrow 19.0 * omit mypy bool8 override error * fix: reexport new types (apache#199) * feat: override new patterns for func repeat and nulls (apache#200) * fix: reexport decimal64 array and decimal128 array * feat: override new patterns for func `repeat` and `nulls` * release: 19.1 (apache#201) * fix: Allow `Iterable[Table]` in `concat_tables` (apache#203) https://arrow.apache.org/docs/python/generated/pyarrow.concat_tables.html > tables : iterable of pyarrow.Table objects * fix: Allow `ChunkedArray[BooleanScalar]` in `pc.invert` (apache#204) Fixes https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L298-L299 * feat: Fully spec `TableGroupBy.aggregate` (apache#197) ## Related - https://arrow.apache.org/docs/python/compute.html#grouped-aggregations - https://arrow.apache.org/docs/python/generated/pyarrow.TableGroupBy.html#pyarrow.TableGroupBy.aggregate - https://github.com/apache/arrow/blob/34a984c842db42b409a1359e6e2cf167a2365a48/python/pyarrow/table.pxi#L6578-L6604 * fix: Add missing return type to `ChunkedArray.filter` (apache#205) * fix: Add relaxed final overload to logical functions (apache#206) Covers all of `pc.(and_ | and_kleene | and_not | and_not_kleene | or_ | or_kleene | xor)` Resolves: - https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L219-L233 - https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L662 * fix: Allow `ChunkedArray` in `Table.set_column` (apache#211) Also being more consistent with `ArrayOrChunkedArray[Any]` everywhere Discovered in - https://github.com/vega/vega-datasets/blob/343b7101391a81190ba24e1e8d62a381d2fef3bd/scripts/species.py#L798-L799 * chore: Ignore `fsspec` `[import-untyped]` (apache#210) ```py _fs.pyi:18: error: Skipping analyzing "fsspec": module is installed, but missing library stubs or py.typed marker [import-untyped] _fs.pyi:18: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports Found 1 error in 1 file (checked 64 source files) ``` - fsspec/filesystem_spec#625 - fsspec/filesystem_spec#1676 * feat: Convert `types.is_*` into `TypeIs` guards (apache#215) * chore: Add `types.__all__` * feat: Convert `types._is_*` into `TypeIs` guards I've been using this for a little while, but makes more sense to live in the stubs https://github.com/narwhals-dev/narwhals/blob/16427440e6d74939c403083b52ce3fb0af7d63c7/narwhals/_arrow/utils.py#L44-L67 * fix: Resolve `bit_wise_and` overlaps (apache#214) Fixes 3 errors: ```py compute.pyi:608:5 - error: Overload 1 for "bit_wise_and" overlaps overload 4 and returns an incompatible type (reportOverlappingOverload) compute.pyi:608:5 - error: Overload 1 for "bit_wise_and" overlaps overload 5 and returns an incompatible type (reportOverlappingOverload) compute.pyi:620:5 - error: Overload 3 for "bit_wise_and" will never be used because its parameters overlap overload 1 (reportOverlappingOverload) ``` * fix: Resolve `list_*` overlapping overloads (apache#213) * fix: Resolve `list_value_length` overlaps * fix: Resolve `list_element` overlaps * fix: Resolve `list_(flatten|slice|parent_indices)` overlaps An improvement, but still not that accurate * fix: Include `VarianceOptions` in `TableGroupBy.aggregate` (apache#212) - Follow-up to apache#197 - Noticed while writing up (narwhals-dev/narwhals#2385) - We already use it for `std`, `var` in https://github.com/narwhals-dev/narwhals/blob/16427440e6d74939c403083b52ce3fb0af7d63c7/narwhals/_arrow/group_by.py#L81-L82 * [pre-commit.ci] pre-commit autoupdate (apache#202) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.2 → v0.11.5](astral-sh/ruff-pre-commit@v0.11.2...v0.11.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: Resolve `Scalar.as_py` warnings for `DictionaryType` (apache#207) > scalar.pyi:75:20 - warning: TypeVar "_AsPyTypeK" appears only once in generic function signature > Use "object" instead (reportInvalidTypeVarUse) > scalar.pyi:85:20 - warning: TypeVar "_AsPyTypeK" appears only once in generic function signature > Use "object" instead (reportInvalidTypeVarUse) Instead just using `int`, which should be all that is possible from: https://github.com/zen-xu/pyarrow-stubs/blob/02552b81161d19d4aa71d8656b028eefac84612b/pyarrow-stubs/__lib_pxi/types.pyi#L154-L164 https://github.com/zen-xu/pyarrow-stubs/blob/02552b81161d19d4aa71d8656b028eefac84612b/pyarrow-stubs/__lib_pxi/types.pyi#L63-L70 * fix: Add default to `pc.sort_indices` (apache#216) * fix: Add default to `pc.sort_indices` Fixes narwhals-dev/narwhals#2390 (comment) Default is specified in https://arrow.apache.org/docs/python/generated/pyarrow.compute.sort_indices.html * refactor: Reuse some aliases * fix: Allow `list_size` with `Field` in `pa.list_` (apache#218) Closes apache#217 * allow `Table` or `RecordBatch` for dataset (apache#222) allow source argument pyarrow.dataset.dataset() to be RecordBatch | Table * refactor: Simplify `types` overloads (apache#219) * fix: `binary` overlap * fix: Simplify list constructors, `_Ordered` * refactor: Use `_Tz` default * fix: iter ChunkedArray should return scalar value (apache#224) * release: 19.2 (apache#225) * fix: Add missing `DictionaryArray` methods/properties (apache#226) ## Docs - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.indices - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary_decode - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary_encode ## Fixes - https://github.com/narwhals-dev/narwhals/blob/c23e56c56630761f0fbc58b575a1c987e57d58d5/narwhals/_arrow/series.py#L787-L798 - https://github.com/narwhals-dev/narwhals/blob/c23e56c56630761f0fbc58b575a1c987e57d58d5/narwhals/_arrow/series_cat.py#L14-L18 * chore: use pyright as static type checker (apache#227) * use pyright as static type checker * make pyright happy * fix: fix pyright action (apache#229) fix github ci * fix: Match runtime behavior of `(Table|RecordBatch).select` (apache#221) * fix: Match runtime behavior of `(Table|RecordBatch).select` ## Resolves - https://github.com/MarcoGorelli/narwhals/blob/5b02b592183b8d39e2d32e0aedd6c234bb22d405/narwhals/_arrow/dataframe.py#L305-L307 - https://github.com/MarcoGorelli/narwhals/blob/5b02b592183b8d39e2d32e0aedd6c234bb22d405/narwhals/_arrow/dataframe.py#L285-L294 ##Description Following up on what I thought was a simple stub issue, but we're both *too strict* and *too permissive* in different ways ##Examples {placeholder} ##Related - https://github.com/apache/arrow/blob/d2ddee62329eb711572b4d71d6380673d7f7edd1/python/pyarrow/table.pxi#L4367-L4374 - https://github.com/apache/arrow/blob/d2ddee62329eb711572b4d71d6380673d7f7edd1/python/pyarrow/table.pxi#L1721-L1739 * update select * update select --------- Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * [pre-commit.ci] pre-commit autoupdate (apache#220) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.5 → v0.11.8](astral-sh/ruff-pre-commit@v0.11.5...v0.11.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: narrow scalar when type is given (apache#230) * rename Uint -> UInt * feat: narrow scalar when type is given * release 19.3 (apache#231) * chore: pyright use strict mode (apache#233) * fix types * update array.pyi * update scalar.pyi * update * update array * update array * optimize chunked_array * optimizer iterchunks * update * update pyproject.toml * fix: pa.nulls accept type rather than types (apache#234) * [pre-commit.ci] pre-commit autoupdate (apache#232) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.8 → v0.11.9](astral-sh/ruff-pre-commit@v0.11.8...v0.11.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 19.4 (apache#235) * lint(pyright): disable reportUnknownMemberType (apache#239) * [pre-commit.ci] pre-commit autoupdate (apache#236) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.9 → v0.11.13](astral-sh/ruff-pre-commit@v0.11.9...v0.11.13) - [github.com/RobertCraigie/pyright-python: v1.1.400 → v1.1.401](RobertCraigie/pyright-python@v1.1.400...v1.1.401) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: support pyarrow 20.0 (apache#240) * [pre-commit.ci] pre-commit autoupdate (apache#241) updates: - [github.com/RobertCraigie/pyright-python: v1.1.401 → v1.1.402](RobertCraigie/pyright-python@v1.1.401...v1.1.402) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * support docstring (apache#242) * doc: complete tensor doc * doc: complete table doc * doc: complete scalar doc * doc: complete orc doc * doc: complete memory doc * doc: complete lib doc * doc: complete json doc * doc: complete hdfs doc * doc: complete gcsfs doc * doc: complete fs doc * doc: complete flight doc * doc: complete dataset doc * doc: complete dataset parquet doc * doc: complete dataset parquet encryption doc * doc: complete cuda doc * doc: complete csv doc * doc: complete azurefs doc * doc: complete core doc * doc: complete interchange doc * doc: complete array doc * doc: complete builder doc * doc: complete device doc * doc: complete io doc * doc: complete ipc doc * doc: complete types doc * mark deprecated apis * doc: complete _compute doc * doc: complete compute doc * doc: update compute doc * lint code * release 20.0.0.20250618 (apache#243) * fix: make ParquetFileFormat constructor args optional (apache#244) * fix: Field.remove_metadata should return Self (apache#246) * [pre-commit.ci] pre-commit autoupdate (apache#245) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.13 → v0.12.0](astral-sh/ruff-pre-commit@v0.11.13...v0.12.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 20.0.0.20250627 (apache#247) * fix: chunked_array with type should be specified (apache#250) * [pre-commit.ci] pre-commit autoupdate (apache#248) updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.0 → v0.12.3](astral-sh/ruff-pre-commit@v0.12.0...v0.12.3) - [github.com/RobertCraigie/pyright-python: v1.1.402 → v1.1.403](RobertCraigie/pyright-python@v1.1.402...v1.1.403) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 20.0.0.20250715 (apache#251) * fix: The type parameter of array should be covariant (apache#253) * release 20.0.0.20250716 (apache#254) * Add py.typed file to signify that the library is typed See the relevant PEP https://peps.python.org/pep-0561 * Prepare `pyarrow-stubs` for history merging MINOR: [Python] Prepare `pyarrow-stubs` for history merging Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * Add `ty` configuration and suppress error codes * One line per rule * Add licence header from original repo for all `.pyi` files * Revert "Add licence header from original repo for all `.pyi` files" This reverts commit 1631f39. * Prepare for licence merging * Exclude `stubs` from `rat` test * Add Apache licence clause to `py.typed` * Reduce list * Resolve merge conflict --------- Signed-off-by: Jonas Dedden <university@jonas-dedden.de> Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> Co-authored-by: Jim Bosch <talljimbo@gmail.com> Co-authored-by: Oliver Mannion <125105+tekumara@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eugene Toder <eltoder@users.noreply.github.com> Co-authored-by: fvankrieken <fvankrieken@planning.nyc.gov> Co-authored-by: Ilia Ablamonov <ilia@flamefork.ru> Co-authored-by: Mathias Beguin <mathias.beguin@hotmail.com> Co-authored-by: Dylan Scott <dylan.scott@gmail.com> Co-authored-by: deanm0000 <37878412+deanm0000@users.noreply.github.com> Co-authored-by: Jan Moravec <moravecj@post.cz> Co-authored-by: Marius van Niekerk <marius.v.niekerk@gmail.com> Co-authored-by: Jonas Dedden <university@jonas-dedden.de> Co-authored-by: Fábio D. Batista <fabio@atelie.dev.br> Co-authored-by: ben-freist <93315290+ben-freist@users.noreply.github.com> Co-authored-by: Jiahao Yuan <kahojyun@icloud.com> Co-authored-by: Pim de Haan <pimdehaan@gmail.com> Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com> Co-authored-by: Tom Crasset <25140344+tcrasset@users.noreply.github.com> Co-authored-by: Tom McTiernan <tmct@users.noreply.github.com> Co-authored-by: Rok Mihevc <rok@mihevc.org>
* Initial commit * init project * complete most of the annotations * fix FixedSizeBufferWriter init annotation * bump 10.0.1.2 * complete parquet core annotations * bump 10.0.1.3 * re-export modules * fix: add return type for foreign_buffer * fix output_stream and read_message annotations * ci: add release job * pre-commit specify flake8 version to 5.0.4 * flake8 ignore F821 for private files * optimize annotations * bump 10.0.1.4 * if param supports IOBase, it should also support NativeFile * bump 10.0.1.5 * pre-commit adds mypy lint * bump 10.0.1.6 * fix ci name * Remove version restrictions for Python. * release 10.0.1.7 * update poetry ci * Fix stubs for Table factory methods The main problem was that these were annotated as instance methods rather than static/class methods, but I've added some detail, too. * update pre-commit * update * fix: make fs.FileSystem.from_uri and hdfs.HadoopFileSystem.from_uri as classmethod * fix: fix read_metadata and read_schema wrong annotations (#11) * fix: typo S3FileSystem schema -> scheme (#12) * bump version 10.0.1.8 (#13) * . (#16) * make DataType hashable (#22) * pa.table support recordbatch (#20) * RecordBatchStreamReader supports next (#18) * add RecordBatch.to_pylist (#23) * precise return types for to_pandas (#25) * bump version 10.0.1.9 (#26) * [pre-commit.ci] pre-commit autoupdate (#27) * [pre-commit.ci] pre-commit autoupdate (#28) * Fix types in FlightDescriptor class (#29) * Fix types in FlightDescriptor class * Add argument types * chore: update pre-commit config (#30) * build: use `pixi` to manage project (#31) * chore: add taplo config (#32) * chore: update LICENSE date (#33) * doc: add CODE_OF_CONDUCT.md (#34) * [pre-commit.ci] pre-commit autoupdate (#38) * [pre-commit.ci] pre-commit autoupdate (#39) updates: - [github.com/astral-sh/ruff-pre-commit: v0.5.7 → v0.6.1](astral-sh/ruff-pre-commit@v0.5.7...v0.6.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (#48) updates: - [github.com/astral-sh/ruff-pre-commit: v0.6.1 → v0.6.2](astral-sh/ruff-pre-commit@v0.6.1...v0.6.2) - [github.com/pre-commit/mirrors-mypy: v1.11.1 → v1.11.2](pre-commit/mirrors-mypy@v1.11.1...v1.11.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * refactor: rewrite type annotations by hand. (#35) * chore: restart * update ruff config * build: add extra dependencies * update mypy config * feat: add util.pyi * feat: add types.pyi * feat: impl lib.pyi * update * feat: add acero.pyi * feat: add compute.pyi * add benchmark.pyi * add cffi * feat: add csv.pyi * disable isort single line * reformat * update compute.pyi * add _auzurefs.pyi * add _cuda.pyi * add _dataset.pyi * rename _stub_typing.pyi -> _stubs_typing.pyi * add _dataset_orc.pyi * add pyarrow-stubs/_dataset_parquet_encryption.pyi * add _dataset_parquet.pyi * add _feather.pyi * feat: add _flight.pyi * add _fs.pyi * add _gcsfs.pyi * add _hdfs.pyi * add _json.pyi * add _orc.pyi * add _parquet_encryption.pyi * add _parquet.pyi * update * add _parquet.pyi * add _s3fs.pyi * add _substrait.pyi * update * update * add parquet/core.pyi * add parquet/encryption.pyi * add BufferProtocol * impl _filesystemdataset_write * add dataset.pyi * add feather.pyi * add flight.pyi * add fs.pyi * add gandiva.pyi * add json.pyi * add orc.pyi * add pandas_compat.pyi * add substrait.pyi * update util.pyi * add interchange * add __lib_pxi * update __lib_pxi * update * update * add types.pyi * feat: add scalar.pyi * update types.pyi * update types.pyi * update scalar.pyi * update * update * update * update * update * update * feat: impl array * feat: add builder.pyi * add scipy * add tensor.pyi * feat: impl NativeFile * update io.pyi * complete io.pyi * add ipc.pyi * mv benchmark.pyi into __lib_pxi * add table.pyi * do re-export in lib.pyi * fix io.pyi * update * optimize scalar.pyi * optimize indices * complete ipc.pyi * update * fix NullableIterable * fix string array * ignore overload-overlap error * fix _Tabular.__getitem__ * remove additional_dependencies * remove check-mypy.sh (apache#49) * release 20240828 (apache#50) * fix release tag (apache#51) * ci: install hatch by pip (apache#52) * ci: fix hatch keyring (apache#53) * ci: use Release environment (apache#54) * remove Scalar generic type var _IsValid (apache#56) * remove Scalar generic type var _IsValid * make Array, Scalar, Types generic type var as covariant type (apache#57) * remove Field generic type var _Nullable (apache#58) * remove Field generic type var _Nullable * fix: pa.dictionary and pa.schema annotation (apache#59) * fix pa.dictionary annotation * fix: schema annotation * release new version (apache#60) * [pre-commit.ci] pre-commit autoupdate (apache#62) * release: 2024.9.3 (apache#63) use new date release format %Y.%m.%d * support pyarrow compute funcs (apache#61) * update compute.pyi * impl Aggregation funcs * impl arithmetic * imit bit-wise functions * imit rounding functions * optimize annotation * impl logarithmic functions * update * impl comparisons funcs * impl logical funcs * impl string predicates and transforms * impl string padding * impl string trimming * impl string splitting and component extraction * impl string joining and slicing * impl Containment tests * impl Categorizations * impl Structural transforms * impl Conversions * impl Temporal component extraction * impl random, Timezone handling * impl Array-wise functions * fix timestamp scalar * support build array with list of scalar (apache#64) * release 2024.9.4 (apache#65) * Version follows the version of pyarrow (apache#66) * import parquet.core into parquet __init__.py (apache#67) Update __init__.pyi * release 17.1 (apache#69) * fix: add missing submodule benchmark, csv and cuda (apache#71) * release 17.2 (apache#72) * fix: from_pylist covariance (apache#73) * [pre-commit.ci] pre-commit autoupdate (apache#74) * Fix return type for middleware factory's start_call (apache#75) It can return None if middleware is not needed for a given call. * release 17.3 (apache#76) * fix: add missing return type in FlightDescriptor static methods (apache#80) * Support Tabular filter with Expression (apache#81) support Tabular filter with Expression * Support compute functions to accept Expression as parameter (apache#82) * fix: Fix the return value of Expression comparison (apache#83) * release 17.4 (apache#84) * fix: fix the array return type (apache#89) * a few type improvements, mostly flight related (apache#90) * FlightError.extra_info -> bytes * annotate FlightStreamReader.cancel return * BasicAuth serialize/deserialize * RecordBatchFileReader.schema * actually str | bytes * add_type_to_Field (apache#87) * add_type_to_Field * Field.type should return the covariant DataType --------- Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * Support fsspec.AbstractFileSystem (apache#88) * supported_filesystem * fixes * remove unused import --------- Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * release 17.5 (apache#91) * [pre-commit.ci] pre-commit autoupdate (apache#95) * fix: parquet not accepting NativeFile (apache#98) * feat: support pa.Buffer buffer protocol (apache#99) * feat: Support `compute` functions to accept ChunkedArray. (apache#100) * release 17.6 (apache#101) * [pre-commit.ci] pre-commit autoupdate (apache#102) * working towards making return signatures only have one type (mean and exp) (apache#105) * group_by_returns_TableGroupBy * return_single_type_for_mean_exp * revert table.pyi * compute.mean does not support BinaryScalar or BinaryArray --------- Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * a table group_by was returing Self but should return TableGroupBy (apache#104) group_by_returns_TableGroupBy * [pre-commit.ci] pre-commit autoupdate (apache#106) updates: - [github.com/pre-commit/pre-commit-hooks: v4.6.0 → v5.0.0](pre-commit/pre-commit-hooks@v4.6.0...v5.0.0) - [github.com/astral-sh/ruff-pre-commit: v0.6.7 → v0.6.9](astral-sh/ruff-pre-commit@v0.6.7...v0.6.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: RecordBatch missing `from_arrays` and `from_pandas` (apache#108) * release 17.7 (apache#109) * fix_combine_chunks (apache#110) * make Self backward compatible (apache#115) * fix: update ConvertOptions (apache#114) * add type property to Array (apache#112) * add type property to Array * Array.type should return covariant --------- Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * release 17.8 (apache#117) * Add include_columns parameter in ConvertOptions (apache#118) * add list[str] overload to rename_columns (apache#119) * release 17.9 (apache#120) * [pre-commit.ci] pre-commit autoupdate (apache#124) updates: - [github.com/astral-sh/ruff-pre-commit: v0.6.9 → v0.7.0](astral-sh/ruff-pre-commit@v0.6.9...v0.7.0) - [github.com/pre-commit/mirrors-mypy: v1.11.2 → v1.12.1](pre-commit/mirrors-mypy@v1.11.2...v1.12.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * improve type annotations for parquet writer (apache#125) Add support for per-field compression specification Add missing none compression value. * Add missing return type for Schema.serialize (apache#123) * Add `Schema.field(int)` (apache#122) * Change various io related functions to support `StrPath` as a path input (apache#121) * Change various io related functions to support StrPath as a path input * fmt * Added StrPath | IO for feather types * fix type hint for sort_by (apache#130) sort_by takes str or list[tuple(name, order)] as its argument where str is a field name not a sort order * metadata on a schema can be passed as str (apache#128) For details see https://github.com/apache/arrow/blob/apache-arrow-17.0.0/python/pyarrow/types.pxi\#L2053-L2056 * Correct typevars for DictionaryType, MapType, RunEncodedType (apache#126) Correct type hints for Dictionary, RunEndEncoded and Map Signed-off-by: Jonas Dedden <university@jonas-dedden.de> Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * Add some more StrPath io parts that were overlooked. (apache#131) * Add some more StrPath io parts that were overlooked. Additionally, add the utility typealias `SingleOrList` that can be used in places where we want a concise type declaration but the there is a large union of types. * write_dataset(base_dir = ) can also take Path * Support ChunkedArray in add/append methods in Table (apache#129) * Add missing partitioning typing case (apache#132) This should now support the examples in the docstring for partitioning. * fix: typo 'permissive' instead of 'premissive' (apache#133) * release 17.10 (apache#134) * fix incorrect type hints for compute.sort_indices (apache#135) * disallow passing `names` as an argument to table when using dictionaries (apache#137) * [pre-commit.ci] pre-commit autoupdate (apache#138) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.0 → v0.7.1](astral-sh/ruff-pre-commit@v0.7.0...v0.7.1) - [github.com/pre-commit/mirrors-mypy: v1.12.1 → v1.13.0](pre-commit/mirrors-mypy@v1.12.1...v1.13.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add missing type for FlightEndpoint (apache#136) * release 17.11 (apache#139) * [pre-commit.ci] pre-commit autoupdate (apache#140) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.1 → v0.7.2](astral-sh/ruff-pre-commit@v0.7.1...v0.7.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [pre-commit.ci] pre-commit autoupdate (apache#142) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.2 → v0.7.3](astral-sh/ruff-pre-commit@v0.7.2...v0.7.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * chore: Create FUNDING.yml (apache#143) Create FUNDING.yml * fix: `read_schema` should return Schema (apache#145) fix: read_schema should return Schema * release 17.12 (apache#146) * [pre-commit.ci] pre-commit autoupdate (apache#147) updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.3 → v0.7.4](astral-sh/ruff-pre-commit@v0.7.3...v0.7.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: `to_table` argument `columns` can be a dict of expressions (apache#149) * [pre-commit.ci] pre-commit autoupdate (apache#148) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.7.4 → v0.8.1](astral-sh/ruff-pre-commit@v0.7.4...v0.8.1) * ruff: ignore PYI063 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * release 17.13 (apache#151) * fix: FileSystem metadata value should be str (apache#152) * fix: FileSystemHandler metadata value should be str (apache#153) * [pre-commit.ci] pre-commit autoupdate (apache#154) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.1 → v0.8.2](astral-sh/ruff-pre-commit@v0.8.1...v0.8.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * improve coverage for pyarrow.struct typehint (apache#157) * fix: ipc typing (apache#159) * release 17.14 (apache#160) * fix: add missing param 'nbytes' to NativeFile.read (apache#163) * release 17.15 (apache#164) * [pre-commit.ci] pre-commit autoupdate (apache#161) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.2 → v0.8.3](astral-sh/ruff-pre-commit@v0.8.2...v0.8.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add 'None' as a valid argument for partitioning to the various parquet reading functions (apache#166) * [pre-commit.ci] pre-commit autoupdate (apache#165) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.3 → v0.8.6](astral-sh/ruff-pre-commit@v0.8.3...v0.8.6) - [github.com/pre-commit/mirrors-mypy: v1.13.0 → v1.14.1](pre-commit/mirrors-mypy@v1.13.0...v1.14.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: should use Collection[Array] instead list[Array] (apache#170) "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance Consider using "Sequence" instead, which is covariant * fix: update type hints for path_or_paths and source parameters in ParquetDataset and read_table (apache#171) * [pre-commit.ci] pre-commit autoupdate (apache#167) updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.6 → v0.9.1](astral-sh/ruff-pre-commit@v0.8.6...v0.9.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 17.16 (apache#172) * Fixed pa.fixed_shape_tensor (apache#175) * [pre-commit.ci] pre-commit autoupdate (apache#173) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.1 → v0.9.4](astral-sh/ruff-pre-commit@v0.9.1...v0.9.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: Preserve generic in `ChunkedArray.type` (apache#177) * release 17.17 (apache#178) * [pre-commit.ci] pre-commit autoupdate (apache#176) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.4 → v0.9.6](astral-sh/ruff-pre-commit@v0.9.4...v0.9.6) - [github.com/pre-commit/mirrors-mypy: v1.14.1 → v1.15.0](pre-commit/mirrors-mypy@v1.14.1...v1.15.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: support to construct ListArray with primitive type (apache#179) * fix: Avoid `chunked_array` overlapping overloads (apache#183) * fix: Add placeholder annotations to `pc.if_else` (apache#182) * fix: Widen `Array` to `Array | ChunkedArray` (apache#181) * fix: add `pc.fill_null` (apache#185) - https://arrow.apache.org/docs/python/generated/pyarrow.compute.fill_null.html - https://github.com/narwhals-dev/narwhals/blob/05e47b27ebe27b24196cee5956d07748d65a62ee/narwhals/_arrow/series.py#L675 * fix: Allow Table.from_arrays to take a list containing a mix of Array and ChunkedArray (apache#187) Update table.pyi * release 17.18 (apache#188) * [pre-commit.ci] pre-commit autoupdate (apache#180) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.6 → v0.9.10](astral-sh/ruff-pre-commit@v0.9.6...v0.9.10) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: from_arrays for both Table and RecordBatch (apache#189) * fix: resolve some `pa.compute` overlaps (apache#184) * fix: resolve overlapping `compute.(add|divide)` * fix: copy from non-cloned signature * fix: resolve overlapping `compute.exp` * fix: resolve overlapping `compute.power` * fix: resolve overlapping `compute.equal` * fix: resolve overlapping `compute.and_` * fix: Include `Array` in `chunked_array` overload (apache#190) narwhals-dev/narwhals@0237f7a * release 17.19 (apache#191) * Add Scalar, Array and Type classes for Json & Uuid (apache#194) * Add Scalar, Array and Type classes for Json & Uuid * Formatting fixes * [pre-commit.ci] pre-commit autoupdate (apache#192) updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.10 → v0.11.2](astral-sh/ruff-pre-commit@v0.9.10...v0.11.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Revert "Add Scalar, Array and Type classes for Json & Uuid" (apache#195) Revert "Add Scalar, Array and Type classes for Json & Uuid (apache#194)" * fix: Add missing `pc.equal` overload (apache#196) * feat: support pyarrow 19.0 (apache#198) * build: upgrade pyarrow min version to 19.0 * feat: support pyarrow 19.0 * omit mypy bool8 override error * fix: reexport new types (apache#199) * feat: override new patterns for func repeat and nulls (apache#200) * fix: reexport decimal64 array and decimal128 array * feat: override new patterns for func `repeat` and `nulls` * release: 19.1 (apache#201) * fix: Allow `Iterable[Table]` in `concat_tables` (apache#203) https://arrow.apache.org/docs/python/generated/pyarrow.concat_tables.html > tables : iterable of pyarrow.Table objects * fix: Allow `ChunkedArray[BooleanScalar]` in `pc.invert` (apache#204) Fixes https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L298-L299 * feat: Fully spec `TableGroupBy.aggregate` (apache#197) - https://arrow.apache.org/docs/python/compute.html#grouped-aggregations - https://arrow.apache.org/docs/python/generated/pyarrow.TableGroupBy.html#pyarrow.TableGroupBy.aggregate - https://github.com/apache/arrow/blob/34a984c842db42b409a1359e6e2cf167a2365a48/python/pyarrow/table.pxi#L6578-L6604 * fix: Add missing return type to `ChunkedArray.filter` (apache#205) * fix: Add relaxed final overload to logical functions (apache#206) Covers all of `pc.(and_ | and_kleene | and_not | and_not_kleene | or_ | or_kleene | xor)` Resolves: - https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L219-L233 - https://github.com/narwhals-dev/narwhals/blob/caabc0efdef54f117c83888926860e3972ef69d5/narwhals/_arrow/series.py#L662 * fix: Allow `ChunkedArray` in `Table.set_column` (apache#211) Also being more consistent with `ArrayOrChunkedArray[Any]` everywhere Discovered in - https://github.com/vega/vega-datasets/blob/343b7101391a81190ba24e1e8d62a381d2fef3bd/scripts/species.py#L798-L799 * chore: Ignore `fsspec` `[import-untyped]` (apache#210) ```py _fs.pyi:18: error: Skipping analyzing "fsspec": module is installed, but missing library stubs or py.typed marker [import-untyped] _fs.pyi:18: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports Found 1 error in 1 file (checked 64 source files) ``` - fsspec/filesystem_spec#625 - fsspec/filesystem_spec#1676 * feat: Convert `types.is_*` into `TypeIs` guards (apache#215) * chore: Add `types.__all__` * feat: Convert `types._is_*` into `TypeIs` guards I've been using this for a little while, but makes more sense to live in the stubs https://github.com/narwhals-dev/narwhals/blob/16427440e6d74939c403083b52ce3fb0af7d63c7/narwhals/_arrow/utils.py#L44-L67 * fix: Resolve `bit_wise_and` overlaps (apache#214) Fixes 3 errors: ```py compute.pyi:608:5 - error: Overload 1 for "bit_wise_and" overlaps overload 4 and returns an incompatible type (reportOverlappingOverload) compute.pyi:608:5 - error: Overload 1 for "bit_wise_and" overlaps overload 5 and returns an incompatible type (reportOverlappingOverload) compute.pyi:620:5 - error: Overload 3 for "bit_wise_and" will never be used because its parameters overlap overload 1 (reportOverlappingOverload) ``` * fix: Resolve `list_*` overlapping overloads (apache#213) * fix: Resolve `list_value_length` overlaps * fix: Resolve `list_element` overlaps * fix: Resolve `list_(flatten|slice|parent_indices)` overlaps An improvement, but still not that accurate * fix: Include `VarianceOptions` in `TableGroupBy.aggregate` (apache#212) - Follow-up to apache#197 - Noticed while writing up (narwhals-dev/narwhals#2385) - We already use it for `std`, `var` in https://github.com/narwhals-dev/narwhals/blob/16427440e6d74939c403083b52ce3fb0af7d63c7/narwhals/_arrow/group_by.py#L81-L82 * [pre-commit.ci] pre-commit autoupdate (apache#202) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.2 → v0.11.5](astral-sh/ruff-pre-commit@v0.11.2...v0.11.5) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix: Resolve `Scalar.as_py` warnings for `DictionaryType` (apache#207) > scalar.pyi:75:20 - warning: TypeVar "_AsPyTypeK" appears only once in generic function signature > Use "object" instead (reportInvalidTypeVarUse) > scalar.pyi:85:20 - warning: TypeVar "_AsPyTypeK" appears only once in generic function signature > Use "object" instead (reportInvalidTypeVarUse) Instead just using `int`, which should be all that is possible from: https://github.com/zen-xu/pyarrow-stubs/blob/02552b81161d19d4aa71d8656b028eefac84612b/pyarrow-stubs/__lib_pxi/types.pyi#L154-L164 https://github.com/zen-xu/pyarrow-stubs/blob/02552b81161d19d4aa71d8656b028eefac84612b/pyarrow-stubs/__lib_pxi/types.pyi#L63-L70 * fix: Add default to `pc.sort_indices` (apache#216) * fix: Add default to `pc.sort_indices` Fixes narwhals-dev/narwhals#2390 (comment) Default is specified in https://arrow.apache.org/docs/python/generated/pyarrow.compute.sort_indices.html * refactor: Reuse some aliases * fix: Allow `list_size` with `Field` in `pa.list_` (apache#218) Closes apache#217 * allow `Table` or `RecordBatch` for dataset (apache#222) allow source argument pyarrow.dataset.dataset() to be RecordBatch | Table * refactor: Simplify `types` overloads (apache#219) * fix: `binary` overlap * fix: Simplify list constructors, `_Ordered` * refactor: Use `_Tz` default * fix: iter ChunkedArray should return scalar value (apache#224) * release: 19.2 (apache#225) * fix: Add missing `DictionaryArray` methods/properties (apache#226) - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.indices - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary_decode - https://arrow.apache.org/docs/python/generated/pyarrow.DictionaryArray.html#pyarrow.DictionaryArray.dictionary_encode - https://github.com/narwhals-dev/narwhals/blob/c23e56c56630761f0fbc58b575a1c987e57d58d5/narwhals/_arrow/series.py#L787-L798 - https://github.com/narwhals-dev/narwhals/blob/c23e56c56630761f0fbc58b575a1c987e57d58d5/narwhals/_arrow/series_cat.py#L14-L18 * chore: use pyright as static type checker (apache#227) * use pyright as static type checker * make pyright happy * fix: fix pyright action (apache#229) fix github ci * fix: Match runtime behavior of `(Table|RecordBatch).select` (apache#221) * fix: Match runtime behavior of `(Table|RecordBatch).select` - https://github.com/MarcoGorelli/narwhals/blob/5b02b592183b8d39e2d32e0aedd6c234bb22d405/narwhals/_arrow/dataframe.py#L305-L307 - https://github.com/MarcoGorelli/narwhals/blob/5b02b592183b8d39e2d32e0aedd6c234bb22d405/narwhals/_arrow/dataframe.py#L285-L294 Following up on what I thought was a simple stub issue, but we're both *too strict* and *too permissive* in different ways {placeholder} - https://github.com/apache/arrow/blob/d2ddee62329eb711572b4d71d6380673d7f7edd1/python/pyarrow/table.pxi#L4367-L4374 - https://github.com/apache/arrow/blob/d2ddee62329eb711572b4d71d6380673d7f7edd1/python/pyarrow/table.pxi#L1721-L1739 * update select * update select --------- Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * [pre-commit.ci] pre-commit autoupdate (apache#220) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.5 → v0.11.8](astral-sh/ruff-pre-commit@v0.11.5...v0.11.8) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: narrow scalar when type is given (apache#230) * rename Uint -> UInt * feat: narrow scalar when type is given * release 19.3 (apache#231) * chore: pyright use strict mode (apache#233) * fix types * update array.pyi * update scalar.pyi * update * update array * update array * optimize chunked_array * optimizer iterchunks * update * update pyproject.toml * fix: pa.nulls accept type rather than types (apache#234) * [pre-commit.ci] pre-commit autoupdate (apache#232) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.8 → v0.11.9](astral-sh/ruff-pre-commit@v0.11.8...v0.11.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 19.4 (apache#235) * lint(pyright): disable reportUnknownMemberType (apache#239) * [pre-commit.ci] pre-commit autoupdate (apache#236) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.9 → v0.11.13](astral-sh/ruff-pre-commit@v0.11.9...v0.11.13) - [github.com/RobertCraigie/pyright-python: v1.1.400 → v1.1.401](RobertCraigie/pyright-python@v1.1.400...v1.1.401) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * feat: support pyarrow 20.0 (apache#240) * [pre-commit.ci] pre-commit autoupdate (apache#241) updates: - [github.com/RobertCraigie/pyright-python: v1.1.401 → v1.1.402](RobertCraigie/pyright-python@v1.1.401...v1.1.402) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * support docstring (apache#242) * doc: complete tensor doc * doc: complete table doc * doc: complete scalar doc * doc: complete orc doc * doc: complete memory doc * doc: complete lib doc * doc: complete json doc * doc: complete hdfs doc * doc: complete gcsfs doc * doc: complete fs doc * doc: complete flight doc * doc: complete dataset doc * doc: complete dataset parquet doc * doc: complete dataset parquet encryption doc * doc: complete cuda doc * doc: complete csv doc * doc: complete azurefs doc * doc: complete core doc * doc: complete interchange doc * doc: complete array doc * doc: complete builder doc * doc: complete device doc * doc: complete io doc * doc: complete ipc doc * doc: complete types doc * mark deprecated apis * doc: complete _compute doc * doc: complete compute doc * doc: update compute doc * lint code * release 20.0.0.20250618 (apache#243) * fix: make ParquetFileFormat constructor args optional (apache#244) * fix: Field.remove_metadata should return Self (apache#246) * [pre-commit.ci] pre-commit autoupdate (apache#245) updates: - [github.com/astral-sh/ruff-pre-commit: v0.11.13 → v0.12.0](astral-sh/ruff-pre-commit@v0.11.13...v0.12.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 20.0.0.20250627 (apache#247) * fix: chunked_array with type should be specified (apache#250) * [pre-commit.ci] pre-commit autoupdate (apache#248) updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.0 → v0.12.3](astral-sh/ruff-pre-commit@v0.12.0...v0.12.3) - [github.com/RobertCraigie/pyright-python: v1.1.402 → v1.1.403](RobertCraigie/pyright-python@v1.1.402...v1.1.403) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * release 20.0.0.20250715 (apache#251) * fix: The type parameter of array should be covariant (apache#253) * release 20.0.0.20250716 (apache#254) * Add py.typed file to signify that the library is typed See the relevant PEP https://peps.python.org/pep-0561 * Prepare `pyarrow-stubs` for history merging MINOR: [Python] Prepare `pyarrow-stubs` for history merging Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> * Add `ty` configuration and suppress error codes * One line per rule * Add licence header from original repo for all `.pyi` files * Revert "Add licence header from original repo for all `.pyi` files" * Prepare for licence merging * Exclude `stubs` from `rat` test * Add Apache licence clause to `py.typed` * Reduce list * Resolve merge conflict --------- Signed-off-by: Jonas Dedden <university@jonas-dedden.de> Co-authored-by: ZhengYu, Xu <zen-xu@outlook.com> Co-authored-by: Jim Bosch <talljimbo@gmail.com> Co-authored-by: Oliver Mannion <125105+tekumara@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eugene Toder <eltoder@users.noreply.github.com> Co-authored-by: fvankrieken <fvankrieken@planning.nyc.gov> Co-authored-by: Ilia Ablamonov <ilia@flamefork.ru> Co-authored-by: Mathias Beguin <mathias.beguin@hotmail.com> Co-authored-by: Dylan Scott <dylan.scott@gmail.com> Co-authored-by: deanm0000 <37878412+deanm0000@users.noreply.github.com> Co-authored-by: Jan Moravec <moravecj@post.cz> Co-authored-by: Marius van Niekerk <marius.v.niekerk@gmail.com> Co-authored-by: Jonas Dedden <university@jonas-dedden.de> Co-authored-by: Fábio D. Batista <fabio@atelie.dev.br> Co-authored-by: ben-freist <93315290+ben-freist@users.noreply.github.com> Co-authored-by: Jiahao Yuan <kahojyun@icloud.com> Co-authored-by: Pim de Haan <pimdehaan@gmail.com> Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com> Co-authored-by: Tom Crasset <25140344+tcrasset@users.noreply.github.com> Co-authored-by: Tom McTiernan <tmct@users.noreply.github.com> Co-authored-by: Rok Mihevc <rok@mihevc.org>
C++ version of ARROW-372