Skip to content

.take silently overflow on list array (when casting to large_list is needed) #26467

@asfimport

Description

@asfimport

reproducer below

import numpy as np
import pyarrow as pa
arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)])
nb_repeat = 2**32 // arr.offsets.to_numpy()[-1]
indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat))
big_arr = arr.take(indices)
print(big_arr.offsets[-5:])
big_arr.validate() # hopefully this can catch it 

[
  -21,
  -16,
  -11,
  -6,
  -1
]
---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
<ipython-input-1-09503f9cbb04> in <module>
      6 big_arr = arr.take(indices)
      7 print(big_arr.offsets[-5:])
----> 8 big_arr.validate()

/opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.Array.validate()

/opt/conda/envs/model/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Negative offsets in list array

and it works fine with large_array (as expected) :

import numpy as np
import pyarrow as pa
arr = pa.array([np.arange(x).astype(np.int8) for x in range(6)], type=pa.large_list(pa.int8()))
nb_repeat = 2**32 // arr.offsets.to_numpy()[-1]
indices = pa.array(np.repeat(np.arange(len(arr)), nb_repeat))
big_arr = arr.take(indices)
print(big_arr.offsets[-5:])
big_arr.validate()
[
  4294967275,
  4294967280,
  4294967285,
  4294967290,
  4294967295
]

Reporter: Artem KOZHEVNIKOV / @artemru

Related issues:

Note: This issue was originally created as ARROW-10494. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions