ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table#8044
ARROW-7663: [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table#8044arw2019 wants to merge 6 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
We lose the the more specific traceback and ZeroDivisionError message, in favor of
In [11]: class MyBrokenInt:
...: def __init__(self):
...: 1/0
In [12]: pa.array([MyBrokenInt()], type=pa.int64())
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<ipython-input-12-1cf156b165b3> in <module>
----> 1 pa.array([MyBrokenInt()], type=pa.int64())
~/git_repo/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
269 else:
270 # ConvertPySequence does strict conversion if type is explicitly passed
--> 271 return _sequence_to_array(obj, mask, size, type, pool, c_from_pandas)
272
273
~/git_repo/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()
38
39 with nogil:
---> 40 check_status(ConvertPySequence(sequence, mask, options, &out))
41
42 if out.get().num_chunks() == 1:
~/git_repo/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
82
83 if status.IsInvalid():
---> 84 raise ArrowInvalid(message)
85 elif status.IsIOError():
86 # Note: OSError constructor is
ArrowInvalid: Could not convert <__main__.MyBrokenInt object at 0x7fc331394290> with type MyBrokenInt: tried to convert to intbut this is the same message as what we get on master for
In [11]: class MyBrokenInt:
...: def __init__(self):
...: 1/1 so maybe it's ok?
There was a problem hiding this comment.
I think that is fine, personally
9a73767 to
21166d3
Compare
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Thanks for working on this!
python/pyarrow/tests/test_compute.py
Outdated
There was a problem hiding this comment.
What did the error message say before, and what does it show now?
There was a problem hiding this comment.
On master it's
TypeError: an integer is required (got type pyarrow.lib.Int8Array)verus on this branch
ArrowInvalid: Could not convert [
5
] with type pyarrow.lib.Int8Array: tried to convert to intThere was a problem hiding this comment.
Hmm, for this case I find the original error message clearer ..
That's the consequence of the scalar(..) conversion using the array conversion under the hood, I suppose?
But OK, I suppose this is fine (it's maybe mainly the multiline repr of the array in the middle of the sentence that makes it more confusing)
There was a problem hiding this comment.
I think that is fine, personally
There was a problem hiding this comment.
What are the cases that this couldn't be converted, but that obj is an integer? When the integer is too big to fit in a C int?
There was a problem hiding this comment.
Yes, and also when converting a negative integer to a uint:
pa.scalar(-1, type='uint8')
No other tests are touched if I recompile without this check
716bd51 to
0735885
Compare
|
@jorisvandenbossche was there more to be done here? |
|
Thanks for the ping. I think all good. @arw2019 can you just rebase to ensure it's still all passing with latest master? |
|
@jorisvandenbossche Rebased and seeing some failures. They're ones also popping up in other, unrelated, PRs, so not sure they're to do with this patch? I'm happy to investigate, though |
|
There are some known failures on Mac and Appveyor at the moment, so nothing to worry about for this PR. |
|
Thanks @arw2019 ! |
|
Thanks @jorisvandenbossche for reviewing! |
This PR homogenizes error messages for mixed-type
Pandasinputs topa.Table.The message for
Pandascolumn withintfollowed bystringis nowthe same as for
doublefollowed bystring:As a side effect, this snippet [xref #5866, ARROW-7168] now throws an
ArrowInvalid(has beenFutureWarningsince 0.16):Finally, this does break a test [xref #4484, ARROW-4036] - see code comment