Skip to content

Panic when reading empty pyarrow.Table #575

@jwimberl

Description

@jwimberl

Describe the bug
When trying to create a DataFrame from a pyarrow.Table object with a nonzero number of columns, but zero rows, I encounter a panic in src/context.rs:294.

To Reproduce

>>> import datafusion as df
>>> import pyarrow as pa
>>> ctx = df.SessionContext()
>>> import pandas as pd
>>> df = pd.DataFrame({'col': []})
>>> import pyarrow as pa
>>> emptyTable = pa.Table.from_pandas(df)
>>> emptyTable
pyarrow.Table
col: double
----
col: [[]]
>>> ctx.from_arrow_table(emptyTable)
thread '<unnamed>' panicked at src/context.rs:294:37:
index out of bounds: the len is 0 but the index is 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pyo3_runtime.PanicException: index out of bounds: the len is 0 but the index is 0

Expected behavior
I expect this to create a DataFrame with zero rows, such as the following (created via .limit(0) from a non-empty DataFrame):

>>> empty
DataFrame()
++
++
>>> empty.describe()
DataFrame()
+------------+-----+
| describe   | col |
+------------+-----+
| count      | 0.0 |
| null_count | 0.0 |
| mean       |     |
| std        |     |
| min        |     |
| max        |     |
| median     |     |
+------------+-----+

Additional context

  • Operating system: Rocky 8
  • Python version: 3.10.4
  • Python module versions used:
>>> df.__version__
'34.0.0'
>>> pa.__version__
'15.0.0'
>>> pd.__version__
'2.2.0'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions