feat: Add flatten array function#562
Conversation
|
Hello @andygrove do you mind giving me a hand with this PR ? I exposed |
Hi @mobley-trent # build and install package
maturin developAlso, don't forget to active the venv before this command. |
|
Hey @ongchi I tested the from datafusion import SessionContext, column
from datafusion import functions as f
import numpy as np
import pyarrow as pa
def py_flatten(arr):
# Testing helper function
result = []
for elem in arr:
if isinstance(elem, list):
result.extend(py_flatten(elem))
else:
result.append(elem)
return result
ctx = SessionContext()
data = [[1.0, 2.0, 3.0], [4.0, 5.0], [6.0]]
batch = pa.RecordBatch.from_arrays(
[np.array(data, dtype=object)], names=["arr"]
)
df = ctx.create_dataframe([[batch]])
col = column("arr")
stmt = f.flatten(col)
py_expr = lambda: [py_flatten(data)]
result = df.select(stmt).collect()[0].column(0).tolist()
print(f"flatten query: {result}")
print(f"py_expr: {py_expr()}")Results: I expected the flatten query to be identical to the |
|
Using a regular ctx = SessionContext()
ctx.sql("select flatten([[1.0, 2.0, 3.0], [4.0, 5.0], [6.0]]);")Result: |
Hi @mobley-trent It's contains of multiple rows of one-dimensional array values. For the |
|
Fixed the merge conflicts |
Which issue does this PR close?
Refer to issue #463
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?