Skip to content

Expose named_struct in python #692

@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently we can only create a struct of expressions using datafusion.functions.struct which assigns fixed field names of c0, c1, and so on. This is difficult to work with. In the rust implementation there is a named_struct function which would serve the purpose.

Describe the solution you'd like
In an ideal world, the name of the field in a struct would come from the name of the expression. It would be great to do something like

df = df.with_column("d", F.struct(col("a"), col("b"), col("c")))

And then the struct would contain field names a, b, and c.

From a brief look at the code this may not be simple to implement. If that is not feasible, I would at least like to expose the named_struct function in the python code.

Describe alternatives you've considered
No additional alternatives I have considered beyond the two described above.

Additional context
Minimal example showing current state:

from datafusion import SessionContext, col, functions as F
import pyarrow as pa

ctx = SessionContext()

batch = pa.RecordBatch.from_arrays(
    [pa.array([1, 2, 3]), pa.array([4, 5, 6]), pa.array([7, 8, 9])],
    names=["a", "b", "c"],
)

df = ctx.create_dataframe([[batch]])

df = df.with_column("d", F.struct(col("a"), col("b"), col("c")))

df.show()

Creates

DataFrame()
+---+---+---+-----------------------+
| a | b | c | d                     |
+---+---+---+-----------------------+
| 1 | 4 | 7 | {c0: 1, c1: 4, c2: 7} |
| 2 | 5 | 8 | {c0: 2, c1: 5, c2: 8} |
| 3 | 6 | 9 | {c0: 3, c1: 6, c2: 9} |
+---+---+---+-----------------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions