add a describe method on DataFrame like Polars#5226
Conversation
9d0fafe to
7a622c6
Compare
c0500d9 to
1309267
Compare
| if field.data_type().is_numeric() { | ||
| Field::new(field.name(), DataType::Float64, true) | ||
| } else { | ||
| Field::new(field.name(), DataType::Utf8, true) |
There was a problem hiding this comment.
I would expect that the schema for count and null_count were always Int64 and the schema for min/max were always Utf8
There was a problem hiding this comment.
the describe method return schema like this.

the each column should have same datatype .
for example :
bool_coloncount/null_countreturn Int64 ; error onmin/max, so makebool_coldatatypeUTF8;float_coloncount/null_countreturn Int64 ; onmin/maxreturn float, so makefloat_coldatatypeFloat64
| vec![], | ||
| fields_iter | ||
| .clone() | ||
| .filter(|f| matches!(f.data_type().is_numeric(), true)) |
There was a problem hiding this comment.
I wonder why restrict the min/max aggregation to numeric fields?
In order to get the min/max values in all columns to work, you could call cast to cast them to the same datatype
There was a problem hiding this comment.
boolean and binary not work with min/max.
filter out DataType::Binary , DataType::Boolean will be better.
!matches!(f.data_type(), DataType::Binary | DataType::Boolean)
There was a problem hiding this comment.
date_string_col, string_col ’s datatype also Binary.
called Result::unwrap() on an Err value: Internal("Min/Max accumulator not implemented for type Binary")
alamb
left a comment
There was a problem hiding this comment.
Looks good to me -- thank you @jiangzhx
cc @andygrove (as I think this is a neat thing to expose in datafusion-python)
| )), | ||
| ); | ||
|
|
||
| let describe_record_batch = |
|
Benchmark runs are scheduled for baseline = ea3b965 and contender = 96aa2a6. 96aa2a6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |


Which issue does this PR close?
Closes #4974 .