Skip to content

[R] Allow functions with {{pkg::}} prefixes #30124

@asfimport

Description

@asfimport

{}Proposed approach{}:

  • add functionality to allow binding registration with the pkg::fun() name;
    • Modify register_binding() to register 2 identical copies for each pkg::fun binding, fun and pkg::fun.
    • Add a binding for the :: operator, which helps with retrieving bindings from the function registry.
    • Add generic unit tests for the pkg::fun functionality.
  • register nse_funcs requiring indirect mapping
    • register each binding with and without the pkg:: prefix
    • add / update unit tests for the nse_funcs bindings to include at least one pkg::fun() call for each binding
  • register nse_funcs requiring direct mapping (unary and binary bindings)
    • register each binding with and without the pkg:: prefix
    • add / update unit tests for the nse_funcs bindings to include at least one pkg::fun() call for each binding
  • register agg_funcs for use with summarise()
  • document changes in the Writing bindings documentation
    • going forward we should be using pkg::fun when defining a binding, which will register 2 copies of the same binding.

      Different implementation options are outlined and discussed in the design document.

      {}Description{}:
      Currently we implement a number of functions from packages like lubridate which work well when called without namespacing (e.g. {}year(){}), however if someone calls lubridate::year() we get a not-implemented method (e.g. {}Warning: Expression lubridate::year(time_hour) not supported in Arrow{}). Is it possible for us to look and see if we have an arrow function that matches the function itself.
      {code:r}
      library(arrow, warn.conflicts = FALSE)
      library(dplyr, warn.conflicts = FALSE)

      ds <- InMemoryDataset$create(nycflights13::flights)

      ds %>%
      mutate(year = lubridate::year(time_hour)) %>%
      collect()
      #> Warning: Expression lubridate::year(time_hour) not supported in Arrow; pulling
      #> data into R
      #> # A tibble: 336,776 × 19
      #> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
      #>
      #> 1 2013 1 1 517 515 2 830 819
      #> 2 2013 1 1 533 529 4 850 830
      #> 3 2013 1 1 542 540 2 923 850
      #> 4 2013 1 1 544 545 -1 1004 1022
      #> 5 2013 1 1 554 600 -6 812 837
      #> 6 2013 1 1 554 558 -4 740 728
      #> 7 2013 1 1 555 600 -5 913 854
      #> 8 2013 1 1 557 600 -3 709 723
      #> 9 2013 1 1 557 600 -3 838 846
      #> 10 2013 1 1 558 600 -2 753 745
      #> # … with 336,766 more rows, and 11 more variables: arr_delay ,
      #> # carrier , flight , tailnum , origin , dest ,
      #> # air_time , distance , hour , minute , time_hour

ds %>%
mutate(year = year(time_hour)) %>%
collect()
#> # A tibble: 336,776 × 19
#> year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>
#> 1 2013 1 1 517 515 2 830 819
#> 2 2013 1 1 533 529 4 850 830
#> 3 2013 1 1 542 540 2 923 850
#> 4 2013 1 1 544 545 -1 1004 1022
#> 5 2013 1 1 554 600 -6 812 837
#> 6 2013 1 1 554 558 -4 740 728
#> 7 2013 1 1 555 600 -5 913 854
#> 8 2013 1 1 557 600 -3 709 723
#> 9 2013 1 1 557 600 -3 838 846
#> 10 2013 1 1 558 600 -2 753 745
#> # … with 336,766 more rows, and 11 more variables: arr_delay ,
#> # carrier , flight , tailnum , origin , dest ,
#> # air_time , distance , hour , minute , time_hour
{code}

Reporter: Jonathan Keane / @jonkeane
Assignee: Dragoș Moldovan-Grünfeld / @dragosmg

Subtasks:

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-14575. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions