Currently if we submit data within a task like the following:
x = np.array([1, 2, 3])
future = client.submit(func, x)
Then we construct a task like (func, x) and then call pickle on this task. We don't do custom serialization once we contruct tasks. This is mostly because stuff in tasks is rarely large, and traversing tasks can increase our overhead in the common case. Typically we encourage people to use dask.delayed or something similar to mark data
x = np.array([1, 2, 3])
x = dask.delayed(x)
future = client.submit(func, x)
But this is error prone and requires rarely held expertise.
We might again consider traversing arguments.
Currently if we submit data within a task like the following:
Then we construct a task like
(func, x)and then call pickle on this task. We don't do custom serialization once we contruct tasks. This is mostly because stuff in tasks is rarely large, and traversing tasks can increase our overhead in the common case. Typically we encourage people to use dask.delayed or something similar to mark dataBut this is error prone and requires rarely held expertise.
We might again consider traversing arguments.