I had this error: TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
After some observation, I found that inside solve_1d_linesearch_quad function in optim.py, np.divide is used. I think it should be nx.divide where nx = get_backend(a, b, c). And inside TorchBackend class, a new method for divide should be added that uses torch.div function.