Multi GPU dataparallel crash

## 🐛 Bug



Model I am using (Bert, XLNet....): GPT2

Language I am using the model on (English, Chinese....): English

The problem arise when using:
* [ ] the official example scripts: run_lm_finetuning
* [ ] my own modified scripts: (give details)

The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: I am finetuning the GPT2 model with a dataset that I have used in the past to fine-tune the  BERT model among others

## To Reproduce

Steps to reproduce the behavior:

1.
2.
3.


raceback (most recent call last):████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 7587/7588 [1:42:01<00:00,  1.26it/s]
  File "run_lm_finetuning.py", line 551, in <module>
    main()
  File "run_lm_finetuning.py", line 503, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
  File "run_lm_finetuning.py", line 228, in train
    outputs = model(inputs, masked_lm_labels=labels) if args.mlm else model(inputs, labels=labels)
  File "/home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.gather(outputs, self.output_device)
  File "/home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    res = gather_map(outputs)
  File "/home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 68, in forward
    return comm.gather(inputs, ctx.dim, ctx.target_device)
  File "/home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/cuda/comm.py", line 165, in gather
    return torch._C._gather(tensors, dim, destination)
**RuntimeError: Gather got an input of invalid size: got [2, 2, 12, 1024, 64], but expected [2, 3, 12, 1024, 64] (gather at /opt/conda/conda-**bld/pytorch_1565272279342/work/torch/csrc/cuda/comm.cpp:226)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f47a13d5e37 in /home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: torch::cuda::gather(c10::ArrayRef<at::Tensor>, long, c10::optional<int>) + 0x3c7 (0x7f4720c61327 in /home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #2: <unknown function> + 0x5fa742 (0x7f47a420f742 in /home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0x1c8316 (0x7f47a3ddd316 in /home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #14: THPFunction_apply(_object*, _object*) + 0x98f (0x7f47a40024bf in /home/dev/deeplearning/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)


## Expected behavior

it crashes at the last batch all the time. I expect it to move to the next epoch 

## Environment

* OS:
* Python version: 3.6
* PyTorch version: 1.2
* PyTorch Transformers version (or branch):
* Using GPU ? 4 Quadro 8000
* Distributed of parallel setup ? parallel
* Any other relevant information:

## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi GPU dataparallel crash #1779

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Multi GPU dataparallel crash #1779

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions