parallel_rspec hanging

Hi, @grosser!

We are observing that from time to time the build never finishes and hang for hours until it times out in CI (after 3 hours).

The build seems fine, most of the times it works but here and there it gets stuck.

We have observed this at [Semaphore](https://semaphore.io) and also on Jenkins but I was never able to reproduce it on my local machine.

The CI output does not provide much help, it is always something similar to:

```
...
-> Process 1 finishes and outputs here
FFF -> some failures
Process 2 simple hangs here and never finishes or dump the failures
```

We are sure it's not about a single spec getting stuck because we have this in place:

```ruby
RSpec.configure do |config|
  config.around(:each) do |example|
    Timeout.timeout(5.minutes) do
      example.run
    end
  end
end
```

While this is not perfect, it ensures that a single `it` never takes more than 5 minutes and if so the spec fails and we can see it, it's very rare.

I added this to our `spec_helper.rb`:

```ruby
Signal.trap("INFO") do
  Thread.list.each do |thread|
    puts "Thread TID-#{(thread.object_id ^ ::Process.pid).to_s(36)} #{thread.name}"
    if thread.backtrace
      puts thread.backtrace.join("\n")
    else
      puts "<no backtrace available>"
    end
  end
end
```

_Shameless copied from Sidekiq [here](https://github.com/sidekiq/sidekiq/blob/c2941506ddf5c8b691d0c6c318f454a76b0abf78/lib/sidekiq/cli.rb#L216-L225)._

But the issue is that sending any signal to the `parallel_spec` pid does not propagate this to the child processes but still I don't think it's a single spec getting stuck.

Do you think it makes sense to propagate at least some signals, by default or configurable, to the child processes?

I don't think this is the issue, otherwise we would simple see the timeout error, but Timeout is known to corrupt memory state here and there and this might be happening to parallel or parallel_tests despite of seeing some `Thread.handle_interrupt` in parallel.

So, a second request could be to have something to this INFO signal on parallel or parallel_tests to dump the thread state and then we can do something trivial on CI (pseudo):

```bash
bundle exec parallel_rspec &
pid=$1

while kill -0 $pid; do
  if beyond_60_minutes; then
    kill -9 $pid # kill abruptly
  else if beyond_55_minutes; then
    kill $pid # try to interrupt
  else if beyond_50_minutes; then
    kill -INFO $pid # try to dump where it is stuck
  fi;

  sleep 30
done
```

That way we can try to debug parallel_tests if it got stuck or if it was actually our source code that got stuck.

WDYT?

---

We are using:

parallel: 1.24.0
parallel_tests: 4.4.0

Found other issues mentioning something similar: https://github.com/grosser/parallel_tests/issues/372 https://github.com/grosser/parallel_tests/issues/74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel_rspec hanging #1028

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

parallel_rspec hanging #1028

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions