Sync protocol gets stuck and node requires a restart.

### Is there an existing issue?

- [X] I have searched the existing issues

### Experiencing problems? Have you tried our Stack Exchange first?

- [X] This is not a support question.

### Description of bug

We use substrate (branch polkadot-v0.9.26) on our testnet (Aleph Zero testnet) and quite often (some nodes get that every couple of days) experience a problem with the sync protocol getting stuck. The way it looks like is that till some point everything works OK, after which the node enters a strange state and keeps logging:
`2022-08-11 13:12:51 ⚙️  Syncing  0.0 bps, target=#7930720 (26 peers), best: #7927340 (0xa832…7ebb), finalized #7927337 (0x292d…fd2a), ⬇ 2.9kiB/s ⬆ 0.4kiB/s`
without making any progress anymore. Restarting the node always helps.

So far we haven't really found any plausible explanation for this -- it doesn't seem to be happening at any specific blocks -- different nodes get stuck on different blocks. We found a strange way to (sometimes) reproduce it -- run a node on a laptop, then hibernate it for some time, and then turn it back on -- once the node reconnects and tries to sync, it gets stuck in the same way (and restart is necessary). This way we were able to produce the attached log (it has `trace` logs on `sync`). Attached below.
[verbose-logs.log](https://github.com/paritytech/substrate/files/9417363/verbose-logs.log)

This is however not a typical situation in which the problem arises (i.e. long network disconnect) -- in fact normally nodes work normally and just like that they stop syncing :/

The main difference between AlephZero and Polkadot and standard parachain nodes is that we have low block-time (1sec) and that we use Aura + AlephBFT instead of Babe + Grandpa. However, the latter shouldn't really matter because this affects mostly non-validator nodes, who don't even run any consensus code. 

Our only guess was that maybe this is because we don't have this fix https://github.com/paritytech/substrate/pull/11817 in our substrate dependency, but by inspecting the code it doesn't seem that it could have such an effect...

We would appreciate any help in finding out what the culprit of that could be... Thanks!

### Steps to reproduce

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync protocol gets stuck and node requires a restart. #12101

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

Description of bug

Steps to reproduce

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Sync protocol gets stuck and node requires a restart. #12101

Description

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

Description of bug

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions