Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Bad signature synchronization error of past blocks with Aura #10103

@remzrn

Description

@remzrn

Hello everyone,
During migration to substrate 4 of edgeware, which uses Aura, I encountered the following issue: upon authority changes at each epoch, the chain fails with a bad signature.
Here is how I traced it:
The error message happens only here (and in babe, but Edgeware does not use babe)

#[display(fmt = "Bad signature on {:?}", _0)]

And the enum is only present (for aura) in the check_header function here:

Err(Error::BadSignature(hash))

So I built two nodes, and old one and a new one, with respective modified substrate versions that printed out the information, and it seems on block 700 (at the end of the first session), the authority set of the working (old) version is extended, ending up with an expected_author which is different from my upgraded version that keeps the same authority set. But I don’t really understand what I missed during the migration that caused this, so I went on a bit more:
The function check_header is only used in aura (also in pow and babe, but whatever), and the authorities are fetched there:

let authorities = authorities(self.client.as_ref(), &BlockId::Hash(parent_hash))

Which seems to reference this function:

fn authorities<A, B, C>(client: &C, at: &BlockId<B>) -> Result<Vec<A>, ConsensusError>

printing the two outputs also show the difference in the authority set, with 4.0.0-dev not picking up the change, while it was correct in substrate 3.
I migrated the chain following the examples in node-template, and it does not seem like any change is required in the runtime api implementation for the aura API. Besides, since i am syncing past blocks, I suppose the runtime that is executed is the one that was effective at the production time, so the authorities should be the same.
Unfortunately, I could not get any further since the Runtime API construction implementation is too complicated for me to understand and involves macros etc.
I also have a weird feeling that the issue might be somehow related to this one too, with some information from a block not being processed or not going through the adequate call chain.

Any help or point would be really appreciated. I am ready to try things to trace further but since the runtime that is executed is a very old one, I cannot print the state further once the API is called.

Metadata

Metadata

Assignees

No one assigned

    Labels

    J2-unconfirmedIssue might be valid, but it’s not yet known.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions