Hello everyone,
During migration to substrate 4 of edgeware, which uses Aura, I encountered the following issue: upon authority changes at each epoch, the chain fails with a bad signature.
Here is how I traced it:
The error message happens only here (and in babe, but Edgeware does not use babe)
|
#[display(fmt = "Bad signature on {:?}", _0)] |
And the enum is only present (for aura) in the check_header function here:
|
Err(Error::BadSignature(hash)) |
So I built two nodes, and old one and a new one, with respective modified substrate versions that printed out the information, and it seems on block 700 (at the end of the first session), the authority set of the working (old) version is extended, ending up with an expected_author which is different from my upgraded version that keeps the same authority set. But I don’t really understand what I missed during the migration that caused this, so I went on a bit more:
The function check_header is only used in aura (also in pow and babe, but whatever), and the authorities are fetched there:
|
let authorities = authorities(self.client.as_ref(), &BlockId::Hash(parent_hash)) |
Which seems to reference this function:
|
fn authorities<A, B, C>(client: &C, at: &BlockId<B>) -> Result<Vec<A>, ConsensusError> |
printing the two outputs also show the difference in the authority set, with 4.0.0-dev not picking up the change, while it was correct in substrate 3.
I migrated the chain following the examples in node-template, and it does not seem like any change is required in the runtime api implementation for the aura API. Besides, since i am syncing past blocks, I suppose the runtime that is executed is the one that was effective at the production time, so the authorities should be the same.
Unfortunately, I could not get any further since the Runtime API construction implementation is too complicated for me to understand and involves macros etc.
I also have a weird feeling that the issue might be somehow related to this one too, with some information from a block not being processed or not going through the adequate call chain.
Any help or point would be really appreciated. I am ready to try things to trace further but since the runtime that is executed is a very old one, I cannot print the state further once the API is called.
Hello everyone,
During migration to substrate 4 of edgeware, which uses Aura, I encountered the following issue: upon authority changes at each epoch, the chain fails with a bad signature.
Here is how I traced it:
The error message happens only here (and in babe, but Edgeware does not use babe)
substrate/client/consensus/aura/src/lib.rs
Line 512 in 632b323
And the enum is only present (for aura) in the check_header function here:
substrate/client/consensus/aura/src/import_queue.rs
Line 106 in 632b323
So I built two nodes, and old one and a new one, with respective modified substrate versions that printed out the information, and it seems on block 700 (at the end of the first session), the authority set of the working (old) version is extended, ending up with an expected_author which is different from my upgraded version that keeps the same authority set. But I don’t really understand what I missed during the migration that caused this, so I went on a bit more:
The function check_header is only used in aura (also in pow and babe, but whatever), and the authorities are fetched there:
substrate/client/consensus/aura/src/import_queue.rs
Line 213 in 632b323
Which seems to reference this function:
substrate/client/consensus/aura/src/lib.rs
Line 544 in 632b323
printing the two outputs also show the difference in the authority set, with 4.0.0-dev not picking up the change, while it was correct in substrate 3.
I migrated the chain following the examples in node-template, and it does not seem like any change is required in the runtime api implementation for the aura API. Besides, since i am syncing past blocks, I suppose the runtime that is executed is the one that was effective at the production time, so the authorities should be the same.
Unfortunately, I could not get any further since the Runtime API construction implementation is too complicated for me to understand and involves macros etc.
I also have a weird feeling that the issue might be somehow related to this one too, with some information from a block not being processed or not going through the adequate call chain.
Any help or point would be really appreciated. I am ready to try things to trace further but since the runtime that is executed is a very old one, I cannot print the state further once the API is called.