Isolate the code of ChainSync and turn it into a state machine#2497
Conversation
| } | ||
|
|
||
| is_offline.store(protocol.is_offline(), Ordering::Relaxed); | ||
| is_major_syncing.store(protocol.is_major_syncing(), Ordering::Relaxed); |
There was a problem hiding this comment.
What's the rationale for moving those here? Previously they would only be updated when something changed, as opposed to at each call to poll as is done here. Also, since the logic is inside of ChainSync, why move it to here and expose it via those methods?
There was a problem hiding this comment.
I'm trying to unbloat sync.rs and protocol.rs by isolating code as much as possible from the threading strategy. To me sync.rs really should only ever be a state machine that decides who to send messages to and processes responses, and shouldn't be concerned with updating the content of Arcs or knowing about channels.
If the boolean update is an issue, a solution would be to make methods from ChainSync return a value indicating that there was a state change in is_offline and/or is_major_syncing.
There was a problem hiding this comment.
To expand: to me the situation in sync.rs and protocol.rs (and really anything in network) is catastrophic. People (including me) don't want to work on this part of the code because it is so horribly bloated and complicated.
And we don't even have warp sync or anything like that implemented. This is just the very basic straight-forward "let's pull blocks" strategy that really shouldn't be that hard.
There was a problem hiding this comment.
Ok, and how does moving those operations to here improves the situation? Whether it happens somewhere inside ChainSync, or here, with the current approach it will still run in the same task as a set of sequential steps(for which there is indeed something to be said).
is_offline.store(protocol.is_offline(), Ordering::Relaxed); is certainly simpler than the equivalent that was found in ChainSync, on the other hand I'm not sure if there was anything incorrect about the previous code.
There was a problem hiding this comment.
Well, simplification. I don't think there was anything correct EDIT: anything incorrect indeed, but it clearly makes the code of ChainSync more complex and strongly discourages any improvement to it.
There was a problem hiding this comment.
Ok. The previous was more complicated, and it was also more targeted. However I don't think it's an important optimization...
| protocol.on_peer_connected(peer_id, debug_msg), | ||
| Some(FromNetworkMsg::PeerDisconnected(peer_id, debug_msg)) => | ||
| protocol.on_peer_disconnected(peer_id, debug_msg), | ||
| let outcome = match msg { |
There was a problem hiding this comment.
Here I am also wondering the point of moving the import-queue to here. It seems the code is more complicated, with those CustomMessageOutcome, and it also appears somewhat bloated since the None case is hit on all branch but one.
Does performing the operation of sending the message to the import-queue here make ChainSync really more deterministic? To me it reads like the logic is just moved around, and made more complicated.
The changes will also make it much more complicated to break things up into separate tasks.
I'm in favor of merging protocol away from its own thread and into the networking tokio runtime, but I'm less in favor of doing everything in one task. If the async nature of the previous setup caused problems, I'd rather see those problems isolated and fixed, while still allowing the tasks to, at least eventually, run separately.
There was a problem hiding this comment.
That's the code of the tests, which I haven't really updated yet.
At the moment, the tests assume the presence of a background thread and I've had issues with them not being deterministic because of that.
As an example, I'm doing these changes from my personal laptop because the network tests don't pass on my office laptop (even with a commit on the master branch from two months ago).
I really would like to make everything deterministic, but I haven't touched the tests code yet. At the moment I'm just keeping the tests code in sync with the changes so that the tests continue to pass.
There was a problem hiding this comment.
I'm in favor of merging protocol away from its own thread and into the networking tokio runtime, but I'm less in favor of doing everything in one task.
It's true that everything is one task at the moment, but that's not the ultimate plan.
The general direction of this rework is to move channels as an internal detail of the code, and not expose them through the API.
The future in service.rs could still create sub-tasks, but would handle all communications between the "main task" and the sub-tasks internally. It would expose a deterministic non-channels API, and all the sub-modules would expose a deterministic non-channels APIs.
There's actually a commit in #2475 that has handling the protocol as a sub-task, but I reverted that in the following commit, because maintaining that code is quite painful when you're changing APIs.
There was a problem hiding this comment.
Ok, I think it makes sense to try to simplify the code, and even running everything in a single task could help de-buggability(since it will all happen one after the other in the same task).
I'm still wondering why the import_queue needs to be moved to the "top-level" right here in the poll_fn, why not just keep it in protocol? By moving it here you now have to introduce this outcome. Does that simplify something?
There was a problem hiding this comment.
I'm still wondering why the import_queue needs to be moved to the "top-level" right here in the poll_fn, why not just keep it in protocol? By moving it here you now have to introduce this outcome. Does that simplify something?
You can now write simple tests that verify the value of the outcome without having to implement the ImportQueue trait.
I'm generally trying to move out the ImportQueue out of the logic of network, so that we can remove the fact that we're forced to run a background thread for it.
|
@arkpar please merge when you're happy. |
…ytech#2497) * Move the is_offline and is_major_syncing logic to service.rs * Move the ImportQueue to service.rs * Remove stop() and abort() * Add some more documentation to sync.rs
Based on top of #2475
Removes all
Arcs (explicit or implicit (ie. the one inside ofBox<ImportQueue>)) from thesync.rsmodule. This makes the code ofsync.rsdeterministic (except for request timeouts).The only logic that's been removed, which is importing blocks and justifications to the import queue, has been moved to
service.rs.