Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Isolate the code of ChainSync and turn it into a state machine#2497

Merged
arkpar merged 4 commits into
paritytech:masterfrom
tomaka:remove-arcs-sync
May 12, 2019
Merged

Isolate the code of ChainSync and turn it into a state machine#2497
arkpar merged 4 commits into
paritytech:masterfrom
tomaka:remove-arcs-sync

Conversation

@tomaka

@tomaka tomaka commented May 7, 2019

Copy link
Copy Markdown
Contributor

Based on top of #2475

Removes all Arcs (explicit or implicit (ie. the one inside of Box<ImportQueue>)) from the sync.rs module. This makes the code of sync.rs deterministic (except for request timeouts).

The only logic that's been removed, which is importing blocks and justifications to the import queue, has been moved to service.rs.

@tomaka tomaka added the I7-refactor Code needs refactoring. label May 7, 2019
@tomaka tomaka mentioned this pull request May 7, 2019
@tomaka tomaka added the A0-please_review Pull request needs code review. label May 7, 2019
@tomaka tomaka requested a review from arkpar May 7, 2019 17:08
}

is_offline.store(protocol.is_offline(), Ordering::Relaxed);
is_major_syncing.store(protocol.is_major_syncing(), Ordering::Relaxed);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for moving those here? Previously they would only be updated when something changed, as opposed to at each call to poll as is done here. Also, since the logic is inside of ChainSync, why move it to here and expose it via those methods?

@tomaka tomaka May 8, 2019

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to unbloat sync.rs and protocol.rs by isolating code as much as possible from the threading strategy. To me sync.rs really should only ever be a state machine that decides who to send messages to and processes responses, and shouldn't be concerned with updating the content of Arcs or knowing about channels.

If the boolean update is an issue, a solution would be to make methods from ChainSync return a value indicating that there was a state change in is_offline and/or is_major_syncing.

@tomaka tomaka May 8, 2019

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand: to me the situation in sync.rs and protocol.rs (and really anything in network) is catastrophic. People (including me) don't want to work on this part of the code because it is so horribly bloated and complicated.
And we don't even have warp sync or anything like that implemented. This is just the very basic straight-forward "let's pull blocks" strategy that really shouldn't be that hard.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and how does moving those operations to here improves the situation? Whether it happens somewhere inside ChainSync, or here, with the current approach it will still run in the same task as a set of sequential steps(for which there is indeed something to be said).

is_offline.store(protocol.is_offline(), Ordering::Relaxed); is certainly simpler than the equivalent that was found in ChainSync, on the other hand I'm not sure if there was anything incorrect about the previous code.

@tomaka tomaka May 8, 2019

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, simplification. I don't think there was anything correct EDIT: anything incorrect indeed, but it clearly makes the code of ChainSync more complex and strongly discourages any improvement to it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. The previous was more complicated, and it was also more targeted. However I don't think it's an important optimization...

protocol.on_peer_connected(peer_id, debug_msg),
Some(FromNetworkMsg::PeerDisconnected(peer_id, debug_msg)) =>
protocol.on_peer_disconnected(peer_id, debug_msg),
let outcome = match msg {

@gterzian gterzian May 8, 2019

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am also wondering the point of moving the import-queue to here. It seems the code is more complicated, with those CustomMessageOutcome, and it also appears somewhat bloated since the None case is hit on all branch but one.

Does performing the operation of sending the message to the import-queue here make ChainSync really more deterministic? To me it reads like the logic is just moved around, and made more complicated.

The changes will also make it much more complicated to break things up into separate tasks.

I'm in favor of merging protocol away from its own thread and into the networking tokio runtime, but I'm less in favor of doing everything in one task. If the async nature of the previous setup caused problems, I'd rather see those problems isolated and fixed, while still allowing the tasks to, at least eventually, run separately.

@tomaka tomaka May 8, 2019

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the code of the tests, which I haven't really updated yet.
At the moment, the tests assume the presence of a background thread and I've had issues with them not being deterministic because of that.
As an example, I'm doing these changes from my personal laptop because the network tests don't pass on my office laptop (even with a commit on the master branch from two months ago).

I really would like to make everything deterministic, but I haven't touched the tests code yet. At the moment I'm just keeping the tests code in sync with the changes so that the tests continue to pass.

@tomaka tomaka May 8, 2019

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of merging protocol away from its own thread and into the networking tokio runtime, but I'm less in favor of doing everything in one task.

It's true that everything is one task at the moment, but that's not the ultimate plan.
The general direction of this rework is to move channels as an internal detail of the code, and not expose them through the API.

The future in service.rs could still create sub-tasks, but would handle all communications between the "main task" and the sub-tasks internally. It would expose a deterministic non-channels API, and all the sub-modules would expose a deterministic non-channels APIs.

There's actually a commit in #2475 that has handling the protocol as a sub-task, but I reverted that in the following commit, because maintaining that code is quite painful when you're changing APIs.

@gterzian gterzian May 8, 2019

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think it makes sense to try to simplify the code, and even running everything in a single task could help de-buggability(since it will all happen one after the other in the same task).

I'm still wondering why the import_queue needs to be moved to the "top-level" right here in the poll_fn, why not just keep it in protocol? By moving it here you now have to introduce this outcome. Does that simplify something?

@tomaka tomaka May 8, 2019

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still wondering why the import_queue needs to be moved to the "top-level" right here in the poll_fn, why not just keep it in protocol? By moving it here you now have to introduce this outcome. Does that simplify something?

You can now write simple tests that verify the value of the outcome without having to implement the ImportQueue trait.

I'm generally trying to move out the ImportQueue out of the logic of network, so that we can remove the fact that we're forced to run a background thread for it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense.

@tomaka tomaka force-pushed the remove-arcs-sync branch from 512806f to 7e1e8ab Compare May 8, 2019 13:14
@gavofyork

Copy link
Copy Markdown
Member

@arkpar please merge when you're happy.

@arkpar arkpar merged commit 66058ee into paritytech:master May 12, 2019
@tomaka tomaka deleted the remove-arcs-sync branch May 12, 2019 17:26
MTDK1 pushed a commit to bdevux/substrate that referenced this pull request Jul 10, 2019
…ytech#2497)

* Move the is_offline and is_major_syncing logic to service.rs

* Move the ImportQueue to service.rs

* Remove stop() and abort()

* Add some more documentation to sync.rs
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A0-please_review Pull request needs code review. I7-refactor Code needs refactoring.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants