-
-
Notifications
You must be signed in to change notification settings - Fork 15k
Iterator::fold is a little slow compared to bare loop #76725
Copy link
Copy link
Closed
Labels
A-iteratorsArea: IteratorsArea: IteratorsI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.Relevant to the library API team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-iteratorsArea: IteratorsArea: IteratorsI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-libsRelevant to the library team, which will review and decide on the PR/issue.Relevant to the library team, which will review and decide on the PR/issue.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.Relevant to the library API team, which will review and decide on the PR/issue.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Background
I was writing a small project to compare the memory impact of iterators. I decided to also investigate the runtime impact and see how close Rust came to providing various zero cost abstractions to iteration. I compared three types of iteration:
and saw this Criterion output plot:

The outliers to the right are the timings of the
foldfunctions while the two pairs on the left are thefilter_map_filterandrawversions. Rust did a great job ensuring.filter.map.filterwas the same speed as a raw loop but.foldseemed to be lacking.Quick Investigation
Looking at the source for
.foldtheaccumis reassigned with the result of each invocation off. I quickly tested if this could be improved with a&mutinstead in this PR. The result was surprising (to me):The "custom fold" method was faster than all the other options (which doesn't make a ton of sense to me but that's what y'all are here for!).
Path Forward
I initially was going to suggest adding some sort of
fold_mutor some better named method to allow for this fasterfolditerator. This could be a performance improvement in some areas and could also improve the syntax when the closure couldn't "easily" return the new accumulator:I made a branch for this if we want to head in that direction (the tests are slim, the benchmarks are probably overkill, the stability is missing, and the docs are probably slim and improperly formatted but it's a start!) and saw some improvements in the benchmarks I added:
Now I'm not sure if this "
fold_mut" path is the right way to go - I'm not sure if it's awkward or dangerous. It seems similar to Ruby'seach_with_objectso there's maybe something there. It could also be that with some compiler witchcraft we can just makefolda "true" zero cost abstraction.In any case, thought I'd post here instead of making a PR so we could decided if there should be any PR and I'm happy to help with whatever path forward we choose!