Limit memory reads for mmaped files#329
Merged
brian-brazil merged 2 commits intoprometheus:masterfrom Oct 18, 2018
Merged
Conversation
0ff3d02 to
9f30ef2
Compare
Signed-off-by: Simon Davy <simon.davy@canonical.com>
9f30ef2 to
f9d6607
Compare
Signed-off-by: Simon Davy <simon.davy@canonical.com>
Contributor
|
Thanks! |
Contributor
Author
|
For posterity, the workers were of course failing due to the corrupted file, which is cached in the master's MultiProcessValue, and until #328, wasn't cleared when forked on metric initialisation, and the metric initialising on the worker would try to read it's state from the master's corrupt file. The above is still a useful improvement in error messages though, it took us a while to figure out which file was corrupt, etc. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a follow on fix for the initialisation corruption fixed in #328.
When corruption of the master's mmaped file occured, it ended up reading the corrupted data as an large integer. It would then feed this into struct.unpack_from as the length to read, which would rightly raise an exception.
However, it also somehow caused a much bigger problem. After this had occurred, Gunicorn would for some unknown reason fail to successfully launch new workers, and thus exit. We saw this pattern of corruption and subsequent gunicorn death happen consistently over multiple machines and services.
The issue was transient, and we didn't get coredumps, but the suspicion is that trying to read a large chunk of memory somehow broke gunicorn.
So this change adds a reasonable bounds check on read length, and if corruption occurs in future, then it should blow up earlier, with a better error, and without breaking Gunicorn.