Prevent infinite recursion loop while reading messages#52
Conversation
…ogical test cases
| } | ||
|
|
||
| /// An internal initializer with no logic. Can be used to create pathological test cases. | ||
| required init( |
There was a problem hiding this comment.
we need a way to inject these private properties so that we can mangle them prior to executing our test.
| try header.updateOffsetInFileAtEndOfNewestMessage( | ||
| to: FileHeader.expectedEndOfHeaderInFile + UInt64(MessageSpan.storageLength) + 1) | ||
|
|
||
| XCTAssertThrowsError(try cache.messages()) { |
There was a problem hiding this comment.
Before the next commit, this test will infinite loop and crash.
| to: FileHeader.expectedEndOfHeaderInFile + UInt64(MessageSpan.storageLength) + 1) | ||
|
|
||
| XCTAssertThrowsError(try cache.messages()) { | ||
| XCTAssertEqual($0 as? CacheAdvanceError, CacheAdvanceError.fileCorrupted) |
There was a problem hiding this comment.
We should be able to detect this case and throw an appropriate error. We should also be able to prevent this case from happening in production. We solve both of these "shoulds" in the next few commits.
| func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws { | ||
| let header = try CacheHeaderHandle( | ||
| forReadingFrom: testFileLocation, | ||
| maximumBytes: 10_000, |
There was a problem hiding this comment.
this number is effectively random. Doesn't matter what it is.
There was a problem hiding this comment.
let's add this to a local constant called randomHighValue, that way we document this via the property name.
| // We'll need to start writing the file from the beginning of the file. | ||
|
|
||
| // Trim the file to the current position to remove soon-to-be-abandoned data from the file. | ||
| try writer.truncate(at: writer.offsetInFile) |
There was a problem hiding this comment.
I believe the way we managed to get into the state the previous commit is testing is that we were truncating the file before we updated the header. We should always update our header before making a write, not after.
I inspected other places where we write data and we were already careful in the other spots to update our header first.
b94c612 to
81ac1d6
Compare
Co-authored-by: Steven Hepting <shepting@gmail.com>
| func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws { | ||
| let header = try CacheHeaderHandle( | ||
| forReadingFrom: testFileLocation, | ||
| maximumBytes: 10_000, |
There was a problem hiding this comment.
let's add this to a local constant called randomHighValue, that way we document this via the property name.
Co-authored-by: Francisco Diaz <fdiaz@users.noreply.github.com>
bachand
left a comment
There was a problem hiding this comment.
Thanks for addressing this issue @dfed . I've left a few comments. If you can help refresh my memory on the format of the underlying file (see one of my comments) I plan to revisit this PR. At that point I will be able to review more confidently.
| // Our file is empty. Make the file corrupted by setting the offset at end of newest message to be further in the file. | ||
| // This should never happen, but past versions of this repo could lead to a file having this kind of inconsistency if a crash occurred at the wrong time. | ||
| try header.updateOffsetInFileAtEndOfNewestMessage( | ||
| to: FileHeader.expectedEndOfHeaderInFile + UInt64(MessageSpan.storageLength) + 1) |
There was a problem hiding this comment.
If this file is empty would FileHeader.expectedEndOfHeaderInFile + 1 cause the issue? I am trying to understand why we need to add UInt64(MessageSpan.storageLength).
There was a problem hiding this comment.
FileHeader.expectedEndOfHeaderInFile + 1 does indeed cause the issue, though I don't think that's what the file would have looked like when we were in the bad state. My guess is that we had at least one span in the expected end of newest message, plus the length of that message.
I'm happy to change this to just be + 1 since that does reproduce the issue, and since I think we've solved the original issue. I'll do that in a follow-up PR.
| // Trim the file to the current writer position to remove soon-to-be-abandoned data from the file. | ||
| truncateAtOffset = writer.offsetInFile | ||
|
|
||
| // Set the offset back to the beginning of the file. |
There was a problem hiding this comment.
The next comment says:
// We know the oldest message is at the beginning of the file, since we just tossed out the rest of the file.
I don't think that's accurate anymore since we haven't truncated yet.
| @@ -121,6 +144,12 @@ public final class CacheAdvance<T: Codable> { | |||
| // If the application crashes between writing the header and writing the message data, we'll have lost the messages between the previous offsetInFileOfOldestMessage and the new offsetInFileOfOldestMessage. | |||
| try header.updateOffsetInFileOfOldestMessage(to: offsetInFileOfOldestMessage) | |||
There was a problem hiding this comment.
I'm trying to load back into my mind how this code works. If the cache file allows overwriting we start to write from the beginning of the file again. It seems from the code that when we start overwriting, we seek to the next oldest message. What happens if the new message we are trying to write is so long that we need to overwrite multiple older messages?
There was a problem hiding this comment.
We handle that in the following code just above the code you highlighted here:
// Prepare the reader before writing the message.
try prepareReaderForWriting(dataOfLength: bytesNeededToStoreMessage)
This code does the following:
/// Advances the reader until there is room to store a new message without writing past the reader.
/// This method should only be called on a cache that overwrites old messages.
/// - Parameter messageLength: the length of the next message that will be written.
private func prepareReaderForWriting(dataOfLength messageLength: Bytes) throws {
// If our writer is behind our reader,
while writer.offsetInFile < reader.offsetInFile
// And our writer doesn't have enough room to write a message such that it stays behind the current reader position.
&& writer.offsetInFile + messageLength >= reader.offsetInFile
{
// Then writing this message would write into the oldest-known message.
// We must advance our reader to the next-oldest message to help make room for the next message we want to write.
try reader.seekToNextMessage()
}
}
We then are willing to write up to where the reader is located after this method returns.
| // Truncate the file if it needs truncation before we write the next message, and after we update our header. | ||
| // If the application crashes between truncating this message data and writing the next message, our file will still be consistent. | ||
| if let truncateAtOffset = truncateAtOffset { | ||
| try writer.truncate(at: truncateAtOffset) | ||
| } |
There was a problem hiding this comment.
I'm having trouble reminding myself why we need to truncate the end of the file. What would really help me in reading this code is a reminder of how the file is formatted. Do we have any graphic or ASCII art anywhere that shows what the format of the file is? If not, I think that would be really helpful to add as it would allow me to validate this code more confidently.
There was a problem hiding this comment.
We do not have any ascii art, though I agree that would help a ton.
High level, the format is:
[reserved header space]([length-of-message][message-content])*
where ([length-of-message][message-content]) can be repeated until the maximum byte count is reached.
Truncating the file prevents us from reading data that should have been deleted when we try to read all messages. Since the only data we have about ordering is the start of first message and end of last message in the header, we'll just try to read all data between these points. If we haven't deleted the data at the end of the file, we would read these supposedly-deleted messages in when trying to read all messages.
| guard !previousReadWasEmpty else { | ||
| // If the previous read was also empty, then the file has been corrupted. | ||
| // Two empty reads in a row means that offsetInFileAtEndOfNewestMessage is incorrect. | ||
| // This inconsistency likely is likely due to a crash occurring during a message write. | ||
| throw CacheAdvanceError.fileCorrupted | ||
| } |
There was a problem hiding this comment.
To make sure I follow: we never expect to be in this state anymore, right? It could be useful to make clear that this is a safety net.
There was a problem hiding this comment.
yeah let me update this comment 👍
| if let truncateAtOffset = truncateAtOffset { | ||
| try writer.truncate(at: truncateAtOffset) | ||
| } | ||
|
|
There was a problem hiding this comment.
Putting here so we have a thread. My understanding is that we intend for the reader to always be at the start of the oldest message in the file. Is that understanding correct?
There was a problem hiding this comment.
yes your understanding is correct.
| XCTAssertEqual(messages, []) | ||
| } | ||
|
|
||
| func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws { |
There was a problem hiding this comment.
I am struggling to connect how the code that we changed in append(message:) could have resulted in the situation we have in this test case. Let me walk through my reasoning.
The premise of this test case is that we have an empty file. The offset of the start of the oldest message is at some point beyond the end of the header. And the offset of the end of the newest message is at the end of the header.
We created a fallback fix in nextEncodedMessage(...) and the real fix is in append(message:).
Before our change in append(message:) we were truncating the file before we updated the header. For us to get into this situation of this test case we would need to truncate the entire contents of the file, so we have an empty file (besides the header).
I cannot figure out a sequence of messages where the previous code in append(message:) would ever truncate the entire file. The only way this could happen is if the writer (and offset of the start of the oldest message) was at the end of the file's header. If we were in this situation, cacheHasSpaceForNewMessageBeforeEndOfFile could never be false. If cacheHasSpaceForNewMessageBeforeEndOfFile were false we would have already failed this previous check, since in this scenario writer.offsetInFile would equal FileHeader.expectedEndOfHeaderInFile.
CacheAdvance/Sources/CacheAdvance/CacheAdvance.swift
Lines 104 to 111 in 2470113
As far as I can tell, the previous code in append(message:) could never truncate the entire writable portion of the file.
There was a problem hiding this comment.
The only way this could happen is if the writer (and offset of the start of the oldest message) was at the end of the file's header
I agree with this analysis!
If we were in this situation,
cacheHasSpaceForNewMessageBeforeEndOfFilecould never befalse. IfcacheHasSpaceForNewMessageBeforeEndOfFilewere false we would have already failed this previous check, since in this scenariowriter.offsetInFilewould equalFileHeader.expectedEndOfHeaderInFile
I disagree!
If we were writing a message of length header.maximumBytes - FileHeader.expectedEndOfHeaderInFile, then bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile would evaluate as true since bytesNeededToStoreMessage is equal to header.maximumBytes - FileHeader.expectedEndOfHeaderInFile.
If writer.offsetInFile is at the FileHeader.expectedEndOfHeaderInFile (which I agree is a precondition for this bug), then in the next line cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true because writer.offsetInFile + bytesNeededToStoreMessage equals header.maximumBytes:
At which point, I think we've proved that our original analysis was correct. Am I missing something?
To dig in a bit further, I believe this issue occurred because:
offsetInFileAtEndOfNewestMessagewas beyond the actual end of the file- The file was empty except for the header
We only update offsetInFileAtEndOfNewestMessage after writing the message:
CacheAdvance/Sources/CacheAdvance/CacheAdvance.swift
Lines 238 to 246 in ed9ece4
Which means that the only way for offsetInFileAtEndOfNewestMessage to be beyond the actual end of the file is if we deleted part of the file before updating the header with our next write.
Now, this analysis (thank you for making me write this out!) indicates that my fix to prevent this data corruption in the future was incorrect. The issue isn't when we truncate the file, but rather when we update offsetInFileAtEndOfNewestMessage. PR incoming!
There was a problem hiding this comment.
This explanation makes sense. It seems like I was off by one. Thank you very much for writing this out.
There was a problem hiding this comment.
If we were in this situation,
cacheHasSpaceForNewMessageBeforeEndOfFilecould never befalse. IfcacheHasSpaceForNewMessageBeforeEndOfFilewere false we would have already failed this previous check, since in this scenariowriter.offsetInFilewould equalFileHeader.expectedEndOfHeaderInFile
I disagree!
If we were writing a message of length
header.maximumBytes - FileHeader.expectedEndOfHeaderInFile, thenbytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFilewould evaluate astruesincebytesNeededToStoreMessageis equal toheader.maximumBytes - FileHeader.expectedEndOfHeaderInFile.
If
writer.offsetInFileis at theFileHeader.expectedEndOfHeaderInFile(which I agree is a precondition for this bug), then in the next linecacheHasSpaceForNewMessageBeforeEndOfFilewould evaluate totruebecausewriter.offsetInFile + bytesNeededToStoreMessageequalsheader.maximumBytes:
I thought more about this and I've circled back to my original conclusion. Yes, I agree that cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true. However, we only truncate the file if cacheHasSpaceForNewMessageBeforeEndOfFile is false. Accordingly I don't think there is any way that we could truncate the entire file through this code path.
There was a problem hiding this comment.
We talked on Zoom and agreed with where I landed in my last comment.
| XCTAssertEqual(messages, []) | ||
| } | ||
|
|
||
| func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws { |
There was a problem hiding this comment.
Nit: the indentation on this method is off.
There was a problem hiding this comment.
I'll update this in my next PR
Airbnb has seen an infinite recursion crash when reading
messages()in the wild.The crash's stack trace involves repeatedly calling through to
nextEncodedMessage(). This PR makes ournextEncodedMessage()method defensive against an infinite recursion loop, and also fixes the underlying issue that enabled a cache file to get into this bad state. Best reviewed commit by commitI will rebase this PR over #53 once that lands in order to get CI running on this PR, but I have run all tests locally and they are passing, so this should be good to review.