Skip to content

Prevent infinite recursion loop while reading messages#52

Merged
dfed merged 9 commits into
mainfrom
dfed--defensive-writes
Mar 8, 2022
Merged

Prevent infinite recursion loop while reading messages#52
dfed merged 9 commits into
mainfrom
dfed--defensive-writes

Conversation

@dfed

@dfed dfed commented Mar 7, 2022

Copy link
Copy Markdown
Owner

Airbnb has seen an infinite recursion crash when reading messages() in the wild.

The crash's stack trace involves repeatedly calling through to nextEncodedMessage(). This PR makes our nextEncodedMessage() method defensive against an infinite recursion loop, and also fixes the underlying issue that enabled a cache file to get into this bad state. Best reviewed commit by commit

I will rebase this PR over #53 once that lands in order to get CI running on this PR, but I have run all tests locally and they are passing, so this should be good to review.

}

/// An internal initializer with no logic. Can be used to create pathological test cases.
required init(

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a way to inject these private properties so that we can mangle them prior to executing our test.

try header.updateOffsetInFileAtEndOfNewestMessage(
to: FileHeader.expectedEndOfHeaderInFile + UInt64(MessageSpan.storageLength) + 1)

XCTAssertThrowsError(try cache.messages()) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the next commit, this test will infinite loop and crash.

to: FileHeader.expectedEndOfHeaderInFile + UInt64(MessageSpan.storageLength) + 1)

XCTAssertThrowsError(try cache.messages()) {
XCTAssertEqual($0 as? CacheAdvanceError, CacheAdvanceError.fileCorrupted)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to detect this case and throw an appropriate error. We should also be able to prevent this case from happening in production. We solve both of these "shoulds" in the next few commits.

func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws {
let header = try CacheHeaderHandle(
forReadingFrom: testFileLocation,
maximumBytes: 10_000,

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this number is effectively random. Doesn't matter what it is.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add this to a local constant called randomHighValue, that way we document this via the property name.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// We'll need to start writing the file from the beginning of the file.

// Trim the file to the current position to remove soon-to-be-abandoned data from the file.
try writer.truncate(at: writer.offsetInFile)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the way we managed to get into the state the previous commit is testing is that we were truncating the file before we updated the header. We should always update our header before making a write, not after.

I inspected other places where we write data and we were already careful in the other spots to update our header first.

@dfed dfed force-pushed the dfed--defensive-writes branch from b94c612 to 81ac1d6 Compare March 7, 2022 20:48
@dfed dfed requested review from bachand and fdiaz March 8, 2022 01:57
@dfed dfed marked this pull request as ready for review March 8, 2022 01:57
Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
dfed and others added 2 commits March 8, 2022 09:25
func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws {
let header = try CacheHeaderHandle(
forReadingFrom: testFileLocation,
maximumBytes: 10_000,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add this to a local constant called randomHighValue, that way we document this via the property name.

Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
dfed and others added 2 commits March 8, 2022 09:46
Co-authored-by: Francisco Diaz <fdiaz@users.noreply.github.com>
@dfed dfed merged commit b87b4f3 into main Mar 8, 2022
@dfed dfed deleted the dfed--defensive-writes branch March 8, 2022 19:31

@bachand bachand left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this issue @dfed . I've left a few comments. If you can help refresh my memory on the format of the underlying file (see one of my comments) I plan to revisit this PR. At that point I will be able to review more confidently.

// Our file is empty. Make the file corrupted by setting the offset at end of newest message to be further in the file.
// This should never happen, but past versions of this repo could lead to a file having this kind of inconsistency if a crash occurred at the wrong time.
try header.updateOffsetInFileAtEndOfNewestMessage(
to: FileHeader.expectedEndOfHeaderInFile + UInt64(MessageSpan.storageLength) + 1)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this file is empty would FileHeader.expectedEndOfHeaderInFile + 1 cause the issue? I am trying to understand why we need to add UInt64(MessageSpan.storageLength).

@dfed dfed Mar 9, 2022

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileHeader.expectedEndOfHeaderInFile + 1 does indeed cause the issue, though I don't think that's what the file would have looked like when we were in the bad state. My guess is that we had at least one span in the expected end of newest message, plus the length of that message.

I'm happy to change this to just be + 1 since that does reproduce the issue, and since I think we've solved the original issue. I'll do that in a follow-up PR.

// Trim the file to the current writer position to remove soon-to-be-abandoned data from the file.
truncateAtOffset = writer.offsetInFile

// Set the offset back to the beginning of the file.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next comment says:

// We know the oldest message is at the beginning of the file, since we just tossed out the rest of the file.

I don't think that's accurate anymore since we haven't truncated yet.

@@ -121,6 +144,12 @@ public final class CacheAdvance<T: Codable> {
// If the application crashes between writing the header and writing the message data, we'll have lost the messages between the previous offsetInFileOfOldestMessage and the new offsetInFileOfOldestMessage.
try header.updateOffsetInFileOfOldestMessage(to: offsetInFileOfOldestMessage)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to load back into my mind how this code works. If the cache file allows overwriting we start to write from the beginning of the file again. It seems from the code that when we start overwriting, we seek to the next oldest message. What happens if the new message we are trying to write is so long that we need to overwrite multiple older messages?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We handle that in the following code just above the code you highlighted here:

            // Prepare the reader before writing the message.
            try prepareReaderForWriting(dataOfLength: bytesNeededToStoreMessage)

This code does the following:

    /// Advances the reader until there is room to store a new message without writing past the reader.
    /// This method should only be called on a cache that overwrites old messages.
    /// - Parameter messageLength: the length of the next message that will be written.
    private func prepareReaderForWriting(dataOfLength messageLength: Bytes) throws {
        // If our writer is behind our reader,
        while writer.offsetInFile < reader.offsetInFile
            // And our writer doesn't have enough room to write a message such that it stays behind the current reader position.
            && writer.offsetInFile + messageLength >= reader.offsetInFile
        {
            // Then writing this message would write into the oldest-known message.
            // We must advance our reader to the next-oldest message to help make room for the next message we want to write.
            try reader.seekToNextMessage()
        }
    }

We then are willing to write up to where the reader is located after this method returns.

Comment on lines +147 to +151
// Truncate the file if it needs truncation before we write the next message, and after we update our header.
// If the application crashes between truncating this message data and writing the next message, our file will still be consistent.
if let truncateAtOffset = truncateAtOffset {
try writer.truncate(at: truncateAtOffset)
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble reminding myself why we need to truncate the end of the file. What would really help me in reading this code is a reminder of how the file is formatted. Do we have any graphic or ASCII art anywhere that shows what the format of the file is? If not, I think that would be really helpful to add as it would allow me to validate this code more confidently.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have any ascii art, though I agree that would help a ton.

High level, the format is:

[reserved header space]([length-of-message][message-content])*

where ([length-of-message][message-content]) can be repeated until the maximum byte count is reached.

Truncating the file prevents us from reading data that should have been deleted when we try to read all messages. Since the only data we have about ordering is the start of first message and end of last message in the header, we'll just try to read all data between these points. If we haven't deleted the data at the end of the file, we would read these supposedly-deleted messages in when trying to read all messages.

Comment on lines +63 to +68
guard !previousReadWasEmpty else {
// If the previous read was also empty, then the file has been corrupted.
// Two empty reads in a row means that offsetInFileAtEndOfNewestMessage is incorrect.
// This inconsistency likely is likely due to a crash occurring during a message write.
throw CacheAdvanceError.fileCorrupted
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I follow: we never expect to be in this state anymore, right? It could be useful to make clear that this is a safety net.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let me update this comment 👍

if let truncateAtOffset = truncateAtOffset {
try writer.truncate(at: truncateAtOffset)
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting here so we have a thread. My understanding is that we intend for the reader to always be at the start of the oldest message in the file. Is that understanding correct?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes your understanding is correct.

XCTAssertEqual(messages, [])
}

func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws {

@bachand bachand Mar 9, 2022

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling to connect how the code that we changed in append(message:) could have resulted in the situation we have in this test case. Let me walk through my reasoning.

The premise of this test case is that we have an empty file. The offset of the start of the oldest message is at some point beyond the end of the header. And the offset of the end of the newest message is at the end of the header.

We created a fallback fix in nextEncodedMessage(...) and the real fix is in append(message:).

Before our change in append(message:) we were truncating the file before we updated the header. For us to get into this situation of this test case we would need to truncate the entire contents of the file, so we have an empty file (besides the header).

I cannot figure out a sequence of messages where the previous code in append(message:) would ever truncate the entire file. The only way this could happen is if the writer (and offset of the start of the oldest message) was at the end of the file's header. If we were in this situation, cacheHasSpaceForNewMessageBeforeEndOfFile could never be false. If cacheHasSpaceForNewMessageBeforeEndOfFile were false we would have already failed this previous check, since in this scenario writer.offsetInFile would equal FileHeader.expectedEndOfHeaderInFile.

guard
bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile // Make sure we have room in this file for this message.
&& bytesNeededToStoreMessage < Int32.max // Make sure we can read this message back out with Int on a 32-bit device.
else
{
// The message is too long to be written to a cache of this size.
throw CacheAdvanceError.messageLargerThanCacheCapacity
}

As far as I can tell, the previous code in append(message:) could never truncate the entire writable portion of the file.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way this could happen is if the writer (and offset of the start of the oldest message) was at the end of the file's header

I agree with this analysis!

If we were in this situation, cacheHasSpaceForNewMessageBeforeEndOfFile could never be false. If cacheHasSpaceForNewMessageBeforeEndOfFile were false we would have already failed this previous check, since in this scenario writer.offsetInFile would equal FileHeader.expectedEndOfHeaderInFile

I disagree!

If we were writing a message of length header.maximumBytes - FileHeader.expectedEndOfHeaderInFile, then bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile would evaluate as true since bytesNeededToStoreMessage is equal to header.maximumBytes - FileHeader.expectedEndOfHeaderInFile.

If writer.offsetInFile is at the FileHeader.expectedEndOfHeaderInFile (which I agree is a precondition for this bug), then in the next line cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true because writer.offsetInFile + bytesNeededToStoreMessage equals header.maximumBytes:

let cacheHasSpaceForNewMessageBeforeEndOfFile = writer.offsetInFile + bytesNeededToStoreMessage <= header.maximumBytes

At which point, I think we've proved that our original analysis was correct. Am I missing something?

To dig in a bit further, I believe this issue occurred because:

  1. offsetInFileAtEndOfNewestMessage was beyond the actual end of the file
  2. The file was empty except for the header

We only update offsetInFileAtEndOfNewestMessage after writing the message:

private func write(messageData: Data) throws {
// Write the message data.
try writer.write(data: messageData)
// Update the offsetInFileAtEndOfNewestMessage in our header and reader now that we've written the message.
// If the application crashes between writing the message data and writing the header, we'll have lost the most recent message.
try header.updateOffsetInFileAtEndOfNewestMessage(to: writer.offsetInFile)
reader.offsetInFileAtEndOfNewestMessage = writer.offsetInFile
}

Which means that the only way for offsetInFileAtEndOfNewestMessage to be beyond the actual end of the file is if we deleted part of the file before updating the header with our next write.

Now, this analysis (thank you for making me write this out!) indicates that my fix to prevent this data corruption in the future was incorrect. The issue isn't when we truncate the file, but rather when we update offsetInFileAtEndOfNewestMessage. PR incoming!

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#58

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation makes sense. It seems like I was off by one. Thank you very much for writing this out.

@bachand bachand Mar 11, 2022

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were in this situation, cacheHasSpaceForNewMessageBeforeEndOfFile could never be false. If cacheHasSpaceForNewMessageBeforeEndOfFile were false we would have already failed this previous check, since in this scenario writer.offsetInFile would equal FileHeader.expectedEndOfHeaderInFile

I disagree!

If we were writing a message of length header.maximumBytes - FileHeader.expectedEndOfHeaderInFile, then bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile would evaluate as true since bytesNeededToStoreMessage is equal to header.maximumBytes - FileHeader.expectedEndOfHeaderInFile.

If writer.offsetInFile is at the FileHeader.expectedEndOfHeaderInFile (which I agree is a precondition for this bug), then in the next line cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true because writer.offsetInFile + bytesNeededToStoreMessage equals header.maximumBytes:

let cacheHasSpaceForNewMessageBeforeEndOfFile = writer.offsetInFile + bytesNeededToStoreMessage <= header.maximumBytes

I thought more about this and I've circled back to my original conclusion. Yes, I agree that cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true. However, we only truncate the file if cacheHasSpaceForNewMessageBeforeEndOfFile is false. Accordingly I don't think there is any way that we could truncate the entire file through this code path.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked on Zoom and agreed with where I landed in my last comment.

XCTAssertEqual(messages, [])
}

func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the indentation on this method is off.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this in my next PR

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#58

dfed added a commit that referenced this pull request Mar 15, 2022
* Delete duplicative comment + code

* Remove incorrect assertion in comment

* Add test for file header length
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants