Prevent infinite recursion loop while reading messages by dfed · Pull Request #52 · dfed/CacheAdvance

dfed · 2022-03-07T20:02:32Z

Airbnb has seen an infinite recursion crash when reading messages() in the wild.

The crash's stack trace involves repeatedly calling through to nextEncodedMessage(). This PR makes our nextEncodedMessage() method defensive against an infinite recursion loop, and also fixes the underlying issue that enabled a cache file to get into this bad state. Best reviewed commit by commit

I will rebase this PR over #53 once that lands in order to get CI running on this PR, but I have run all tests locally and they are passing, so this should be good to review.

…ogical test cases

dfed · 2022-03-07T20:02:58Z

+    }

+    /// An internal initializer with no logic. Can be used to create pathological test cases.
+    required init(


we need a way to inject these private properties so that we can mangle them prior to executing our test.

dfed · 2022-03-07T20:33:12Z

+      try header.updateOffsetInFileAtEndOfNewestMessage(
+        to: FileHeader.expectedEndOfHeaderInFile + UInt64(MessageSpan.storageLength) + 1)
+
+      XCTAssertThrowsError(try cache.messages()) {


Before the next commit, this test will infinite loop and crash.

dfed · 2022-03-07T20:33:44Z

+        to: FileHeader.expectedEndOfHeaderInFile + UInt64(MessageSpan.storageLength) + 1)
+
+      XCTAssertThrowsError(try cache.messages()) {
+          XCTAssertEqual($0 as? CacheAdvanceError, CacheAdvanceError.fileCorrupted)


We should be able to detect this case and throw an appropriate error. We should also be able to prevent this case from happening in production. We solve both of these "shoulds" in the next few commits.

dfed · 2022-03-07T20:33:57Z

+  func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws {
+      let header = try CacheHeaderHandle(
+        forReadingFrom: testFileLocation,
+        maximumBytes: 10_000,


this number is effectively random. Doesn't matter what it is.

let's add this to a local constant called randomHighValue, that way we document this via the property name.

dfed · 2022-03-07T20:35:26Z

                // We'll need to start writing the file from the beginning of the file.

-                // Trim the file to the current position to remove soon-to-be-abandoned data from the file.
-                try writer.truncate(at: writer.offsetInFile)


I believe the way we managed to get into the state the previous commit is testing is that we were truncating the file before we updated the header. We should always update our header before making a write, not after.

I inspected other places where we write data and we were already careful in the other spots to update our header first.

Co-authored-by: Steven Hepting <shepting@gmail.com>

fdiaz · 2022-03-08T17:37:55Z

+  func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws {
+      let header = try CacheHeaderHandle(
+        forReadingFrom: testFileLocation,
+        maximumBytes: 10_000,


let's add this to a local constant called randomHighValue, that way we document this via the property name.

Co-authored-by: Francisco Diaz <fdiaz@users.noreply.github.com>

bachand

Thanks for addressing this issue @dfed . I've left a few comments. If you can help refresh my memory on the format of the underlying file (see one of my comments) I plan to revisit this PR. At that point I will be able to review more confidently.

bachand · 2022-03-09T01:11:24Z

+      // Our file is empty. Make the file corrupted by setting the offset at end of newest message to be further in the file.
+      // This should never happen, but past versions of this repo could lead to a file having this kind of inconsistency if a crash occurred at the wrong time.
+      try header.updateOffsetInFileAtEndOfNewestMessage(
+        to: FileHeader.expectedEndOfHeaderInFile + UInt64(MessageSpan.storageLength) + 1)


If this file is empty would FileHeader.expectedEndOfHeaderInFile + 1 cause the issue? I am trying to understand why we need to add UInt64(MessageSpan.storageLength).

FileHeader.expectedEndOfHeaderInFile + 1 does indeed cause the issue, though I don't think that's what the file would have looked like when we were in the bad state. My guess is that we had at least one span in the expected end of newest message, plus the length of that message.

I'm happy to change this to just be + 1 since that does reproduce the issue, and since I think we've solved the original issue. I'll do that in a follow-up PR.

bachand · 2022-03-09T01:18:44Z

+                // Trim the file to the current writer position to remove soon-to-be-abandoned data from the file.
+                truncateAtOffset = writer.offsetInFile

                // Set the offset back to the beginning of the file.


The next comment says:

// We know the oldest message is at the beginning of the file, since we just tossed out the rest of the file.

I don't think that's accurate anymore since we haven't truncated yet.

bachand · 2022-03-09T01:20:59Z

@@ -121,6 +144,12 @@ public final class CacheAdvance<T: Codable> {
            // If the application crashes between writing the header and writing the message data, we'll have lost the messages between the previous offsetInFileOfOldestMessage and the new offsetInFileOfOldestMessage.
            try header.updateOffsetInFileOfOldestMessage(to: offsetInFileOfOldestMessage)


I'm trying to load back into my mind how this code works. If the cache file allows overwriting we start to write from the beginning of the file again. It seems from the code that when we start overwriting, we seek to the next oldest message. What happens if the new message we are trying to write is so long that we need to overwrite multiple older messages?

We handle that in the following code just above the code you highlighted here:

// Prepare the reader before writing the message. try prepareReaderForWriting(dataOfLength: bytesNeededToStoreMessage)

This code does the following:

/// Advances the reader until there is room to store a new message without writing past the reader. /// This method should only be called on a cache that overwrites old messages. /// - Parameter messageLength: the length of the next message that will be written. private func prepareReaderForWriting(dataOfLength messageLength: Bytes) throws { // If our writer is behind our reader, while writer.offsetInFile < reader.offsetInFile // And our writer doesn't have enough room to write a message such that it stays behind the current reader position. && writer.offsetInFile + messageLength >= reader.offsetInFile { // Then writing this message would write into the oldest-known message. // We must advance our reader to the next-oldest message to help make room for the next message we want to write. try reader.seekToNextMessage() } }

We then are willing to write up to where the reader is located after this method returns.

bachand · 2022-03-09T01:23:49Z

+            // Truncate the file if it needs truncation before we write the next message, and after we update our header.
+            // If the application crashes between truncating this message data and writing the next message, our file will still be consistent.
+            if let truncateAtOffset = truncateAtOffset {
+                try writer.truncate(at: truncateAtOffset)
+            }


I'm having trouble reminding myself why we need to truncate the end of the file. What would really help me in reading this code is a reminder of how the file is formatted. Do we have any graphic or ASCII art anywhere that shows what the format of the file is? If not, I think that would be really helpful to add as it would allow me to validate this code more confidently.

We do not have any ascii art, though I agree that would help a ton.

High level, the format is:

[reserved header space]([length-of-message][message-content])*

where ([length-of-message][message-content]) can be repeated until the maximum byte count is reached.

Truncating the file prevents us from reading data that should have been deleted when we try to read all messages. Since the only data we have about ordering is the start of first message and end of last message in the header, we'll just try to read all data between these points. If we haven't deleted the data at the end of the file, we would read these supposedly-deleted messages in when trying to read all messages.

bachand · 2022-03-09T01:24:39Z

+            guard !previousReadWasEmpty else {
+                // If the previous read was also empty, then the file has been corrupted.
+                // Two empty reads in a row means that offsetInFileAtEndOfNewestMessage is incorrect.
+                // This inconsistency likely is likely due to a crash occurring during a message write.
+                throw CacheAdvanceError.fileCorrupted
+            }


To make sure I follow: we never expect to be in this state anymore, right? It could be useful to make clear that this is a safety net.

yeah let me update this comment 👍

bachand · 2022-03-09T18:24:27Z

+            if let truncateAtOffset = truncateAtOffset {
+                try writer.truncate(at: truncateAtOffset)
+            }
+


Putting here so we have a thread. My understanding is that we intend for the reader to always be at the start of the oldest message in the file. Is that understanding correct?

yes your understanding is correct.

bachand · 2022-03-09T18:47:26Z

        XCTAssertEqual(messages, [])
    }

+  func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws {


I am struggling to connect how the code that we changed in append(message:) could have resulted in the situation we have in this test case. Let me walk through my reasoning.

The premise of this test case is that we have an empty file. The offset of the start of the oldest message is at some point beyond the end of the header. And the offset of the end of the newest message is at the end of the header.

We created a fallback fix in nextEncodedMessage(...) and the real fix is in append(message:).

Before our change in append(message:) we were truncating the file before we updated the header. For us to get into this situation of this test case we would need to truncate the entire contents of the file, so we have an empty file (besides the header).

I cannot figure out a sequence of messages where the previous code in append(message:) would ever truncate the entire file. The only way this could happen is if the writer (and offset of the start of the oldest message) was at the end of the file's header. If we were in this situation, cacheHasSpaceForNewMessageBeforeEndOfFile could never be false. If cacheHasSpaceForNewMessageBeforeEndOfFile were false we would have already failed this previous check, since in this scenario writer.offsetInFile would equal FileHeader.expectedEndOfHeaderInFile.

CacheAdvance/Sources/CacheAdvance/CacheAdvance.swift

Lines 104 to 111 in 2470113

guard

bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile // Make sure we have room in this file for this message.

&& bytesNeededToStoreMessage < Int32.max // Make sure we can read this message back out with Int on a 32-bit device.

else

{

// The message is too long to be written to a cache of this size.

throw CacheAdvanceError.messageLargerThanCacheCapacity

}

As far as I can tell, the previous code in append(message:) could never truncate the entire writable portion of the file.

The only way this could happen is if the writer (and offset of the start of the oldest message) was at the end of the file's header

I agree with this analysis!

If we were in this situation, cacheHasSpaceForNewMessageBeforeEndOfFile could never be false. If cacheHasSpaceForNewMessageBeforeEndOfFile were false we would have already failed this previous check, since in this scenario writer.offsetInFile would equal FileHeader.expectedEndOfHeaderInFile

I disagree!

If we were writing a message of length header.maximumBytes - FileHeader.expectedEndOfHeaderInFile, then bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile would evaluate as true since bytesNeededToStoreMessage is equal to header.maximumBytes - FileHeader.expectedEndOfHeaderInFile.

If writer.offsetInFile is at the FileHeader.expectedEndOfHeaderInFile (which I agree is a precondition for this bug), then in the next line cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true because writer.offsetInFile + bytesNeededToStoreMessage equals header.maximumBytes:

CacheAdvance/Sources/CacheAdvance/CacheAdvance.swift

Line 113 in ed9ece4

let cacheHasSpaceForNewMessageBeforeEndOfFile = writer.offsetInFile + bytesNeededToStoreMessage <= header.maximumBytes

At which point, I think we've proved that our original analysis was correct. Am I missing something?

To dig in a bit further, I believe this issue occurred because:

offsetInFileAtEndOfNewestMessage was beyond the actual end of the file

The file was empty except for the header

We only update offsetInFileAtEndOfNewestMessage after writing the message:

CacheAdvance/Sources/CacheAdvance/CacheAdvance.swift

Lines 238 to 246 in ed9ece4

private func write(messageData: Data) throws {

// Write the message data.

try writer.write(data: messageData)

// Update the offsetInFileAtEndOfNewestMessage in our header and reader now that we've written the message.

// If the application crashes between writing the message data and writing the header, we'll have lost the most recent message.

try header.updateOffsetInFileAtEndOfNewestMessage(to: writer.offsetInFile)

reader.offsetInFileAtEndOfNewestMessage = writer.offsetInFile

}

Which means that the only way for offsetInFileAtEndOfNewestMessage to be beyond the actual end of the file is if we deleted part of the file before updating the header with our next write.

Now, this analysis (thank you for making me write this out!) indicates that my fix to prevent this data corruption in the future was incorrect. The issue isn't when we truncate the file, but rather when we update offsetInFileAtEndOfNewestMessage. PR incoming!

This explanation makes sense. It seems like I was off by one. Thank you very much for writing this out.

If we were in this situation, cacheHasSpaceForNewMessageBeforeEndOfFile could never be false. If cacheHasSpaceForNewMessageBeforeEndOfFile were false we would have already failed this previous check, since in this scenario writer.offsetInFile would equal FileHeader.expectedEndOfHeaderInFile

I disagree!

If we were writing a message of length header.maximumBytes - FileHeader.expectedEndOfHeaderInFile, then bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile would evaluate as true since bytesNeededToStoreMessage is equal to header.maximumBytes - FileHeader.expectedEndOfHeaderInFile.

If writer.offsetInFile is at the FileHeader.expectedEndOfHeaderInFile (which I agree is a precondition for this bug), then in the next line cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true because writer.offsetInFile + bytesNeededToStoreMessage equals header.maximumBytes:

CacheAdvance/Sources/CacheAdvance/CacheAdvance.swift

Line 113 in ed9ece4

let cacheHasSpaceForNewMessageBeforeEndOfFile = writer.offsetInFile + bytesNeededToStoreMessage <= header.maximumBytes

I thought more about this and I've circled back to my original conclusion. Yes, I agree that cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true. However, we only truncate the file if cacheHasSpaceForNewMessageBeforeEndOfFile is false. Accordingly I don't think there is any way that we could truncate the entire file through this code path.

We talked on Zoom and agreed with where I landed in my last comment.

bachand · 2022-03-09T19:12:05Z

        XCTAssertEqual(messages, [])
    }

+  func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws {


Nit: the indentation on this method is off.

I'll update this in my next PR

* Delete duplicative comment + code * Remove incorrect assertion in comment * Add test for file header length

Create internal initializer for CacheAdvance to enable testing pathol…

b728029

…ogical test cases

dfed commented Mar 7, 2022

View reviewed changes

dfed added 4 commits March 7, 2022 12:48

Write test that will fail

45b66b5

Fix test

41e32b5

Attempt to prevent the test scenario from happening again

4ae0a03

Bugfix version bump

81ac1d6

dfed force-pushed the dfed--defensive-writes branch from b94c612 to 81ac1d6 Compare March 7, 2022 20:48

dfed requested review from bachand and fdiaz March 8, 2022 01:57

dfed marked this pull request as ready for review March 8, 2022 01:57

shepting reviewed Mar 8, 2022

View reviewed changes

Comment thread Sources/CacheAdvance/CacheReader.swift Outdated

dfed and others added 2 commits March 8, 2022 09:25

Merge remote-tracking branch 'origin/HEAD' into dfed--defensive-writes

8780b71

Dan learns to type

0bb76ff

Co-authored-by: Steven Hepting <shepting@gmail.com>

fdiaz approved these changes Mar 8, 2022

View reviewed changes

dfed and others added 2 commits March 8, 2022 09:46

Better comment

a56f206

Co-authored-by: Francisco Diaz <fdiaz@users.noreply.github.com>

Document that value is random

2470113

dfed merged commit b87b4f3 into main Mar 8, 2022

dfed deleted the dfed--defensive-writes branch March 8, 2022 19:31

bachand reviewed Mar 9, 2022

View reviewed changes

dfed mentioned this pull request Mar 9, 2022

Bachand feedback #56

Merged

bachand mentioned this pull request Mar 9, 2022

Potential pathological case where Cache Advance reads no messages for a non-empty file #57

Closed

bachand reviewed Mar 9, 2022

View reviewed changes

This was referenced Mar 10, 2022

Limit possibility of file corruption #58

Merged

Clean up after #52 and #58 #59

Merged

dfed added a commit that referenced this pull request Mar 15, 2022

Clean up after #52 and #58 (#59)

cc069c1

* Delete duplicative comment + code * Remove incorrect assertion in comment * Add test for file header length

		@@ -121,6 +144,12 @@ public final class CacheAdvance<T: Codable> {
		// If the application crashes between writing the header and writing the message data, we'll have lost the messages between the previous offsetInFileOfOldestMessage and the new offsetInFileOfOldestMessage.
		try header.updateOffsetInFileOfOldestMessage(to: offsetInFileOfOldestMessage)

	guard
	bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile // Make sure we have room in this file for this message.
	&& bytesNeededToStoreMessage < Int32.max // Make sure we can read this message back out with Int on a 32-bit device.
	else
	{
	// The message is too long to be written to a cache of this size.
	throw CacheAdvanceError.messageLargerThanCacheCapacity
	}

	private func write(messageData: Data) throws {
	// Write the message data.
	try writer.write(data: messageData)

	// Update the offsetInFileAtEndOfNewestMessage in our header and reader now that we've written the message.
	// If the application crashes between writing the message data and writing the header, we'll have lost the most recent message.
	try header.updateOffsetInFileAtEndOfNewestMessage(to: writer.offsetInFile)
	reader.offsetInFileAtEndOfNewestMessage = writer.offsetInFile
	}

Uh oh!

Conversation

dfed commented Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bachand left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dfed Mar 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bachand Mar 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bachand Mar 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dfed commented Mar 7, 2022 •

edited

Loading

dfed Mar 9, 2022 •

edited

Loading

bachand Mar 9, 2022 •

edited

Loading

bachand Mar 11, 2022 •

edited

Loading