Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CacheAdvance.podspec
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Pod::Spec.new do |s|
s.name = 'CacheAdvance'
s.version = '1.2.0'
s.version = '1.2.1'
s.license = 'Apache License, Version 2.0'
s.summary = 'A performant cache for logging systems. CacheAdvance persists log events 30x faster than SQLite.'
s.homepage = 'https://github.com/dfed/CacheAdvance'
Expand Down
47 changes: 38 additions & 9 deletions Sources/CacheAdvance/CacheAdvance.swift
Original file line number Diff line number Diff line change
Expand Up @@ -36,20 +36,39 @@ public final class CacheAdvance<T: Codable> {
/// - Warning: `shouldOverwriteOldMessages` must be consistent for the life of a cache. Changing this value after logs have been persisted to a cache will prevent appending new messages to this cache.
/// - Warning: `decoder` must have a consistent implementation for the life of a cache. Changing this value after logs have been persisted to a cache may prevent reading messages from this cache.
/// - Warning: `encoder` must have a consistent implementation for the life of a cache. Changing this value after logs have been persisted to a cache may prevent reading messages from this cache.
public init(
public convenience init(
fileURL: URL,
maximumBytes: Bytes,
shouldOverwriteOldMessages: Bool,
decoder: MessageDecoder = JSONDecoder(),
encoder: MessageEncoder = JSONEncoder())
throws
{
self.fileURL = fileURL

writer = try FileHandle(forWritingTo: fileURL)
reader = try CacheReader(forReadingFrom: fileURL)
header = try CacheHeaderHandle(forReadingFrom: fileURL, maximumBytes: maximumBytes, overwritesOldMessages: shouldOverwriteOldMessages)
self.init(
fileURL: fileURL,
writer: try FileHandle(forWritingTo: fileURL),
reader: try CacheReader(forReadingFrom: fileURL),
header: try CacheHeaderHandle(
forReadingFrom: fileURL,
maximumBytes: maximumBytes,
overwritesOldMessages: shouldOverwriteOldMessages),
decoder: decoder,
encoder: encoder)
}

/// An internal initializer with no logic. Can be used to create pathological test cases.
required init(

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a way to inject these private properties so that we can mangle them prior to executing our test.

fileURL: URL,
writer: FileHandle,
reader: CacheReader,
header: CacheHeaderHandle,
decoder: MessageDecoder,
encoder: MessageEncoder)
{
self.fileURL = fileURL
self.writer = writer
self.reader = reader
self.header = header
self.decoder = decoder
self.encoder = encoder
}
Expand Down Expand Up @@ -93,12 +112,16 @@ public final class CacheAdvance<T: Codable> {

let cacheHasSpaceForNewMessageBeforeEndOfFile = writer.offsetInFile + bytesNeededToStoreMessage <= header.maximumBytes
if header.overwritesOldMessages {
if !cacheHasSpaceForNewMessageBeforeEndOfFile {
let truncateAtOffset: UInt64?
if cacheHasSpaceForNewMessageBeforeEndOfFile {
// We have room for this message. No need to truncate.
truncateAtOffset = nil
} else {
// This message can't be written without exceeding our maximum file length.
// We'll need to start writing the file from the beginning of the file.

// Trim the file to the current position to remove soon-to-be-abandoned data from the file.
try writer.truncate(at: writer.offsetInFile)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the way we managed to get into the state the previous commit is testing is that we were truncating the file before we updated the header. We should always update our header before making a write, not after.

I inspected other places where we write data and we were already careful in the other spots to update our header first.

// Trim the file to the current writer position to remove soon-to-be-abandoned data from the file.
truncateAtOffset = writer.offsetInFile

// Set the offset back to the beginning of the file.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next comment says:

// We know the oldest message is at the beginning of the file, since we just tossed out the rest of the file.

I don't think that's accurate anymore since we haven't truncated yet.

try writer.seek(to: FileHeader.expectedEndOfHeaderInFile)
Expand All @@ -121,6 +144,12 @@ public final class CacheAdvance<T: Codable> {
// If the application crashes between writing the header and writing the message data, we'll have lost the messages between the previous offsetInFileOfOldestMessage and the new offsetInFileOfOldestMessage.
try header.updateOffsetInFileOfOldestMessage(to: offsetInFileOfOldestMessage)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to load back into my mind how this code works. If the cache file allows overwriting we start to write from the beginning of the file again. It seems from the code that when we start overwriting, we seek to the next oldest message. What happens if the new message we are trying to write is so long that we need to overwrite multiple older messages?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We handle that in the following code just above the code you highlighted here:

            // Prepare the reader before writing the message.
            try prepareReaderForWriting(dataOfLength: bytesNeededToStoreMessage)

This code does the following:

    /// Advances the reader until there is room to store a new message without writing past the reader.
    /// This method should only be called on a cache that overwrites old messages.
    /// - Parameter messageLength: the length of the next message that will be written.
    private func prepareReaderForWriting(dataOfLength messageLength: Bytes) throws {
        // If our writer is behind our reader,
        while writer.offsetInFile < reader.offsetInFile
            // And our writer doesn't have enough room to write a message such that it stays behind the current reader position.
            && writer.offsetInFile + messageLength >= reader.offsetInFile
        {
            // Then writing this message would write into the oldest-known message.
            // We must advance our reader to the next-oldest message to help make room for the next message we want to write.
            try reader.seekToNextMessage()
        }
    }

We then are willing to write up to where the reader is located after this method returns.


// Truncate the file if it needs truncation before we write the next message, and after we update our header.
// If the application crashes between truncating this message data and writing the next message, our file will still be consistent.
if let truncateAtOffset = truncateAtOffset {
try writer.truncate(at: truncateAtOffset)
}
Comment on lines +147 to +151

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble reminding myself why we need to truncate the end of the file. What would really help me in reading this code is a reminder of how the file is formatted. Do we have any graphic or ASCII art anywhere that shows what the format of the file is? If not, I think that would be really helpful to add as it would allow me to validate this code more confidently.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have any ascii art, though I agree that would help a ton.

High level, the format is:

[reserved header space]([length-of-message][message-content])*

where ([length-of-message][message-content]) can be repeated until the maximum byte count is reached.

Truncating the file prevents us from reading data that should have been deleted when we try to read all messages. Since the only data we have about ordering is the start of first message and end of last message in the header, we'll just try to read all data between these points. If we haven't deleted the data at the end of the file, we would read these supposedly-deleted messages in when trying to read all messages.


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting here so we have a thread. My understanding is that we intend for the reader to always be at the start of the oldest message in the file. Is that understanding correct?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes your understanding is correct.

// Let the reader know where the oldest message begins.
reader.offsetInFileOfOldestMessage = offsetInFileOfOldestMessage

Expand Down
10 changes: 8 additions & 2 deletions Sources/CacheAdvance/CacheReader.swift
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ final class CacheReader {
}

/// Returns the next encodable message, seeking to the beginning of the next message.
func nextEncodedMessage() throws -> Data? {
func nextEncodedMessage(previousReadWasEmpty: Bool = false) throws -> Data? {
let startingOffset = offsetInFile

guard startingOffset != offsetInFileAtEndOfNewestMessage else {
Expand All @@ -60,11 +60,17 @@ final class CacheReader {
return message

case .emptyRead:
guard !previousReadWasEmpty else {
// If the previous read was also empty, then the file has been corrupted.
// Two empty reads in a row means that offsetInFileAtEndOfNewestMessage is incorrect.
// This inconsistency likely is likely due to a crash occurring during a message write.
throw CacheAdvanceError.fileCorrupted
}
Comment on lines +63 to +68

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I follow: we never expect to be in this state anymore, right? It could be useful to make clear that this is a safety net.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let me update this comment 👍

// We know the next message is at the end of the file header. Let's seek to it.
try reader.seek(to: FileHeader.expectedEndOfHeaderInFile)

// We know there's a message to read now that we're at the start of the file.
return try nextEncodedMessage()
return try nextEncodedMessage(previousReadWasEmpty: true)

case .invalidFormat:
throw CacheAdvanceError.fileCorrupted
Expand Down
29 changes: 29 additions & 0 deletions Tests/CacheAdvanceTests/CacheAdvanceTests.swift
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,35 @@ final class CacheAdvanceTests: XCTestCase {
XCTAssertEqual(messages, [])
}

func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws {

@bachand bachand Mar 9, 2022

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling to connect how the code that we changed in append(message:) could have resulted in the situation we have in this test case. Let me walk through my reasoning.

The premise of this test case is that we have an empty file. The offset of the start of the oldest message is at some point beyond the end of the header. And the offset of the end of the newest message is at the end of the header.

We created a fallback fix in nextEncodedMessage(...) and the real fix is in append(message:).

Before our change in append(message:) we were truncating the file before we updated the header. For us to get into this situation of this test case we would need to truncate the entire contents of the file, so we have an empty file (besides the header).

I cannot figure out a sequence of messages where the previous code in append(message:) would ever truncate the entire file. The only way this could happen is if the writer (and offset of the start of the oldest message) was at the end of the file's header. If we were in this situation, cacheHasSpaceForNewMessageBeforeEndOfFile could never be false. If cacheHasSpaceForNewMessageBeforeEndOfFile were false we would have already failed this previous check, since in this scenario writer.offsetInFile would equal FileHeader.expectedEndOfHeaderInFile.

guard
bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile // Make sure we have room in this file for this message.
&& bytesNeededToStoreMessage < Int32.max // Make sure we can read this message back out with Int on a 32-bit device.
else
{
// The message is too long to be written to a cache of this size.
throw CacheAdvanceError.messageLargerThanCacheCapacity
}

As far as I can tell, the previous code in append(message:) could never truncate the entire writable portion of the file.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way this could happen is if the writer (and offset of the start of the oldest message) was at the end of the file's header

I agree with this analysis!

If we were in this situation, cacheHasSpaceForNewMessageBeforeEndOfFile could never be false. If cacheHasSpaceForNewMessageBeforeEndOfFile were false we would have already failed this previous check, since in this scenario writer.offsetInFile would equal FileHeader.expectedEndOfHeaderInFile

I disagree!

If we were writing a message of length header.maximumBytes - FileHeader.expectedEndOfHeaderInFile, then bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile would evaluate as true since bytesNeededToStoreMessage is equal to header.maximumBytes - FileHeader.expectedEndOfHeaderInFile.

If writer.offsetInFile is at the FileHeader.expectedEndOfHeaderInFile (which I agree is a precondition for this bug), then in the next line cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true because writer.offsetInFile + bytesNeededToStoreMessage equals header.maximumBytes:

let cacheHasSpaceForNewMessageBeforeEndOfFile = writer.offsetInFile + bytesNeededToStoreMessage <= header.maximumBytes

At which point, I think we've proved that our original analysis was correct. Am I missing something?

To dig in a bit further, I believe this issue occurred because:

  1. offsetInFileAtEndOfNewestMessage was beyond the actual end of the file
  2. The file was empty except for the header

We only update offsetInFileAtEndOfNewestMessage after writing the message:

private func write(messageData: Data) throws {
// Write the message data.
try writer.write(data: messageData)
// Update the offsetInFileAtEndOfNewestMessage in our header and reader now that we've written the message.
// If the application crashes between writing the message data and writing the header, we'll have lost the most recent message.
try header.updateOffsetInFileAtEndOfNewestMessage(to: writer.offsetInFile)
reader.offsetInFileAtEndOfNewestMessage = writer.offsetInFile
}

Which means that the only way for offsetInFileAtEndOfNewestMessage to be beyond the actual end of the file is if we deleted part of the file before updating the header with our next write.

Now, this analysis (thank you for making me write this out!) indicates that my fix to prevent this data corruption in the future was incorrect. The issue isn't when we truncate the file, but rather when we update offsetInFileAtEndOfNewestMessage. PR incoming!

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#58

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation makes sense. It seems like I was off by one. Thank you very much for writing this out.

@bachand bachand Mar 11, 2022

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were in this situation, cacheHasSpaceForNewMessageBeforeEndOfFile could never be false. If cacheHasSpaceForNewMessageBeforeEndOfFile were false we would have already failed this previous check, since in this scenario writer.offsetInFile would equal FileHeader.expectedEndOfHeaderInFile

I disagree!

If we were writing a message of length header.maximumBytes - FileHeader.expectedEndOfHeaderInFile, then bytesNeededToStoreMessage <= header.maximumBytes - FileHeader.expectedEndOfHeaderInFile would evaluate as true since bytesNeededToStoreMessage is equal to header.maximumBytes - FileHeader.expectedEndOfHeaderInFile.

If writer.offsetInFile is at the FileHeader.expectedEndOfHeaderInFile (which I agree is a precondition for this bug), then in the next line cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true because writer.offsetInFile + bytesNeededToStoreMessage equals header.maximumBytes:

let cacheHasSpaceForNewMessageBeforeEndOfFile = writer.offsetInFile + bytesNeededToStoreMessage <= header.maximumBytes

I thought more about this and I've circled back to my original conclusion. Yes, I agree that cacheHasSpaceForNewMessageBeforeEndOfFile would evaluate to true. However, we only truncate the file if cacheHasSpaceForNewMessageBeforeEndOfFile is false. Accordingly I don't think there is any way that we could truncate the entire file through this code path.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked on Zoom and agreed with where I landed in my last comment.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the indentation on this method is off.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this in my next PR

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#58

let randomHighValue: UInt64 = 10_1000
let header = try CacheHeaderHandle(
forReadingFrom: testFileLocation,
maximumBytes: randomHighValue,
overwritesOldMessages: true)
let cache = CacheAdvance<TestableMessage>(
fileURL: testFileLocation,
writer: try FileHandle(forWritingTo: testFileLocation),
reader: try CacheReader(forReadingFrom: testFileLocation),
header: try CacheHeaderHandle(
forReadingFrom: testFileLocation,
maximumBytes: header.maximumBytes,
overwritesOldMessages: header.overwritesOldMessages),
decoder: JSONDecoder(),
encoder: JSONEncoder())

// Make sure the header data is persisted before we read it as part of the `messages()` call below.
try header.synchronizeHeaderData()
// Our file is empty. Make the file corrupted by setting the offset at end of newest message to be further in the file.
// This should never happen, but past versions of this repo could lead to a file having this kind of inconsistency if a crash occurred at the wrong time.
try header.updateOffsetInFileAtEndOfNewestMessage(
to: FileHeader.expectedEndOfHeaderInFile + UInt64(MessageSpan.storageLength) + 1)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this file is empty would FileHeader.expectedEndOfHeaderInFile + 1 cause the issue? I am trying to understand why we need to add UInt64(MessageSpan.storageLength).

@dfed dfed Mar 9, 2022

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileHeader.expectedEndOfHeaderInFile + 1 does indeed cause the issue, though I don't think that's what the file would have looked like when we were in the bad state. My guess is that we had at least one span in the expected end of newest message, plus the length of that message.

I'm happy to change this to just be + 1 since that does reproduce the issue, and since I think we've solved the original issue. I'll do that in a follow-up PR.


XCTAssertThrowsError(try cache.messages()) {

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the next commit, this test will infinite loop and crash.

XCTAssertEqual($0 as? CacheAdvanceError, CacheAdvanceError.fileCorrupted)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to detect this case and throw an appropriate error. We should also be able to prevent this case from happening in production. We solve both of these "shoulds" in the next few commits.

}
}

func test_isWritable_returnsTrueWhenStaticHeaderMetadataMatches() throws {
let originalCache = try createCache(overwritesOldMessages: false)
XCTAssertTrue(try originalCache.isWritable())
Expand Down