Skip to content

Split into two ranges to read messages if offsetInFileOfOldestMessage > offsetInFileAtEndOfNewestMessage to avoid possible loop#72

Merged
dfed merged 11 commits into
dfed:mainfrom
jianjunwoo:public_dev
Nov 10, 2022
Merged

Split into two ranges to read messages if offsetInFileOfOldestMessage > offsetInFileAtEndOfNewestMessage to avoid possible loop#72
dfed merged 11 commits into
dfed:mainfrom
jianjunwoo:public_dev

Conversation

@jianjunwoo

Copy link
Copy Markdown
Collaborator

My concern is there is still this line of code: try reader.seek(to: FileHeader.expectedEndOfHeaderInFile) , which keeps the possibility to run into an infinite loop.

Thus, I have a thought to fix this for shouldOverwriteOldMessages true | false without the seek back action in this func.

  1. When shouldOverwriteOldMessages is true and new_offset < old_offset , split the reader by two ranges according old message offset, new message offset.
    1.1 Range 1 would be : from old message offset to file end.
    1.2 Range 2 would be: from top to newest offset
  2. When shouldOverwriteOldMessages is false, the reader read from top to end.
  3. Both of the above situations, we never seek back for next read which can guarantee there is no loop.

@dfed @bachand

@dfed dfed left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you're thinking about this problem! I need to dig in more before I can approve, which I won't have time to do until next week, but I'm excited by this direction 😄. In the meantime, I've left some comments/questions.

Comment thread Sources/CacheAdvance/CacheAdvance.swift Outdated
Comment on lines +173 to +194
// There is only one range: | `offsetInFileOfOldestMessage` -> `offsetInFileAtEndOfNewestMessage`|
if reader.offsetInFileOfOldestMessage < reader.offsetInFileAtEndOfNewestMessage {
for encodedMessage in try reader.encodedMessagesFromOffset(reader.offsetInFileOfOldestMessage,
endOffset: reader.offsetInFileAtEndOfNewestMessage) {
messages.append(try decoder.decode(T.self, from: encodedMessage))
}
} else {
// In this case, the messages could be split to two ranges
// | First Range | (GAP: ignore) | Second Range |

// This is second range: | `offsetInFileOfOldestMessage` -> EOF |
for encodedMessage in try reader.encodedMessagesFromOffset(reader.offsetInFileOfOldestMessage) {
messages.append(try decoder.decode(T.self, from: encodedMessage))
}

// This is first range: | `expectedEndOfHeaderInFile` -> `offsetInFileAtEndOfNewestMessage`|
for encodedMessage in try reader.encodedMessagesFromOffset(
FileHeader.expectedEndOfHeaderInFile,
endOffset: reader.offsetInFileAtEndOfNewestMessage) {
messages.append(try decoder.decode(T.self, from: encodedMessage))
}
}

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clever approach! I don't have time to dig to deeply into this PR this week, but I'll do my best to get to it next week. My initial impression is that this approach seems reasonable. At a minimum, I really like the code comment you've written here – they'll really help future maintainers (myself included!).

@jianjunwoo jianjunwoo Nov 4, 2022

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you're thinking about this problem! I need to dig in more before I can approve, which I won't have time to do until next week, but I'm excited by this direction 😄. In the meantime, I've left some comments/questions.

This approach came in when I read the discussion in PR#64 again. really happy to see this move forward if you feel it right too : D @dfed

Comment thread Sources/CacheAdvance/CacheAdvance.swift Outdated
Comment thread Sources/CacheAdvance/CacheReader.swift
Comment thread Sources/CacheAdvance/CacheReader.swift
Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
XCTAssertEqual(messages, [])
}

func test_messages_whenOffsetInFileAtEndOfNewestMessageIsBeyondEndOfNewestMessageButBeforeEndOfFile_throwsFileCorrupted() throws {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this test (and the deleted one below) no longer valid? I think testing these cases is still valuable, but I might be missing something.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests should be kept. I reverted the last commit, this was deleted then.

let message: TestableMessage = "This is a test"
let maximumBytes = try requiredByteCount(for: [message])
func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws {
let randomHighValue: UInt64 = 10_1000

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this number less confusing (and fix an old mistake of mine) by doing either:

Suggested change
let randomHighValue: UInt64 = 10_1000
let randomHighValue: UInt64 = 10_000

or:

Suggested change
let randomHighValue: UInt64 = 10_1000
let randomHighValue: UInt64 = 101_000

Comment thread CacheAdvance.podspec
@codecov

codecov Bot commented Nov 4, 2022

Copy link
Copy Markdown

Codecov Report

Merging #72 (27c9f65) into main (3a1ed6f) will increase coverage by 0.07%.
The diff coverage is 96.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #72      +/-   ##
==========================================
+ Coverage   96.36%   96.44%   +0.07%     
==========================================
  Files          14       14              
  Lines         578      591      +13     
==========================================
+ Hits          557      570      +13     
  Misses         21       21              
Impacted Files Coverage Δ
Sources/CacheAdvance/CacheReader.swift 93.50% <93.93%> (-0.75%) ⬇️
Sources/CacheAdvance/CacheAdvance.swift 100.00% <100.00%> (ø)

Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
break
}
}
if let endOffset, offsetInFile != endOffset {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If use maximumOffset set default value for endOffset in the func's parameter, this line need change to offsetInFile < endOffset.
If changed to this, we need to change some test cases, like test_messages_whenOffsetInFileAtEndOfNewestMessageIsBeyondEndOfFile_throwsFileCorrupted. That case won't get error when offsetInFileAtEndOfNewestMessage is beyond EOF.

Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
@bachand

bachand commented Nov 5, 2022

Copy link
Copy Markdown
Collaborator

I will dig into this next week as well. Thanks @jianjunwoo !

@bachand bachand left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @jianjunwoo !

My understanding of this PR is as follows... We aren't currently aware of any type of file corruption that could lead to an infinite loop. At the same time, the way that code is currently written means that an infinite loop remains possible. This PR refactors the library so that an infinite loop is not possible.

It's worth pointing out that so far we haven't (as far as I can recall) acknowledged the two different "modes" of CacheAdvance in the code. The code has been written to transparently handle both situations. There's something elegant with not explicitly acknowledging each case and writing code that will work in either. At the same time, I have found that I spend a lot of time each time I review changes to CacheReader.swift thinking about the different possibilities. I feel like this change makes it easier to consider the edge cases since they are more explicitly captured in code.

I really like that we didn't need to change any tests in this PR. That makes me confident in the change.

Let's give @dfed the time to review as well to make sure he's onboard before we move ahead 👍

messages.append(try decoder.decode(T.self, from: encodedMessage))
}
// There is only one range: | `offsetInFileOfOldestMessage` -> `offsetInFileAtEndOfNewestMessage`|
if reader.offsetInFileOfOldestMessage < reader.offsetInFileAtEndOfNewestMessage {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the cache is empty I assume that reader.offsetInFileOfOldestMessage == reader.offsetInFileAtEndOfNewestMessage? I wonder if it would be best to explicitly handle the case of these two values being equal, to make this code easier to reason about.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be nice to validate my assumption that reader.offsetInFileOfOldestMessage == reader.offsetInFileAtEndOfNewestMessage when the cache is empty.

@dfed dfed Nov 8, 2022

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the cache is empty I assume that reader.offsetInFileOfOldestMessage == reader.offsetInFileAtEndOfNewestMessage

Per this comment block

// up until the current position of the writing handle – which is at the end of the newest persisted message. This algorithm implies that if
// the reading handle and the writing handle are at the same position in the file, then the file is empty. Therefore, when writing a message

when the reading handle and the writing handle point at the same position the file is empty. The reader starts out at header.offsetInFileOfOldestMessage, and the writer starts out at header.offsetInFileAtEndOfNewestMessage. And our reader.offsetInFileOfOldestMessage and reader.offsetInFileAtEndOfNewestMessage should always be set to the same values as those in the header, so your assumption is indeed true. I like the idea of explicitly handling this case.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this idea too. handle == as empty will make it more clear

Comment thread Sources/CacheAdvance/CacheAdvance.swift
///
/// - Parameters:
/// - file: The file URL indicating the desired location of the on-disk store. This file should already exist.
/// - maximumBytes: The maximum size of the cache, in bytes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
}

/// Returns the next encodable message, seeking to the beginning of the next message.
func nextEncodedMessage() throws -> Data? {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this private now?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be private, let me do in next commit.
I kept it here to make the diff more readable.

Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
}

// We know the next message is at the end of the file header. Let's seek to it.
try reader.seek(to: FileHeader.expectedEndOfHeaderInFile)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the library is safer against unknown unknown errors without this line of code.

@jianjunwoo jianjunwoo changed the title Split two ranges to read messages if offsetInFileOfOldestMessage > offsetInFileAtEndOfNewestMessage to avoid possible loop Split into two ranges to read messages if offsetInFileOfOldestMessage > offsetInFileAtEndOfNewestMessage to avoid possible loop Nov 8, 2022

@dfed dfed left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a ton of little nit suggestions, but overall I really dig this approach. I appreciate your spending the time to make this library easier to maintain!

I have also validated that the lines not covered by tests in CacheReader.swift are already not covered on main.

Comment thread Sources/CacheAdvance/CacheAdvance.swift Outdated
Comment thread Sources/CacheAdvance/CacheAdvance.swift Outdated
Comment thread Sources/CacheAdvance/CacheAdvance.swift Outdated
Comment thread Sources/CacheAdvance/CacheAdvance.swift Outdated
Comment thread Sources/CacheAdvance/CacheAdvance.swift Outdated
Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
try reader.seek(to: startOffset)
while let data = try nextEncodedMessage() {
encodedMessages.append(data)
if let endOffset = endOffset, offsetInFile >= endOffset {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When offsetInFile > endOffset, wouldn't that mean that the file is corrupted?

EDIT: yup! you covered this case a few lines down 😄

Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
}

/// Returns the next encodable message, seeking to the beginning of the next message.
func nextEncodedMessage() throws -> Data? {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that this method is no longer very smart. Our code is more declarative and less imperative now, and the new approach should be simpler to maintain 😄

Comment thread Sources/CacheAdvance/CacheReader.swift Outdated
Co-authored-by: Dan Federman <dfed@me.com>
// MARK: Private

/// Returns the next encodable message, seeking to the beginning of the next message.
private func nextEncodedMessage() throws -> Data? {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark this func as private since it is not used by other files any more.

}
}
}
if let endOffset = endOffset, offsetInFile != endOffset {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this if anymore given the if within the while loop?

@jianjunwoo jianjunwoo Nov 9, 2022

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This make no difference compared to the code before this. However, this looks more readable to handle == and > separately. @dfed

                if offsetInFile >= endOffset {
                    break
                }

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. This code is catching the offsetInFile < endOffset case. Maybe we could make that explicit here? Far from necessary, but it could make the intent more clear if I'm reading this right.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, I misunderstood your question. The if within the while loop is need to check offsetInFile == endOffset to stop at the end.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. This code is catching the offsetInFile < endOffset case. Maybe we could make that explicit here? Far from necessary, but it could make the intent more clear if I'm reading this right.

I agree it would be nice to make this more clear.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take this in a follow-up PR

@bachand bachand left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

}
} else if reader.offsetInFileOfOldestMessage == reader.offsetInFileAtEndOfNewestMessage {
// This is an empty cache.
return []

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
}
}
if let endOffset = endOffset, offsetInFile != endOffset {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. This code is catching the offsetInFile < endOffset case. Maybe we could make that explicit here? Far from necessary, but it could make the intent more clear if I'm reading this right.

I agree it would be nice to make this more clear.

@jianjunwoo

Copy link
Copy Markdown
Collaborator Author

Looks like we've all set about the details, or do we still have any other comments that I missed to discuss?

@dfed dfed merged commit 26a1c97 into dfed:main Nov 10, 2022
@dfed

dfed commented Nov 10, 2022

Copy link
Copy Markdown
Owner

That's it! Thanks again @jianjunwoo for tackling this + making future maintenance of this lib easier! Your contributions have immensely improved this library in recent weeks 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants