Reimplement with zip and two DirWalkers#13
Conversation
The current implementation has the problem that every matching file is opened and read twice. This is an alternate implementation that only checks every pair of files once.
epage
left a comment
There was a problem hiding this comment.
Sorry I didn't notice these earlier.
| WalkDir::new(path) | ||
| .sort_by(compare_by_file_name) | ||
| .into_iter() | ||
| .skip(1) |
There was a problem hiding this comment.
btw we didn't need to skip in the original implementation. Why is this needed now?
There was a problem hiding this comment.
WalkDir starts iteration with the path you give it. If you take the skip out, the tests will fail when it compares 'a_base' to 'b_base'. In the old code this wasn't necessary since every comparison would start with the pointless check that ('a_base').strip_prexfix('a_base') exists in 'b_base'.
There was a problem hiding this comment.
Ok, so its an optimization to avoid checking the current dir. Does this deserve a comment?
There was a problem hiding this comment.
(not for the what but the why)
There was a problem hiding this comment.
It's not an optimization. You need to skip the base dir in the comparison, otherwise two dirs will always be treated as different unless they have the same name.
There was a problem hiding this comment.
You're right; thanks for pointing that out!
src/lib.rs
Outdated
|
|
||
| if a_count != b_count { | ||
| return Ok(true); | ||
| if a.file_type() != b.file_type() || a.file_name() != b.file_name() |
There was a problem hiding this comment.
This is checking just file_name. I feel I'm missing something. In the case of diffing:
- a_base/foo/file.txt
- a_base/bar/file.txt
It looks like this will say there is no difference when in fact there is.
With only one file in each base, the sort won't help us spot a discrepancy. We'' then check the file name and file contents which will be the same but the directories are different.
Do we instead need to check a.strip_prefix(a_base) != b.strip_prefix(b_base)?
There was a problem hiding this comment.
I've corrected the error. strip_prefix is unnecessary it's enough to check depth.
| } | ||
|
|
||
| fn ensure_dir<P: AsRef<Path>>(path: P) -> Result<(), std::io::Error> { | ||
| match create_dir(path) { |
There was a problem hiding this comment.
It's needed to create an empty directory for the test because it's not possible to add an empty directory to git. The work around is add a ".gitkeep" file inside the empty directory but then it's no longer empty.
There was a problem hiding this comment.
I was more so thinking higher level (with the test data split, it was harder to see whats going on).
So it looks like you have
asc/dir1/a/b.txt
asc/dir2/a
asc/dir2/b.txt
and
desc/dir1/a.txt
desc/dir1/b
desc/dir2/b/a.txt
So this will ensure that the entries for the a and b directories are the same, so we can progress to compare the a.txt and b.txt files, ensuring that they are treated differently.
That right?
| } | ||
|
|
||
| #[test] | ||
| fn filedepth() { |
There was a problem hiding this comment.
This is testing the case of dir/file.txt vs file.txt but what about dirA/file.txt vs dirB/file.txt?
There was a problem hiding this comment.
This was never a problem. The walkers will compare dirA and dirB and the difference will be detected. I added a test for this to be sure.
There was a problem hiding this comment.
Ah, been a while since I've messed enough with WalkDir. Thanks for pointing that out.
| } | ||
|
|
||
| fn ensure_dir<P: AsRef<Path>>(path: P) -> Result<(), std::io::Error> { | ||
| match create_dir(path) { |
There was a problem hiding this comment.
I was more so thinking higher level (with the test data split, it was harder to see whats going on).
So it looks like you have
asc/dir1/a/b.txt
asc/dir2/a
asc/dir2/b.txt
and
desc/dir1/a.txt
desc/dir1/b
desc/dir2/b/a.txt
So this will ensure that the entries for the a and b directories are the same, so we can progress to compare the a.txt and b.txt files, ensuring that they are treated differently.
That right?
|
Now that reviewing is done for reals this time, could you squash your commits? |
|
I squashed the commits in a new branch. The new pull request is #14. |
…ion-3.x chore(deps): update github/codeql-action action to v3
The current implementation has the problem that every matching file is
opened and read twice. This is an alternate implementation that only checks
every pair of files once.