Skip to content

Reimplement with zip and two DirWalkers#13

Closed
ghost wants to merge 3 commits intomasterfrom
unknown repository
Closed

Reimplement with zip and two DirWalkers#13
ghost wants to merge 3 commits intomasterfrom
unknown repository

Conversation

@ghost
Copy link

@ghost ghost commented Dec 27, 2017

The current implementation has the problem that every matching file is
opened and read twice. This is an alternate implementation that only checks
every pair of files once.

The current implementation has the problem that every matching file is
opened and read twice. This is an alternate implementation that only checks
every pair of files once.
@ghost ghost mentioned this pull request Dec 27, 2017
Copy link
Collaborator

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't notice these earlier.

WalkDir::new(path)
.sort_by(compare_by_file_name)
.into_iter()
.skip(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw we didn't need to skip in the original implementation. Why is this needed now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WalkDir starts iteration with the path you give it. If you take the skip out, the tests will fail when it compares 'a_base' to 'b_base'. In the old code this wasn't necessary since every comparison would start with the pointless check that ('a_base').strip_prexfix('a_base') exists in 'b_base'.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so its an optimization to avoid checking the current dir. Does this deserve a comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not for the what but the why)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an optimization. You need to skip the base dir in the comparison, otherwise two dirs will always be treated as different unless they have the same name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right; thanks for pointing that out!

src/lib.rs Outdated

if a_count != b_count {
return Ok(true);
if a.file_type() != b.file_type() || a.file_name() != b.file_name()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking just file_name. I feel I'm missing something. In the case of diffing:

  • a_base/foo/file.txt
  • a_base/bar/file.txt
    It looks like this will say there is no difference when in fact there is.

With only one file in each base, the sort won't help us spot a discrepancy. We'' then check the file name and file contents which will be the same but the directories are different.

Do we instead need to check a.strip_prefix(a_base) != b.strip_prefix(b_base)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've corrected the error. strip_prefix is unnecessary it's enough to check depth.

}

fn ensure_dir<P: AsRef<Path>>(path: P) -> Result<(), std::io::Error> {
match create_dir(path) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed to create an empty directory for the test because it's not possible to add an empty directory to git. The work around is add a ".gitkeep" file inside the empty directory but then it's no longer empty.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more so thinking higher level (with the test data split, it was harder to see whats going on).

So it looks like you have

asc/dir1/a/b.txt
asc/dir2/a
asc/dir2/b.txt

and

desc/dir1/a.txt
desc/dir1/b
desc/dir2/b/a.txt

So this will ensure that the entries for the a and b directories are the same, so we can progress to compare the a.txt and b.txt files, ensuring that they are treated differently.

That right?

}

#[test]
fn filedepth() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing the case of dir/file.txt vs file.txt but what about dirA/file.txt vs dirB/file.txt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was never a problem. The walkers will compare dirA and dirB and the difference will be detected. I added a test for this to be sure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, been a while since I've messed enough with WalkDir. Thanks for pointing that out.

}

fn ensure_dir<P: AsRef<Path>>(path: P) -> Result<(), std::io::Error> {
match create_dir(path) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more so thinking higher level (with the test data split, it was harder to see whats going on).

So it looks like you have

asc/dir1/a/b.txt
asc/dir2/a
asc/dir2/b.txt

and

desc/dir1/a.txt
desc/dir1/b
desc/dir2/b/a.txt

So this will ensure that the entries for the a and b directories are the same, so we can progress to compare the a.txt and b.txt files, ensuring that they are treated differently.

That right?

@epage
Copy link
Collaborator

epage commented Dec 29, 2017

Now that reviewing is done for reals this time, could you squash your commits?

@ghost
Copy link
Author

ghost commented Jan 1, 2018

I squashed the commits in a new branch. The new pull request is #14.

@ghost ghost closed this Jan 1, 2018
epage added a commit to epage/dir-diff that referenced this pull request Apr 16, 2024
…ion-3.x

chore(deps): update github/codeql-action action to v3
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant