Skip to content

Switch to LTRData.DiscUtils fork#145

Merged
gfs merged 14 commits into
microsoft:mainfrom
Erik-White:switch-to-ltrdata-discutils
Jan 30, 2024
Merged

Switch to LTRData.DiscUtils fork#145
gfs merged 14 commits into
microsoft:mainfrom
Erik-White:switch-to-ltrdata-discutils

Conversation

@Erik-White

@Erik-White Erik-White commented Jan 20, 2024

Copy link
Copy Markdown
Contributor

Closes #144, fixes #93, fixes #59

The current fork of DiscUtils is almost abandoned at this point. The LTRData fork is much further ahead and has made some great performance improvements and bug fixes. Moving to the newer fork would help resolve some existing issues, as well as make future fixes much easier to resolve.

@Erik-White

Copy link
Copy Markdown
Contributor Author

@microsoft-github-policy-service agree

@Erik-White Erik-White marked this pull request as ready for review January 24, 2024 21:46
Comment thread RecursiveExtractor/Extractors/WimExtractor.cs
@Erik-White

Copy link
Copy Markdown
Contributor Author

@gfs I think this is ready to go now. All the tests are passing, it just needs checking over to see if you think the LTRData fork of DiscUtils is ready for use.

@gfs

gfs commented Jan 24, 2024

Copy link
Copy Markdown
Contributor

@Erik-White awesome. I'm running pipeline tests now and I'll aim to give this a thorough review tomorrow.

@gfs

gfs commented Jan 25, 2024

Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines failed to run 1 pipeline(s).

@gfs

gfs commented Jan 25, 2024

Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@gfs gfs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @Erik-White for the contribution!

@gfs gfs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon closer inspection it looks like this does not resolve #59. Need to add DMG type detection to MiniMagic and to add a DMG extractor.

@gfs

gfs commented Jan 26, 2024

Copy link
Copy Markdown
Contributor

Implemented DMG extractor and added compatible DMG file.

@gfs

gfs commented Jan 26, 2024

Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@gfs

gfs commented Jan 26, 2024

Copy link
Copy Markdown
Contributor

Test failure is related to different invalid path character definitions between net4.8 and net core. See dotnet/runtime#63383.

Will aim to determine which character is the issue and develop a new sample tomorrow.

Comment thread .github/workflows/publish-code-coverage.yml
@Erik-White

Erik-White commented Jan 26, 2024

Copy link
Copy Markdown
Contributor Author

The problem with the DMG test is due to a file entry that has an invalid character in the name: ".HFS+ Private Directory Data\r"
I can't see a difference in the invalid character list for .NET4.8 vs other versions, but there is a difference in the path handling:
https://referencesource.microsoft.com/#mscorlib/system/io/path.cs,1197
vs
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/Path.cs#L685C3-L685C103

We end up calling Path.Combine from DiscUtils and I wonder if that helper method should maybe have a conditional that excludes .NET framework i.e. #if NETSTANDARD || NETCOREAPP.

The other option is to change the file entry name in the test data, but I'm not sure what's best. Is this a scenario that needs to be tested?

@gfs

gfs commented Jan 26, 2024

Copy link
Copy Markdown
Contributor

The problem with the DMG test is due to a file entry that has an invalid character in the name: ".HFS+ Private Directory Data\r" I can't see a difference in the invalid character list for .NET4.8 vs other versions, but there is a difference in the path handling: https://referencesource.microsoft.com/#mscorlib/system/io/path.cs,1197 vs https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/Path.cs#L685C3-L685C103

We end up calling Path.Combine from DiscUtils and I wonder if that helper method should maybe have a conditional that excludes .NET framework i.e. #if NETSTANDARD || NETCOREAPP.

The other option is to change the file entry name in the test data, but I'm not sure what's best. Is this a scenario that needs to be tested?

This is an issue I've had to resolve at other times in Recursive Extractor in order to be able to use Path.Combine -

public string GetSanitizedPath(string replacement = "_") => SanitizePath(FullPath, replacement);
.

I think I see a couple options here:

  1. A patch to DiscUtils to sanitize file names for the platform before calling Path.Combine
  2. Exclude DMG support from Recursive Extractor on netstandard 2.1/net4.8
  3. Create a sample file which doesn't contain the .HFS+ Private Directory Data (I'm not positive this is possible - or desirable - I created this sample file with pretty standard hdiutil arguments on OS X from a folder containing two text files.)

Either way its likely prudent to add some extra exception handling as this exception bubbling up to the user does violate one of the design principles I try to follow for this project - that in general unparseable/malformed archives should not throw exceptions. https://github.com/microsoft/RecursiveExtractor/#exceptions

As for if this case is needed - we need a test case for DMG parsing, but this file in particular is not special. However, this file does represent a relatively naiive DMG that I imagine someone may create, so it would be nice if we could support it, and it would be nice to keep feature parity as much as possible between different runtimes.

I think the ideal solution is probably option 1, perhaps using a mechanism like the one I linked above in FileEntry.cs, however, if the combined path is needed as a key to access additional data from the image in later calls, and not just as a string used to create files on disc/as a file identifier for the user, this would not be feasible. I can look further into where/why disc utils is combining these paths today.

@gfs

gfs commented Jan 26, 2024

Copy link
Copy Markdown
Contributor

@Erik-White, got a chance to look at the discutils code you linked. I think I agree the simplest fix may be to change

#if NET461_OR_GREATER || NETSTANDARD || NETCOREAPP
        return Path.Combine(a, b);
#else

to

#if NETSTANDARD || NETCOREAPP
        return Path.Combine(a, b);
#else

@gfs

gfs commented Jan 26, 2024

Copy link
Copy Markdown
Contributor

I found the two locations with the differing lists of invalid path chars.

Net 4.8: https://referencesource.microsoft.com/#mscorlib/system/io/pathinternal.cs,32
NetCore (Windows): https://github.com/dotnet/runtime/blob/82bea3fe096185a337bc7a89e4649421215a10d3/src/libraries/System.Private.CoreLib/src/System/IO/Path.Windows.cs#L24

At a minimum it looks like at least ", > and < were removed between framework and core.

@Erik-White

Copy link
Copy Markdown
Contributor Author

At a minimum it looks like at least ", > and < were removed between framework and core.

Sorry, I should have been clearer, I meant characters that affected the failing test 😃

Changing to #if NETSTANDARD || NETCOREAPP would be nice and simple. Just need to check two things I think:

  • Does it have any unintended effects elsewhere in DiscUtils (or RecursiveExtractor)?
  • What is the implementation for NETSTANDARD? Is it the same as .NET Framework or Core?

@gfs

gfs commented Jan 26, 2024

Copy link
Copy Markdown
Contributor

At a minimum it looks like at least ", > and < were removed between framework and core.

Sorry, I should have been clearer, I meant characters that affected the failing test 😃

Changing to #if NETSTANDARD || NETCOREAPP would be nice and simple. Just need to check two things I think:

* Does it have any unintended effects elsewhere in DiscUtils (or RecursiveExtractor)?

* What is the implementation for NETSTANDARD? Is it the same as .NET Framework or Core?

Ah okay, yes, sorry missed what you meant. It also doesn't appear to me either that we have any of those characters here (perhaps the \r at the end of what you clipped above?).

I opened a PR with the change here, and validated locally by referencing the net code from recursive extractor that it seems to resolve the test issue with net4.8: LTRData/DiscUtils#4 . all the DiscUtils tests still pass without modification - but it would be hard for me to speak if it had some other unintended effect not covered by the tests. I dont think it has any negative impact for other recursive extractor paths as this should only effect net4.8 and testing indicates this resolves the issue.

I'm not totally sure on if netstandard should be included in the directive or not, I think there's a reasonable argument to only use Path.Combine on netcoreapp, but I think this is the minimal change which resolves the issue we see here.

@Erik-White

Copy link
Copy Markdown
Contributor Author

Nice! We just need DiscUtils with the latest fix and then hopefully it's all good to go

@Erik-White

Copy link
Copy Markdown
Contributor Author

Your fix is now in with the latest DiscUtils and the tests are green! Everything looks good from my side, nice work with the DmgExtractor 🥇

@gfs

gfs commented Jan 29, 2024

Copy link
Copy Markdown
Contributor

Nice. I'll give it one more look over and likely merge today.

@gfs

gfs commented Jan 29, 2024

Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@Erik-White Erik-White closed this Jan 29, 2024
@Erik-White Erik-White reopened this Jan 29, 2024
@gfs gfs merged commit 38ad5d6 into microsoft:main Jan 30, 2024
@gfs

gfs commented Jan 30, 2024

Copy link
Copy Markdown
Contributor

Thanks again @Erik-White for your contribution!

@Erik-White

Copy link
Copy Markdown
Contributor Author

Fantastic, thanks for all your work on this project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate to LTRData/DiscUtils GC thread exception when extracting corrupt wim file MultiExtractor DMG parsing

3 participants