Fix possible deadlock in UnixCertificateManager and MacOSCertificateManager#66727
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a possible Unix development-certificate startup hang by replacing a redirected certutil process wait with Process.RunAndCaptureText, allowing stdout/stderr to be consumed while waiting.
Changes:
- Updates the NSS database certificate lookup path to use
Process.RunAndCaptureText. - Preserves existing exit-code-based success behavior for the lookup.
9991cd8 to
cf3abff
Compare
56470e3 to
5326fb3
Compare
adamsitnik
left a comment
There was a problem hiding this comment.
I really like the fact that you have found other places that could run into the same bug and fixed them @Youssef1313 🥇
I also love the fact that you have tried the new APIs! But please keep in mind that in case you need to backport these changes to older versions, they won't be available.
PTAL at my comments, thanks!
| } | ||
| using (var process = Process.Start(MacOSTrustCertificateCommandLine, MacOSTrustCertificateCommandLineArguments + tmpFile)) | ||
|
|
||
| var exitStatus = Process.Run(new ProcessStartInfo(MacOSTrustCertificateCommandLine, MacOSTrustCertificateCommandLineArguments + tmpFile)); |
| return checkTrustProcess.ExitCode == 0 ? TrustLevel.Full : TrustLevel.None; | ||
| }; | ||
|
|
||
| var checkTrustProcessOutput = Process.RunAndCaptureText(checkTrustProcessStartInfo); |
There was a problem hiding this comment.
For this particular use case the best solution is to redirect standard output and error to null file.
What we get:
- no output is being printed
- no extra work is being performed to capture the text just to ignore it
This is how it could look like (pseudocode, I've not compiled it)
using SafeFileHandle nullHandle = File.OpenNullHandle();
var checkTrustProcessStartInfo = new ProcessStartInfo(
MacOSVerifyCertificateCommandLine,
string.Format(CultureInfo.InvariantCulture, MacOSVerifyCertificateCommandLineArgumentsFormat, tmpFile))
{
// Do this to avoid showing output to the console when the cert is not trusted. It is trivial to export
// the cert and replicate the command to see details.
StandardOutputHandle = nullHandle,
StandardErrorHandle= nullHandle
});
var checkTrustProcessOutput = Process.Run(checkTrustProcessStartInfo);There was a problem hiding this comment.
I've created dotnet/runtime#128453 in order to make it one-liner
There was a problem hiding this comment.
Is it different from StartDetached?
There was a problem hiding this comment.
StartDetached
Yes, start detached focuses on ensuring the process can outlive the parent (it creates new session on Unix etc) and in addition to that it redirects to null
|
|
||
| {output}"); | ||
| } | ||
| {processOutput.StandardOutput}{processOutput.StandardError}"); |
There was a problem hiding this comment.
This problem has existed before this PR, I just want to mention it: joining standard output and error like this may confuse the users when it produced invalid order.
Imagine that the child process produces following output:
OUT: A
ERR: B
OUT: C
The combined output will be:
A
C
B
In the past I was considering introducing a method that would redirect std out and err to the same pipe and producing a single string (and preserving original ordering). I think it would be very useful in such scenarios.
There was a problem hiding this comment.
Yup. As it's a pre-existing different issue, I would say it's out of scope for this PR for now.
| using var process = Process.Start(startInfo)!; | ||
| process.WaitForExit(); | ||
| return process.ExitCode == 0; | ||
| return Process.RunAndCaptureText(startInfo).ExitStatus.ExitCode == 0; |
There was a problem hiding this comment.
Similar to above, it would be better to redirect to null file and then just .Run
| using var process = Process.Start(startInfo)!; | ||
| process.WaitForExit(); | ||
| return process.ExitCode == 0; | ||
| return Process.RunAndCaptureText(startInfo).ExitStatus.ExitCode == 0; |
There was a problem hiding this comment.
Similar to above, it would be better to redirect to null file and then just .Run
| using var process = Process.Start(startInfo)!; | ||
| process.WaitForExit(); | ||
| return process.ExitCode == 0; | ||
| return Process.RunAndCaptureText(startInfo).ExitStatus.ExitCode == 0; |
There was a problem hiding this comment.
Similar to above, it would be better to redirect to null file and then just .Run
adamsitnik
left a comment
There was a problem hiding this comment.
LGTM (assuming the tests are going to pass), thank you for addressing my feedback @Youssef1313 !
| RedirectStandardOutput = true | ||
| }); | ||
| RedirectStandardOutput = true, | ||
| RedirectStandardError = true, |
There was a problem hiding this comment.
This changes behavior beyond just fixing the deadlock. Previously stderr from security find-certificate was inherited and visible on the user's console; now it's captured into findCertificateProcessOutput.StandardError and never read. In practice security find-certificate writes useful diagnostics there — e.g. SecKeychainSearchCopyNext: The specified item could not be found in the keychain., "keychain does not exist", or User interaction is not allowed. in locked / SSH / CI sessions — which have historically helped diagnose dev-certs trust issues.
Could we either (a) route stderr to File.OpenNullHandle() like the other call sites that don't care about it, to make the discard explicit, or (b) log findCertificateProcessOutput.StandardError via the existing Log.* event source when it's non-empty? Option (b) preserves the diagnostic signal even when EventSource listeners are attached. Not a blocker for the deadlock fix itself.
There was a problem hiding this comment.
Thanks for catching this. I think it might be safer to just revert to the old code. In this specific callsite, as the original code was reading stdout and not redirecting stderr at all, that path shouldn't have deadlock.
The paths that would have deadlocks are either not reading the output at all, or redirecting both stdout and stderr and reading them sequentially.
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
…acOS VSD hang
Merge-downgrade of eng/Version.Details.{xml,props} + global.json to dotnet/dotnet
codeflow commit 8d5666c (keeps arcade/Wix + main-only deps at main, no dep removal).
Reverts the macOS PR-check Helix queue (dotnet#63531) back to OSX.15.Amd64.Open so the
hang-prone queue is exercised. Rolls the 2 CertificateGeneration files back to the
target commit to stay coherent with the older runtime (pre-dotnet#66727 Process.Run API).
Draft probe PR for runtime mac-hang bisection. Not for merge.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…acOS VSD hang
Merge-downgrade of eng/Version.Details.{xml,props} + global.json to dotnet/dotnet
codeflow commit 8a68740 (keeps arcade/Wix + main-only deps at main, no dep removal).
Reverts the macOS PR-check Helix queue (dotnet#63531) back to OSX.15.Amd64.Open so the
hang-prone queue is exercised. Rolls the 2 CertificateGeneration files back to the
target commit to stay coherent with the older runtime (pre-dotnet#66727 Process.Run API).
Draft probe PR for runtime mac-hang bisection. Not for merge.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…acOS VSD hang
Merge-downgrade of eng/Version.Details.{xml,props} + global.json to dotnet/dotnet
codeflow commit 2f91a9d (keeps arcade/Wix + main-only deps at main, no dep removal).
Reverts the macOS PR-check Helix queue (dotnet#63531) back to OSX.15.Amd64.Open so the
hang-prone queue is exercised. Rolls the 2 CertificateGeneration files back to the
target commit to stay coherent with the older runtime (pre-dotnet#66727 Process.Run API).
Draft probe PR for runtime mac-hang bisection. Not for merge.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…acOS VSD hang
Merge-downgrade of eng/Version.Details.{xml,props} + global.json to dotnet/dotnet
codeflow commit 8d5666c (keeps arcade/Wix + main-only deps at main, no dep removal).
Reverts the macOS PR-check Helix queue (dotnet#63531) back to OSX.15.Amd64.Open so the
hang-prone queue is exercised. Rolls the 2 CertificateGeneration files back to the
target commit to stay coherent with the older runtime (pre-dotnet#66727 Process.Run API).
Draft probe PR for runtime mac-hang bisection. Not for merge.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…acOS VSD hang
Merge-downgrade of eng/Version.Details.{xml,props} + global.json to dotnet/dotnet
codeflow commit 8a68740 (keeps arcade/Wix + main-only deps at main, no dep removal).
Reverts the macOS PR-check Helix queue (dotnet#63531) back to OSX.15.Amd64.Open so the
hang-prone queue is exercised. Rolls the 2 CertificateGeneration files back to the
target commit to stay coherent with the older runtime (pre-dotnet#66727 Process.Run API).
Draft probe PR for runtime mac-hang bisection. Not for merge.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…acOS VSD hang
Merge-downgrade of eng/Version.Details.{xml,props} + global.json to dotnet/dotnet
codeflow commit 2f91a9d (keeps arcade/Wix + main-only deps at main, no dep removal).
Reverts the macOS PR-check Helix queue (dotnet#63531) back to OSX.15.Amd64.Open so the
hang-prone queue is exercised. Rolls the 2 CertificateGeneration files back to the
target commit to stay coherent with the older runtime (pre-dotnet#66727 Process.Run API).
Draft probe PR for runtime mac-hang bisection. Not for merge.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…acOS VSD hang
Merge-downgrade of eng/Version.Details.{xml,props} + global.json to dotnet/dotnet
codeflow commit 8d5666c (keeps arcade/Wix + main-only deps at main, no dep removal).
Adds the target-era darc-pub-dotnet-extensions feed to NuGet.config so rolled-back
package versions (e.g. Microsoft.Extensions.Caching.Hybrid 10.4.1) resolve.
Reverts the macOS PR-check Helix queue (dotnet#63531) back to OSX.15.Amd64.Open.
Targeted API-break fixes coherent with the older runtime: rolls the 2
CertificateGeneration files back to the target (pre-dotnet#66727 Process.Run) and removes
the JsonTypeInfoKind.Union block (pre-dotnet#67001) in OpenApi.
Draft probe PR for runtime mac-hang bisection. Not for merge.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…acOS VSD hang
Merge-downgrade of eng/Version.Details.{xml,props} + global.json to dotnet/dotnet
codeflow commit 8a68740 (keeps arcade/Wix + main-only deps at main, no dep removal).
Reverts the macOS PR-check Helix queue (dotnet#63531) back to OSX.15.Amd64.Open so the
hang-prone queue is exercised. Rolls the 2 CertificateGeneration files back to the
target commit to stay coherent with the older runtime (pre-dotnet#66727 Process.Run API).
Draft probe PR for runtime mac-hang bisection. Not for merge.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…acOS VSD hang
Merge-downgrade of eng/Version.Details.{xml,props} + global.json to dotnet/dotnet
codeflow commit 2f91a9d (keeps arcade/Wix + main-only deps at main, no dep removal).
Reverts the macOS PR-check Helix queue (dotnet#63531) back to OSX.15.Amd64.Open so the
hang-prone queue is exercised. Rolls the 2 CertificateGeneration files back to the
target commit to stay coherent with the older runtime (pre-dotnet#66727 Process.Run API).
Draft probe PR for runtime mac-hang bisection. Not for merge.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes #65518
When stdout/stderr are redirected, the child process writes to buffer (typically 4kb, at least on Windows), and the parent process is expected to be reading the stdout/stderr. If the parent process didn't read stdout/stderr, then the child process will be blocked forever on any
Console.Write*call when the buffer is full (4kb of data is already written).The child process can only get unblocked when the parent process consumes the output, but the parent process never attempts to read the output.
RunAndCaptureText is a new API in .NET 11 that attempts to resolve that. See also the very recent blogpost about this: https://devblogs.microsoft.com/dotnet/process-api-improvements-in-dotnet-11/
@adamsitnik I'm less familiar with the new APIs and haven't used them much yet. I was hesitating between
RunAndCaptureText(keeping the current behavior of redirecting stdout/stderr which is likely done to avoid the avoid from showing on the running terminal) versus usingProcessStartInfo.StartDetachedwhich prevents the handle inheritance altogether and uses null/nop handles instead (which is probably faster but more deviating from the original code).