Skip to content

Staging pipeline errors categorization  #12821

@epananth

Description

@epananth

Consolidation of Staging pipeline errors from past 6 months +Matt Mitchell + Rahul and Lee comments :

Error What type of error Link to error (if any) Should we fix this? Thoughts on fixing this? Actions to take Ownership Need automation?
Common errors This is not something we need to fix, but I added this list just to understand Release pipeline maintenance
Which BarId to pick for release? User error Do we need a separate mechanism than what we have today? Should we automate this? mmitche This one is hard to automate. It requires knowing what the desired state of the build is. Is there another build coming? Are all fixes in, etc.? I don't think this is a particularly hard thing to do today. It just requires understanding the product. Low pri automating this one, so no actions needed as of now Coherency team Manual
Linux signing- it is driven off file list. Historically some files could have been skipped, but that is no longer the case so, scripts have to be improved to about what files it expects vs. what it finds mmitche I think we should fix this one, at least investigate making the input list strict. If files are missing, you need to fix the list. dotnet/arcade-services#2399 Infra team Automate
Channel with ID 'xxx' is not configured to be published to User error Link Requires arcade update, this is not an error, but its good to know what has to be done when this error occurs. mmitche This one gets fixed by regular arcade updates to the repo. No action needed Infra team Manual
Object reference set to null User error - Secret rotation error? Link mmitche We've never gotten to the bottom of this one but I suspect it is some kind of transient authentication error. I'd like to see it fixed (or retried)? dotnet/arcade-services#2634 Infra team Manual
Gather drop download failure Enhancement mmitche I wish we could be stricter with these, but I think it will block shipping in some cases (vstest e.g.). On the plus side, we don't see this that often in .NET 6 or 7. No action needed coherency team Manual
Something missing/ incorrect in manifest.json or config.json mmitche These are usually indications of real errors. I think they cause an outright failure today in the first stage (Create Release Config). Those errors are real. This can be part of the error manifest, like listing all the errors in one place Infra team Automation
Publishing Errors
Publishing didn't expect that a random channel (General testing) would be used for release User error Link Do we need a mechanism to say, that this build should not use General testing channel? mmitche That's a good idea. What if we scan the channels used in the manifest.json when creating the release config, and verify that they are all "approved"? https://github.com/dotnet/arcade/issues/13021 Infra team Automate
Switching from .NET 7 to .NET 8, there are scripts that needs update, this has to be documented if we need this in future (Meaning every major release) Maintenance mmitche I think this is a combo of two things: 1 - documentation. 2 - Putting errors and exceptions in the scripting when we detect such situations. Be paranoid. And then point to the instructions in the error. @dkurepa has created a wiki for the this, we can update this. dotnet/arcade-services#2401 Infra team Manual
Error in publishing, which needs some improvements (Publish Assets to Staging Location) Enhancement Link mmitche This looks like a hardening issue. dotnet/arcade-services#2402 Infra team Manual
Error in publishing Enhancement Link mmitche I think this one is because some major-version changes didn't get made (generating dynamic-feeds.json). This is another case where we should be strict when we find these cases, and then document appropriately. dotnet/arcade-services#2401 Already mentioned dotnet/arcade-services#2401 Infra team Manual
Publish to VS drop Don't know Link Mostly Jaques cares about this, so for some release he cares and some he doesn't. mmitche I don't know about this one. @joeloff? dotnet/arcade-services#2403 Jack is working on this Manual
Improve erroring in Publishing , Gather drop and Required validation as these stages have to be pass most of the times maintenance
Consolidate publishing to different blob storages in one stage rbhanda only real concern is if one task fails, the entire stage will fail, blocking the whole pipeline. As of now, I dont know all the storage locations for copy and how are those consumed dotnet/arcade-services#2404 Infra team Automate
Better way to check staging pipeline passed Publishing CTI Validation assets stage, so that release pipeline Repo propogation does not fail rbhanda A validation stage which can confirm that assets exists for the reop prop to work. Manual copy of assets to public on the release day is painful since our assets are huge dotnet/arcade-services#2395 Infra team Automate
Ability to rerun failed stages, without restarting the pipeline from the beginning Lee comments #13014 Infra team Does not apply
skip some stages if possible Lee Comments dotnet/arcade-services#2390 same as above Infra team Does not apply
Tar file already exists Enhancement Link hardening required dotnet/arcade-services#2402 Infra team Automate
Publishing feeds not found. Did you add the channel to feeds.json and this script? mmitche This is another major-version switchover issue. dotnet/arcade-services#2401 Already mentioned dotnet/arcade-services#2401 - Infra team Manual
failed to publish logs Enhancement Something failed in the real step, go through the logs rbhanda Isnt a blocking issue but the log consolidation and an easier way to identify issues in the pipeline is an enhancement we should pursue Seems like a good candidate for dotnet/arcade-services#2396 Infra team Automate
Required Validation failure
General understanding of what all repo validation have to pass before we can say that required validation is good to go?
Failed required validation in source link Don't know Link mmitche Generally sourcelink issues have not been a blocker for release. That's primarily why we have two different stages, "Validation", and "Required Validation". Validation can be a judgement call, and Required Validation should pass (unless we determine that its results are wrong). Feels like no action needed here Coherency Manual

Metadata

Metadata

Assignees

Labels

area-staging-pipelineIssues related to the staging part of the .NET release infrastructure.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions