Remove Azure.Sdk.Tools.Cli.Benchmarks (superseded by Vally skill evals)#15697
Merged
Conversation
The benchmark project has been replaced by skill evals under .github/skills/ run via @microsoft/vally-cli. Removes: - The Azure.Sdk.Tools.Cli.Benchmarks project and all scenarios/test data - Its entry in Azure.Sdk.Tools.Cli.sln - The Run_Benchmark job in tools/azsdk-cli/ci.yml The RunBenchmarks parameter is kept as a deprecated no-op so any external pipeline definition or scheduled run still passing it does not break. Safe to delete once confirmed no caller sets it.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR removes the deprecated Azure.Sdk.Tools.Cli.Benchmarks project and its CI integration, since skill behavior measurement is now owned by the Vally-based skill evals under .github/skills/ (and run via eng/pipelines/skill-eval.yml).
Changes:
- Deleted the
tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/project (code, scenarios, validators, docs, and test data). - Removed the Benchmarks project from
tools/azsdk-cli/Azure.Sdk.Tools.Cli.sln. - Updated
tools/azsdk-cli/ci.ymlto drop the benchmark job while retaining a deprecated no-opRunBenchmarksparameter for compatibility.
Reviewed changes
Copilot reviewed 183 out of 184 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tools/azsdk-cli/ci.yml | Removes the benchmark CI job and keeps RunBenchmarks as a deprecated no-op parameter for compatibility. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.sln | Removes the Benchmarks project from the solution so builds no longer include it. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Azure.Sdk.Tools.Cli.Benchmarks.csproj | Deleted as part of removing the Benchmarks project. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/docs/running-in-containers.md | Deleted as part of removing Benchmarks documentation. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Infrastructure/ScenarioDiscovery.cs | Deleted as part of removing Benchmarks runtime/discovery infrastructure. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Infrastructure/SessionConfigHelper.cs | Deleted as part of removing Benchmarks runtime infrastructure. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Interaction/SyntheticAICustomer.cs | Deleted as part of removing Benchmarks interaction harness. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/BenchmarkDefaults.cs | Deleted as part of removing Benchmarks models/config. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/BenchmarkLog.cs | Deleted as part of removing Benchmarks models/logging. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/BenchmarkOptions.cs | Deleted as part of removing Benchmarks models/config. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/BenchmarkResult.cs | Deleted as part of removing Benchmarks models/results. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/CleanupPolicy.cs | Deleted as part of removing Benchmarks models/config. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/ExecutionConfig.cs | Deleted as part of removing Benchmarks execution configuration. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/ExecutionResult.cs | Deleted as part of removing Benchmarks execution result model. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/ExpectedToolCall.cs | Deleted as part of removing Benchmarks validation models. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/JudgmentResult.cs | Deleted as part of removing Benchmarks judgment model. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/RepoConfig.cs | Deleted as part of removing Benchmarks repo configuration model. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/TokenUsage.cs | Deleted as part of removing Benchmarks token-usage tracking. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/ToolCallRecord.cs | Deleted as part of removing Benchmarks tool-call capture. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/ValidationContext.cs | Deleted as part of removing Benchmarks validation context model. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/ValidationResult.cs | Deleted as part of removing Benchmarks validation result model. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Models/ValidationSummary.cs | Deleted as part of removing Benchmarks validation summary model. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/BenchmarkScenario.cs | Deleted as part of removing Benchmarks scenario framework. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/GitHub/GetPrLinkCurrentBranchScenario.cs | Deleted as part of removing Benchmarks scenarios. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/Pipeline/CheckSdkGenerationStatusScenario.cs | Deleted as part of removing Benchmarks scenarios. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/ReleasePlan/CreateReleasePlanScenario.cs | Deleted as part of removing Benchmarks scenarios. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/ReleasePlan/LinkNamespaceApprovalIssueScenario.cs | Deleted as part of removing Benchmarks scenarios. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/TypeSpec/CheckPublicRepoScenario.cs | Deleted as part of removing Benchmarks scenarios. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/TypeSpec/CheckPublicRepoThenValidateScenario.cs | Deleted as part of removing Benchmarks scenarios. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/TypeSpec/GetModifiedTypespecProjectsScenario.cs | Deleted as part of removing Benchmarks scenarios. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/TypeSpec/RenameClientPropertyScenario.cs | Deleted as part of removing Benchmarks scenarios. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/TypeSpec/TypespecGenerationStep02Scenario.cs | Deleted as part of removing Benchmarks scenarios. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Scenarios/TypeSpec/ValidateTypespecScenario.cs | Deleted as part of removing Benchmarks scenarios. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001001-version-spread-property/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001001-version-spread-property/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001002-version-default-value/readme.md | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001002-version-default-value/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001003-version-required-to-optional/readme.md | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001003-version-required-to-optional/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001004-version-property-decorator/readme.md | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001004-version-property-decorator/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/readme.md | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2024-10-01-preview/Operations_List_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2024-10-01-preview/Operations_List_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2024-10-01-preview/Employees_Update_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2024-10-01-preview/Employees_ListBySubscription_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2024-10-01-preview/Employees_ListBySubscription_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2024-10-01-preview/Employees_ListByResourceGroup_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2024-10-01-preview/Employees_ListByResourceGroup_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2024-10-01-preview/Employees_Get_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2024-10-01-preview/Employees_Delete_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2021-10-01/Operations_List_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2021-10-01/Operations_List_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2021-10-01/Employees_Update_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2021-10-01/Employees_ListBySubscription_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2021-10-01/Employees_ListBySubscription_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2021-10-01/Employees_ListByResourceGroup_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2021-10-01/Employees_ListByResourceGroup_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2021-10-01/Employees_Get_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001005-version-add-preview-after-preview/examples/2021-10-01/Employees_Delete_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/readme.md | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2024-10-01/Operations_List_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2024-10-01/Operations_List_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2024-10-01/Employees_Update_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2024-10-01/Employees_ListBySubscription_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2024-10-01/Employees_ListBySubscription_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2024-10-01/Employees_ListByResourceGroup_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2024-10-01/Employees_ListByResourceGroup_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2024-10-01/Employees_Get_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2024-10-01/Employees_Delete_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2021-10-01/Operations_List_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2021-10-01/Operations_List_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2021-10-01/Employees_Update_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2021-10-01/Employees_ListBySubscription_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2021-10-01/Employees_ListBySubscription_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2021-10-01/Employees_ListByResourceGroup_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2021-10-01/Employees_ListByResourceGroup_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2021-10-01/Employees_Get_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001006-version-add-preview-after-stable/examples/2021-10-01/Employees_Delete_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/readme.md | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2024-10-01-preview/Operations_List_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2024-10-01-preview/Operations_List_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2024-10-01-preview/Employees_Update_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2024-10-01-preview/Employees_ListBySubscription_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2024-10-01-preview/Employees_ListBySubscription_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2024-10-01-preview/Employees_ListByResourceGroup_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2024-10-01-preview/Employees_ListByResourceGroup_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2024-10-01-preview/Employees_Get_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2024-10-01-preview/Employees_Delete_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2021-10-01/Operations_List_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2021-10-01/Operations_List_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2021-10-01/Employees_Update_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2021-10-01/Employees_ListBySubscription_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2021-10-01/Employees_ListBySubscription_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2021-10-01/Employees_ListByResourceGroup_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2021-10-01/Employees_ListByResourceGroup_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2021-10-01/Employees_Get_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001007-version-add-stable-after-preview/examples/2021-10-01/Employees_Delete_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/readme.md | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2024-10-01/Operations_List_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2024-10-01/Operations_List_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2024-10-01/Employees_Update_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2024-10-01/Employees_ListBySubscription_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2024-10-01/Employees_ListBySubscription_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2024-10-01/Employees_ListByResourceGroup_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2024-10-01/Employees_ListByResourceGroup_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2024-10-01/Employees_Get_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2024-10-01/Employees_Delete_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2021-10-01/Operations_List_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2021-10-01/Operations_List_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2021-10-01/Employees_Update_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2021-10-01/Employees_ListBySubscription_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2021-10-01/Employees_ListBySubscription_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2021-10-01/Employees_ListByResourceGroup_MinimumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2021-10-01/Employees_ListByResourceGroup_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2021-10-01/Employees_Get_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/001008-version-add-stable-after-stable/examples/2021-10-01/Employees_Delete_MaximumSet_Gen.json | Deleted as part of removing Benchmarks generated example data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/002003-ARM-define-full-update-operation/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/002004-ARM-define-extension-resource/badgeAssignment.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/002005-ARM-define-the-resource/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/003002-arm-action-lro/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/004001-decorate-mgmt-resource-name-parameter/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/004002-decorate-length-constrains-on-array-item/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/arm-action-sync-operation/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/arm-action-sync-operation/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/arm-add-patch-operation-to-resource/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/arm-add-patch-operation-to-resource/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-add-preview-after-preview/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-add-preview-after-preview/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-model-property-removed/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-model-property-removed/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-model-property-renamed/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-model-property-renamed/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-model-property-required/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-model-property-required/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-model-property-type-changed/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-model-property-type-changed/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-operation-return-type-changed/main.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/TestData/TypeSpec/version-operation-return-type-changed/employee.tsp | Deleted as part of removing Benchmarks TypeSpec test data. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Validation/IValidator.cs | Deleted as part of removing Benchmarks validation framework. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Validation/ValidatorRunner.cs | Deleted as part of removing Benchmarks validation framework. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Validation/Validators/ContainsValidator.cs | Deleted as part of removing Benchmarks validators. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Validation/Validators/FileExistsValidator.cs | Deleted as part of removing Benchmarks validators. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Validation/Validators/InteractionValidator.cs | Deleted as part of removing Benchmarks validators. |
| tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/Validation/Validators/ToolAndSkillTriggerValidator.cs | Deleted as part of removing Benchmarks validators. |
Member
|
Looks good, I'd say add @haolingdong-msft and @chunyu3 as reviewers just to make sure they migrated everything over to vally. |
jeo02
approved these changes
May 22, 2026
chunyu3
approved these changes
May 27, 2026
haolingdong-msft
left a comment
Member
There was a problem hiding this comment.
Thanks @helen229 for cleaning up. LGTM
This was referenced May 27, 2026
helen229
added a commit
that referenced
this pull request
Jun 16, 2026
… Vally (#15124) (#15811) * Scaffold Azure.Sdk.Tools.Vally tool-scenario eval suite (#15124) Adds a new Vally eval suite under tools/azsdk-cli/Azure.Sdk.Tools.Vally/ for MCP tool / scenario evaluations, replacing the deleted Azure.Sdk.Tools.Cli.Benchmarks project (#15697). - README documents project intent, layout, local run instructions, and how to add a new scenario. - .vally.yaml wires the azsdk-mcp environment (stdio dotnet run against Azure.Sdk.Tools.Cli) and defines 'typespec' and 'all' suites. - evals/check-public-repo.eval.yaml is the first ported scenario (from the deleted CheckPublicRepoScenario): verifies the agent invokes azsdk_typespec_check_project_in_public_repo for a public-repo check prompt. Lints clean via 'vally lint --eval-spec'. - fixtures/.gitkeep reserves the per-scenario fixtures layout. Remaining scenarios from the deleted benchmark are tracked as a checklist in the project README and in #15124. * Port remaining 9 benchmark scenarios to Vally (#15124) Adds eval YAMLs for every scenario that was deleted from Azure.Sdk.Tools.Cli.Benchmarks in #15697: - check-public-repo-then-validate - validate-typespec - typespec-generation-step02 - get-modified-typespec-projects (stub — needs git-repo fixture / setup hook) - add-arm-resource (stub — needs fixtures + npx tsp compile post-check) - create-release-plan - link-namespace-approval-issue - get-pr-link-current-branch - check-sdk-generation-status Each eval uses the built-in tool-calls grader for presence checks; the original benchmark's argument/order/forbidden/optional assertions are captured in prompt text + inline TODOs (require custom graders or upstream Vally support, documented in README). Also adds release-plan/github/pipeline suites to .vally.yaml. All 10 evals pass 'vally lint --eval-spec'. * Add rename-client-property stub eval to Vally suite (#15124) Ports the deleted RenameClientPropertyScenario as a tool-calls-only stub. Full expected-diff grading + sparse-clone setup hook are tracked as follow-ups in the README. * Fix tool name prefix in graders, timeout format, expand README * Reorganize evals into scenarios/ and triggers/; port trigger evals from #15183 - Move 11 multi-step scenario evals to evals/scenarios/ - Port 9 per-tool trigger evals from jeo02/migrate-evaluations-to-vally (PR #15183) to evals/triggers/, stripped azure-sdk-mcp- prefix from graders to match bare MCP tool names - Port Validate-EvalTools.ps1 to scripts/, retargeted at evals/triggers/ with bare-name regex - Update .vally.yaml suites for new layout (scenarios, triggers, all) - Update README to document the split and per-trigger-file tool coverage - Add .gitignore for vally-results/ and results/ * update the config and use gpt-5.4 model * add disallowed * Vally: restructure evals into unit/integration/e2e test pyramid Replace per-area folders (scenarios/, triggers/) with tier-based folders. Feature area moves to a YAML tag, enabling tag-filtered suites. Add composite suites (pr-gate, nightly) and area-filtered suites in .vally.yaml. Update Validate-EvalTools.ps1 to scan evals/unit for triggers-*.eval.yaml. Refresh README and Run-LiveEvals.ps1 paths. * Vally: remove Run-LiveEvals.ps1 (local-only test wrapper) Drop the local-only convenience wrapper and refer directly to evals/setup/ensure-specs-clone.ps1 in docs and YAML comments. Users prime the spec clone manually and invoke 'vally eval --suite e2e'. * some docs and test e2e one * update docs * udpate design * update with skill evals * reorg based on the design * remove the duplicates * add new scenarios * update the doc * update doc * update names * Vally: align release-planner mock stimuli with live e2e pattern All 5 release-planner mock stimuli now use environment.git worktree pointing at the per-user azure-rest-api-specs cache (matching the live e2e fixture), plus a structured e2e-style prompt that supplies the Contoso fixture IDs the mock handlers expect (TypeSpec project, service/product tree IDs, work-item ID 29262). Also document the --skill-dir requirement and worker-cap caveat in README, and fix one stale path in .vally.yaml comment. * update doc * Vally: fix MCP boot race + drop misconfigured grader (#15948) - Launch pre-built DLLs via 'dotnet <dll>' in both .vally.yaml files instead of 'dotnet run', so N parallel workers no longer race on Roslyn's exclusive write lock for the output DLL. - Add 'Build MCP servers' step to eng/pipelines/skill-eval.yml so the CI runner has the DLLs ready before vally starts. - Drop the skill-invocation grader from generate-sdk-for-existing-release-plan (no preflight reasoning step required; tools-only). - Strip 'I'm in a checkout of azure-rest-api-specs.' preamble from prompts; the worktree already provides that context. - Remove stray '// tools skills response' artifact in live release-planner.eval.yaml. - README: document 'dotnet build' as a prereq; rewrite workers warning. Validated: scenarios-mock at --workers 6 -> 5/5 stimuli pass, 0 race hits, ~4 min. * update readme for runing steps * Vally: align mock release-planner grader with live + deterministic 'not found' lookup The create-release-plan-and-generate-sdk mock stimulus required the agent to call azsdk_update_sdk_details_in_release_plan, but neither the prompt nor the azsdk-common-prepare-release-plan skill's create flow asks for it. The agent correctly skipped the tool, and the grader flapped. The dedicated update-sdk-details-in-release-plan stimulus already covers that tool with an explicit prompt. Drop it from the create+generate grader so mock matches the live release-planner-e2e contract (create / get / generate / link). Also patch GetReleasePlanForSpecPrHandler to return a deterministic 'not found' response (ReleasePlanDetails = null). The mock previously returned a 'plan exists' result for any spec PR, pushing the agent down the update path instead of the create path that the stimulus exercises. Stimuli that target an existing plan pass the work-item ID directly and call azsdk_get_release_plan, so this is safe. * update eval yaml * Address PR #15811 review: fix stale paths, exit codes, build output, cache portability - README/eval comments: evals/unit -> evals/tools, evals/scenarios -> evals/workflow-scenarios (Copilot C1/C5) - Validate-EvalTools.ps1: default EvalPath -> evals/tools; return 1 -> exit 1 so CI fails loudly (Copilot C2/C3) - MCP build output: dotnet build -o artifacts/mcp/{cli,mock}; pipeline switched to Release; .vally.yaml no longer hardcodes Debug/net8.0 (Praveen #1/#2) - ensure-specs-clone.ps1 + workflow evals: repo-relative artifacts/specs-cache path instead of C:/Users/gaoh; Vally resolves it relative to the eval file so it works for all contributors + CI (Copilot C6/C7, Praveen #4) - add-arm-resource/rename-client-property: comment clarifying 'edit' is the Copilot SDK built-in file tool, not an MCP tool (Praveen #5) * Refactor Vally tool evals: rename triggers-* to prompt-to-tool-*, consolidate standalone single-tool evals - Rename evals/tools/triggers-*.eval.yaml to prompt-to-tool-*.eval.yaml (Praveen review #6) - Consolidate 7 standalone single-tool scenario evals into the matching namespace files as full-context checks (check-public-repo, check-sdk-generation-status, create-release-plan, get-modified-typespec-projects, get-pr-link-current-branch, link-namespace-approval-issue, validate-typespec) - Keep add-arm-resource.eval.yaml standalone (produces a file edit, not a pure tool trigger) - Switch tool evals to gpt-5.4 and add explicit 'use the available Azure SDK MCP tools' steering plus concrete grounding to bare trigger prompts so they invoke the MCP tool reliably - Update README evals/tools section and Validate-EvalTools.ps1 to the new file names * Remove agent-eval-strategy design spec from PR (now reviewed standalone in #15918) * Drop flaky edit-tool assertion from add-arm-resource eval * remove script * Stabilize flaky tool-scenario prompts and add README command cookbook Ground 13 previously-flaky prompts with concrete IDs/paths so they route deterministically to the intended MCP tool; make the mock check-service-label handler convention-driven (status derived from the requested serviceLabel); document common vally invocation recipes in the README. * Fix outdated command examples in Vally README Replace references to consolidated/non-existent eval files (create-release-plan, check-public-repo, link-namespace-approval-issue) with the real prompt-to-tool-* and workflow-scenario files; correct the default output path to ./vally-results/<timestamp>/; fix the cookbook results.jsonl parser to locate the newest timestamped run; add the missing release-planner-workflows mock scenario to the index. * Fix invalid prompt-grader config in live release-planner eval The prompt (LLM-judge) grader schema uses 'prompt' for the rubric text, not 'rubric'. Rename the field and add 'scoring: binary' (the rubric is pass/fail) so the spec validates.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Removes the
Azure.Sdk.Tools.Cli.Benchmarksproject and its CI job. Thebenchmark project has been superseded by the Vally-based skill evals living
under
.github/skills/(merged in #15376, the"Migration waza skills to vally" PR).
Why
@microsoft/vally-cliare now the canonical way wemeasure skill/tool behavior — see
eng/pipelines/skill-eval.yml.Run_BenchmarkCI job is currently disabled (only runs whenparameters.RunBenchmarks=trueorBuild.Reason=Schedule).confusion about which framework owns coverage.
Changes
tools/azsdk-cli/Azure.Sdk.Tools.Cli.Benchmarks/project (182 files, scenarios, validators, test data, docs)
tools/azsdk-cli/Azure.Sdk.Tools.Cli.sln(via
dotnet sln remove)Run_Benchmarkjob and theAuthoringSpecRepoparameterfrom
tools/azsdk-cli/ci.ymlRunBenchmarksparameter as a deprecated no-op so anyexternal pipeline definition or scheduled run still passing
RunBenchmarks: truedoes not break. Safe to delete in a follow-up oncewe've confirmed no caller sets it.
Safety check
git grep "Cli.Benchmarks"across the rest of the repo.cs/.csproj/.sln/.ymldotnet build Azure.Sdk.Tools.Cli.slnAzSDK_Eval_Variable_group,azuresdk-copilot-github-pateng/pipelines/skill-eval.yml— left aloneazuresdkqabot-devservice connection,qa-bot-serviceGo binarytools/sdk-ai-bots/— left aloneResidual risk
Only one path could break: an ADO pipeline definition that pins
RunBenchmarks: trueoutside the repo. The no-op parameter above absorbsthat case — the pipeline run will still succeed, just without running the
(now-deleted) benchmark job.