Skip to content

Migration waza skills to vally#15376

Merged
jeo02 merged 27 commits into
Azure:mainfrom
jeo02:migrate-skill-evals-v2
May 19, 2026
Merged

Migration waza skills to vally#15376
jeo02 merged 27 commits into
Azure:mainfrom
jeo02:migrate-skill-evals-v2

Conversation

@jeo02

@jeo02 jeo02 commented Apr 29, 2026

Copy link
Copy Markdown
Member

Summary

Migrates the skill evaluation infrastructure from the deprecated azd waza extension to the @microsoft/vally-cli
evaluation framework.

Changes

Eval Framework Migration

  • Replaced .waza.yaml with .vally.yaml project configuration
  • Converted all eval specs from waza task-based format to vally stimulus/grader format
  • Added trigger.eval.yaml files for trigger/anti-trigger coverage per skill
  • Consolidated scattered tasks/*.yaml files into single eval specs with inline stimuli
  • Added area and priority tags to all eval specs for filtering

Pipeline Changes

  • .github/workflows/skill-eval.yml — Simplified to lint-only (vally lint). Installs @microsoft/vally-cli via npm.
  • eng/pipelines/skill-eval.yml (new) — Azure DevOps pipeline for running evals. Uses AzSDK_Eval_Variable_group for
    Copilot API authentication since the copilot-sdk executor requires a user-scoped PAT (GitHub App tokens are not
    supported by the Copilot API).

SKILL.md Fixes

  • Flattened compatibility frontmatter from nested YAML objects to plain strings across all 8 skill files (required by
    vally lint valid-refs check)
  • Fixed invalid cross-skill references in sensei/SKILL.md and skill-authoring/SKILL.md

Cleanup

  • Deleted old waza artifacts: root-level eval.yaml, tasks/ directories, trigger_tests.yaml, evals/tasks/ directories
  • Removed .waza.yaml

Why Azure DevOps for evals?

The vally copilot-sdk executor uses @github/copilot-sdk which authenticates via the GitHub CLI (gh). The default
GITHUB_TOKEN in GitHub Actions is a GitHub App server-to-server token, which the Copilot API rejects. The ADO pipeline
accesses a user-scoped PAT (azuresdk-copilot-github-pat) from the AzSDK_Eval_Variable_group variable group.

Testing

  • Vally lint passes for all skills
  • Eval discovery and tag filtering verified in CI (--tag "priority=p0" --tag "area=")

@jeo02 jeo02 force-pushed the migrate-skill-evals-v2 branch from f003161 to e89ebe4 Compare April 29, 2026 21:44
@jeo02 jeo02 force-pushed the migrate-skill-evals-v2 branch from e89ebe4 to bf9ace7 Compare April 29, 2026 21:45
@jeo02 jeo02 marked this pull request as ready for review April 29, 2026 23:36
Copilot AI review requested due to automatic review settings April 29, 2026 23:36
@jeo02 jeo02 requested a review from a team as a code owner April 29, 2026 23:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates the repo’s Copilot skill evaluation setup from the deprecated azd/waza format to the @microsoft/vally-cli framework, including updated CI/ADO automation and converted eval specs.

Changes:

  • Introduces Vally project configuration (.github/skills/.vally.yaml) and converts per-skill eval specs to Vally stimuli/graders (plus new trigger.eval.yaml files).
  • Simplifies the GitHub Actions workflow to Vally lint only, and adds an Azure DevOps pipeline to run evaluations.
  • Removes legacy waza artifacts (.waza.yaml, task-based eval YAMLs, and old trigger test YAMLs) and flattens compatibility frontmatter in SKILL files.

Reviewed changes

Copilot reviewed 89 out of 89 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
eng/pipelines/skill-eval.yml Adds Azure DevOps pipeline to run Vally evals and publish results.
.github/workflows/skill-eval.yml Switches CI to Vally lint-only workflow.
.github/skills/skill-authoring/tasks/authoring-basic-001.yaml Removes legacy waza task definition.
.github/skills/skill-authoring/evals/trigger_tests.yaml Removes legacy trigger test list.
.github/skills/skill-authoring/evals/trigger.eval.yaml Adds Vally trigger/anti-trigger eval spec.
.github/skills/skill-authoring/evals/tasks/basic-trigger.yaml Removes legacy waza task.
.github/skills/skill-authoring/evals/tasks/anti-trigger.yaml Removes legacy waza task.
.github/skills/skill-authoring/evals/eval.yaml Converts main eval spec to Vally stimuli/graders.
.github/skills/skill-authoring/eval.yaml Removes legacy root eval config.
.github/skills/skill-authoring/SKILL.md Flattens compatibility frontmatter and adjusts wording.
.github/skills/sensei/tasks/sensei-basic-001.yaml Removes legacy waza task definition.
.github/skills/sensei/evals/trigger_tests.yaml Removes legacy trigger test list.
.github/skills/sensei/evals/trigger.eval.yaml Adds Vally trigger/anti-trigger eval spec.
.github/skills/sensei/evals/tasks/basic-trigger.yaml Removes legacy waza task.
.github/skills/sensei/evals/tasks/anti-trigger.yaml Removes legacy waza task.
.github/skills/sensei/evals/eval.yaml Converts main eval spec to Vally stimuli/graders.
.github/skills/sensei/eval.yaml Removes legacy root eval config.
.github/skills/sensei/SKILL.md Removes/updates related skills section (per ref lint needs).
.github/skills/markdown-token-optimizer/tasks/optimize-basic-001.yaml Removes legacy waza task definition.
.github/skills/markdown-token-optimizer/evals/trigger_tests.yaml Removes legacy trigger test list.
.github/skills/markdown-token-optimizer/evals/trigger.eval.yaml Adds Vally trigger/anti-trigger eval spec.
.github/skills/markdown-token-optimizer/evals/tasks/basic-trigger.yaml Removes legacy waza task.
.github/skills/markdown-token-optimizer/evals/tasks/anti-trigger.yaml Removes legacy waza task.
.github/skills/markdown-token-optimizer/evals/eval.yaml Converts main eval spec to Vally stimuli/graders.
.github/skills/markdown-token-optimizer/eval.yaml Removes legacy root eval config.
.github/skills/markdown-token-optimizer/SKILL.md Flattens compatibility frontmatter.
.github/skills/azure-typespec-author/SKILL.md Flattens compatibility frontmatter.
.github/skills/azsdk-common-sdk-release/tasks/release-trigger-001.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-sdk-release/tasks/release-readiness-001.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-sdk-release/tasks/release-negative-001.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-sdk-release/tasks/release-basic-001.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-sdk-release/evals/trigger_tests.yaml Removes legacy trigger test list.
.github/skills/azsdk-common-sdk-release/evals/trigger.eval.yaml Adds Vally trigger/anti-trigger eval spec.
.github/skills/azsdk-common-sdk-release/evals/tasks/basic-trigger.yaml Removes legacy waza task.
.github/skills/azsdk-common-sdk-release/evals/tasks/anti-trigger.yaml Removes legacy waza task.
.github/skills/azsdk-common-sdk-release/evals/eval.yaml Converts main eval spec to Vally stimuli/graders.
.github/skills/azsdk-common-sdk-release/eval.yaml Removes legacy root eval config.
.github/skills/azsdk-common-sdk-release/SKILL.md Flattens compatibility frontmatter.
.github/skills/azsdk-common-prepare-release-plan/tasks/should-not-trigger.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-prepare-release-plan/tasks/link-sdk-prs.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-prepare-release-plan/tasks/edge-case.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-prepare-release-plan/tasks/basic-usage.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-prepare-release-plan/evals/trigger_tests.yaml Removes legacy trigger test list.
.github/skills/azsdk-common-prepare-release-plan/evals/trigger.eval.yaml Adds Vally trigger/anti-trigger eval spec.
.github/skills/azsdk-common-prepare-release-plan/evals/tasks/basic-trigger.yaml Removes legacy waza task.
.github/skills/azsdk-common-prepare-release-plan/evals/tasks/anti-trigger.yaml Removes legacy waza task.
.github/skills/azsdk-common-prepare-release-plan/evals/eval.yaml Converts main eval spec to Vally stimuli/graders.
.github/skills/azsdk-common-prepare-release-plan/eval.yaml Removes legacy root eval config.
.github/skills/azsdk-common-prepare-release-plan/SKILL.md Flattens compatibility frontmatter.
.github/skills/azsdk-common-pipeline-troubleshooting/tasks/should-not-trigger.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-pipeline-troubleshooting/tasks/local-reproduction.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-pipeline-troubleshooting/tasks/edge-case.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-pipeline-troubleshooting/tasks/basic-usage.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-pipeline-troubleshooting/evals/trigger_tests.yaml Removes legacy trigger test list.
.github/skills/azsdk-common-pipeline-troubleshooting/evals/trigger.eval.yaml Adds Vally trigger/anti-trigger eval spec.
.github/skills/azsdk-common-pipeline-troubleshooting/evals/tasks/basic-trigger.yaml Removes legacy waza task.
.github/skills/azsdk-common-pipeline-troubleshooting/evals/tasks/anti-trigger.yaml Removes legacy waza task.
.github/skills/azsdk-common-pipeline-troubleshooting/evals/eval.yaml Converts main eval spec to Vally stimuli/graders.
.github/skills/azsdk-common-pipeline-troubleshooting/eval.yaml Removes legacy root eval config.
.github/skills/azsdk-common-pipeline-troubleshooting/SKILL.md Flattens compatibility frontmatter.
.github/skills/azsdk-common-generate-sdk-locally/tasks/update-version.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/update-metadata.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/update-changelog.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/rename-client.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/hide-operation.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/full-workflow.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/edge-case.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/customization-workflow.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/breaking-changes.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/basic-usage.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/anti-trigger.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/tasks/analyzer-errors.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-generate-sdk-locally/evals/eval.yaml Adds consolidated Vally eval spec for generate-sdk-locally skill.
.github/skills/azsdk-common-generate-sdk-locally/eval.yaml Removes legacy root eval config.
.github/skills/azsdk-common-generate-sdk-locally/SKILL.md Flattens compatibility frontmatter.
.github/skills/azsdk-common-apiview-feedback-resolution/tasks/should-not-trigger.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-apiview-feedback-resolution/tasks/no-feedback.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-apiview-feedback-resolution/tasks/edge-case.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-apiview-feedback-resolution/tasks/basic-usage.yaml Removes legacy waza task definition.
.github/skills/azsdk-common-apiview-feedback-resolution/evals/trigger_tests.yaml Removes legacy trigger test list.
.github/skills/azsdk-common-apiview-feedback-resolution/evals/trigger.eval.yaml Adds Vally trigger/anti-trigger eval spec.
.github/skills/azsdk-common-apiview-feedback-resolution/evals/tasks/url-trigger.yaml Removes legacy waza task.
.github/skills/azsdk-common-apiview-feedback-resolution/evals/tasks/basic-trigger.yaml Removes legacy waza task.
.github/skills/azsdk-common-apiview-feedback-resolution/evals/tasks/anti-trigger.yaml Removes legacy waza task.
.github/skills/azsdk-common-apiview-feedback-resolution/evals/eval.yaml Converts main eval spec to Vally stimuli/graders.
.github/skills/azsdk-common-apiview-feedback-resolution/eval.yaml Removes legacy root eval config.
.github/skills/azsdk-common-apiview-feedback-resolution/SKILL.md Flattens compatibility frontmatter.
.github/skills/.waza.yaml Removes deprecated waza project config.
.github/skills/.vally.yaml Adds Vally project config (paths, environments, suites).

Comment thread .github/skills/azsdk-common-apiview-feedback-resolution/evals/eval.yaml Outdated
Comment thread .github/skills/.vally.yaml Outdated
Comment thread .github/workflows/skill-eval.yml
Comment thread .github/workflows/skill-eval.yml Outdated
Comment thread .github/skills/azsdk-common-sdk-release/evals/eval.yaml
Comment thread .github/skills/azsdk-common-prepare-release-plan/evals/eval.yaml

@haolingdong-msft haolingdong-msft left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jeo02 for the PR, overall looks good! add some comments on impl details

Comment thread .github/skills/.vally.yaml Outdated
Comment thread .github/skills/sensei/SKILL.md
Comment thread .github/skills/.vally.yaml Outdated
Comment thread eng/pipelines/skill-eval.yml Outdated
jeo02 added a commit to Azure/azure-sdk-for-js that referenced this pull request May 19, 2026
Sync .github/skills directory with azure-sdk-tools for PR
Azure/azure-sdk-tools#15376 See [eng/common
workflow](https://github.com/Azure/azure-sdk-tools/blob/main/eng/common/README.md#workflow)

---------

Co-authored-by: Juan Ospina <70209456+jeo02@users.noreply.github.com>
@jeo02 jeo02 merged commit 0ace94a into Azure:main May 19, 2026
5 of 8 checks passed
@jeo02 jeo02 mentioned this pull request May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants