Skip to content

Adding status badges#2

Merged
scbedd merged 2 commits into
Azure:masterfrom
scbedd:status-badges
Feb 15, 2019
Merged

Adding status badges#2
scbedd merged 2 commits into
Azure:masterfrom
scbedd:status-badges

Conversation

@scbedd

@scbedd scbedd commented Feb 15, 2019

Copy link
Copy Markdown
Member

No description provided.

@scbedd

scbedd commented Feb 15, 2019

Copy link
Copy Markdown
Member Author

Why are checks not appearing here?

@scbedd scbedd merged commit e0a515b into Azure:master Feb 15, 2019
arpanlaha added a commit to arpanlaha/azure-sdk-tools that referenced this pull request Jun 14, 2019
Initial processor testing
ghost pushed a commit that referenced this pull request Nov 15, 2021
…#2259)

This is attempt #2. Failure to patch is blocking the other PR. If I can just simplify the commits and have it work, that's definitely the easiest way.
wanlwanl added a commit to albertxavier100/azure-sdk-tools that referenced this pull request Jun 6, 2025
…ccuracy-test-1

Wanl/resolve issue foud in accuracy test 1
chunyu3 pushed a commit to chunyu3/azure-sdk-tools that referenced this pull request Sep 23, 2025
remove sources and TopK parameter
helen229 added a commit that referenced this pull request Jun 9, 2026
…cache portability

- README/eval comments: evals/unit -> evals/tools, evals/scenarios -> evals/workflow-scenarios (Copilot C1/C5)

- Validate-EvalTools.ps1: default EvalPath -> evals/tools; return 1 -> exit 1 so CI fails loudly (Copilot C2/C3)

- MCP build output: dotnet build -o artifacts/mcp/{cli,mock}; pipeline switched to Release; .vally.yaml no longer hardcodes Debug/net8.0 (Praveen #1/#2)

- ensure-specs-clone.ps1 + workflow evals: repo-relative artifacts/specs-cache path instead of C:/Users/gaoh; Vally resolves it relative to the eval file so it works for all contributors + CI (Copilot C6/C7, Praveen #4)

- add-arm-resource/rename-client-property: comment clarifying 'edit' is the Copilot SDK built-in file tool, not an MCP tool (Praveen #5)
helen229 added a commit that referenced this pull request Jun 16, 2026
… Vally (#15124) (#15811)

* Scaffold Azure.Sdk.Tools.Vally tool-scenario eval suite (#15124)

Adds a new Vally eval suite under tools/azsdk-cli/Azure.Sdk.Tools.Vally/ for MCP tool / scenario evaluations, replacing the deleted Azure.Sdk.Tools.Cli.Benchmarks project (#15697).

- README documents project intent, layout, local run instructions, and how to add a new scenario.

- .vally.yaml wires the azsdk-mcp environment (stdio dotnet run against Azure.Sdk.Tools.Cli) and defines 'typespec' and 'all' suites.

- evals/check-public-repo.eval.yaml is the first ported scenario (from the deleted CheckPublicRepoScenario): verifies the agent invokes azsdk_typespec_check_project_in_public_repo for a public-repo check prompt. Lints clean via 'vally lint --eval-spec'.

- fixtures/.gitkeep reserves the per-scenario fixtures layout.

Remaining scenarios from the deleted benchmark are tracked as a checklist in the project README and in #15124.

* Port remaining 9 benchmark scenarios to Vally (#15124)

Adds eval YAMLs for every scenario that was deleted from Azure.Sdk.Tools.Cli.Benchmarks in #15697:

- check-public-repo-then-validate

- validate-typespec

- typespec-generation-step02

- get-modified-typespec-projects (stub — needs git-repo fixture / setup hook)

- add-arm-resource (stub — needs fixtures + npx tsp compile post-check)

- create-release-plan

- link-namespace-approval-issue

- get-pr-link-current-branch

- check-sdk-generation-status

Each eval uses the built-in tool-calls grader for presence checks; the original benchmark's argument/order/forbidden/optional assertions are captured in prompt text + inline TODOs (require custom graders or upstream Vally support, documented in README). Also adds release-plan/github/pipeline suites to .vally.yaml. All 10 evals pass 'vally lint --eval-spec'.

* Add rename-client-property stub eval to Vally suite (#15124)

Ports the deleted RenameClientPropertyScenario as a tool-calls-only stub. Full expected-diff grading + sparse-clone setup hook are tracked as follow-ups in the README.

* Fix tool name prefix in graders, timeout format, expand README

* Reorganize evals into scenarios/ and triggers/; port trigger evals from #15183

- Move 11 multi-step scenario evals to evals/scenarios/
- Port 9 per-tool trigger evals from jeo02/migrate-evaluations-to-vally (PR #15183) to evals/triggers/, stripped azure-sdk-mcp- prefix from graders to match bare MCP tool names
- Port Validate-EvalTools.ps1 to scripts/, retargeted at evals/triggers/ with bare-name regex
- Update .vally.yaml suites for new layout (scenarios, triggers, all)
- Update README to document the split and per-trigger-file tool coverage
- Add .gitignore for vally-results/ and results/

* update the config and use gpt-5.4 model

* add disallowed

* Vally: restructure evals into unit/integration/e2e test pyramid

Replace per-area folders (scenarios/, triggers/) with tier-based folders. Feature area moves to a YAML tag, enabling tag-filtered suites. Add composite suites (pr-gate, nightly) and area-filtered suites in .vally.yaml. Update Validate-EvalTools.ps1 to scan evals/unit for triggers-*.eval.yaml. Refresh README and Run-LiveEvals.ps1 paths.

* Vally: remove Run-LiveEvals.ps1 (local-only test wrapper)

Drop the local-only convenience wrapper and refer directly to evals/setup/ensure-specs-clone.ps1 in docs and YAML comments. Users prime the spec clone manually and invoke 'vally eval --suite e2e'.

* some docs and test e2e one

* update docs

* udpate design

* update with skill evals

* reorg based on the design

* remove the duplicates

* add new scenarios

* update the doc

* update doc

* update names

* Vally: align release-planner mock stimuli with live e2e pattern

All 5 release-planner mock stimuli now use environment.git worktree pointing at the per-user azure-rest-api-specs cache (matching the live e2e fixture), plus a structured e2e-style prompt that supplies the Contoso fixture IDs the mock handlers expect (TypeSpec project, service/product tree IDs, work-item ID 29262). Also document the --skill-dir requirement and worker-cap caveat in README, and fix one stale path in .vally.yaml comment.

* update doc

* Vally: fix MCP boot race + drop misconfigured grader (#15948)

- Launch pre-built DLLs via 'dotnet <dll>' in both .vally.yaml files instead of 'dotnet run', so N parallel workers no longer race on Roslyn's exclusive write lock for the output DLL.

- Add 'Build MCP servers' step to eng/pipelines/skill-eval.yml so the CI runner has the DLLs ready before vally starts.

- Drop the skill-invocation grader from generate-sdk-for-existing-release-plan (no preflight reasoning step required; tools-only).

- Strip 'I'm in a checkout of azure-rest-api-specs.' preamble from prompts; the worktree already provides that context.

- Remove stray '// tools skills response' artifact in live release-planner.eval.yaml.

- README: document 'dotnet build' as a prereq; rewrite workers warning.

Validated: scenarios-mock at --workers 6 -> 5/5 stimuli pass, 0 race hits, ~4 min.

* update readme for runing steps

* Vally: align mock release-planner grader with live + deterministic 'not found' lookup

The create-release-plan-and-generate-sdk mock stimulus required the agent to call
azsdk_update_sdk_details_in_release_plan, but neither the prompt nor the
azsdk-common-prepare-release-plan skill's create flow asks for it. The agent
correctly skipped the tool, and the grader flapped. The dedicated
update-sdk-details-in-release-plan stimulus already covers that tool with an
explicit prompt. Drop it from the create+generate grader so mock matches the
live release-planner-e2e contract (create / get / generate / link).

Also patch GetReleasePlanForSpecPrHandler to return a deterministic
'not found' response (ReleasePlanDetails = null). The mock previously
returned a 'plan exists' result for any spec PR, pushing the agent down
the update path instead of the create path that the stimulus exercises.
Stimuli that target an existing plan pass the work-item ID directly and
call azsdk_get_release_plan, so this is safe.

* update eval yaml

* Address PR #15811 review: fix stale paths, exit codes, build output, cache portability

- README/eval comments: evals/unit -> evals/tools, evals/scenarios -> evals/workflow-scenarios (Copilot C1/C5)

- Validate-EvalTools.ps1: default EvalPath -> evals/tools; return 1 -> exit 1 so CI fails loudly (Copilot C2/C3)

- MCP build output: dotnet build -o artifacts/mcp/{cli,mock}; pipeline switched to Release; .vally.yaml no longer hardcodes Debug/net8.0 (Praveen #1/#2)

- ensure-specs-clone.ps1 + workflow evals: repo-relative artifacts/specs-cache path instead of C:/Users/gaoh; Vally resolves it relative to the eval file so it works for all contributors + CI (Copilot C6/C7, Praveen #4)

- add-arm-resource/rename-client-property: comment clarifying 'edit' is the Copilot SDK built-in file tool, not an MCP tool (Praveen #5)

* Refactor Vally tool evals: rename triggers-* to prompt-to-tool-*, consolidate standalone single-tool evals

- Rename evals/tools/triggers-*.eval.yaml to prompt-to-tool-*.eval.yaml (Praveen review #6)

- Consolidate 7 standalone single-tool scenario evals into the matching namespace files as full-context checks (check-public-repo, check-sdk-generation-status, create-release-plan, get-modified-typespec-projects, get-pr-link-current-branch, link-namespace-approval-issue, validate-typespec)

- Keep add-arm-resource.eval.yaml standalone (produces a file edit, not a pure tool trigger)

- Switch tool evals to gpt-5.4 and add explicit 'use the available Azure SDK MCP tools' steering plus concrete grounding to bare trigger prompts so they invoke the MCP tool reliably

- Update README evals/tools section and Validate-EvalTools.ps1 to the new file names

* Remove agent-eval-strategy design spec from PR (now reviewed standalone in #15918)

* Drop flaky edit-tool assertion from add-arm-resource eval

* remove script

* Stabilize flaky tool-scenario prompts and add README command cookbook

Ground 13 previously-flaky prompts with concrete IDs/paths so they route deterministically to the intended MCP tool; make the mock check-service-label handler convention-driven (status derived from the requested serviceLabel); document common vally invocation recipes in the README.

* Fix outdated command examples in Vally README

Replace references to consolidated/non-existent eval files (create-release-plan, check-public-repo, link-namespace-approval-issue) with the real prompt-to-tool-* and workflow-scenario files; correct the default output path to ./vally-results/<timestamp>/; fix the cookbook results.jsonl parser to locate the newest timestamped run; add the missing release-planner-workflows mock scenario to the index.

* Fix invalid prompt-grader config in live release-planner eval

The prompt (LLM-judge) grader schema uses 'prompt' for the rubric text, not 'rubric'. Rename the field and add 'scoring: binary' (the rubric is pass/fail) so the spec validates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant