Add A/B experiment for issue-arborist prompt_style (#30015)#30016
Conversation
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/8d84e107-4daa-480c-b595-fde7c844eddf Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
|
||
| --- | ||
|
|
||
| {{#if __GH_AW_EXPERIMENTS__PROMPT_STYLE___detailed}} |
There was a problem hiding this comment.
@copilot this conditions is pre expanded and should be experiments... instead. Update compiler to raise an error on this kind of syntactic errors.
There was a problem hiding this comment.
Fixed in 85d4daa — changed the condition to {{#if experiments.prompt_style == "detailed"}} (the experiments.<name> dot-notation form).
Also added validateNoPreExpandedExperimentPlaceholders() in template_validation.go which scans all {{#if ...}} condition expressions and raises an error if any __GH_AW_EXPERIMENTS_*__ placeholder is found, with a hint to use experiments.<name> instead. Wired into the compiler in compiler_orchestrator_frontmatter.go.
…tion for pre-expanded placeholders Agent-Logs-Url: https://github.com/github/gh-aw/sessions/86f11b7c-590c-4f98-b723-e57b6871ff4f Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
🧪 Test Quality Sentinel ReportTest Quality Score: 100/100✅ Excellent test quality
Test Classification Details
Analysis
Design invariant enforced: The compiler must reject workflow markdown files that contain pre-expanded internal experiment placeholder tokens (double-underscore form), directing authors to use the correct Build tag: ✅ No mock libraries used — real compiler instantiation via Error assertions include the test case name for context ( Inflation ratio: 85 test lines / 43 production lines = 1.98:1 — just under the 2:1 threshold ✅ Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators. References: §25292452480
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). The single new table-driven test covers 3 scenarios including 2 error cases, enforces a real behavioral contract, uses no mocks, and has the required build tag.
There was a problem hiding this comment.
Pull request overview
Adds a 50/50 A/B experiment to the issue-arborist workflow to compare a concise vs detailed prompt, and introduces compiler-time validation to prevent authors from using internal pre-expanded experiment placeholders in {{#if ...}} conditions.
Changes:
- Adds
experiments.prompt_styleconfiguration toissue-arboristfrontmatter and wraps the prompt body in a template conditional to select betweendetailedandconcisevariants. - Adds
validateNoPreExpandedExperimentPlaceholders()and wires it into frontmatter parsing validation. - Regenerates the compiled lock workflow to include experiment selection/state steps and propagated
GH_AW_EXPERIMENTS_*env vars.
Show a summary per file
| File | Description |
|---|---|
| pkg/workflow/template_validation.go | Adds a new validation pass intended to reject internal __GH_AW_EXPERIMENTS_*__ placeholders in {{#if ...}} conditions. |
| pkg/workflow/compiler_template_validation_test.go | Adds compiler tests asserting placeholder-based {{#if ...}} conditions are rejected. |
| pkg/workflow/compiler_orchestrator_frontmatter.go | Invokes the new experiment-placeholder validation during frontmatter/markdown parsing. |
| .github/workflows/issue-arborist.md | Declares the prompt_style experiment and adds concise-vs-detailed prompt branching. |
| .github/workflows/issue-arborist.lock.yml | Regenerated compiled workflow including experiment state restore/pick/push and env propagation. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
pkg/workflow/template_validation.go:57
preExpandedExperimentPatternonly matches the__GH_AW_EXPERIMENTS_<NAME>__form. It will not match other placeholder-like forms that start with__GH_AW_EXPERIMENTS(e.g. the__GH_AW_EXPERIMENTS__PROMPT_STYLE___detailedstring used in the new test), so the validator won’t reject them. Consider broadening this pattern to catch any__GH_AW_EXPERIMENTS...placeholder token(s) that could appear in a condition (or explicitly remove/adjust the test case if that format is not actually produced).
// preExpandedExperimentPattern matches the internal __GH_AW_EXPERIMENTS_*__ placeholder form
// that is produced by the runtime and must never be written manually in workflow markdown.
// Authors should use the experiments.<name> form (e.g. experiments.prompt_style == "detailed").
preExpandedExperimentPattern = regexp.MustCompile(`__GH_AW_EXPERIMENTS_[A-Z0-9_]+__`)
- Files reviewed: 5/5 changed files
- Comments generated: 5
| // templateIfConditionPattern matches the condition expression inside {{#if ...}} tags. | ||
| // Captures the condition string between #if and the closing }}. | ||
| templateIfConditionPattern = regexp.MustCompile(`\{\{#if\s+(.*?)\s*\}\}`) | ||
|
|
||
| // preExpandedExperimentPattern matches the internal __GH_AW_EXPERIMENTS_*__ placeholder form | ||
| // that is produced by the runtime and must never be written manually in workflow markdown. | ||
| // Authors should use the experiments.<name> form (e.g. experiments.prompt_style == "detailed"). | ||
| preExpandedExperimentPattern = regexp.MustCompile(`__GH_AW_EXPERIMENTS_[A-Z0-9_]+__`) |
| // # Validation Functions | ||
| // | ||
| // - validateNoIncludesInTemplateRegions() - Validates that imports are not inside template blocks | ||
| // - validateNoPreExpandedExperimentPlaceholders() - Validates that pre-expanded __GH_AW_EXPERIMENTS__*__ placeholders are not used in template conditions |
| { | ||
| name: "invalid workflow with double-underscore variant form", | ||
| content: `--- | ||
| on: issues | ||
| permissions: | ||
| issues: read | ||
| strict: false | ||
| --- | ||
|
|
||
| {{#if __GH_AW_EXPERIMENTS__PROMPT_STYLE___detailed}} | ||
| detailed prompt | ||
| {{/if}}`, | ||
| shouldError: true, | ||
| errContains: "use experiments.<name> instead", | ||
| }, |
| // Validate that pre-expanded __GH_AW_EXPERIMENTS_*__ placeholders are not used in template conditions | ||
| if err := validateNoPreExpandedExperimentPlaceholders(result.Markdown); err != nil { | ||
| orchestratorFrontmatterLog.Printf("Pre-expanded experiment placeholder validation failed: %v", err) | ||
| return nil, fmt.Errorf("template condition validation failed: %w", err) | ||
| } |
| 1. Use `jq` to identify clusters of 5+ related issues that share a theme but lack a parent. | ||
| 2. Create a parent issue (title prefix `[Parent] `) for each cluster and link its members as sub-issues. | ||
| 3. Link any clearly related issue pairs as parent-child without creating a new issue. | ||
| 4. Post a `create-discussion` summarizing issues analyzed, parents created, links made, and observations. |
|
@copilot Review all comments |
… create_discussion, update tests Agent-Logs-Url: https://github.com/github/gh-aw/sessions/1e53a068-92aa-4e70-a871-d2d02a3318a3 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed all actionable items in 655f3c3:
|
Sets up a 50/50 A/B experiment on
issue-arboristto determine whether the current verbose 5-step prompt (detailed) outperforms a concise 4-step variant by ≥15% onlinks_createdper run.Frontmatter changes
experiments.prompt_styleblock withconcise/detailedvariants, Mann-Whitney analysis, 30-sample minimum,empty_output_rate == 0guardrail, andissue: 30015guardrail_metrics.thresholdto use expression syntax"==0"(schema requires operator-prefixed string, not bare float)Prompt changes
Wrapped the existing prompt body in
{{#if experiments.prompt_style == "detailed"}}/{{else}}/{{/if}}:detailed— original prompt preserved verbatim (5 steps, jq examples, output template)concise— new 4-bullet variant directing the model to self-direct viajq, create parent issues, link pairs, and post acreate_discussionsummaryCompiler validation
Added
validateNoPreExpandedExperimentPlaceholders()intemplate_validation.gowhich raises a compile-time error if the internal__GH_AW_EXPERIMENTS_*__placeholder form is used directly in a{{#if ...}}condition. Authors must use theexperiments.<name>dot-notation form (e.g.experiments.prompt_style == "detailed").The validation uses the existing
TemplateIfPatternfromexpression_patterns.go, which correctly handles conditions containing embedded${{ ... }}blocks without false negatives.Lock file regenerated via
gh aw compile issue-arborist.