Skip to content

[ab-advisor] Experiment campaign for smoke-temporary-id: A/B test sub_agent_strategy #34010

@github-actions

Description

@github-actions

🧪 Experiment Campaign: smoke-temporary-id

Workflow file: .github/workflows/smoke-temporary-id.md
Selected dimension: sub_agent_strategy
Triggered by: ab-testing-advisor on 2026-05-22


Background

The smoke-temporary-id workflow validates temporary ID functionality for creating parent-child issue hierarchies and cross-references. Currently, it performs all work (creating 3 issues, adding 1 comment) in a single agent context. This experiment tests whether decomposing the work into sub-agents—one per issue creation—reduces token consumption and improves reliability without sacrificing correctness.

Hypothesis

H0: Decomposing issue creation into sub-agents does not reduce effective token count or improve success rate compared to single-agent execution.

H1: Using sub-agents for per-item work reduces effective token count by 15-25% and improves success rate by maintaining smaller, focused contexts that are less prone to hallucination or incomplete JSON emission.

Experiment Configuration

Add the following experiments: block to the workflow frontmatter:

experiments:
  sub_agent_strategy:
    variants: [single_agent, sub_agents]
    description: "Test whether decomposing issue creation into sub-agents reduces cost"
    hypothesis: "H0: no change in effective token count. H1: sub-agents reduce token count by 15-25% and improve success rate."
    metric: effective_token_count
    secondary_metrics: [run_duration_seconds, issue_creation_success_rate]
    guardrail_metrics:
      - name: all_issues_created
        direction: min
        threshold: "==3"
      - name: temporary_id_resolution_rate
        direction: min
        threshold: ">=0.95"
    min_samples: 20
    weight: [50, 50]
    start_date: "2026-05-23"
    issue: "#aw_campaign"
    analysis_type: t_test
    tags: [cost-efficiency, sub-agents, smoke-tests]

Variant descriptions:

  • single_agent: Current behavior—all work performed in the main agent prompt (3 issue creations + 1 comment in one context)
  • sub_agents: Decompose work—launch 3 background task agents (one per issue) from a coordinator prompt, then add comment in main context after sub-agents complete

Workflow Changes Required

The experiment uses handlebars conditional blocks to switch between single-agent and sub-agent execution paths. Always use the value-comparison form {{#if experiments.sub_agent_strategy == "variant"}} (never the internal env-var form).

Before (current single-agent approach, lines 50-110):

# Smoke Test: Temporary ID Functionality

This workflow validates that temporary IDs work correctly for:
1. Creating parent-child issue hierarchies
2. Cross-referencing issues in bodies
3. Different temporary ID formats (3-8 alphanumeric characters)

**IMPORTANT**: Use the exact temporary ID format `aw_` followed by 3-8 alphanumeric characters (A-Za-z0-9).

## Test 1: Create Parent Issue with Temporary ID

Create a parent tracking issue with a temporary ID. Use a 6-character alphanumeric ID.

```json
{
  "type": "create_issue",
  "temporary_id": "aw_test01",
  "title": "Test Parent: Temporary ID Validation",
  "body": "This is a parent issue created to test temporary ID functionality.\n\nSub-issues:\n- #aw_test02\n- #aw_test03\n\nAll references should be replaced with actual issue numbers."
}

Test 2: Create Sub-Issues with Cross-References

[... rest of single-agent instructions ...]


**After** (with experiment conditional):

```markdown
# Smoke Test: Temporary ID Functionality

This workflow validates that temporary IDs work correctly for:
1. Creating parent-child issue hierarchies
2. Cross-referencing issues in bodies
3. Different temporary ID formats (3-8 alphanumeric characters)

**IMPORTANT**: Use the exact temporary ID format `aw_` followed by 3-8 alphanumeric characters (A-Za-z0-9).

{{#if experiments.sub_agent_strategy == "single_agent"}}

## Single-Agent Mode

Create all issues in this context.

### Test 1: Create Parent Issue

```json
{
  "type": "create_issue",
  "temporary_id": "aw_test01",
  "title": "Test Parent: Temporary ID Validation",
  "body": "This is a parent issue created to test temporary ID functionality.\n\nSub-issues:\n- #aw_test02\n- #aw_test03\n\nAll references should be replaced with actual issue numbers."
}

Test 2: Create Sub-Issue 1

{
  "type": "create_issue",
  "temporary_id": "aw_test02",
  "parent": "aw_test01",
  "title": "Sub-Issue 1: Test Temporary ID References",
  "body": "This is sub-issue 1.\n\nParent: #aw_test01\nRelated: #aw_test03\n\nAll temporary IDs should be resolved to actual issue numbers."
}

Test 3: Create Sub-Issue 2

{
  "type": "create_issue",
  "temporary_id": "aw_test03",
  "parent": "aw_test01",
  "title": "Sub-Issue 2: Test Different ID Length",
  "body": "This is sub-issue 2 with an 8-character temporary ID.\n\nParent: #aw_test01\nRelated: #aw_test02\n\nTesting that longer temporary IDs (8 chars) work correctly."
}

{{/if}}
{\{#if experiments.sub_agent_strategy == "sub_agents"}}

Sub-Agent Mode

Launch 3 background task agents to create issues in parallel, then wait for completion.

You are the coordinator. Your job:

  1. Launch 3 background task agents (one per issue) with complete temporary ID instructions
  2. Wait for all 3 agents to complete (you will receive automatic completion notifications)
  3. After all complete, add a summary comment to the parent issue using temporary ID aw_test01

Agent 1: Create Parent Issue

Launch a background task agent:

# Use the task tool with agent_type: task, mode: background
# Prompt: "Create a parent issue with temporary ID aw_test01. Title: 'Test Parent: Temporary ID Validation'. Body: 'This is a parent issue created to test temporary ID functionality.\n\nSub-issues:\n- #aw_test02\n- #aw_test03\n\nAll references should be replaced with actual issue numbers.' Use safeoutputs create_issue with temporary_id: aw_test01."

Agent 2: Create Sub-Issue 1

Launch a second background task agent:

# Prompt: "Create a sub-issue with temporary ID aw_test02 and parent aw_test01. Title: 'Sub-Issue 1: Test Temporary ID References'. Body: 'This is sub-issue 1.\n\nParent: #aw_test01\nRelated: #aw_test03\n\nAll temporary IDs should be resolved to actual issue numbers.' Use safeoutputs create_issue with temporary_id: aw_test02 and parent: aw_test01."

Agent 3: Create Sub-Issue 2

Launch a third background task agent:

# Prompt: "Create a sub-issue with temporary ID aw_test03 and parent aw_test01. Title: 'Sub-Issue 2: Test Different ID Length'. Body: 'This is sub-issue 2 with an 8-character temporary ID.\n\nParent: #aw_test01\nRelated: #aw_test02\n\nTesting that longer temporary IDs (8 chars) work correctly.' Use safeoutputs create_issue with temporary_id: aw_test03 and parent: aw_test01."

After All Agents Complete

Once you receive completion notifications for all 3 agents, verify their success and add a summary comment.

{{/if}}

Final Step: Add Summary Comment

Regardless of strategy, add a comment to the parent issue summarizing test results:

{
  "type": "add_comment",
  "issue_number": "aw_test01",
  "body": "## Test Results\n\n✅ Parent issue created with temporary ID `aw_test01`\n✅ Sub-issue 1 created with temporary ID `aw_test02` and linked to parent\n✅ Sub-issue 2 created with temporary ID `aw_test03` and linked to parent\n✅ Cross-references resolved correctly\n\n**Validation**: Check that:\n1. All temporary ID references (#aw_*) in issue bodies are replaced with actual issue numbers (#123)\n2. Sub-issues show parent relationship in GitHub UI\n3. Parent issue shows sub-issues in task list\n\nTemporary ID format validated: `aw_[A-Za-z0-9]{3,8}`"
}

Expected Outcome

  1. Parent issue #aw_test01 created and assigned actual issue number (e.g., Add source field to track workflow origin in frontmatter #1234)
  2. Sub-issues #aw_test02 and #aw_test03 created with actual issue numbers
  3. All references like #aw_test01 replaced with actual numbers like #1234
  4. Sub-issues properly linked to parent with parent field
  5. Comment added to parent verifying the test results

Success Criteria: All 3 issues created, all temporary ID references resolved, parent-child relationships established.


### Success Metrics

| Metric | Type | Target |
|--------|------|--------|
| effective_token_count | Primary | 15-25% reduction with sub_agents |
| run_duration_seconds | Secondary | No significant increase |
| issue_creation_success_rate | Secondary | Maintain or improve |
| all_issues_created | Guardrail | Must equal 3 for both variants |
| temporary_id_resolution_rate | Guardrail | Must be >=0.95 (all temp IDs resolved) |

### Statistical Design

- **Variants**: `single_agent` (control), `sub_agents` (treatment)
- **Assignment**: Round-robin via `gh-aw` experiments runtime (cache-based)
- **Minimum runs per variant**: 20 (workflow runs ~1-3 times per day on PR labels or manual dispatch)
- **Expected experiment duration**: ~10-15 days to reach 40 total runs (20 per variant)
- **Analysis approach**: Two-sample t-test for token count reduction; proportion test for success rate; Mann-Whitney U for duration if non-normal

### Implementation Steps

- [ ] Add `experiments:` section to frontmatter with `sub_agent_strategy` experiment
- [ ] Wrap workflow body with `{{#if experiments.sub_agent_strategy == "single_agent"}}` and `{{#if experiments.sub_agent_strategy == "sub_agents"}}` conditional blocks
- [ ] Implement sub-agent coordination logic in the `sub_agents` variant branch
- [ ] Run `gh aw compile smoke-temporary-id` to regenerate lock file
- [ ] Monitor experiment assignments in workflow run step summaries
- [ ] After 20+ runs per variant, analyze results using `daily-experiment-report` workflow
- [ ] Document findings and promote winning variant by removing experiment block and keeping winning logic

### References

- [A/B Testing in gh-aw](https://github.com/github/gh-aw/blob/main/.github/aw/github-agentic-workflows.md)
- Workflow file: `.github/workflows/smoke-temporary-id.md`
- Experiment state artifact: `/tmp/gh-aw/experiments/state.json` (uploaded per run)
- Analysis workflow: `.github/workflows/daily-experiment-report.md`


<!-- gh-aw-tracker-id: ab-testing-advisor -->




> Generated by [🧪 Daily A/B Testing Advisor](https://github.com/github/gh-aw/actions/runs/26290849200) · ● 1.2M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fab-testing-advisor%22&type=issues)
> - [x] expires <!-- gh-aw-expires: 2026-06-05T13:41:25.432Z --> on Jun 5, 2026, 1:41 PM UTC

<!-- gh-aw-agentic-workflow: Daily A/B Testing Advisor, gh-aw-tracker-id: ab-testing-advisor, engine: copilot, version: 1.0.48, model: claude-sonnet-4.5, id: 26290849200, workflow_id: ab-testing-advisor, run: https://github.com/github/gh-aw/actions/runs/26290849200 -->

<!-- gh-aw-workflow-id: ab-testing-advisor -->
<!-- gh-aw-workflow-call-id: github/gh-aw/ab-testing-advisor -->
<!-- gh-aw-close-key: ab-testing-advisor -->

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions