[ab-advisor] Experiment campaign for smoke-temporary-id: A/B test sub_agent_strategy

### 🧪 Experiment Campaign: smoke-temporary-id

**Workflow file**: `.github/workflows/smoke-temporary-id.md`
**Selected dimension**: `sub_agent_strategy`
**Triggered by**: `ab-testing-advisor` on 2026-05-22

---

### Background

The `smoke-temporary-id` workflow validates temporary ID functionality for creating parent-child issue hierarchies and cross-references. Currently, it performs all work (creating 3 issues, adding 1 comment) in a single agent context. This experiment tests whether decomposing the work into sub-agents—one per issue creation—reduces token consumption and improves reliability without sacrificing correctness.

### Hypothesis

**H0**: Decomposing issue creation into sub-agents does not reduce effective token count or improve success rate compared to single-agent execution.

**H1**: Using sub-agents for per-item work reduces effective token count by 15-25% and improves success rate by maintaining smaller, focused contexts that are less prone to hallucination or incomplete JSON emission.

### Experiment Configuration

Add the following `experiments:` block to the workflow frontmatter:

```yaml
experiments:
  sub_agent_strategy:
    variants: [single_agent, sub_agents]
    description: "Test whether decomposing issue creation into sub-agents reduces cost"
    hypothesis: "H0: no change in effective token count. H1: sub-agents reduce token count by 15-25% and improve success rate."
    metric: effective_token_count
    secondary_metrics: [run_duration_seconds, issue_creation_success_rate]
    guardrail_metrics:
      - name: all_issues_created
        direction: min
        threshold: "==3"
      - name: temporary_id_resolution_rate
        direction: min
        threshold: ">=0.95"
    min_samples: 20
    weight: [50, 50]
    start_date: "2026-05-23"
    issue: "#aw_campaign"
    analysis_type: t_test
    tags: [cost-efficiency, sub-agents, smoke-tests]
```

**Variant descriptions**:
- `single_agent`: Current behavior—all work performed in the main agent prompt (3 issue creations + 1 comment in one context)
- `sub_agents`: Decompose work—launch 3 background `task` agents (one per issue) from a coordinator prompt, then add comment in main context after sub-agents complete

### Workflow Changes Required

The experiment uses handlebars conditional blocks to switch between single-agent and sub-agent execution paths. **Always use the value-comparison form** `{{#if experiments.sub_agent_strategy == "variant"}}` (never the internal env-var form).

**Before** (current single-agent approach, lines 50-110):

```markdown
# Smoke Test: Temporary ID Functionality

This workflow validates that temporary IDs work correctly for:
1. Creating parent-child issue hierarchies
2. Cross-referencing issues in bodies
3. Different temporary ID formats (3-8 alphanumeric characters)

**IMPORTANT**: Use the exact temporary ID format `aw_` followed by 3-8 alphanumeric characters (A-Za-z0-9).

## Test 1: Create Parent Issue with Temporary ID

Create a parent tracking issue with a temporary ID. Use a 6-character alphanumeric ID.

```json
{
  "type": "create_issue",
  "temporary_id": "aw_test01",
  "title": "Test Parent: Temporary ID Validation",
  "body": "This is a parent issue created to test temporary ID functionality.\n\nSub-issues:\n- #aw_test02\n- #aw_test03\n\nAll references should be replaced with actual issue numbers."
}
```

## Test 2: Create Sub-Issues with Cross-References

[... rest of single-agent instructions ...]
```

**After** (with experiment conditional):

```markdown
# Smoke Test: Temporary ID Functionality

This workflow validates that temporary IDs work correctly for:
1. Creating parent-child issue hierarchies
2. Cross-referencing issues in bodies
3. Different temporary ID formats (3-8 alphanumeric characters)

**IMPORTANT**: Use the exact temporary ID format `aw_` followed by 3-8 alphanumeric characters (A-Za-z0-9).

{{#if experiments.sub_agent_strategy == "single_agent"}}

## Single-Agent Mode

Create all issues in this context.

### Test 1: Create Parent Issue

```json
{
  "type": "create_issue",
  "temporary_id": "aw_test01",
  "title": "Test Parent: Temporary ID Validation",
  "body": "This is a parent issue created to test temporary ID functionality.\n\nSub-issues:\n- #aw_test02\n- #aw_test03\n\nAll references should be replaced with actual issue numbers."
}
```

### Test 2: Create Sub-Issue 1

```json
{
  "type": "create_issue",
  "temporary_id": "aw_test02",
  "parent": "aw_test01",
  "title": "Sub-Issue 1: Test Temporary ID References",
  "body": "This is sub-issue 1.\n\nParent: #aw_test01\nRelated: #aw_test03\n\nAll temporary IDs should be resolved to actual issue numbers."
}
```

### Test 3: Create Sub-Issue 2

```json
{
  "type": "create_issue",
  "temporary_id": "aw_test03",
  "parent": "aw_test01",
  "title": "Sub-Issue 2: Test Different ID Length",
  "body": "This is sub-issue 2 with an 8-character temporary ID.\n\nParent: #aw_test01\nRelated: #aw_test02\n\nTesting that longer temporary IDs (8 chars) work correctly."
}
```

\{\{/if}}
\{\\{\#if experiments.sub_agent_strategy == "sub_agents"}}

## Sub-Agent Mode

Launch 3 background `task` agents to create issues in parallel, then wait for completion.

You are the coordinator. Your job:

1. Launch 3 background `task` agents (one per issue) with complete temporary ID instructions
2. Wait for all 3 agents to complete (you will receive automatic completion notifications)
3. After all complete, add a summary comment to the parent issue using temporary ID `aw_test01`

### Agent 1: Create Parent Issue

Launch a background task agent:

```bash
# Use the task tool with agent_type: task, mode: background
# Prompt: "Create a parent issue with temporary ID aw_test01. Title: 'Test Parent: Temporary ID Validation'. Body: 'This is a parent issue created to test temporary ID functionality.\n\nSub-issues:\n- #aw_test02\n- #aw_test03\n\nAll references should be replaced with actual issue numbers.' Use safeoutputs create_issue with temporary_id: aw_test01."
```

### Agent 2: Create Sub-Issue 1

Launch a second background task agent:

```bash
# Prompt: "Create a sub-issue with temporary ID aw_test02 and parent aw_test01. Title: 'Sub-Issue 1: Test Temporary ID References'. Body: 'This is sub-issue 1.\n\nParent: #aw_test01\nRelated: #aw_test03\n\nAll temporary IDs should be resolved to actual issue numbers.' Use safeoutputs create_issue with temporary_id: aw_test02 and parent: aw_test01."
```

### Agent 3: Create Sub-Issue 2

Launch a third background task agent:

```bash
# Prompt: "Create a sub-issue with temporary ID aw_test03 and parent aw_test01. Title: 'Sub-Issue 2: Test Different ID Length'. Body: 'This is sub-issue 2 with an 8-character temporary ID.\n\nParent: #aw_test01\nRelated: #aw_test02\n\nTesting that longer temporary IDs (8 chars) work correctly.' Use safeoutputs create_issue with temporary_id: aw_test03 and parent: aw_test01."
```

### After All Agents Complete

Once you receive completion notifications for all 3 agents, verify their success and add a summary comment.

\{\{/if}}

## Final Step: Add Summary Comment

Regardless of strategy, add a comment to the parent issue summarizing test results:

```json
{
  "type": "add_comment",
  "issue_number": "aw_test01",
  "body": "## Test Results\n\n✅ Parent issue created with temporary ID `aw_test01`\n✅ Sub-issue 1 created with temporary ID `aw_test02` and linked to parent\n✅ Sub-issue 2 created with temporary ID `aw_test03` and linked to parent\n✅ Cross-references resolved correctly\n\n**Validation**: Check that:\n1. All temporary ID references (#aw_*) in issue bodies are replaced with actual issue numbers (#123)\n2. Sub-issues show parent relationship in GitHub UI\n3. Parent issue shows sub-issues in task list\n\nTemporary ID format validated: `aw_[A-Za-z0-9]{3,8}`"
}
```

## Expected Outcome

1. Parent issue #aw_test01 created and assigned actual issue number (e.g., #1234)
2. Sub-issues #aw_test02 and #aw_test03 created with actual issue numbers
3. All references like `#aw_test01` replaced with actual numbers like `#1234`
4. Sub-issues properly linked to parent with `parent` field
5. Comment added to parent verifying the test results

**Success Criteria**: All 3 issues created, all temporary ID references resolved, parent-child relationships established.
```

### Success Metrics

| Metric | Type | Target |
|--------|------|--------|
| effective_token_count | Primary | 15-25% reduction with sub_agents |
| run_duration_seconds | Secondary | No significant increase |
| issue_creation_success_rate | Secondary | Maintain or improve |
| all_issues_created | Guardrail | Must equal 3 for both variants |
| temporary_id_resolution_rate | Guardrail | Must be >=0.95 (all temp IDs resolved) |

### Statistical Design

- **Variants**: `single_agent` (control), `sub_agents` (treatment)
- **Assignment**: Round-robin via `gh-aw` experiments runtime (cache-based)
- **Minimum runs per variant**: 20 (workflow runs ~1-3 times per day on PR labels or manual dispatch)
- **Expected experiment duration**: ~10-15 days to reach 40 total runs (20 per variant)
- **Analysis approach**: Two-sample t-test for token count reduction; proportion test for success rate; Mann-Whitney U for duration if non-normal

### Implementation Steps

- [ ] Add `experiments:` section to frontmatter with `sub_agent_strategy` experiment
- [ ] Wrap workflow body with `{{#if experiments.sub_agent_strategy == "single_agent"}}` and `{{#if experiments.sub_agent_strategy == "sub_agents"}}` conditional blocks
- [ ] Implement sub-agent coordination logic in the `sub_agents` variant branch
- [ ] Run `gh aw compile smoke-temporary-id` to regenerate lock file
- [ ] Monitor experiment assignments in workflow run step summaries
- [ ] After 20+ runs per variant, analyze results using `daily-experiment-report` workflow
- [ ] Document findings and promote winning variant by removing experiment block and keeping winning logic

### References

- [A/B Testing in gh-aw](https://github.com/github/gh-aw/blob/main/.github/aw/github-agentic-workflows.md)
- Workflow file: `.github/workflows/smoke-temporary-id.md`
- Experiment state artifact: `/tmp/gh-aw/experiments/state.json` (uploaded per run)
- Analysis workflow: `.github/workflows/daily-experiment-report.md`







> Generated by [🧪 Daily A/B Testing Advisor](https://github.com/github/gh-aw/actions/runs/26290849200) · ● 1.2M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fab-testing-advisor%22&type=issues)
> - [x] expires  on Jun 5, 2026, 1:41 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ab-advisor] Experiment campaign for smoke-temporary-id: A/B test sub_agent_strategy #34010

🧪 Experiment Campaign: smoke-temporary-id

Background

Hypothesis

Experiment Configuration

Workflow Changes Required

Test 2: Create Sub-Issues with Cross-References

Test 2: Create Sub-Issue 1

Test 3: Create Sub-Issue 2

Sub-Agent Mode

Agent 1: Create Parent Issue

Agent 2: Create Sub-Issue 1

Agent 3: Create Sub-Issue 2

After All Agents Complete

Final Step: Add Summary Comment

Expected Outcome

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[ab-advisor] Experiment campaign for smoke-temporary-id: A/B test sub_agent_strategy #34010

Description

🧪 Experiment Campaign: smoke-temporary-id

Background

Hypothesis

Experiment Configuration

Workflow Changes Required

Test 2: Create Sub-Issues with Cross-References

Test 2: Create Sub-Issue 1

Test 3: Create Sub-Issue 2

Sub-Agent Mode

Agent 1: Create Parent Issue

Agent 2: Create Sub-Issue 1

Agent 3: Create Sub-Issue 2

After All Agents Complete

Final Step: Add Summary Comment

Expected Outcome

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions