docs: add token consumption optimization guide by Copilot · Pull Request #31477 · github/gh-aw

Copilot · 2026-05-11T11:04:19Z

✨ Enhancement

What does this improve?

No single reference existed for token cost reduction in agentic workflows. Guidance was scattered across campaign.md, subagents.md, experiments.md, debug-agentic-workflow.md, and pattern docs.

Why is this valuable?

Token cost is the primary operational expense for agentic workflows. A consolidated, opinionated reference accelerates cost reduction without requiring authors to mine multiple files.

Implementation approach:

.github/aw/token-optimization.md — New reference distilling six techniques:
- DataOps — move API calls into deterministic steps: so the agent reads compact pre-aggregated JSON instead of fetching iteratively
- gh-proxy / cli-proxy — mode: gh-proxy skips Docker MCP server startup; cli-proxy: true lets the agent pipe MCP output through jq before it hits the context window
- Inline sub-agents with model: small — delegate classification/summarization (~200 tokens/item at haiku pricing) to cheap models; large model only synthesizes compact results
- Caveman — A/B experiment comparing verbose vs. minimal prompt variants with metric: "effective_tokens" to find the cheapest acceptable phrasing
- Experiments — gate every optimization behind an experiments: field before committing; measure first
- Audit-based comparison — gh aw audit <base-id> <optimized-id> confirms effective token delta across runs before promoting a change; per-request detail is read from logs/run-<run-id>/firewall-audit-logs/api-proxy-logs/token-usage.jsonl which gh aw audit populates automatically
The Quick-Reference Checklist is promoted to the top of the document so readers can immediately apply the highest-impact checks without reading the full guide. Cross-links to subagents.md, experiments.md, memory.md, cli-commands.md, syntax.md are included.
.github/agents/agentic-workflows.agent.md — Added dispatcher routing entry (**Reducing token consumption / cost optimization**) and ### Token Consumption Optimization prompt section so the agent loads the guide when users ask about cost, token usage, or measuring prompt changes.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan

@copilot optimize for promoting efficiency

pelikhan · 2026-05-11T11:10:49Z

+### Per-request token detail
+
+For deep analysis, download the `firewall-audit-logs` artifact:
+
+```bash
+gh run download <run-id> -n firewall-audit-logs
+cat firewall-audit-logs/api-proxy-logs/token-usage.jsonl
+```
+
+Each line is one API call with `model`, `input_tokens`, `output_tokens`, `cache_read_tokens`, and `cache_write_tokens`. Use this to find which API calls are the most expensive.


@copilot the audit command downloads all the artifacts and caches them to files already.

Fixed — the section now reads gh aw audit <run-id> first (which already downloads all artifacts into logs/run-<run-id>/) and then references the token-usage.jsonl from that local path.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-11T11:26:51Z

@copilot optimize for promoting efficiency

The Quick-Reference Checklist is now the first thing readers see — each item describes the optimization concisely with its payoff, ordered from highest to lowest impact. The verbose "before optimizing" framing has been cut from the intro.

pelikhan · 2026-05-11T11:31:37Z

@copilot keep linking in the .agent.md file

…flows.agent.md Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-11T11:36:21Z

@copilot keep linking in the .agent.md file

The previous commit had accidentally dropped both the dispatcher routing entry and the ### Token Consumption Optimization section. Both are restored in this commit.

Copilot

Pull request overview

This PR adds a consolidated, opinionated guide for reducing token/cost in agentic workflows and wires it into the dispatcher agent so it’s loaded when users ask about cost optimization.

Changes:

Added a new .github/aw/token-optimization.md reference covering measurement (gh aw audit) and several token-reduction techniques (DataOps/steps, gh-proxy/cli-proxy usage, sub-agents, experiments, caching).
Updated the dispatcher agent prompt (.github/agents/agentic-workflows.agent.md) to route token/cost questions to the new guide.

Show a summary per file

File	Description
.github/aw/token-optimization.md	New consolidated token optimization reference with measurement + technique guidance.
.github/agents/agentic-workflows.agent.md	Adds routing + prompt section so the dispatcher loads the new optimization guide for cost/token questions.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/aw/token-optimization.md:163

toolsets: is shown under tools.github in mode: gh-proxy, but gh-proxy mode does not register the GitHub MCP server, so toolsets don’t apply. Recommend removing toolsets: [default] from this snippet (or adding a note that toolsets only affect mode: local).

```yaml
tools:
  github:
    mode: gh-proxy      # ✅ preferred — pre-authenticated gh CLI, no MCP server startup
    toolsets: [default]

</details>


- **Files reviewed:** 2/2 changed files
- **Comments generated:** 3

+engine: copilot
+tools:
+  github:
+    mode: gh-proxy


+## Technique 2 — Use `gh-proxy` and `cli-proxy` Instead of the MCP Server
+
+**Eliminates Docker startup overhead and reduces per-call context overhead.**


+**Repeated context (system prompt, shared preamble) is charged at ~10× less when cached.**
+
+Prompt caching is automatically enabled by the AWF gateway. Effective cached input tokens are weighted at `0.1` in the effective token formula (versus `1.0` for uncached input).


Add token consumption optimization guide and link in agent.md

5926292

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan May 11, 2026

Copilot AI changed the title ~~Add token consumption optimization guide~~ docs: add token consumption optimization guide May 11, 2026

Copilot created this pull request from a session on behalf of pelikhan May 11, 2026 11:05 View session

Copilot finished work on behalf of pelikhan May 11, 2026 11:05

Copilot AI requested a review from pelikhan May 11, 2026 11:05

pelikhan reviewed May 11, 2026

View reviewed changes

Copilot started work on behalf of pelikhan May 11, 2026 11:21 View session

Fix per-request token detail and promote checklist to top of doc

584b7e9

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 11, 2026 11:27

Copilot AI requested a review from pelikhan May 11, 2026 11:27

Copilot started work on behalf of pelikhan May 11, 2026 11:31 View session

Restore token optimization routing and prompt section in agentic-work…

2cc5cc2

…flows.agent.md Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan marked this pull request as ready for review May 11, 2026 11:37

Copilot AI review requested due to automatic review settings May 11, 2026 11:37

Copilot finished work on behalf of pelikhan May 11, 2026 11:37

pelikhan merged commit 79139ac into main May 11, 2026
1 check passed

pelikhan deleted the copilot/add-instructions-for-token-optimization branch May 11, 2026 11:37

Copilot started reviewing on behalf of pelikhan May 11, 2026 11:38 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 11, 2026

[docs] Update documentation for features from 2026-05-11 #31491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add token consumption optimization guide#31477

docs: add token consumption optimization guide#31477
pelikhan merged 3 commits into
mainfrom
copilot/add-instructions-for-token-optimization

Copilot AI commented May 11, 2026 •

edited

Loading

Uh oh!

pelikhan left a comment

Uh oh!

pelikhan May 11, 2026

Uh oh!

Copilot AI May 11, 2026

Uh oh!

Copilot AI commented May 11, 2026

Uh oh!

pelikhan commented May 11, 2026

Uh oh!

Copilot AI commented May 11, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		## Technique 2 — Use `gh-proxy` and `cli-proxy` Instead of the MCP Server

		Eliminates Docker startup overhead and reduces per-call context overhead.

		Repeated context (system prompt, shared preamble) is charged at ~10× less when cached.

		Prompt caching is automatically enabled by the AWF gateway. Effective cached input tokens are weighted at `0.1` in the effective token formula (versus `1.0` for uncached input).

Conversation

Copilot AI commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Enhancement

Uh oh!

pelikhan left a comment

Choose a reason for hiding this comment

Uh oh!

pelikhan May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented May 11, 2026

Uh oh!

pelikhan commented May 11, 2026

Uh oh!

Copilot AI commented May 11, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 11, 2026 •

edited

Loading