docs: add token consumption optimization guide#31477
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
| ### Per-request token detail | ||
|
|
||
| For deep analysis, download the `firewall-audit-logs` artifact: | ||
|
|
||
| ```bash | ||
| gh run download <run-id> -n firewall-audit-logs | ||
| cat firewall-audit-logs/api-proxy-logs/token-usage.jsonl | ||
| ``` | ||
|
|
||
| Each line is one API call with `model`, `input_tokens`, `output_tokens`, `cache_read_tokens`, and `cache_write_tokens`. Use this to find which API calls are the most expensive. |
There was a problem hiding this comment.
@copilot the audit command downloads all the artifacts and caches them to files already.
There was a problem hiding this comment.
Fixed — the section now reads gh aw audit <run-id> first (which already downloads all artifacts into logs/run-<run-id>/) and then references the token-usage.jsonl from that local path.
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
The Quick-Reference Checklist is now the first thing readers see — each item describes the optimization concisely with its payoff, ordered from highest to lowest impact. The verbose "before optimizing" framing has been cut from the intro. |
|
@copilot keep linking in the .agent.md file |
…flows.agent.md Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
The previous commit had accidentally dropped both the dispatcher routing entry and the |
There was a problem hiding this comment.
Pull request overview
This PR adds a consolidated, opinionated guide for reducing token/cost in agentic workflows and wires it into the dispatcher agent so it’s loaded when users ask about cost optimization.
Changes:
- Added a new
.github/aw/token-optimization.mdreference covering measurement (gh aw audit) and several token-reduction techniques (DataOps/steps, gh-proxy/cli-proxy usage, sub-agents, experiments, caching). - Updated the dispatcher agent prompt (
.github/agents/agentic-workflows.agent.md) to route token/cost questions to the new guide.
Show a summary per file
| File | Description |
|---|---|
| .github/aw/token-optimization.md | New consolidated token optimization reference with measurement + technique guidance. |
| .github/agents/agentic-workflows.agent.md | Adds routing + prompt section so the dispatcher loads the new optimization guide for cost/token questions. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
.github/aw/token-optimization.md:163
toolsets:is shown undertools.githubinmode: gh-proxy, but gh-proxy mode does not register the GitHub MCP server, so toolsets don’t apply. Recommend removingtoolsets: [default]from this snippet (or adding a note that toolsets only affectmode: local).
```yaml
tools:
github:
mode: gh-proxy # ✅ preferred — pre-authenticated gh CLI, no MCP server startup
toolsets: [default]
</details>
- **Files reviewed:** 2/2 changed files
- **Comments generated:** 3
| engine: copilot | ||
| tools: | ||
| github: | ||
| mode: gh-proxy |
| ## Technique 2 — Use `gh-proxy` and `cli-proxy` Instead of the MCP Server | ||
|
|
||
| **Eliminates Docker startup overhead and reduces per-call context overhead.** |
| **Repeated context (system prompt, shared preamble) is charged at ~10× less when cached.** | ||
|
|
||
| Prompt caching is automatically enabled by the AWF gateway. Effective cached input tokens are weighted at `0.1` in the effective token formula (versus `1.0` for uncached input). |
✨ Enhancement
What does this improve?
No single reference existed for token cost reduction in agentic workflows. Guidance was scattered across
campaign.md,subagents.md,experiments.md,debug-agentic-workflow.md, and pattern docs.Why is this valuable?
Token cost is the primary operational expense for agentic workflows. A consolidated, opinionated reference accelerates cost reduction without requiring authors to mine multiple files.
Implementation approach:
.github/aw/token-optimization.md— New reference distilling six techniques:steps:so the agent reads compact pre-aggregated JSON instead of fetching iterativelygh-proxy/cli-proxy—mode: gh-proxyskips Docker MCP server startup;cli-proxy: truelets the agent pipe MCP output throughjqbefore it hits the context windowmodel: small— delegate classification/summarization (~200 tokens/item at haiku pricing) to cheap models; large model only synthesizes compact resultsmetric: "effective_tokens"to find the cheapest acceptable phrasingexperiments:field before committing; measure firstgh aw audit <base-id> <optimized-id>confirms effective token delta across runs before promoting a change; per-request detail is read fromlogs/run-<run-id>/firewall-audit-logs/api-proxy-logs/token-usage.jsonlwhichgh aw auditpopulates automaticallyThe Quick-Reference Checklist is promoted to the top of the document so readers can immediately apply the highest-impact checks without reading the full guide. Cross-links to
subagents.md,experiments.md,memory.md,cli-commands.md,syntax.mdare included..github/agents/agentic-workflows.agent.md— Added dispatcher routing entry (**Reducing token consumption / cost optimization**) and### Token Consumption Optimizationprompt section so the agent loads the guide when users ask about cost, token usage, or measuring prompt changes.