Skip to content

docs: add token consumption optimization guide#31477

Merged
pelikhan merged 3 commits into
mainfrom
copilot/add-instructions-for-token-optimization
May 11, 2026
Merged

docs: add token consumption optimization guide#31477
pelikhan merged 3 commits into
mainfrom
copilot/add-instructions-for-token-optimization

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 11, 2026

✨ Enhancement

What does this improve?

No single reference existed for token cost reduction in agentic workflows. Guidance was scattered across campaign.md, subagents.md, experiments.md, debug-agentic-workflow.md, and pattern docs.

Why is this valuable?

Token cost is the primary operational expense for agentic workflows. A consolidated, opinionated reference accelerates cost reduction without requiring authors to mine multiple files.

Implementation approach:

  • .github/aw/token-optimization.md — New reference distilling six techniques:

    • DataOps — move API calls into deterministic steps: so the agent reads compact pre-aggregated JSON instead of fetching iteratively
    • gh-proxy / cli-proxymode: gh-proxy skips Docker MCP server startup; cli-proxy: true lets the agent pipe MCP output through jq before it hits the context window
    • Inline sub-agents with model: small — delegate classification/summarization (~200 tokens/item at haiku pricing) to cheap models; large model only synthesizes compact results
    • Caveman — A/B experiment comparing verbose vs. minimal prompt variants with metric: "effective_tokens" to find the cheapest acceptable phrasing
    • Experiments — gate every optimization behind an experiments: field before committing; measure first
    • Audit-based comparisongh aw audit <base-id> <optimized-id> confirms effective token delta across runs before promoting a change; per-request detail is read from logs/run-<run-id>/firewall-audit-logs/api-proxy-logs/token-usage.jsonl which gh aw audit populates automatically

    The Quick-Reference Checklist is promoted to the top of the document so readers can immediately apply the highest-impact checks without reading the full guide. Cross-links to subagents.md, experiments.md, memory.md, cli-commands.md, syntax.md are included.

  • .github/agents/agentic-workflows.agent.md — Added dispatcher routing entry (**Reducing token consumption / cost optimization**) and ### Token Consumption Optimization prompt section so the agent loads the guide when users ask about cost, token usage, or measuring prompt changes.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title Add token consumption optimization guide docs: add token consumption optimization guide May 11, 2026
Copilot AI requested a review from pelikhan May 11, 2026 11:05
Copy link
Copy Markdown
Collaborator

@pelikhan pelikhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot optimize for promoting efficiency

Comment on lines +51 to +60
### Per-request token detail

For deep analysis, download the `firewall-audit-logs` artifact:

```bash
gh run download <run-id> -n firewall-audit-logs
cat firewall-audit-logs/api-proxy-logs/token-usage.jsonl
```

Each line is one API call with `model`, `input_tokens`, `output_tokens`, `cache_read_tokens`, and `cache_write_tokens`. Use this to find which API calls are the most expensive.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot the audit command downloads all the artifacts and caches them to files already.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the section now reads gh aw audit <run-id> first (which already downloads all artifacts into logs/run-<run-id>/) and then references the token-usage.jsonl from that local path.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 11, 2026

@copilot optimize for promoting efficiency

The Quick-Reference Checklist is now the first thing readers see — each item describes the optimization concisely with its payoff, ordered from highest to lowest impact. The verbose "before optimizing" framing has been cut from the intro.

Copilot AI requested a review from pelikhan May 11, 2026 11:27
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot keep linking in the .agent.md file

…flows.agent.md

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 11, 2026

@copilot keep linking in the .agent.md file

The previous commit had accidentally dropped both the dispatcher routing entry and the ### Token Consumption Optimization section. Both are restored in this commit.

@pelikhan pelikhan marked this pull request as ready for review May 11, 2026 11:37
Copilot AI review requested due to automatic review settings May 11, 2026 11:37
@pelikhan pelikhan merged commit 79139ac into main May 11, 2026
1 check passed
@pelikhan pelikhan deleted the copilot/add-instructions-for-token-optimization branch May 11, 2026 11:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a consolidated, opinionated guide for reducing token/cost in agentic workflows and wires it into the dispatcher agent so it’s loaded when users ask about cost optimization.

Changes:

  • Added a new .github/aw/token-optimization.md reference covering measurement (gh aw audit) and several token-reduction techniques (DataOps/steps, gh-proxy/cli-proxy usage, sub-agents, experiments, caching).
  • Updated the dispatcher agent prompt (.github/agents/agentic-workflows.agent.md) to route token/cost questions to the new guide.
Show a summary per file
File Description
.github/aw/token-optimization.md New consolidated token optimization reference with measurement + technique guidance.
.github/agents/agentic-workflows.agent.md Adds routing + prompt section so the dispatcher loads the new optimization guide for cost/token questions.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

.github/aw/token-optimization.md:163

  • toolsets: is shown under tools.github in mode: gh-proxy, but gh-proxy mode does not register the GitHub MCP server, so toolsets don’t apply. Recommend removing toolsets: [default] from this snippet (or adding a note that toolsets only affect mode: local).
```yaml
tools:
  github:
    mode: gh-proxy      # ✅ preferred — pre-authenticated gh CLI, no MCP server startup
    toolsets: [default]
</details>


- **Files reviewed:** 2/2 changed files
- **Comments generated:** 3


engine: copilot
tools:
github:
mode: gh-proxy
Comment on lines +152 to +154
## Technique 2 — Use `gh-proxy` and `cli-proxy` Instead of the MCP Server

**Eliminates Docker startup overhead and reduces per-call context overhead.**
Comment on lines +352 to +354
**Repeated context (system prompt, shared preamble) is charged at ~10× less when cached.**

Prompt caching is automatically enabled by the AWF gateway. Effective cached input tokens are weighted at `0.1` in the effective token formula (versus `1.0` for uncached input).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants