Skip to content

Epic: System Prompting & Context Trimming vNext #15328

@mattKorwel

Description

@mattKorwel

Overview

This epic covers the overhaul of the Gemini CLI system prompting and context management to optimize for Gemini 3.0 and address performance/coherence issues. The current prompting is heavily reliant on 2.5-specific "handholding," which is counterproductive for newer models that are more proficient with terminal-centric workflows. Managing context rot and token bloat is identified as the single most important factor for improving performance on long-running tasks and benchmarks like SWEBench.

Key Objectives

System Prompting & Gemini 3.0

  • Gemini 3.0 Optimization: Transition from handholding-heavy 2.5 prompts to flexible instructions that leverage Gemini 3's Bash proficiency.
  • Terminal-Centric Approach: Leverage the newer model's Bash proficiency, allowing it to perform more exploratory actions (e.g., using piping or redirection to limit output) rather than relying on strictly gated tool parameters.

Context Management

  • Rolling Context Window: Implement a rolling window for context compression/pruning, targeting a stable pool (e.g., ~50k tokens) to prevent model incoherence over time.
  • Context Pruning vs. Compression: Distinguish between compression (pulling down when limits are hit) and pruning (actively maintaining a focused area by removing content and replacing with file references).
  • Temporary Directory Redirection: Offload heavy outputs (build logs, large search results, pre-flights) to the session temporary directory to isolate them from the primary context window.
  • Improved Benchmark Performance: Utilize context trimming as the primary lever to increase success rates on high-token scenarios.

Requirements & Technical Tasks

  • Temp Dir Read/Write: Enable the model to both read from and write to the temporary directory.
  • Output Visibility: Implement feedback/streaming for redirected output to ensure users don't experience "hanging" during long operations.
  • Structured Interpolations: Implement structured interpolations for prompt variants (sub-agents, experimental features) as suggested in Support template syntax for system prompts #13757.
  • Safety & Security: Incorporate a comprehensive security and safety section in the prompts.
  • Context Telemetry: Add telemetry to track context usage percentages across different components (prompt, tools, extensions) to identify "debt" offenders.

Behavioral Evals & Metrics

  • Establish behavioral evals to prevent overactions and ensure consistent tool use.
  • Coherence Hill-Climbing: Use a "long scenario" metric (e.g., fixing 700+ linter errors) to measure the point at which the agent loses coherence and loops.
  • Repetitive Task Reliability: Ensure reliable execution of multi-file, independent changes as noted in related issue Gemini CLI often performs badly when the context window gets large #9791.
  • Track operational health metrics separate from data science metrics.

Based on the planning session: Planning: Make Gcli smarter

Sub Issues By Category

I've categorized the 19 open sub-issues for Epic #15328 to help you decide what to tackle next:

🤖 System Prompt & Model Instructions

🧹 Context & Resource Management

🧠 Agent Logic & Workflow

⚡ UX & Benchmarking

Metadata

Metadata

Assignees

Labels

area/agentIssues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Qualityworkstream-rollupLabel used to tag epics and features that are associated with one of the three primary workstreams🔒 maintainer only⛔ Do not contribute. Internal roadmap item.

Type

No fields configured for Epic.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions