Epic: System Prompting & Context Trimming vNext

## Overview
This epic covers the overhaul of the Gemini CLI system prompting and context management to optimize for Gemini 3.0 and address performance/coherence issues. The current prompting is heavily reliant on 2.5-specific "handholding," which is counterproductive for newer models that are more proficient with terminal-centric workflows. Managing context rot and token bloat is identified as the single most important factor for improving performance on long-running tasks and benchmarks like SWEBench.

## Key Objectives

### System Prompting & Gemini 3.0
- **Gemini 3.0 Optimization**: Transition from handholding-heavy 2.5 prompts to flexible instructions that leverage Gemini 3's Bash proficiency.
- **Terminal-Centric Approach**: Leverage the newer model's Bash proficiency, allowing it to perform more exploratory actions (e.g., using piping or redirection to limit output) rather than relying on strictly gated tool parameters.

### Context Management
- **Rolling Context Window**: Implement a rolling window for context compression/pruning, targeting a stable pool (e.g., ~50k tokens) to prevent model incoherence over time.
- **Context Pruning vs. Compression**: Distinguish between compression (pulling down when limits are hit) and pruning (actively maintaining a focused area by removing content and replacing with file references).
- **Temporary Directory Redirection**: Offload heavy outputs (build logs, large search results, pre-flights) to the session temporary directory to isolate them from the primary context window.
- **Improved Benchmark Performance**: Utilize context trimming as the primary lever to increase success rates on high-token scenarios.

## Requirements & Technical Tasks
- [ ] **Temp Dir Read/Write**: Enable the model to both read from and write to the temporary directory.
- [ ] **Output Visibility**: Implement feedback/streaming for redirected output to ensure users don't experience "hanging" during long operations.
- [ ] **Structured Interpolations**: Implement structured interpolations for prompt variants (sub-agents, experimental features) as suggested in #13757.
- [ ] **Safety & Security**: Incorporate a comprehensive security and safety section in the prompts.
- [ ] **Context Telemetry**: Add telemetry to track context usage percentages across different components (prompt, tools, extensions) to identify "debt" offenders.

## Behavioral Evals & Metrics
- Establish behavioral evals to prevent overactions and ensure consistent tool use.
- **Coherence Hill-Climbing**: Use a "long scenario" metric (e.g., fixing 700+ linter errors) to measure the point at which the agent loses coherence and loops.
- **Repetitive Task Reliability**: Ensure reliable execution of multi-file, independent changes as noted in related issue #9791.
- Track operational health metrics separate from data science metrics.

---
*Based on the planning session: [Planning: Make Gcli smarter](https://docs.google.com/document/d/1R9fBXf60S104s6wwFh7mGma0fA7MBPr9QQxykm3oo50/edit)*

# Sub Issues By Category

I've categorized the **19 open sub-issues** for Epic #15328 to help you decide what to tackle next:

### 🤖 System Prompt & Model Instructions
*   **#17161**: Isolated system prompts for Gemini 3 and Gemini 2.5
*   **#15678**: Refine instructions for `save_memory` tool
*   **#13757**: Support template syntax for system prompts

### 🧹 Context & Resource Management
*   **#17164**: Fix FS tools (like `ReadFolder`) access to project temp directory
*   **#17190**: Shorten temp directory paths
*   **#16955**: Rolling Tool Output Pruning
*   **#17037**: Lower tool output truncation thresholds for Gemini 3
*   **#16999**: Hierarchical `GEMINI.md` context loading

### 🧠 Agent Logic & Workflow
*   **#17028**: Prevent Gemini 3 from auto-acting on simple questions
*   **#17108**: Auto-build and lint after every file change
*   **#17109**: Explicit 'review changes' step
*   **#17110**: Auto-run changes to validate application
*   **#17111**: Briefly describe "large" changes after completion
*   **#15032**: Prefer automated linting/formatting over manual changes

### ⚡ UX & Benchmarking
*   **#17188** & **#17189**: Auto-accept file/shell redirections
*   **#17162**: Experiment-based model control (Mendel)
*   **#17163**: Verify benchmark scores for new prompts
*   **#17035**: Re-enable `write_todos` for Gemini 3


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: System Prompting & Context Trimming vNext #15328

Overview

Key Objectives

System Prompting & Gemini 3.0

Context Management

Requirements & Technical Tasks

Behavioral Evals & Metrics

Sub Issues By Category

🤖 System Prompt & Model Instructions

🧹 Context & Resource Management

🧠 Agent Logic & Workflow

⚡ UX & Benchmarking

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Epic: System Prompting & Context Trimming vNext #15328

Description

Overview

Key Objectives

System Prompting & Gemini 3.0

Context Management

Requirements & Technical Tasks

Behavioral Evals & Metrics

Sub Issues By Category

🤖 System Prompt & Model Instructions

🧹 Context & Resource Management

🧠 Agent Logic & Workflow

⚡ UX & Benchmarking

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions