Skip to content

Re-enable write_todos tool for Gemini 3 family #17035

@NTaylorMullen

Description

@NTaylorMullen

Part of #15328

Background

The write_todos tool was initially disabled for the Gemini 3 model family because it was found to be a "net-negative" in its current form. However, we want to re-enable planning in a way that aligns with the 'Conductor' philosophy and the strategy defined in the system prompt overhaul (ntm/sys.prompt.overhaul).

Goals

Re-enable planning for Gemini 3 as a streamlined, incremental process that encourages piecemeal task execution and higher-quality outputs.

Refined Proposed Changes

  • Conductor Orchestration: Align the tool usage with the "Strategy" section of the system prompt overhaul. The tool should act as a 'Conductor' (similar to the Gemini CLI extension), encouraging the model to break down tasks into small, verifiable chunks rather than attempting large, monolithic changes.
  • Ecosystem Streamlining: Ensure the planning mechanism is compatible with existing tech (like the state snapshot logic used in history compression). The goal is to make 'creating a plan' and 'updating a plan' a first-class, lightweight operation.
  • Incremental Planning API: Move away from "replace-all" semantics. Provide a way to push, pop, or update specific sub-goals in the plan to reduce token overhead and cognitive load on the model.
  • Piecemeal Execution Loop: Update system instructions to explicitly link the plan/todo list to the execution loop: Plan -> Act -> Validate -> Update Plan.
  • UI Integration: Keep the TodoTray as the user-facing reflection of this 'Conductor' state.

Evaluation

  • Measure if the 'Conductor' feel leads to a higher success rate on complex coding evals by preventing the model from "getting lost" in long execution cycles.
  • Verify that incremental updates to the plan are reflected correctly in the CLI UI.

Metadata

Metadata

Assignees

Labels

area/agentIssues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Qualityworkstream-rollupLabel used to tag epics and features that are associated with one of the three primary workstreams🔒 maintainer only⛔ Do not contribute. Internal roadmap item.

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions