GSoC Behavioral evals, Quality, and the OSS Community

This item is tracking a Google Summer of Code project during which a participant will work closely with the Agent Intelligence team at Google on Gemini CLI to improve the OSS community's ability to contribute to quality impacting areas of the Gemini CLI, including the prompt, tools, behavioral evals, and various pre-release guardrails.

The participant will do some or all of the following, depending on time:
- Onboard to the quality area and learn how to iterate and validate changes.
- Write and stabilize [behavioral eval](https://github.com/google-gemini/gemini-cli/blob/main/evals/README.md) tests and corresponding prompt and tool changes, to refine the product's prompt driven features.
- Write skills or subagents for helping 1st and 3rd party contributors work more effectively with evals, prompt changes, subagents, and skills.
- Identify and fix processes, documentation, logging, and tooling gaps that impede ones ability to diagnose and fix bugs in the agent, subagent, skills, prompts, and tools.
- Implement tooling for external contributors to inventory, assess gaps in, generate, validate, and improve behavioral evals tests.
- Implement tooling for generating evals or unit tests from chat logs

Finally, and most importantly -- dogfood and improve the community contribution scenario for the agent intelligence area and help us to build a more vibrant quality ecosystem with the community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSoC Behavioral evals, Quality, and the OSS Community #23331

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GSoC Behavioral evals, Quality, and the OSS Community #23331

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions