Skip to content

Fix incorrect agent API request example payload structure#29556

Merged
Sameerlite merged 1 commit into
BerriAI:litellm_oss_staging_030626from
IshaMeera:fix-agent-doc
Jun 3, 2026
Merged

Fix incorrect agent API request example payload structure#29556
Sameerlite merged 1 commit into
BerriAI:litellm_oss_staging_030626from
IshaMeera:fix-agent-doc

Conversation

@IshaMeera

Copy link
Copy Markdown
Contributor

Relevant issues
N/A

Linear ticket
N/A

Pre-Submission checklist
My PR's scope is as isolated as possible, it only solves 1 specific problem
Screenshots / Proof of Fix - Attached SS with and without agent wrapper with its respective response.
Before
The documented Swagger/OpenAPI examples for agent endpoints wrapped request payloads inside:

{
"agent": {
...
}
}
Sending the documented payload resulted in:

422 validation error
loc: ["body", "agent_name"]
After
Updated examples to use root-level request bodies:

{
"agent_name": "...",
"agent_card_params": { ... },
"litellm_params": { ... }
}
Verified locally:

Nested payload → 422 validation error
Root-level payload → 200 success
Verified updated examples through /openapi.json
Verified formatting with Black
Type
📖 Documentation
🐛 Bug Fix

Changes
Fix incorrect Agent API request examples in OpenAPI/Swagger documentation.

Root cause:
Agent endpoints accept request models directly (request: AgentConfig, request: PatchAgentRequest) and therefore expect root-level request fields rather than an outer agent object.

Updated examples for:

POST /v1/agents
PUT /v1/agents/{agent_id}
PATCH /v1/agents/{agent_id}
Also updated related example URLs for consistency:

GET /v1/agents/{agent_id}
DELETE /v1/agents/{agent_id}
Changes made:

  • Removed outer "agent" wrapper from example payloads
  • Updated /agents → /v1/agents
  • Corrected PATCH curl example to use the appropriate HTTP method
  • Cleaned example indentation
image image image

@greptile-apps

greptile-apps Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes the OpenAPI/Swagger example curl commands embedded in the agent endpoint docstrings. The examples previously wrapped request bodies in an "agent" object that the API never expected, causing 422 validation errors when users copied them verbatim.

  • Removed the outer "agent" wrapper from the POST, PUT, and PATCH example payloads so they match the AgentConfig and PatchAgentRequest flat models that FastAPI deserializes directly.
  • Updated example URLs from /agents to /v1/agents to match the actual registered router paths, and corrected the PATCH endpoint's example from curl -X PUT to curl -X PATCH.

Confidence Score: 5/5

Safe to merge — only docstring content is changed, with no functional code altered.

All changes are confined to example curl commands inside Python docstrings. The corrected payloads now match the actual FastAPI route signatures (request: AgentConfig, request: PatchAgentRequest), and the URLs and HTTP methods align with the router decorators in the same file. No runtime logic is touched.

No files require special attention.

Important Files Changed

Filename Overview
litellm/proxy/agent_endpoints/endpoints.py Docstring-only fix: removes incorrect "agent" wrapper from example request payloads, updates URLs from /agents to /v1/agents, and corrects the PATCH example's HTTP method from PUT to PATCH.

Reviews (1): Last reviewed commit: "Fix incorrect agent API request example ..." | Re-trigger Greptile

@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.00000% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/proxy/agent_endpoints/endpoints.py 75.00% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Sameerlite Sameerlite changed the base branch from litellm_internal_staging to litellm_oss_staging June 3, 2026 11:02
@Sameerlite Sameerlite requested a review from a team June 3, 2026 11:02

@Sameerlite Sameerlite left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Sameerlite Sameerlite enabled auto-merge (squash) June 3, 2026 11:03
@Sameerlite Sameerlite changed the base branch from litellm_oss_staging to litellm_oss_staging_030626 June 3, 2026 11:05
@Sameerlite Sameerlite merged commit 71c6a80 into BerriAI:litellm_oss_staging_030626 Jun 3, 2026
45 of 46 checks passed
mateo-berri added a commit that referenced this pull request Jun 3, 2026
* Fix incorrect agent API request example payload structure (#29556)

* fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs (#29427)

* fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs

On /v1/messages and other LITELLM_METADATA_ROUTES, the parent OTel span
is stored in litellm_params['litellm_metadata'] instead of
litellm_params['metadata']. When the request body contains a native
'metadata' field (e.g. Anthropic's {"user_id": "..."}),
litellm_params['metadata'] gets overwritten and the parent span is lost,
producing orphan root spans with a different trace_id.

Add fallback checks to litellm_metadata in:
- _get_span_context(): so child spans find the correct parent
- _end_proxy_span_from_kwargs(): so the proxy span gets closed

Fixes: #27934

* test(otel): tighten assertions per Greptile review

- test_span_context_metadata_takes_priority: assert litellm_metadata
  span is never accessed, proving metadata takes priority
- test_span_context_no_parent_when_neither_has_span: assert both ctx
  and detected_span are None

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* fix: remove premature end-user budget check from get_end_user_object (#29420)

* fix(proxy): remove premature end-user budget check from get_end_user_object

Problem:
- `_check_end_user_budget()` was called inside `get_end_user_object()`
- This caused budget checks to run BEFORE `skip_budget_checks` could be evaluated
- Zero-cost models (e.g., local vLLM) were incorrectly blocked when
  end-users exceeded their budget, even though they should bypass budget checks

Solution:
- Remove `_check_end_user_budget()` calls from `get_end_user_object()`
- Budget enforcement now happens exclusively in `common_checks()` where
  `skip_budget_checks` context is available
- `get_end_user_object()` keeps `route` as optional in function parameter for backwards compatibility and future implementation.

* refactor(tests): update budget enforcement tests to reflect changes in get_end_user_object

- test_get_end_user_object() verifies data fetching
- test_check_end_user_budget() verifies enforcement
- test_budget_enforcement_blocks_over_budget_users() integrates _check_end_user_budget()
- test_resolve_end_user_reraises_budget_exceeded() is now test_resolve_end_user since no budget exceeded is thrown in get_end_user_object()

* Gemini /images/generate and /images/edits billing fixes + add support for size and aspect ratio params (#29534)

* Fix Gemini image config mapping

* Address Gemini image config review

* Format Gemini image generation transform

* Fix Gemini image token usage logging

* Share Gemini image request helpers

* Fix Gemini Imagen model routing

* Fixes as per self code review

* Fixes per internal code review

* Stop gating Imagen imageSize forwarding

* Document Gemini image size mapping source

* chore: retrigger lint

* Clarify Gemini candidate count precedence

* Add Inception provider (#29522)

* add inception as provider (chat, fim)

* linting

* seperate test suite for chat and fim

* fix test coverage

* fix: model hub custom pricing model info (#29293)

* Opik user auth key metadata extractors (#28397)

* fix: enhance Opik metadata extraction to include user API key auth context fixed after refactoring to extractor logic

* test: add unit tests for OPik metadata extraction logic

* fix: enhance extract_opik_metadata function to prioritize metadata sources for improved accuracy

* fix(ci): clarified comments and edited unit tests

* test: add unit tests for OPik metadata extraction with auth and requester overrides

* fix(ui): replace fixed favicon.ico with current api get /get_favicon (#29532)

Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar>

* fix(vertex/gemini): keep tool_call reference when a text-only assistant message follows (#29561)

`_gemini_convert_messages_with_history` tracks `last_message_with_tool_calls`
so a following tool result can be matched back to its tool call. The assignment
was inside a branch guarded by
`assistant_msg.get("tool_calls", []) is not None`, which is also True for a
text-only assistant message (an empty list is not None). As a result, an
assistant message with no tool calls that appears between a tool call and its
tool result overwrote the reference, and conversion failed with:

    Exception: Missing corresponding tool call for tool response message.

This shape is common: a model emits a short narration/assistant message after a
tool call before the tool result is appended.

Only update `last_message_with_tool_calls` when the assistant message actually
carries tool_calls (or a function_call). Adds a regression test.

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models (#28572)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models

The 1-hour prompt-cache write tier
(`cache_creation_input_token_cost_above_1hr`) was added to the
us./global. variants of the Claude 4.5/4.6/4.7 family on Bedrock, but
the eu./au./jp. cross-region inference profiles were left without it.
AWS Bedrock pricing applies the same +10% regional premium across all
geo profiles, so eu./au./jp. should carry the same 1-hour rates as
us. (1.6x the 5-minute regional rate).

Without these fields, cost tracking on EU/AU/JP Bedrock 1-hour-TTL
prompt caching falls back to the 5-minute write rate and undercounts
spend by ~60% for European, Australian, and Japanese tenants.

Adds the 1-hour tier (and Sonnet 4.5's long-context >200K tier where
AWS publishes one) to 14 regional Bedrock entries in both
`model_prices_and_context_window.json` and the bundled
`model_prices_and_context_window_backup.json`:

  - eu./au.   Opus 4.6     ($11.00 / MTok)
  - eu./au.   Opus 4.7     ($11.00 / MTok)
  - eu./au./jp. Sonnet 4.6 ($6.60 / MTok)
  - eu./au./jp. Sonnet 4.5 ($6.60 / MTok regular, $13.20 / MTok LC)
  - eu./au./jp. Haiku 4.5  ($2.20 / MTok)

Also extends `tests/test_litellm/test_bedrock_anthropic_1hr_cache_pricing.py`
with a `REGIONAL_EXPECTED` parametrized block covering all 13 new
entries plus the existing 1.6x ratio invariant.

Note: `eu.anthropic.claude-opus-4-5-20251101-v1:0` carries the
wrong 5m rate today (base 6.25e-06 instead of regional 6.875e-06),
which would break the 1.6x ratio check. It is intentionally left out
of this PR so the scope stays "1-hour cache tier addition" — a
separate follow-up should correct the EU 5m rates for Opus 4.5.

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* Add 1-hour cache write pricing tier for Vertex AI Anthropic models (#28569)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add 1-hour cache write pricing tier for Vertex AI Anthropic models

GCP Vertex AI publishes a separate 1-hour cache write column for the
Claude family (1.6x the 5-minute write rate, matching the documented
Bedrock ratio). LiteLLM's Vertex AI Anthropic entries only carry the
5-minute tier, so any request that uses `cache_control: {"ttl": "1h"}`
on Vertex AI Claude is undercounted in cost tracking by ~60%.

The runtime side already supports the 1-hour tier — `VertexAIAnthropicConfig`
extends `AnthropicConfig`, populating `ephemeral_1h_input_tokens`, and
`_calculate_cache_creation_cost` reads `cache_creation_input_token_cost_above_1hr`.
Only the price registry was missing data.

Adds the field to 19 vertex_ai/claude-* entries across both
`model_prices_and_context_window.json` and the bundled
`model_prices_and_context_window_backup.json`:

  - Haiku 4.5 ($1.25 -> $2.00 / MTok)
  - Sonnet 3.7 / 4 / 4.5 / 4.6 ($3.75 -> $6.00 / MTok)
  - Opus 4.5 / 4.6 / 4.7 ($6.25 -> $10.00 / MTok)
  - Opus 4 / 4.1 ($18.75 -> $30.00 / MTok)

Adds `tests/test_litellm/test_vertex_anthropic_1hr_cache_pricing.py`
mirroring the Bedrock equivalent — pins each (5m, 1h) pair per model
and asserts the 1.6x ratio across the family.

Fixes #27781.

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* Fix Gemini multimodal function responses (#29325)

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* address greptile review: add _transform_image_usage method and model-map supports_image_size flag

- Add _transform_image_usage instance method to GoogleImageGenConfig that
  delegates to transform_gemini_image_usage, fixing the regression test
- Replace hardcoded "2.5-flash" string check in supports_gemini_image_size
  with a get_model_info lookup on supports_image_size (default true)
- Add supports_image_size: false to all gemini-2.5-flash model entries in
  model_prices_and_context_window.json so capability is controlled via the
  model map rather than embedded in code

* fix test failures: schema validation, mypy type, model info plumbing, pricing test

- Add supports_image_size to ModelInfoBase TypedDict so get_model_info surfaces it
- Pass supports_image_size through _get_model_info_helper constructor call
- Fix supports_gemini_image_size to use value is not False (None means unset, defaults to True)
- Add supports_image_size to JSON schema in test_aaamodel_prices_and_context_window_json_is_valid
- Correct gemini-3.1-flash-lite pricing assertions in test to match JSON values

* Add Azure AI Kimi K2.6 metadata (#27052)

* Add Azure AI Kimi K2.6 metadata

* Scope Kimi metadata test cost map setup

* fall back to substring check for models not in model_prices_and_context_window.json

Models like gemini-2.5-flash-image-preview are not in the pricing JSON,
so get_model_info raises. Fall back to "2.5-flash" not in model when the
JSON has no explicit supports_image_size entry for the model.

* fix(inception): don't forward global litellm.api_key to Inception FIM

Match the Inception chat config: resolve only an Inception-specific key
(param, litellm.inception_key, or INCEPTION_API_KEY) for the text-completion
FIM path. The global litellm.api_key (often an OpenAI key) was both leaking
to api.inceptionlabs.ai and taking precedence over the configured Inception
key when set.

* fix(auth): enforce end-user budget on custom-auth path that skips common_checks

get_end_user_object() no longer raises BudgetExceededError, so custom-auth
deployments with custom_auth_run_common_checks unset (which skip the
centralized common_checks gate) stopped enforcing the end-user budget,
letting an over-budget end user keep making requests. Re-enforce the
budget in _run_post_custom_auth_checks on that path.

---------

Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar>
Co-authored-by: Isha <72744901+IshaMeera@users.noreply.github.com>
Co-authored-by: aneeshsangvikar <aneeshsangvikar@fiddler.ai>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com>
Co-authored-by: Suleiman Elkhoury <108065141+suleimanelkhoury@users.noreply.github.com>
Co-authored-by: Dmitriy Alergant <93501479+DmitriyAlergant@users.noreply.github.com>
Co-authored-by: Yanis Miraoui <yanis.miraoui19@imperial.ac.uk>
Co-authored-by: Lovro Seder <vrovro@gmail.com>
Co-authored-by: Thomas Mildner <12685945+Thomas-Mildner@users.noreply.github.com>
Co-authored-by: José Luis Di Biase <josx@interorganic.com.ar>
Co-authored-by: Lai Quang Huy <64073540+1qh@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: ZHONG Ziwen <67355585+zzw-math@users.noreply.github.com>
Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants