feat(provider): Add Neosantara provider as OpenAI Compatible by ErRickow · Pull Request #29646 · BerriAI/litellm

ErRickow · 2026-06-04T02:59:48Z

Relevant issues

This is re open pull request and fix from PR: #20641

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have added meaningful tests
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible; it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Ran LiteLLM proxy locally and verified Neosantara end-to-end.

Chat completions:

  curl -sS http://127.0.0.1:4000/v1/chat/completions \
    -H 'Authorization: Bearer sk-test' \
    -H 'Content-Type: application/json' \
    -d '{"model":"neosantara-chat","messages":
    [{"role":"user","content":"Reply exactly: litellm-neosantara-opus-chat-
    ok"}],"max_tokens":20}'

Returned: litellm-neosantara-opus-chat-ok

Responses API:

  curl -sS http://127.0.0.1:4000/v1/responses \
    -H 'Authorization: Bearer sk-test' \
    -H 'Content-Type: application/json' \
    -d '{"model":"neosantara-responses","input":"Reply exactly: litellm-
    neosantara-opus-responses-ok","max_output_tokens":40}'

Returned: status: completed, text: litellm-neosantara-opus-responses-ok

NOTE: some models in free tier may can't be used, use gemini-3-flash-preview, garda-code, grok-4.1-fast-non-reasoning for testing :)

Type

🆕 New Feature
🧹 Refactoring
📖 Documentation
✅ Test

Changes

docs/my-website/docs/providers/neosantara.md
litellm/llms/openai_like/providers.json
litellm/types/utils.py
provider_endpoints_support.json
tests/test_litellm/llms/neosantara/test_neosantara.py

[Infra] Promote internal staging to main

chore(ci): promote internal staging to main

codspeed-hq · 2026-06-04T03:02:05Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing neosantara-xyz:litellm_neosantara_provider (d8b4b80) with main (5be0797)}

greptile-apps · 2026-06-04T03:02:14Z

Greptile Summary

This PR integrates Neosantara as an OpenAI-compatible provider by following the JSON-registry pattern already used by providers such as crusoe and chutes. No custom HTTP handler is introduced; all routing reuses the existing openai_like infrastructure.

Adds a neosantara entry to providers.json (base URL, env-var names, max_completion_tokens→max_tokens mapping, and supported endpoints) and a matching NEOSANTARA enum value in LlmProviders.
Registers the provider's endpoint capabilities in provider_endpoints_support.json (chat completions + responses API enabled).
Adds a mock-only test suite covering registry lookup, dynamic config, provider-prefix routing, URL construction, parameter mapping, and responses-API config.

Confidence Score: 5/5

Safe to merge; changes are additive and isolated to the JSON provider registry, the LlmProviders enum, the endpoint-support manifest, and a new mock test file — no existing behavior is altered.

The implementation uses the same data-driven registry pattern as other recently-added OpenAI-compatible providers, introduces no custom HTTP logic, and the tests are mock-only. The only gaps are a missing init.py in the test directory and the absence of a default-base-URL test case, neither of which affects runtime correctness.

tests/test_litellm/llms/neosantara/test_neosantara.py — missing init.py and a default-fallback test case that peer providers include.

Important Files Changed

Filename	Overview
litellm/llms/openai_like/providers.json	Adds the neosantara provider entry following the established JSON pattern; includes base URL, env var names, max_completion_tokens→max_tokens mapping, and supported_endpoints.
litellm/types/utils.py	Adds NEOSANTARA = "neosantara" to the LlmProviders enum between CHUTES and XIAOMI_MIMO; minimal, correct change.
provider_endpoints_support.json	Adds neosantara endpoint support entry; the docs URL references a page that may not be live until the docs-repo PR is merged.
tests/test_litellm/llms/neosantara/test_neosantara.py	Mock-only test suite covering registry, env-var config, provider detection, URL generation, param mapping, and responses API; missing init.py and a default-base-URL test that peer providers include.

_{Reviews (2): Last reviewed commit: "Address Neosantara provider review feedb..." | Re-trigger Greptile}

ErRickow · 2026-06-04T03:03:11Z

@mateo-berri hi, reopen and pull to main. Let me know if there's any issues or mybe need more information about neosantara or about the integration 👍🏻

codecov · 2026-06-04T03:03:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad947e101a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T03:10:26Z

+os.environ["NEOSANTARA_API_KEY"] = ""  # your Neosantara API key
+
+response = litellm.responses(
+    model="neosantara/claude-opus-4-6",


Use a Neosantara-listed Responses model

When users copy this Responses example, it sends claude-opus-4-6, but Neosantara's own Models Overview does not list that model ID; the available Anthropic IDs there are in the claude-4.5-* family. In that scenario the new provider docs guide users to an immediate unknown-model/provider rejection even though their LiteLLM configuration is otherwise correct; the proxy sample below repeats the same ID.

Useful? React with 👍 / 👎.

it's fine, the actual model is live on api.neosantara.xyz/v1/models, the docs got slow update about the model ID

chatgpt-codex-connector · 2026-06-04T03:10:26Z

        "interactions": true
      }
    },
+    "neosantara": {


Add Neosantara to the packaged endpoint support file

This only adds Neosantara to the root support JSON, but the proxy endpoint that serves supported endpoints reads litellm/provider_endpoints_support_backup.json via public_endpoints.py:146; I checked that packaged file with rg and it still has no neosantara entry. As a result, /public/supported_endpoints will continue omitting this new provider even though this file says chat and responses are supported.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-04T03:10:26Z

+  "neosantara": {
+    "base_url": "https://api.neosantara.xyz/v1",
+    "api_key_env": "NEOSANTARA_API_KEY",
+    "api_base_env": "NEOSANTARA_API_BASE",
+    "supported_endpoints": ["/v1/chat/completions", "/v1/responses"]


Map max_completion_tokens to max_tokens

For callers that use LiteLLM's OpenAI chat parameter max_completion_tokens, this JSON config inherits the default OpenAI GPT supported-params list and forwards max_completion_tokens unchanged because no param_mappings entry is defined. Neosantara's chat-completions documentation lists max_tokens as the generation limit parameter, so those otherwise valid LiteLLM calls can be rejected by the provider; add the same max_completion_tokens → max_tokens mapping used by neighboring JSON providers.

Useful? React with 👍 / 👎.

Sameerlite · 2026-06-04T12:26:51Z

@ErRickow Can you get the score to 5/5?

ErRickow · 2026-06-04T13:02:21Z

@Sameerlite yup, wait a minute

ErRickow · 2026-06-04T13:14:08Z

@greptileai re review

ErRickow · 2026-06-05T09:55:09Z

@Sameerlite hi, the score already 5/5. If you need any details or need some changes again, feel free to discuss

Sameerlite · 2026-06-05T10:21:37Z

@ErRickow can you resolve conflicts? Thanks!

…AI/litellm into litellm_neosantara_provider # Conflicts: # litellm/llms/openai_like/providers.json

CLAassistant · 2026-06-05T10:41:12Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ yuneng-berri
✅ ErRickow
❌ shin-berri
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

ErRickow · 2026-06-05T10:49:02Z

@Sameerlite hi, already resolved the conflict, you can check now

* Add Neosantara provider * Register Neosantara provider enum * Address Neosantara provider review feedback * Add Neosantara packaged endpoint support --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* Mark xAI models retiring on 2026-05-15 (#28788) Per https://docs.x.ai/developers/migration/may-15-retirement, xAI is retiring the following slugs on 2026-05-15 (auto-redirect to grok-4.3 with various reasoning efforts; callers continuing to use the old slugs will be billed at grok-4.3 pricing): grok-4-1-fast-reasoning{,-latest} -> grok-4.3 (low effort) grok-4-1-fast-non-reasoning{,-latest} -> grok-4.3 (none) grok-4-fast-reasoning -> grok-4.3 (low effort) grok-4-fast-non-reasoning -> grok-4.3 (none) grok-4-0709 -> grok-4.3 (low effort) grok-code-fast-1{,-0825} -> grok-build-0.1 grok-3 -> grok-4.3 (none) Only the direct xai/ slugs are tagged; third-party hosts (azure_ai, oci, vercel_ai_gateway, perplexity/xai) run their own schedules. The grok-3 retirement list explicitly names only the base grok-3 slug — the -mini / -fast / -beta / -latest variants are not listed, so they remain untouched. * feat(moonshot): advertise json_schema response support on live models (#29683) litellm.responses() already routes Moonshot through the responses->chat-completions bridge, and Moonshot honors response_format json_schema on chat completions. The cost-map entries left supports_response_schema unset, so discovery layers that gate on that flag dropped Moonshot from structured-output / responses listings even though the capability works end to end. Set supports_response_schema on the nine models currently live on api.moonshot.ai: kimi-k2.5, kimi-k2.6, the moonshot-v1 8k/32k/128k text and vision-preview variants, and moonshot-v1-auto. Verified against the live API that each honors json_schema and that litellm.responses() returns schema-valid structured output through the bridge. * chore(moonshot): mark models retired from api.moonshot.ai as deprecated (#29685) Thirteen Moonshot/Kimi models in the cost map no longer resolve on api.moonshot.ai (all return 404). Stamp each with its deprecation_date from platform.kimi.ai/docs/models rather than deleting the entries, so historical cost calculation keeps resolving the names while tooling can surface the retirement. Dates: kimi-thinking-preview 2025-11-11; kimi-latest and its 8k/32k/128k context variants 2026-01-28; the kimi-k2 preview/turbo/thinking series 2026-05-25; the moonshot-v1 -0430 snapshots use their own 2024-04-30 snapshot date (Moonshot publishes no discontinuation date for them). * fix(moonshot): drop temperature for reasoning models (kimi-k2.5/k2.6) (#29687) Kimi reasoning models reject every temperature except 1; a request with temperature=0.2 returns "invalid temperature: only 1 is allowed for this model". litellm only clamped temperature into [0.3, 1], so any value below 1 still 400'd. Drop the temperature param entirely for reasoning models (gated on supports_reasoning, the same signal transform_request already uses) so the model default is used; the non-reasoning moonshot-v1 models keep the existing clamp. Co-authored-by: Sameer Kankute <sameer@berri.ai> * feat(mcp): add per-server timeout configuration (#29672) * feat(mcp): add per-server timeout configuration * fix(mcp): address timeout field review comments - use is not None guard instead of or for 0.0 edge case - copy timeout in both LiteLLM_MCPServerTable constructions (health check path + _build_mcp_server_table) - add timeout Float? column to all three schema.prisma files - extend round-trip test to cover _build_mcp_server_table direction - add test for zero timeout not treated as falsy * fix(mcp): forward timeout in _build_temporary_mcp_server_record * fix(mcp): return 504 instead of 500 when per-server timeout fires * test(mcp): add 504 timeout regression test; fix black formatting * Add jp. Bedrock cross-region inference profile for claude-opus-4-7 (#28567) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add jp. Bedrock cross-region inference profile for claude-opus-4-7 AWS Bedrock documents jp.anthropic.claude-opus-4-7 alongside the existing us./eu./au./global. profiles for Claude Opus 4.7 (ap-northeast-1 Tokyo / ap-northeast-3 Osaka), but the entry is missing from model_prices_and_context_window.json. Tokyo-region users currently get an "unknown model" error when routing through the JP geo profile. Adds the entry to both the canonical file and the bundled backup, mirroring the recent pattern for sonnet-4-6 (#27831). Pricing matches the other regional profiles (10% premium over base/global). Regression test pins all six documented profiles (base, global, us, eu, au, jp) and asserts pricing parity between jp. and au. variants. Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-7.html --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * feat(soniox): add soniox audio transcription integration (#29508) * feat(openmeter): add OPENMETER_TRUST_REQUEST_USER to prevent forged attribution (#29650) The OpenMeter callback resolves the CloudEvent subject from kwargs["user"] first, then falls back to the key-bound user_api_key_user_id. For multi-tenant proxy deployments, a client can set `"user": "..."` in the request body and cause their usage to be attributed to that arbitrary string — a billing-attribution forgery risk. Adds OPENMETER_TRUST_REQUEST_USER env var (default "true" for backward compatibility). When set to "false", the request-supplied `user` field is ignored and the subject is resolved solely from user_api_key_user_id. Matches the existing env-var-driven config pattern in this file (OPENMETER_API_KEY, OPENMETER_API_ENDPOINT, OPENMETER_EVENT_TYPE). * feat(search): add you_com as a search provider (#28370) * feat(search): add you_com as a search provider Registers You.com Search API as a first-class `search_provider` in the `search_tools` registry, alongside Tavily, Exa, Perplexity, etc. - New adapter: litellm/llms/you_com/search/transformation.py - POSTs to https://ydc-index.io/v1/search - Auth: X-API-Key from YOUCOM_API_KEY (or explicit api_key) - Maps Perplexity unified spec: max_results -> count, search_domain_filter -> include_domains, country -> country - Flattens results.web + results.news into a single SearchResult list; snippet prefers snippets[0], falls back to description; page_age -> date - Registry: SearchProviders.YOU_COM in litellm/types/utils.py and wired into ProviderConfigManager.get_provider_search_config() - Pricing entry: model_prices_and_context_window.json (placeholder $0.0; happy to adjust to maintainers' preferred public number) - Docs: example router config snippet and example proxy yaml updated - Tests: tests/search_tests/test_you_com_search.py - 5 mocked tests (payload shape, domain filter mapping, snippet fallback, news flattening, missing-api-key error) Refs upstream expansion signal: #15942 * review fixups: normalize api_base, lowercase country, scope env-var to test Addresses Greptile inline review comments on #28370: - get_complete_url: strip trailing slashes from api_base *before* the endswith("/v1/search") check, so a custom base like ".../v1/search/" doesn't become ".../v1/search/v1/search". - transform_search_request: .lower() country before sending, matching Tavily's convention so callers using the unified spec form ("US") get consistent behavior across providers. - Tests: replace direct os.environ writes with an autouse monkeypatch fixture so YOUCOM_API_KEY is set per-test and removed afterwards. The missing-key test now uses monkeypatch.delenv. New test asserts the trailing-slash normalization above. Reverts the ARCHITECTURE.md / example yaml edits per the reviewer note that documentation changes belong in the litellm-docs repo. * support keyless free tier (api.you.com/v1/agents/search) as default You.com offers an IP-throttled keyless endpoint that returns the same response shape as the keyed one (~100 queries/day, no signup). This is a significant onboarding lever - mirrors the keyless DuckDuckGo/SearXNG providers already in the search_tools registry. Behavior: - YOUCOM_API_KEY set -> keyed: POST https://ydc-index.io/v1/search (X-API-Key header) - no key -> free: POST https://api.you.com/v1/agents/search (no auth) - YOUCOM_API_BASE override -> honored as-is Tests: - New: test_you_com_search_keyless_free_tier - asserts URL + absence of X-API-Key when no key is configured. - New: test_you_com_search_validate_environment_keyless - asserts the config no longer raises when the key is absent. - Removed: test_you_com_search_raises_without_api_key (the precondition no longer holds). - Existing payload/domain-filter/etc tests still cover keyed mode via the autouse YOUCOM_API_KEY fixture. Verified both endpoints accept POST + return identical JSON shape: results.web[] / results.news[] with title, url, snippets, description, page_age. * register you_com in provider_endpoints_support.json Adding `litellm/llms/you_com/` requires a corresponding entry in provider_endpoints_support.json or the code-quality/check_provider_folders_documented CI check fails. Follows the compact tavily/serper pattern - endpoints: { search: true }. Local run of the check now reports "All 114 provider folders are documented". * move tests under tests/test_litellm/llms/ so CI exercises them The litellm CI workflows scope unit tests to `tests/test_litellm/...` (see test-unit-llm-providers.yml: `tests/test_litellm/llms` path), so tests living under `tests/search_tests/` are never run in CI - which is why codecov reports 0% patch coverage for the new adapter even though the unit tests exist and pass locally. Move test_you_com_search.py into `tests/test_litellm/llms/you_com/` so the test-unit-llm-providers job picks it up. 7/7 tests still pass at the new location. (Sibling search-only providers - tavily, exa_ai, brave, etc. - still live only in `tests/search_tests/` and would benefit from the same move, but that is out of scope for this PR.) * fix(you_com): pin Accept-Encoding: identity to dodge keyless gzip bug The keyless free-tier endpoint (api.you.com/v1/agents/search) advertises Content-Encoding: gzip but returns a body that httpx's decoder rejects with `zlib.error: Error -3 while decompressing data: incorrect header check`, surfacing as litellm.APIConnectionError in user code. curl works because it doesn't request compression by default. Pin Accept-Encoding: identity in validate_environment so the upstream server skips compression entirely. Harmless on the keyed endpoint (ydc-index.io/v1/search) which negotiates content-encoding correctly. The header uses setdefault so a caller-supplied Accept-Encoding still takes precedence. (Server-side bug has been flagged to the You.com team separately - once fixed there, this workaround can be removed.) New unit test: test_you_com_search_pins_identity_accept_encoding. --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * docs: fix README typo (#29419) Correct clear spelling mistakes in documentation without changing behavior. Confidence: high Scope-risk: narrow Tested: git diff --check; uvx codespell on changed files Not-tested: Full docs build not run; text-only changes * Fix(langfuse): pass httpx_client to Langfuse in langfuse_prompt_management to respect SSL_VERIFY (#29480) * fix(langfuse): pass ssl_verify to Langfuse httpx client * fix_langfuse_ * add unit tests * addressed comments --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * feat(models): add minimax/MiniMax-M3 to model cost map (#29412) Add MiniMax's new flagship MiniMax-M3 to the native minimax provider: 512K context, 128K max output, native multimodal (supports_vision), reasoning, prompt caching. Pricing (USD/M tokens): input 0.6 / output 2.4 / cache read 0.12. M3 has no active prompt-cache-write tier, so cache_creation_input_token_cost is omitted. Updated both the root model_prices_and_context_window.json (remote source) and the bundled litellm/model_prices_and_context_window_backup.json (local fallback), keeping them in sync. * fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log (#29394) * fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log * fix(logging): extend terminal event handling to ResponseIncompleteEvent and ResponseFailedEvent; fix return type annotation * feat(provider): Add Neosantara provider as OpenAI Compatible (#29646) * Add Neosantara provider * Register Neosantara provider enum * Address Neosantara provider review feedback * Add Neosantara packaged endpoint support --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix: address greptile and veria review feedback - langfuse: guard httpx_client injection behind version check (>= 2.7.3) - soniox: propagate audio_transcription_duration in _hidden_params for spend tracking - soniox: give SONIOX_API_BASE env var priority over caller-supplied api_base - mcp: replace CancelledError catch with asyncio.wait_for + TimeoutError * chore(mcp): add migration for per-server timeout column * fix(test): add tool_use_system_prompt_tokens to model prices schema validator * fix: mcp timeout test uses real asyncio.wait_for timeout; you_com get_complete_url respects resolved api_key * fix: forward resolved api_key into you_com endpoint selection and apply timeout to soniox polling GETs The search flow resolves api_key in validate_environment but never passed it into get_complete_url, so a programmatic api_key (with no YOUCOM_API_KEY in the env) set the X-API-Key header yet still selected the keyless free-tier endpoint. Forward api_key through both the search entrypoint and the http handler so the keyed endpoint is chosen. HTTPHandler.get/AsyncHTTPHandler.get had no timeout parameter, so the Soniox poll and transcript-fetch GETs silently used the client global default instead of the caller timeout. Add a per-request timeout to get() and forward the configured timeout from the Soniox handler. * fix(soniox): price stt-async-v4 per second so transcriptions are billed The handler stores audio_transcription_duration in _hidden_params, but the model carried only token cost fields and the response has no token usage, so the transcription cost path fell through to cost_per_second and returned $0. An authenticated caller could transcribe Soniox audio without decrementing their budget. Switch the entry to output_cost_per_second at Soniox's published $0.10/hour async rate so the stored duration produces a real charge. * fix(langfuse): use a dedicated httpx client for the SDK injection The httpx_client handed to the Langfuse SDK came from _get_httpx_client(), which returns LiteLLM's globally cached HTTPHandler. If Langfuse closed that client on teardown it would invalidate the shared client used by every other LiteLLM HTTP call. Build a dedicated httpx.Client instead, still resolving SSL verification and client certificate from LiteLLM's configuration. * fix(soniox): prefer caller-supplied api_base over SONIOX_API_BASE env var * fix(cohere): support max_completion_tokens on cohere v2 chat (default route) (#29779) * fix(cohere): support max_completion_tokens on cohere v2 chat The default cohere_chat route resolves to CohereV2ChatConfig, which did not list or map max_completion_tokens, so get_optional_params raised UnsupportedParamsError for the standard OpenAI parameter (the modern replacement for the deprecated max_tokens). The v1 config already maps it to cohere's max_tokens; mirror that in v2 and add v2 regression tests. * fix(cohere): make max_completion_tokens take precedence over max_tokens on v2 When both max_tokens and max_completion_tokens are supplied, prefer max_completion_tokens explicitly rather than relying on dict iteration order, and cover both orderings with a regression test. --------- Co-authored-by: Daniel Yudelevich <4537920+yudelevi@users.noreply.github.com> Co-authored-by: hectorc98 <hector.chamorroalvarez@adyen.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Dan Lemon <dan@danlemon.com> Co-authored-by: Saswat <saswatds@users.noreply.github.com> Co-authored-by: Brian Sparker <brainsparker@users.noreply.github.com> Co-authored-by: Zhao73 <156770117+Zhao73@users.noreply.github.com> Co-authored-by: Urain Ahmad Shah <60431964+urainshah@users.noreply.github.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: kape <168134658+kapelame@users.noreply.github.com> Co-authored-by: danisalvaa <159898202+danisalvaa@users.noreply.github.com> Co-authored-by: Just R <remixingmagelang@gmail.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>

shin-berri and others added 9 commits May 13, 2026 22:37

Merge pull request BerriAI#27906 from BerriAI/litellm_internal_staging

e58a561

[Infra] Promote internal staging to main

Merge pull request BerriAI#28100 from BerriAI/litellm_internal_staging

a72414a

[Infra] Promote internal staging to main

Merge pull request BerriAI#28292 from BerriAI/litellm_internal_staging

79b4578

chore(ci): promote internal staging to main

Merge pull request BerriAI#28680 from BerriAI/litellm_internal_staging

35f6961

chore(ci): promote internal staging to main

Merge pull request BerriAI#28709 from BerriAI/litellm_internal_staging

06f6cfc

chore(ci): promote internal staging to main

Merge pull request BerriAI#29243 from BerriAI/litellm_internal_staging

a021a5b

chore(ci): promote internal staging to main

Merge pull request BerriAI#29372 from BerriAI/litellm_internal_staging

5be0797

chore(ci): promote internal staging to main

Add Neosantara provider

acec6bc

Register Neosantara provider enum

ad947e1

greptile-apps Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread docs/my-website/docs/providers/neosantara.md Outdated

Comment thread litellm/llms/openai_like/providers.json

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

Address Neosantara provider review feedback

d8b4b80

Sameerlite approved these changes Jun 5, 2026

View reviewed changes

Sameerlite changed the base branch from main to litellm_oss_staging_050626 June 5, 2026 10:21

Merge branch 'litellm_oss_staging_050626' of https://github.com/Berri…

9b6ad57

…AI/litellm into litellm_neosantara_provider # Conflicts: # litellm/llms/openai_like/providers.json

Add Neosantara packaged endpoint support

d2a9290

Sameerlite merged commit a419c7b into BerriAI:litellm_oss_staging_050626 Jun 5, 2026
44 of 45 checks passed

abhicris mentioned this pull request Jun 11, 2026

feat(provider): Add Hanzo provider as OpenAI Compatible #30197

Closed

6 tasks

ErRickow mentioned this pull request Jun 14, 2026

docs(neosantara): add provider documentation BerriAI/litellm-docs#309

Open

Uh oh!

Conversation

ErRickow commented Jun 4, 2026

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

codspeed-hq Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

ErRickow commented Jun 4, 2026

Uh oh!

codecov Bot commented Jun 4, 2026

Codecov Report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ErRickow Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Sameerlite commented Jun 4, 2026

Uh oh!

ErRickow commented Jun 4, 2026

Uh oh!

ErRickow commented Jun 4, 2026

Uh oh!

ErRickow commented Jun 5, 2026

Uh oh!

Sameerlite commented Jun 5, 2026

Uh oh!

CLAassistant commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ErRickow commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codspeed-hq Bot commented Jun 4, 2026 •

edited

Loading

greptile-apps Bot commented Jun 4, 2026 •

edited

Loading

CLAassistant commented Jun 5, 2026 •

edited

Loading