Skip to content

Add soniox provider#29508

Merged
Sameerlite merged 1 commit into
BerriAI:litellm_oss_staging_050626from
dan2k3k4:add-soniox-provider
Jun 5, 2026
Merged

Add soniox provider#29508
Sameerlite merged 1 commit into
BerriAI:litellm_oss_staging_050626from
dan2k3k4:add-soniox-provider

Conversation

@dan2k3k4

@dan2k3k4 dan2k3k4 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Relevant issues

Fixes #29507

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have added meaningful tests
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible; it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

image image

I have a proof-of-concept tool for testing transcription of audio into SRT and then into another language (if needed), so passing response_format=srt to the request should result in a subtitles format:

image

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

@dan2k3k4

dan2k3k4 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@greptileai

@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds Soniox as a new LiteLLM provider, supporting async speech-to-text transcription via Soniox's multi-step REST API (upload → create → poll → fetch → cleanup). The integration is well-structured: it follows established provider patterns, uses litellm's shared HTTP clients on both sync and async paths (including ssl_verify propagation), correctly passes model_response through to response building, and ships comprehensive mock-only unit tests.

  • New handler (handler.py) orchestrates the polling pipeline with server-side-clamped poll intervals and best-effort cleanup; response transformation supports plain text, SRT, VTT, and verbose JSON formats synthesized from Soniox token timestamps.
  • Provider is fully registered across LlmProviders enum, models_by_provider, lazy-import registry, model_prices_and_context_window.json, provider_endpoints_support.json, and the proxy UI credentials form.

Confidence Score: 5/5

New provider addition with no modifications to existing behaviour; safe to merge.

The change is purely additive — all existing code paths are untouched. The new Soniox handler correctly uses litellm's shared HTTP clients, propagates ssl_verify on both sync and async paths, and passes model_response through to response construction. Tests are mock-only and cover the full polling lifecycle. The only minor issue is a missing defensive guard on the poll_interval float conversion, which has no impact on correctness for valid inputs.

No files require special attention.

Important Files Changed

Filename Overview
litellm/llms/soniox/audio_transcription/handler.py New handler orchestrating Soniox's multi-step async transcription (upload → create → poll → fetch → cleanup) on both sync and async paths; both paths correctly use litellm's shared HTTP clients with ssl_verify propagation.
litellm/llms/soniox/audio_transcription/transformation.py Transformation config mapping OpenAI params to Soniox-native params; handles language, response_format (srt/vtt/verbose_json), and webhook secret redaction correctly.
litellm/llms/soniox/common_utils.py Shared utilities: SRT/VTT subtitle rendering from token timestamps, API key/base resolution, poll-interval clamping constants — all well-bounded.
litellm/main.py Adds soniox dispatch branch in transcription() following the same pattern as nvidia_riva; type-ignore on provider_config is intentional and safe at runtime.
litellm/init.py Registers soniox_models set, adds it to model_list and models_by_provider — consistent with all other provider registrations.
litellm/types/utils.py Adds SONIOX to LlmProviders enum — single-line, correct placement.
model_prices_and_context_window.json Adds soniox/stt-async-v4 model entry with pricing, mode, and supported endpoints; no unrelated modifications in this version of the diff.
tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_handler.py Comprehensive mock-based tests covering sync/async flows, upload, polling, cleanup, edge cases — no real network calls.
tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_transformation.py Tests transformation config including param mapping, response building, SRT/VTT rendering, and error handling — good coverage.
tests/test_litellm/llms/soniox/test_soniox_provider_registration.py Tests provider enum registration, model list inclusion, lazy import, and get_llm_provider resolution — all mock-based.

Reviews (6): Last reviewed commit: "feat(soniox): add soniox audio transcrip..." | Re-trigger Greptile

Comment thread litellm/llms/soniox/README.md Outdated
Comment thread provider_endpoints_support.json
Comment thread litellm/llms/soniox/audio_transcription/handler.py Outdated
Comment thread litellm/llms/soniox/audio_transcription/handler.py
@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds Soniox as a new audio transcription provider, implementing a multi-step async pipeline (file upload → create job → poll → fetch transcript → cleanup) behind LiteLLM's standard litellm.transcription() interface. The Soniox-specific code is well-structured, uses existing LiteLLM HTTP handlers, and is covered by comprehensive mock-only tests.

  • The core Soniox provider files (handler.py, transformation.py, common_utils.py) are clean and follow established LiteLLM patterns; the litellm/main.py dispatch and provider registration hooks are correct.
  • Both model_prices_and_context_window.json and its backup have unintended side-effects from JSON reformatting: \"supports_reasoning\": true was silently removed from bedrock/us-east-1/minimax.minimax-m2.5, bedrock/us-west-2/minimax.minimax-m2.5, and minimax.minimax-m2.5 (bedrock_converse), which would break any caller relying on reasoning for those models.
  • provider_endpoints_support.json also lost the a2a and interactions fields from the charity_engine entry as a formatting side-effect.

Confidence Score: 3/5

Safe to merge for Soniox functionality, but the JSON reformatting removed supports_reasoning from existing bedrock minimax models, which is a regression for those users.

The Soniox provider implementation itself is solid. The concern is in the JSON files: supports_reasoning: true was dropped from three existing bedrock/minimax model entries as a side-effect of reformatting, and two endpoint fields were removed from charity_engine. These unrelated changes could silently break users who rely on reasoning with those models.

model_prices_and_context_window.json and litellm/model_prices_and_context_window_backup.json — the minimax model entries need supports_reasoning restored; provider_endpoints_support.json needs the charity_engine a2a/interactions fields restored.

Important Files Changed

Filename Overview
litellm/llms/soniox/audio_transcription/handler.py New handler orchestrating Soniox's multi-step async transcription flow (upload → create → poll → fetch → cleanup) for both sync and async paths; uses existing LiteLLM HTTP handlers correctly and applies cleanup in finally blocks.
litellm/llms/soniox/audio_transcription/transformation.py New transformation config mapping OpenAI params to Soniox equivalents and building TranscriptionResponse from Soniox payloads; logic is sound.
litellm/llms/soniox/common_utils.py Shared constants, exception class, and utility helpers for Soniox; straightforward and correct.
litellm/main.py Adds Soniox dispatch branch to transcription(); follows the same pattern as the nvidia_riva/elevenlabs branches.
model_prices_and_context_window.json Adds soniox/stt-async-v4 entry, but JSON reformatting also silently removes supports_reasoning: true from bedrock/us-east-1/minimax.minimax-m2.5, bedrock/us-west-2/minimax.minimax-m2.5, and minimax.minimax-m2.5 (bedrock_converse) — a likely regression.
litellm/model_prices_and_context_window_backup.json Mirrors the main pricing JSON changes, including the same unintended supports_reasoning removals for minimax models.
provider_endpoints_support.json Adds soniox entry correctly, but also removes a2a and interactions fields from the charity_engine entry as a formatting side-effect.
litellm/llms/soniox/README.md User-facing documentation added inside the litellm package; per contribution rules this should live in the litellm-docs repo instead.
litellm/init.py Registers soniox_models set and adds it to model_list/models_by_provider; follows existing patterns correctly.
litellm/litellm_core_utils/get_llm_provider_logic.py Adds soniox branch to resolve api_base and api_key from env vars; consistent with other provider branches.
tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_handler.py Comprehensive mock-only tests covering upload, poll, cleanup, error, and async flows; no real network calls.
tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_transformation.py Unit tests for the transformation config covering param mapping, response building, and token rendering.

Reviews (2): Last reviewed commit: "Merge branch 'BerriAI:litellm_internal_s..." | Re-trigger Greptile

Comment thread model_prices_and_context_window.json
Comment thread litellm/llms/soniox/README.md Outdated
Comment thread provider_endpoints_support.json Outdated
@dan2k3k4 dan2k3k4 requested a review from a team June 2, 2026 16:23
@dan2k3k4 dan2k3k4 force-pushed the add-soniox-provider branch 2 times, most recently from aa05908 to 7127e01 Compare June 2, 2026 16:30
@dan2k3k4

dan2k3k4 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread litellm/llms/soniox/audio_transcription/handler.py
Comment thread litellm/llms/soniox/audio_transcription/handler.py Outdated
Comment thread litellm/llms/soniox/audio_transcription/handler.py Outdated
@veria-ai

veria-ai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

PR overview

All previously flagged issues have been addressed. No open security concerns remain on this pull request.

Security review

No open security issues remain on this pull request.

Fixed/addressed: 3 · PR risk: 0/10

@dan2k3k4 dan2k3k4 force-pushed the add-soniox-provider branch 2 times, most recently from b33a5df to 175d129 Compare June 3, 2026 10:57
Comment thread litellm/llms/soniox/audio_transcription/handler.py Outdated
@dan2k3k4 dan2k3k4 force-pushed the add-soniox-provider branch 4 times, most recently from e8a41c9 to c276b12 Compare June 3, 2026 13:37
@dan2k3k4

dan2k3k4 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

@greptileai

@dan2k3k4 dan2k3k4 force-pushed the add-soniox-provider branch 3 times, most recently from d9ec779 to 480958b Compare June 4, 2026 19:04
@dan2k3k4 dan2k3k4 requested a review from Sameerlite June 4, 2026 21:17
@Sameerlite

Copy link
Copy Markdown
Collaborator

@greptileai

@Sameerlite

Copy link
Copy Markdown
Collaborator

The concern is in the JSON files: supports_reasoning: true was dropped from three existing bedrock/minimax model entries as a side-effect of reformatting, and two endpoint fields were removed from charity_engine. These unrelated changes could silently break users who rely on reasoning with those models.

Please fix this

@dan2k3k4 dan2k3k4 force-pushed the add-soniox-provider branch from 480958b to 0b51950 Compare June 5, 2026 09:29
@dan2k3k4

dan2k3k4 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

The concern is in the JSON files: supports_reasoning: true was dropped from three existing bedrock/minimax model entries as a side-effect of reformatting, and two endpoint fields were removed from charity_engine. These unrelated changes could silently break users who rely on reasoning with those models.

Please fix this

Added a fix, squashed my commits into one and force pushed

@Sameerlite

Copy link
Copy Markdown
Collaborator

@greptileai

@Sameerlite Sameerlite changed the base branch from litellm_internal_staging to litellm_oss_staging_050626 June 5, 2026 10:28
@Sameerlite Sameerlite merged commit 38f2660 into BerriAI:litellm_oss_staging_050626 Jun 5, 2026
70 checks passed
@dan2k3k4 dan2k3k4 deleted the add-soniox-provider branch June 5, 2026 11:21
mateo-berri added a commit that referenced this pull request Jun 5, 2026
* Mark xAI models retiring on 2026-05-15 (#28788)

Per https://docs.x.ai/developers/migration/may-15-retirement, xAI is
retiring the following slugs on 2026-05-15 (auto-redirect to grok-4.3
with various reasoning efforts; callers continuing to use the old slugs
will be billed at grok-4.3 pricing):

  grok-4-1-fast-reasoning{,-latest}      -> grok-4.3 (low effort)
  grok-4-1-fast-non-reasoning{,-latest}  -> grok-4.3 (none)
  grok-4-fast-reasoning                  -> grok-4.3 (low effort)
  grok-4-fast-non-reasoning              -> grok-4.3 (none)
  grok-4-0709                            -> grok-4.3 (low effort)
  grok-code-fast-1{,-0825}               -> grok-build-0.1
  grok-3                                 -> grok-4.3 (none)

Only the direct xai/ slugs are tagged; third-party hosts (azure_ai,
oci, vercel_ai_gateway, perplexity/xai) run their own schedules. The
grok-3 retirement list explicitly names only the base grok-3 slug — the
-mini / -fast / -beta / -latest variants are not listed, so they remain
untouched.

* feat(moonshot): advertise json_schema response support on live models (#29683)

litellm.responses() already routes Moonshot through the responses->chat-completions
bridge, and Moonshot honors response_format json_schema on chat completions. The
cost-map entries left supports_response_schema unset, so discovery layers that gate
on that flag dropped Moonshot from structured-output / responses listings even though
the capability works end to end.

Set supports_response_schema on the nine models currently live on api.moonshot.ai:
kimi-k2.5, kimi-k2.6, the moonshot-v1 8k/32k/128k text and vision-preview variants,
and moonshot-v1-auto. Verified against the live API that each honors json_schema and
that litellm.responses() returns schema-valid structured output through the bridge.

* chore(moonshot): mark models retired from api.moonshot.ai as deprecated (#29685)

Thirteen Moonshot/Kimi models in the cost map no longer resolve on
api.moonshot.ai (all return 404). Stamp each with its deprecation_date from
platform.kimi.ai/docs/models rather than deleting the entries, so historical
cost calculation keeps resolving the names while tooling can surface the
retirement.

Dates: kimi-thinking-preview 2025-11-11; kimi-latest and its 8k/32k/128k context
variants 2026-01-28; the kimi-k2 preview/turbo/thinking series 2026-05-25; the
moonshot-v1 -0430 snapshots use their own 2024-04-30 snapshot date (Moonshot
publishes no discontinuation date for them).

* fix(moonshot): drop temperature for reasoning models (kimi-k2.5/k2.6) (#29687)

Kimi reasoning models reject every temperature except 1; a request with
temperature=0.2 returns "invalid temperature: only 1 is allowed for this model".
litellm only clamped temperature into [0.3, 1], so any value below 1 still 400'd.

Drop the temperature param entirely for reasoning models (gated on
supports_reasoning, the same signal transform_request already uses) so the model
default is used; the non-reasoning moonshot-v1 models keep the existing clamp.

Co-authored-by: Sameer Kankute <sameer@berri.ai>

* feat(mcp): add per-server timeout configuration (#29672)

* feat(mcp): add per-server timeout configuration

* fix(mcp): address timeout field review comments

- use is not None guard instead of or for 0.0 edge case
- copy timeout in both LiteLLM_MCPServerTable constructions (health check path + _build_mcp_server_table)
- add timeout Float? column to all three schema.prisma files
- extend round-trip test to cover _build_mcp_server_table direction
- add test for zero timeout not treated as falsy

* fix(mcp): forward timeout in _build_temporary_mcp_server_record

* fix(mcp): return 504 instead of 500 when per-server timeout fires

* test(mcp): add 504 timeout regression test; fix black formatting

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7 (#28567)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7

AWS Bedrock documents jp.anthropic.claude-opus-4-7 alongside the
existing us./eu./au./global. profiles for Claude Opus 4.7
(ap-northeast-1 Tokyo / ap-northeast-3 Osaka), but the entry is
missing from model_prices_and_context_window.json. Tokyo-region
users currently get an "unknown model" error when routing through
the JP geo profile.

Adds the entry to both the canonical file and the bundled backup,
mirroring the recent pattern for sonnet-4-6 (#27831). Pricing matches
the other regional profiles (10% premium over base/global).

Regression test pins all six documented profiles (base, global, us, eu,
au, jp) and asserts pricing parity between jp. and au. variants.

Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-7.html

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* feat(soniox): add soniox audio transcription integration (#29508)

* feat(openmeter): add OPENMETER_TRUST_REQUEST_USER to prevent forged attribution (#29650)

The OpenMeter callback resolves the CloudEvent subject from kwargs["user"]
first, then falls back to the key-bound user_api_key_user_id. For
multi-tenant proxy deployments, a client can set `"user": "..."` in the
request body and cause their usage to be attributed to that arbitrary
string — a billing-attribution forgery risk.

Adds OPENMETER_TRUST_REQUEST_USER env var (default "true" for backward
compatibility). When set to "false", the request-supplied `user` field is
ignored and the subject is resolved solely from user_api_key_user_id.

Matches the existing env-var-driven config pattern in this file
(OPENMETER_API_KEY, OPENMETER_API_ENDPOINT, OPENMETER_EVENT_TYPE).

* feat(search): add you_com as a search provider (#28370)

* feat(search): add you_com as a search provider

Registers You.com Search API as a first-class `search_provider` in the
`search_tools` registry, alongside Tavily, Exa, Perplexity, etc.

- New adapter: litellm/llms/you_com/search/transformation.py
  - POSTs to https://ydc-index.io/v1/search
  - Auth: X-API-Key from YOUCOM_API_KEY (or explicit api_key)
  - Maps Perplexity unified spec: max_results -> count,
    search_domain_filter -> include_domains, country -> country
  - Flattens results.web + results.news into a single SearchResult list;
    snippet prefers snippets[0], falls back to description; page_age -> date
- Registry: SearchProviders.YOU_COM in litellm/types/utils.py and wired
  into ProviderConfigManager.get_provider_search_config()
- Pricing entry: model_prices_and_context_window.json (placeholder $0.0;
  happy to adjust to maintainers' preferred public number)
- Docs: example router config snippet and example proxy yaml updated
- Tests: tests/search_tests/test_you_com_search.py - 5 mocked tests
  (payload shape, domain filter mapping, snippet fallback, news flattening,
  missing-api-key error)

Refs upstream expansion signal: #15942

* review fixups: normalize api_base, lowercase country, scope env-var to test

Addresses Greptile inline review comments on #28370:

- get_complete_url: strip trailing slashes from api_base *before* the
  endswith("/v1/search") check, so a custom base like ".../v1/search/"
  doesn't become ".../v1/search/v1/search".
- transform_search_request: .lower() country before sending, matching
  Tavily's convention so callers using the unified spec form ("US") get
  consistent behavior across providers.
- Tests: replace direct os.environ writes with an autouse monkeypatch
  fixture so YOUCOM_API_KEY is set per-test and removed afterwards.
  The missing-key test now uses monkeypatch.delenv. New test asserts the
  trailing-slash normalization above.

Reverts the ARCHITECTURE.md / example yaml edits per the reviewer note
that documentation changes belong in the litellm-docs repo.

* support keyless free tier (api.you.com/v1/agents/search) as default

You.com offers an IP-throttled keyless endpoint that returns the same
response shape as the keyed one (~100 queries/day, no signup). This is a
significant onboarding lever - mirrors the keyless DuckDuckGo/SearXNG
providers already in the search_tools registry.

Behavior:
- YOUCOM_API_KEY set        -> keyed:  POST https://ydc-index.io/v1/search
                                       (X-API-Key header)
- no key                    -> free:   POST https://api.you.com/v1/agents/search
                                       (no auth)
- YOUCOM_API_BASE override  -> honored as-is

Tests:
- New: test_you_com_search_keyless_free_tier - asserts URL + absence of
  X-API-Key when no key is configured.
- New: test_you_com_search_validate_environment_keyless - asserts the
  config no longer raises when the key is absent.
- Removed: test_you_com_search_raises_without_api_key (the precondition
  no longer holds).
- Existing payload/domain-filter/etc tests still cover keyed mode via
  the autouse YOUCOM_API_KEY fixture.

Verified both endpoints accept POST + return identical JSON shape:
  results.web[] / results.news[] with title, url, snippets, description,
  page_age.

* register you_com in provider_endpoints_support.json

Adding `litellm/llms/you_com/` requires a corresponding entry in
provider_endpoints_support.json or the
code-quality/check_provider_folders_documented CI check fails.

Follows the compact tavily/serper pattern - endpoints: { search: true }.
Local run of the check now reports "All 114 provider folders are documented".

* move tests under tests/test_litellm/llms/ so CI exercises them

The litellm CI workflows scope unit tests to `tests/test_litellm/...`
(see test-unit-llm-providers.yml: `tests/test_litellm/llms` path), so
tests living under `tests/search_tests/` are never run in CI - which is
why codecov reports 0% patch coverage for the new adapter even though
the unit tests exist and pass locally.

Move test_you_com_search.py into `tests/test_litellm/llms/you_com/` so
the test-unit-llm-providers job picks it up. 7/7 tests still pass at
the new location.

(Sibling search-only providers - tavily, exa_ai, brave, etc. - still
live only in `tests/search_tests/` and would benefit from the same
move, but that is out of scope for this PR.)

* fix(you_com): pin Accept-Encoding: identity to dodge keyless gzip bug

The keyless free-tier endpoint (api.you.com/v1/agents/search) advertises
Content-Encoding: gzip but returns a body that httpx's decoder rejects
with `zlib.error: Error -3 while decompressing data: incorrect header
check`, surfacing as litellm.APIConnectionError in user code. curl works
because it doesn't request compression by default.

Pin Accept-Encoding: identity in validate_environment so the upstream
server skips compression entirely. Harmless on the keyed endpoint
(ydc-index.io/v1/search) which negotiates content-encoding correctly.

The header uses setdefault so a caller-supplied Accept-Encoding still
takes precedence. (Server-side bug has been flagged to the You.com team
separately - once fixed there, this workaround can be removed.)

New unit test: test_you_com_search_pins_identity_accept_encoding.

---------

Co-authored-by: Sameer Kankute <sameer@berri.ai>

* docs: fix README typo (#29419)

Correct clear spelling mistakes in documentation without changing behavior.

Confidence: high
Scope-risk: narrow
Tested: git diff --check; uvx codespell on changed files
Not-tested: Full docs build not run; text-only changes

* Fix(langfuse): pass httpx_client to Langfuse in langfuse_prompt_management to respect SSL_VERIFY (#29480)

* fix(langfuse): pass ssl_verify to Langfuse httpx client

* fix_langfuse_

* add unit tests

* addressed comments

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(models): add minimax/MiniMax-M3 to model cost map (#29412)

Add MiniMax's new flagship MiniMax-M3 to the native minimax provider:
512K context, 128K max output, native multimodal (supports_vision),
reasoning, prompt caching. Pricing (USD/M tokens): input 0.6 / output
2.4 / cache read 0.12. M3 has no active prompt-cache-write tier, so
cache_creation_input_token_cost is omitted.

Updated both the root model_prices_and_context_window.json (remote
source) and the bundled litellm/model_prices_and_context_window_backup.json
(local fallback), keeping them in sync.

* fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log (#29394)

* fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log

* fix(logging): extend terminal event handling to ResponseIncompleteEvent and ResponseFailedEvent; fix return type annotation

* feat(provider): Add Neosantara provider as OpenAI Compatible (#29646)

* Add Neosantara provider

* Register Neosantara provider enum

* Address Neosantara provider review feedback

* Add Neosantara packaged endpoint support

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix: address greptile and veria review feedback

- langfuse: guard httpx_client injection behind version check (>= 2.7.3)
- soniox: propagate audio_transcription_duration in _hidden_params for spend tracking
- soniox: give SONIOX_API_BASE env var priority over caller-supplied api_base
- mcp: replace CancelledError catch with asyncio.wait_for + TimeoutError

* chore(mcp): add migration for per-server timeout column

* fix(test): add tool_use_system_prompt_tokens to model prices schema validator

* fix: mcp timeout test uses real asyncio.wait_for timeout; you_com get_complete_url respects resolved api_key

* fix: forward resolved api_key into you_com endpoint selection and apply timeout to soniox polling GETs

The search flow resolves api_key in validate_environment but never passed it
into get_complete_url, so a programmatic api_key (with no YOUCOM_API_KEY in the
env) set the X-API-Key header yet still selected the keyless free-tier endpoint.
Forward api_key through both the search entrypoint and the http handler so the
keyed endpoint is chosen.

HTTPHandler.get/AsyncHTTPHandler.get had no timeout parameter, so the Soniox
poll and transcript-fetch GETs silently used the client global default instead
of the caller timeout. Add a per-request timeout to get() and forward the
configured timeout from the Soniox handler.

* fix(soniox): price stt-async-v4 per second so transcriptions are billed

The handler stores audio_transcription_duration in _hidden_params, but the
model carried only token cost fields and the response has no token usage, so
the transcription cost path fell through to cost_per_second and returned $0.
An authenticated caller could transcribe Soniox audio without decrementing
their budget. Switch the entry to output_cost_per_second at Soniox's published
$0.10/hour async rate so the stored duration produces a real charge.

* fix(langfuse): use a dedicated httpx client for the SDK injection

The httpx_client handed to the Langfuse SDK came from _get_httpx_client(),
which returns LiteLLM's globally cached HTTPHandler. If Langfuse closed that
client on teardown it would invalidate the shared client used by every other
LiteLLM HTTP call. Build a dedicated httpx.Client instead, still resolving SSL
verification and client certificate from LiteLLM's configuration.

* fix(soniox): prefer caller-supplied api_base over SONIOX_API_BASE env var

* fix(cohere): support max_completion_tokens on cohere v2 chat (default route) (#29779)

* fix(cohere): support max_completion_tokens on cohere v2 chat

The default cohere_chat route resolves to CohereV2ChatConfig, which did not
list or map max_completion_tokens, so get_optional_params raised
UnsupportedParamsError for the standard OpenAI parameter (the modern
replacement for the deprecated max_tokens). The v1 config already maps it to
cohere's max_tokens; mirror that in v2 and add v2 regression tests.

* fix(cohere): make max_completion_tokens take precedence over max_tokens on v2

When both max_tokens and max_completion_tokens are supplied, prefer
max_completion_tokens explicitly rather than relying on dict iteration order,
and cover both orderings with a regression test.

---------

Co-authored-by: Daniel Yudelevich <4537920+yudelevi@users.noreply.github.com>
Co-authored-by: hectorc98 <hector.chamorroalvarez@adyen.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Dan Lemon <dan@danlemon.com>
Co-authored-by: Saswat <saswatds@users.noreply.github.com>
Co-authored-by: Brian Sparker <brainsparker@users.noreply.github.com>
Co-authored-by: Zhao73 <156770117+Zhao73@users.noreply.github.com>
Co-authored-by: Urain Ahmad Shah <60431964+urainshah@users.noreply.github.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: kape <168134658+kapelame@users.noreply.github.com>
Co-authored-by: danisalvaa <159898202+danisalvaa@users.noreply.github.com>
Co-authored-by: Just R <remixingmagelang@gmail.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add support for Soniox Provider

2 participants