Add soniox provider by dan2k3k4 · Pull Request #29508 · BerriAI/litellm

dan2k3k4 · 2026-06-02T16:11:53Z

Relevant issues

Fixes #29507

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have added meaningful tests
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible; it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

I have a proof-of-concept tool for testing transcription of audio into SRT and then into another language (if needed), so passing response_format=srt to the request should result in a subtitles format:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

dan2k3k4 · 2026-06-02T16:13:24Z

@greptileai

codecov · 2026-06-02T16:16:15Z

Codecov Report

❌ Patch coverage is 96.21381% with 17 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/llms/soniox/common_utils.py	93.18%	9 Missing ⚠️
litellm/llms/soniox/audio_transcription/handler.py	98.52%	3 Missing ⚠️
.../litellm_core_utils/get_supported_openai_params.py	33.33%	2 Missing ⚠️
litellm/main.py	50.00%	2 Missing ⚠️
.../llms/soniox/audio_transcription/transformation.py	98.96%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-06-02T16:16:19Z

Greptile Summary

This PR adds Soniox as a new LiteLLM provider, supporting async speech-to-text transcription via Soniox's multi-step REST API (upload → create → poll → fetch → cleanup). The integration is well-structured: it follows established provider patterns, uses litellm's shared HTTP clients on both sync and async paths (including ssl_verify propagation), correctly passes model_response through to response building, and ships comprehensive mock-only unit tests.

New handler (handler.py) orchestrates the polling pipeline with server-side-clamped poll intervals and best-effort cleanup; response transformation supports plain text, SRT, VTT, and verbose JSON formats synthesized from Soniox token timestamps.
Provider is fully registered across LlmProviders enum, models_by_provider, lazy-import registry, model_prices_and_context_window.json, provider_endpoints_support.json, and the proxy UI credentials form.

Confidence Score: 5/5

New provider addition with no modifications to existing behaviour; safe to merge.

The change is purely additive — all existing code paths are untouched. The new Soniox handler correctly uses litellm's shared HTTP clients, propagates ssl_verify on both sync and async paths, and passes model_response through to response construction. Tests are mock-only and cover the full polling lifecycle. The only minor issue is a missing defensive guard on the poll_interval float conversion, which has no impact on correctness for valid inputs.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/llms/soniox/audio_transcription/handler.py	New handler orchestrating Soniox's multi-step async transcription (upload → create → poll → fetch → cleanup) on both sync and async paths; both paths correctly use litellm's shared HTTP clients with ssl_verify propagation.
litellm/llms/soniox/audio_transcription/transformation.py	Transformation config mapping OpenAI params to Soniox-native params; handles language, response_format (srt/vtt/verbose_json), and webhook secret redaction correctly.
litellm/llms/soniox/common_utils.py	Shared utilities: SRT/VTT subtitle rendering from token timestamps, API key/base resolution, poll-interval clamping constants — all well-bounded.
litellm/main.py	Adds soniox dispatch branch in transcription() following the same pattern as nvidia_riva; type-ignore on provider_config is intentional and safe at runtime.
litellm/init.py	Registers soniox_models set, adds it to model_list and models_by_provider — consistent with all other provider registrations.
litellm/types/utils.py	Adds SONIOX to LlmProviders enum — single-line, correct placement.
model_prices_and_context_window.json	Adds soniox/stt-async-v4 model entry with pricing, mode, and supported endpoints; no unrelated modifications in this version of the diff.
tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_handler.py	Comprehensive mock-based tests covering sync/async flows, upload, polling, cleanup, edge cases — no real network calls.
tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_transformation.py	Tests transformation config including param mapping, response building, SRT/VTT rendering, and error handling — good coverage.
tests/test_litellm/llms/soniox/test_soniox_provider_registration.py	Tests provider enum registration, model list inclusion, lazy import, and get_llm_provider resolution — all mock-based.

_{Reviews (6): Last reviewed commit: "feat(soniox): add soniox audio transcrip..." | Re-trigger Greptile}

greptile-apps · 2026-06-02T16:18:52Z

Greptile Summary

This PR adds Soniox as a new audio transcription provider, implementing a multi-step async pipeline (file upload → create job → poll → fetch transcript → cleanup) behind LiteLLM's standard litellm.transcription() interface. The Soniox-specific code is well-structured, uses existing LiteLLM HTTP handlers, and is covered by comprehensive mock-only tests.

The core Soniox provider files (handler.py, transformation.py, common_utils.py) are clean and follow established LiteLLM patterns; the litellm/main.py dispatch and provider registration hooks are correct.
Both model_prices_and_context_window.json and its backup have unintended side-effects from JSON reformatting: \"supports_reasoning\": true was silently removed from bedrock/us-east-1/minimax.minimax-m2.5, bedrock/us-west-2/minimax.minimax-m2.5, and minimax.minimax-m2.5 (bedrock_converse), which would break any caller relying on reasoning for those models.
provider_endpoints_support.json also lost the a2a and interactions fields from the charity_engine entry as a formatting side-effect.

Confidence Score: 3/5

Safe to merge for Soniox functionality, but the JSON reformatting removed supports_reasoning from existing bedrock minimax models, which is a regression for those users.

The Soniox provider implementation itself is solid. The concern is in the JSON files: supports_reasoning: true was dropped from three existing bedrock/minimax model entries as a side-effect of reformatting, and two endpoint fields were removed from charity_engine. These unrelated changes could silently break users who rely on reasoning with those models.

model_prices_and_context_window.json and litellm/model_prices_and_context_window_backup.json — the minimax model entries need supports_reasoning restored; provider_endpoints_support.json needs the charity_engine a2a/interactions fields restored.

Important Files Changed

Filename	Overview
litellm/llms/soniox/audio_transcription/handler.py	New handler orchestrating Soniox's multi-step async transcription flow (upload → create → poll → fetch → cleanup) for both sync and async paths; uses existing LiteLLM HTTP handlers correctly and applies cleanup in finally blocks.
litellm/llms/soniox/audio_transcription/transformation.py	New transformation config mapping OpenAI params to Soniox equivalents and building TranscriptionResponse from Soniox payloads; logic is sound.
litellm/llms/soniox/common_utils.py	Shared constants, exception class, and utility helpers for Soniox; straightforward and correct.
litellm/main.py	Adds Soniox dispatch branch to transcription(); follows the same pattern as the nvidia_riva/elevenlabs branches.
model_prices_and_context_window.json	Adds soniox/stt-async-v4 entry, but JSON reformatting also silently removes supports_reasoning: true from bedrock/us-east-1/minimax.minimax-m2.5, bedrock/us-west-2/minimax.minimax-m2.5, and minimax.minimax-m2.5 (bedrock_converse) — a likely regression.
litellm/model_prices_and_context_window_backup.json	Mirrors the main pricing JSON changes, including the same unintended supports_reasoning removals for minimax models.
provider_endpoints_support.json	Adds soniox entry correctly, but also removes a2a and interactions fields from the charity_engine entry as a formatting side-effect.
litellm/llms/soniox/README.md	User-facing documentation added inside the litellm package; per contribution rules this should live in the litellm-docs repo instead.
litellm/init.py	Registers soniox_models set and adds it to model_list/models_by_provider; follows existing patterns correctly.
litellm/litellm_core_utils/get_llm_provider_logic.py	Adds soniox branch to resolve api_base and api_key from env vars; consistent with other provider branches.
tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_handler.py	Comprehensive mock-only tests covering upload, poll, cleanup, error, and async flows; no real network calls.
tests/test_litellm/llms/soniox/audio_transcription/test_soniox_audio_transcription_transformation.py	Unit tests for the transformation config covering param mapping, response building, and token rendering.

_{Reviews (2): Last reviewed commit: "Merge branch 'BerriAI:litellm_internal_s..." | Re-trigger Greptile}

dan2k3k4 · 2026-06-02T17:06:36Z

@greptileai

veria-ai · 2026-06-02T17:12:57Z

PR overview

All previously flagged issues have been addressed. No open security concerns remain on this pull request.

Security review

No open security issues remain on this pull request.

Fixed/addressed: 3 · PR risk: 0/10

dan2k3k4 · 2026-06-03T13:38:49Z

@greptileai

Sameerlite · 2026-06-05T09:03:04Z

@greptileai

Sameerlite · 2026-06-05T09:14:01Z

The concern is in the JSON files: supports_reasoning: true was dropped from three existing bedrock/minimax model entries as a side-effect of reformatting, and two endpoint fields were removed from charity_engine. These unrelated changes could silently break users who rely on reasoning with those models.

Please fix this

dan2k3k4 · 2026-06-05T09:31:03Z

The concern is in the JSON files: supports_reasoning: true was dropped from three existing bedrock/minimax model entries as a side-effect of reformatting, and two endpoint fields were removed from charity_engine. These unrelated changes could silently break users who rely on reasoning with those models.

Please fix this

Added a fix, squashed my commits into one and force pushed

Sameerlite · 2026-06-05T10:21:58Z

@greptileai

* Mark xAI models retiring on 2026-05-15 (#28788) Per https://docs.x.ai/developers/migration/may-15-retirement, xAI is retiring the following slugs on 2026-05-15 (auto-redirect to grok-4.3 with various reasoning efforts; callers continuing to use the old slugs will be billed at grok-4.3 pricing): grok-4-1-fast-reasoning{,-latest} -> grok-4.3 (low effort) grok-4-1-fast-non-reasoning{,-latest} -> grok-4.3 (none) grok-4-fast-reasoning -> grok-4.3 (low effort) grok-4-fast-non-reasoning -> grok-4.3 (none) grok-4-0709 -> grok-4.3 (low effort) grok-code-fast-1{,-0825} -> grok-build-0.1 grok-3 -> grok-4.3 (none) Only the direct xai/ slugs are tagged; third-party hosts (azure_ai, oci, vercel_ai_gateway, perplexity/xai) run their own schedules. The grok-3 retirement list explicitly names only the base grok-3 slug — the -mini / -fast / -beta / -latest variants are not listed, so they remain untouched. * feat(moonshot): advertise json_schema response support on live models (#29683) litellm.responses() already routes Moonshot through the responses->chat-completions bridge, and Moonshot honors response_format json_schema on chat completions. The cost-map entries left supports_response_schema unset, so discovery layers that gate on that flag dropped Moonshot from structured-output / responses listings even though the capability works end to end. Set supports_response_schema on the nine models currently live on api.moonshot.ai: kimi-k2.5, kimi-k2.6, the moonshot-v1 8k/32k/128k text and vision-preview variants, and moonshot-v1-auto. Verified against the live API that each honors json_schema and that litellm.responses() returns schema-valid structured output through the bridge. * chore(moonshot): mark models retired from api.moonshot.ai as deprecated (#29685) Thirteen Moonshot/Kimi models in the cost map no longer resolve on api.moonshot.ai (all return 404). Stamp each with its deprecation_date from platform.kimi.ai/docs/models rather than deleting the entries, so historical cost calculation keeps resolving the names while tooling can surface the retirement. Dates: kimi-thinking-preview 2025-11-11; kimi-latest and its 8k/32k/128k context variants 2026-01-28; the kimi-k2 preview/turbo/thinking series 2026-05-25; the moonshot-v1 -0430 snapshots use their own 2024-04-30 snapshot date (Moonshot publishes no discontinuation date for them). * fix(moonshot): drop temperature for reasoning models (kimi-k2.5/k2.6) (#29687) Kimi reasoning models reject every temperature except 1; a request with temperature=0.2 returns "invalid temperature: only 1 is allowed for this model". litellm only clamped temperature into [0.3, 1], so any value below 1 still 400'd. Drop the temperature param entirely for reasoning models (gated on supports_reasoning, the same signal transform_request already uses) so the model default is used; the non-reasoning moonshot-v1 models keep the existing clamp. Co-authored-by: Sameer Kankute <sameer@berri.ai> * feat(mcp): add per-server timeout configuration (#29672) * feat(mcp): add per-server timeout configuration * fix(mcp): address timeout field review comments - use is not None guard instead of or for 0.0 edge case - copy timeout in both LiteLLM_MCPServerTable constructions (health check path + _build_mcp_server_table) - add timeout Float? column to all three schema.prisma files - extend round-trip test to cover _build_mcp_server_table direction - add test for zero timeout not treated as falsy * fix(mcp): forward timeout in _build_temporary_mcp_server_record * fix(mcp): return 504 instead of 500 when per-server timeout fires * test(mcp): add 504 timeout regression test; fix black formatting * Add jp. Bedrock cross-region inference profile for claude-opus-4-7 (#28567) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add jp. Bedrock cross-region inference profile for claude-opus-4-7 AWS Bedrock documents jp.anthropic.claude-opus-4-7 alongside the existing us./eu./au./global. profiles for Claude Opus 4.7 (ap-northeast-1 Tokyo / ap-northeast-3 Osaka), but the entry is missing from model_prices_and_context_window.json. Tokyo-region users currently get an "unknown model" error when routing through the JP geo profile. Adds the entry to both the canonical file and the bundled backup, mirroring the recent pattern for sonnet-4-6 (#27831). Pricing matches the other regional profiles (10% premium over base/global). Regression test pins all six documented profiles (base, global, us, eu, au, jp) and asserts pricing parity between jp. and au. variants. Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-7.html --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * feat(soniox): add soniox audio transcription integration (#29508) * feat(openmeter): add OPENMETER_TRUST_REQUEST_USER to prevent forged attribution (#29650) The OpenMeter callback resolves the CloudEvent subject from kwargs["user"] first, then falls back to the key-bound user_api_key_user_id. For multi-tenant proxy deployments, a client can set `"user": "..."` in the request body and cause their usage to be attributed to that arbitrary string — a billing-attribution forgery risk. Adds OPENMETER_TRUST_REQUEST_USER env var (default "true" for backward compatibility). When set to "false", the request-supplied `user` field is ignored and the subject is resolved solely from user_api_key_user_id. Matches the existing env-var-driven config pattern in this file (OPENMETER_API_KEY, OPENMETER_API_ENDPOINT, OPENMETER_EVENT_TYPE). * feat(search): add you_com as a search provider (#28370) * feat(search): add you_com as a search provider Registers You.com Search API as a first-class `search_provider` in the `search_tools` registry, alongside Tavily, Exa, Perplexity, etc. - New adapter: litellm/llms/you_com/search/transformation.py - POSTs to https://ydc-index.io/v1/search - Auth: X-API-Key from YOUCOM_API_KEY (or explicit api_key) - Maps Perplexity unified spec: max_results -> count, search_domain_filter -> include_domains, country -> country - Flattens results.web + results.news into a single SearchResult list; snippet prefers snippets[0], falls back to description; page_age -> date - Registry: SearchProviders.YOU_COM in litellm/types/utils.py and wired into ProviderConfigManager.get_provider_search_config() - Pricing entry: model_prices_and_context_window.json (placeholder $0.0; happy to adjust to maintainers' preferred public number) - Docs: example router config snippet and example proxy yaml updated - Tests: tests/search_tests/test_you_com_search.py - 5 mocked tests (payload shape, domain filter mapping, snippet fallback, news flattening, missing-api-key error) Refs upstream expansion signal: #15942 * review fixups: normalize api_base, lowercase country, scope env-var to test Addresses Greptile inline review comments on #28370: - get_complete_url: strip trailing slashes from api_base *before* the endswith("/v1/search") check, so a custom base like ".../v1/search/" doesn't become ".../v1/search/v1/search". - transform_search_request: .lower() country before sending, matching Tavily's convention so callers using the unified spec form ("US") get consistent behavior across providers. - Tests: replace direct os.environ writes with an autouse monkeypatch fixture so YOUCOM_API_KEY is set per-test and removed afterwards. The missing-key test now uses monkeypatch.delenv. New test asserts the trailing-slash normalization above. Reverts the ARCHITECTURE.md / example yaml edits per the reviewer note that documentation changes belong in the litellm-docs repo. * support keyless free tier (api.you.com/v1/agents/search) as default You.com offers an IP-throttled keyless endpoint that returns the same response shape as the keyed one (~100 queries/day, no signup). This is a significant onboarding lever - mirrors the keyless DuckDuckGo/SearXNG providers already in the search_tools registry. Behavior: - YOUCOM_API_KEY set -> keyed: POST https://ydc-index.io/v1/search (X-API-Key header) - no key -> free: POST https://api.you.com/v1/agents/search (no auth) - YOUCOM_API_BASE override -> honored as-is Tests: - New: test_you_com_search_keyless_free_tier - asserts URL + absence of X-API-Key when no key is configured. - New: test_you_com_search_validate_environment_keyless - asserts the config no longer raises when the key is absent. - Removed: test_you_com_search_raises_without_api_key (the precondition no longer holds). - Existing payload/domain-filter/etc tests still cover keyed mode via the autouse YOUCOM_API_KEY fixture. Verified both endpoints accept POST + return identical JSON shape: results.web[] / results.news[] with title, url, snippets, description, page_age. * register you_com in provider_endpoints_support.json Adding `litellm/llms/you_com/` requires a corresponding entry in provider_endpoints_support.json or the code-quality/check_provider_folders_documented CI check fails. Follows the compact tavily/serper pattern - endpoints: { search: true }. Local run of the check now reports "All 114 provider folders are documented". * move tests under tests/test_litellm/llms/ so CI exercises them The litellm CI workflows scope unit tests to `tests/test_litellm/...` (see test-unit-llm-providers.yml: `tests/test_litellm/llms` path), so tests living under `tests/search_tests/` are never run in CI - which is why codecov reports 0% patch coverage for the new adapter even though the unit tests exist and pass locally. Move test_you_com_search.py into `tests/test_litellm/llms/you_com/` so the test-unit-llm-providers job picks it up. 7/7 tests still pass at the new location. (Sibling search-only providers - tavily, exa_ai, brave, etc. - still live only in `tests/search_tests/` and would benefit from the same move, but that is out of scope for this PR.) * fix(you_com): pin Accept-Encoding: identity to dodge keyless gzip bug The keyless free-tier endpoint (api.you.com/v1/agents/search) advertises Content-Encoding: gzip but returns a body that httpx's decoder rejects with `zlib.error: Error -3 while decompressing data: incorrect header check`, surfacing as litellm.APIConnectionError in user code. curl works because it doesn't request compression by default. Pin Accept-Encoding: identity in validate_environment so the upstream server skips compression entirely. Harmless on the keyed endpoint (ydc-index.io/v1/search) which negotiates content-encoding correctly. The header uses setdefault so a caller-supplied Accept-Encoding still takes precedence. (Server-side bug has been flagged to the You.com team separately - once fixed there, this workaround can be removed.) New unit test: test_you_com_search_pins_identity_accept_encoding. --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> * docs: fix README typo (#29419) Correct clear spelling mistakes in documentation without changing behavior. Confidence: high Scope-risk: narrow Tested: git diff --check; uvx codespell on changed files Not-tested: Full docs build not run; text-only changes * Fix(langfuse): pass httpx_client to Langfuse in langfuse_prompt_management to respect SSL_VERIFY (#29480) * fix(langfuse): pass ssl_verify to Langfuse httpx client * fix_langfuse_ * add unit tests * addressed comments --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * feat(models): add minimax/MiniMax-M3 to model cost map (#29412) Add MiniMax's new flagship MiniMax-M3 to the native minimax provider: 512K context, 128K max output, native multimodal (supports_vision), reasoning, prompt caching. Pricing (USD/M tokens): input 0.6 / output 2.4 / cache read 0.12. M3 has no active prompt-cache-write tier, so cache_creation_input_token_cost is omitted. Updated both the root model_prices_and_context_window.json (remote source) and the bundled litellm/model_prices_and_context_window_backup.json (local fallback), keeping them in sync. * fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log (#29394) * fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log * fix(logging): extend terminal event handling to ResponseIncompleteEvent and ResponseFailedEvent; fix return type annotation * feat(provider): Add Neosantara provider as OpenAI Compatible (#29646) * Add Neosantara provider * Register Neosantara provider enum * Address Neosantara provider review feedback * Add Neosantara packaged endpoint support --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix: address greptile and veria review feedback - langfuse: guard httpx_client injection behind version check (>= 2.7.3) - soniox: propagate audio_transcription_duration in _hidden_params for spend tracking - soniox: give SONIOX_API_BASE env var priority over caller-supplied api_base - mcp: replace CancelledError catch with asyncio.wait_for + TimeoutError * chore(mcp): add migration for per-server timeout column * fix(test): add tool_use_system_prompt_tokens to model prices schema validator * fix: mcp timeout test uses real asyncio.wait_for timeout; you_com get_complete_url respects resolved api_key * fix: forward resolved api_key into you_com endpoint selection and apply timeout to soniox polling GETs The search flow resolves api_key in validate_environment but never passed it into get_complete_url, so a programmatic api_key (with no YOUCOM_API_KEY in the env) set the X-API-Key header yet still selected the keyless free-tier endpoint. Forward api_key through both the search entrypoint and the http handler so the keyed endpoint is chosen. HTTPHandler.get/AsyncHTTPHandler.get had no timeout parameter, so the Soniox poll and transcript-fetch GETs silently used the client global default instead of the caller timeout. Add a per-request timeout to get() and forward the configured timeout from the Soniox handler. * fix(soniox): price stt-async-v4 per second so transcriptions are billed The handler stores audio_transcription_duration in _hidden_params, but the model carried only token cost fields and the response has no token usage, so the transcription cost path fell through to cost_per_second and returned $0. An authenticated caller could transcribe Soniox audio without decrementing their budget. Switch the entry to output_cost_per_second at Soniox's published $0.10/hour async rate so the stored duration produces a real charge. * fix(langfuse): use a dedicated httpx client for the SDK injection The httpx_client handed to the Langfuse SDK came from _get_httpx_client(), which returns LiteLLM's globally cached HTTPHandler. If Langfuse closed that client on teardown it would invalidate the shared client used by every other LiteLLM HTTP call. Build a dedicated httpx.Client instead, still resolving SSL verification and client certificate from LiteLLM's configuration. * fix(soniox): prefer caller-supplied api_base over SONIOX_API_BASE env var * fix(cohere): support max_completion_tokens on cohere v2 chat (default route) (#29779) * fix(cohere): support max_completion_tokens on cohere v2 chat The default cohere_chat route resolves to CohereV2ChatConfig, which did not list or map max_completion_tokens, so get_optional_params raised UnsupportedParamsError for the standard OpenAI parameter (the modern replacement for the deprecated max_tokens). The v1 config already maps it to cohere's max_tokens; mirror that in v2 and add v2 regression tests. * fix(cohere): make max_completion_tokens take precedence over max_tokens on v2 When both max_tokens and max_completion_tokens are supplied, prefer max_completion_tokens explicitly rather than relying on dict iteration order, and cover both orderings with a regression test. --------- Co-authored-by: Daniel Yudelevich <4537920+yudelevi@users.noreply.github.com> Co-authored-by: hectorc98 <hector.chamorroalvarez@adyen.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Dan Lemon <dan@danlemon.com> Co-authored-by: Saswat <saswatds@users.noreply.github.com> Co-authored-by: Brian Sparker <brainsparker@users.noreply.github.com> Co-authored-by: Zhao73 <156770117+Zhao73@users.noreply.github.com> Co-authored-by: Urain Ahmad Shah <60431964+urainshah@users.noreply.github.com> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: kape <168134658+kapelame@users.noreply.github.com> Co-authored-by: danisalvaa <159898202+danisalvaa@users.noreply.github.com> Co-authored-by: Just R <remixingmagelang@gmail.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>

dan2k3k4 mentioned this pull request Jun 2, 2026

chore: add soniox provider with support for async stt-async-v4 model #26885

Closed

7 tasks

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread litellm/llms/soniox/README.md Outdated

Comment thread provider_endpoints_support.json

Comment thread litellm/llms/soniox/audio_transcription/handler.py Outdated

Comment thread litellm/llms/soniox/audio_transcription/handler.py

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread model_prices_and_context_window.json

Comment thread litellm/llms/soniox/README.md Outdated

Comment thread provider_endpoints_support.json Outdated

dan2k3k4 requested a review from a team June 2, 2026 16:23

dan2k3k4 force-pushed the add-soniox-provider branch 2 times, most recently from aa05908 to 7127e01 Compare June 2, 2026 16:30

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread litellm/llms/soniox/audio_transcription/handler.py

veria-ai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread litellm/llms/soniox/audio_transcription/handler.py Outdated

Comment thread litellm/llms/soniox/audio_transcription/handler.py Outdated

dan2k3k4 force-pushed the add-soniox-provider branch 2 times, most recently from b33a5df to 175d129 Compare June 3, 2026 10:57

veria-ai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread litellm/llms/soniox/audio_transcription/handler.py Outdated

dan2k3k4 force-pushed the add-soniox-provider branch 4 times, most recently from e8a41c9 to c276b12 Compare June 3, 2026 13:37

dan2k3k4 force-pushed the add-soniox-provider branch 3 times, most recently from d9ec779 to 480958b Compare June 4, 2026 19:04

dan2k3k4 requested a review from Sameerlite June 4, 2026 21:17

feat(soniox): add soniox audio transcription integration

0b51950

dan2k3k4 force-pushed the add-soniox-provider branch from 480958b to 0b51950 Compare June 5, 2026 09:29

Sameerlite changed the base branch from litellm_internal_staging to litellm_oss_staging_050626 June 5, 2026 10:28

Sameerlite merged commit 38f2660 into BerriAI:litellm_oss_staging_050626 Jun 5, 2026
70 checks passed

dan2k3k4 deleted the add-soniox-provider branch June 5, 2026 11:21

Sameerlite pushed a commit that referenced this pull request Jun 5, 2026

feat(soniox): add soniox audio transcription integration (#29508)

7a788c1

Uh oh!

Conversation

dan2k3k4 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

dan2k3k4 commented Jun 2, 2026

Uh oh!

codecov Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dan2k3k4 commented Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veria-ai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Security review

Uh oh!

Uh oh!

dan2k3k4 commented Jun 3, 2026

Uh oh!

Sameerlite commented Jun 5, 2026

Uh oh!

Sameerlite commented Jun 5, 2026

Uh oh!

dan2k3k4 commented Jun 5, 2026

Uh oh!

Sameerlite commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dan2k3k4 commented Jun 2, 2026 •

edited

Loading

codecov Bot commented Jun 2, 2026 •

edited

Loading

greptile-apps Bot commented Jun 2, 2026 •

edited

Loading

veria-ai Bot commented Jun 2, 2026 •

edited

Loading