Add granian as a ASGI compliant web server. Provider better throughput stability, by harish-berri · Pull Request #26027 · BerriAI/litellm

harish-berri · 2026-04-18T20:07:36Z

Add granian as a ASGI compliant web server. Provider better throughput stability, failure rate is nil and 10-20 RPS improvements over uvicorn. We achieve a peak throughput increase from 337 RPS to 365 RPS on a GCP VM with 4vCPU and 16GB RAM. Redis Cache Enabled

Check below for some numbers

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

Raw Throughput of the Granian (4 workers (subprocesses) + 1 runtime worker)

Raw Throughput of the Uvicorn Server(4 workers (subporcesses))

Raw Throughput of the LLM endpoint

…, 10-20 RPS improvement under standard LT conditions. TODO: Verify poetry lock details and add locust numbers to PR

greptile-apps · 2026-04-18T20:09:50Z

Greptile Summary

This PR adds Granian as an optional ASGI server backend for the LiteLLM proxy, exposed via --run_granian and --granian_threads CLI flags. Most issues from the previous review round have been resolved: granian==2.5.7 is now correctly placed in the proxy optional-dependencies, the license_cache.json version matches the pin, requires-python is updated to >=3.10, and SSL test coverage has been added.

All five granian unit tests (test_init_granian_server*) use @patch(\"granian.Granian\") as a decorator with pytest.importorskip(\"granian\") in the function body. The decorator resolves and imports granian before the body runs, so the tests error with ModuleNotFoundError rather than skipping on environments where granian is not installed. The fix (module-level pytest.importorskip) was outlined in a prior review comment but has not been applied to any of the five tests.

Confidence Score: 4/5

Safe to merge for production use; the test defect only affects CI environments where granian is absent and doesn't impact runtime behavior.

Core implementation is correct and well-structured. All major issues from the previous review round (dependency placement, license version, Python floor, missing SSL tests) are resolved. The remaining P1 is confined to test infrastructure: the @patch/importorskip ordering affects all five granian tests and will cause errors (not skips) on environments without granian installed, which can break CI.

tests/test_litellm/proxy/test_proxy_cli.py — all five test_init_granian_server* tests need the module-level pytest.importorskip guard applied.

Important Files Changed

Filename	Overview
litellm/proxy/proxy_cli.py	Adds `_init_granian_server` static method and `--run_granian` / `--granian_threads` CLI flags; implementation is clean with SSL validation, workers clamping, and informational warnings for unsupported options.
tests/test_litellm/proxy/test_proxy_cli.py	Adds SSL and partial-SSL test cases for granian (addressing previously missing coverage), but all five granian tests still share the `@patch("granian.Granian")` / `pytest.importorskip` ordering defect that causes errors instead of skips when granian is absent.
pyproject.toml	`granian==2.5.7` correctly placed in `proxy` optional-dependencies (not core); `requires-python` updated to `>=3.10`; no remaining placement or marker concerns.
license_cache.json	Records `granian:2.5.7` with BSD-3-Clause license — version now matches the `pyproject.toml` pin; prior mismatch resolved.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[litellm proxy run_server CLI] --> B{--run_granian?}
    B -- Yes --> C[Import granian or raise ImportError]
    C --> D[_init_granian_server]
    D --> E{SSL flags?}
    E -- both cert+key --> F[Add ssl_cert / ssl_key to kwargs]
    E -- only one --> G[raise ClickException]
    E -- neither --> H[No SSL kwargs]
    F --> I{granian_runtime_threads?}
    H --> I
    I -- set --> J[kwargs runtime_threads = N]
    I -- None --> K[omit runtime_threads]
    J --> L[Granian kwargs .serve blocking]
    K --> L
    B -- No --> M{--run_gunicorn?}
    M -- Yes --> N[_run_gunicorn_server]
    M -- No --> O{--run_hypercorn?}
    O -- Yes --> P[_init_hypercorn_server]
    O -- No --> Q[uvicorn.run default path]

_{Reviews (5): Last reviewed commit: "update uv lock to fix granian import err..." | Re-trigger Greptile}

…an server. Remove Python version skip conditions and implement tests to ensure SSL certificate and key are required for server initialization.

…/litellm into litellm_sidecar_robyn_perf

greptile-apps · 2026-04-20T16:48:46Z

+    @patch("granian.Granian")
+    @patch("builtins.print")
+    def test_init_granian_server_ssl(self, mock_print, mock_granian_cls):
+        pytest.importorskip("granian")
+        mock_server = MagicMock()
+        mock_granian_cls.return_value = mock_server
+        fake_interfaces = SimpleNamespace(ASGI="asgi")
+        with patch("granian.constants.Interfaces", fake_interfaces):
+            ProxyInitializationHelpers._init_granian_server(
+                host="0.0.0.0",
+                port=4000,
+                num_workers=1,
+                ssl_certfile_path="/path/to/cert.pem",
+                ssl_keyfile_path="/path/to/key.pem",
+                max_requests_before_restart=None,
+                ciphers=None,
+                granian_runtime_threads=None,
+            )
+        call_kwargs = mock_granian_cls.call_args.kwargs
+        assert call_kwargs["ssl_cert"] == Path("/path/to/cert.pem")
+        assert call_kwargs["ssl_key"] == Path("/path/to/key.pem")
+        mock_server.serve.assert_called_once()
+
+    @patch("granian.Granian")
+    def test_init_granian_server_ssl_requires_cert_and_key(self, mock_granian_cls):
+        pytest.importorskip("granian")
+        fake_interfaces = SimpleNamespace(ASGI="asgi")
+        with patch("granian.constants.Interfaces", fake_interfaces):
+            with pytest.raises(click.ClickException, match="Both --ssl_certfile_path"):
+                ProxyInitializationHelpers._init_granian_server(
+                    host="0.0.0.0",
+                    port=4000,
+                    num_workers=1,
+                    ssl_certfile_path="/path/to/cert.pem",
+                    ssl_keyfile_path=None,
+                    max_requests_before_restart=None,
+                    ciphers=None,
+                    granian_runtime_threads=None,
+                )
+        mock_granian_cls.assert_not_called()


@patch resolves before pytest.importorskip in new SSL tests

The three newly added SSL tests (test_init_granian_server_ssl, test_init_granian_server_ssl_requires_cert_and_key, and test_init_granian_server_runtime_threads) all carry @patch("granian.Granian") decorators with pytest.importorskip("granian") in their bodies. unittest.mock.patch resolves the target attribute (importing granian and looking up Granian) when the decorated function is invoked, before the function body executes. On an environment where granian is absent, this raises ModuleNotFoundError rather than skipping the test cleanly. The two original granian tests already had this pattern flagged; the same defect is present in all newly-added tests. Moving the skip guard to module level (as suggested in the earlier thread) would fix all five tests at once.

…/litellm into litellm_sidecar_robyn_perf

codecov · 2026-05-12T20:14:19Z

Codecov Report

❌ Patch coverage is 70.58824% with 10 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
litellm/proxy/proxy_cli.py	70.58%	10 Missing ⚠️

📢 Thoughts on this report? Let us know!

…/litellm into litellm_sidecar_robyn_perf

…/litellm into litellm_sidecar_robyn_perf merge parent

…/litellm into litellm_sidecar_robyn_perf

* feat: add guardrail violation span attributes and fix missing spans on pre-call blocks (#28364) - Fix missing guardrail child spans when a pre-call guardrail blocks the request before reaching the LLM provider; `async_post_call_failure_hook` now calls `_emit_guardrail_spans_from_request_data` to emit spans from `request_data["metadata"]` regardless of whether `_handle_failure` already fired - Add `guardrail_status`, `guardrail_action`, and `guardrail_violation_categories` as queryable top-level OTEL span attributes so trace backends can filter/group by violation type without parsing the redacted `guardrail_response` blob - Introduce `_emit_guardrail_spans_from_request_data` helper that constructs minimal kwargs from `request_data["metadata"]` and routes through `_create_guardrail_span`, sharing the same dedupe state to prevent double-emitting when both failure hooks fire - Extend `BedrockGuardrail` with `_build_tracing_detail` and `_extract_violation_category_names` which flatten BLOCKED assessments into human-readable category labels (topic names, content-filter types, PII entity types, named regex names) before redaction, and surface Bedrock's raw `action` field via `tracing_detail` - Security: violation category extraction deliberately omits `customWords.match` and unnamed regex `match` values because those fields carry the user-submitted content that triggered the rule; only operator-defined `name`/`type` labels are emitted - Add `violation_categories` and `guardrail_action` fields to `StandardLoggingGuardrailInformation` and `GuardrailTracingDetail` TypedDicts to carry the pre-redaction metadata through the logging pipeline - Add comprehensive test suite covering: guardrail span creation on failure, dedupe between `_handle_failure` and `async_post_call_failure_hook`, per-span status attributes for multi-guardrail sequences, Bedrock category extraction for all policy types, security leak prevention, and end-to-end `CustomGuardrail` violation path Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> * test(proxy): behavior-pinning matrix for team management endpoints (#28441) * test(proxy): behavior-pinning matrix for team management endpoints PR2 (Team Tier-1) of the management-endpoint behavior-pinning effort. Extends the tests/proxy_behavior/management/ harness PR1 built and adds the actor x target-resource authz matrix for the 7 team endpoints: /team/new, /team/info, /team/list, /team/update, /team/member_add, /team/member_delete, /team/member_update. Tests-only, no production code changes. Harness extensions: - actors.py: ORG_B_ADMIN actor (org admin of ORG_B) and TEAM_GAMMA (an ORG_A team with no actor members), so team-targeting endpoints get a clean own / same-org-other / cross-org target axis. - conftest.py: create_scratch_team() raw-seeds target teams without /team/new side effects; the scratch teardown now also strips dangling scratch-team refs from LiteLLM_UserTable.teams. 156 new scenarios; status codes pinned to observed handler behavior. * test(proxy): record mutmut run blockers in PR2 triage doc Attempted a scoped local mutmut run for G5; it did not complete. Record the three concrete blockers in mutmut_triage/pr2-team-tier1.md so the next attempt has a head start: 1. mutmut's mutants/ sandbox is import-shadowed by the worktree source. 2. the legacy mock suite and the real-DB behavior suite cannot share a pytest session (mock suite globally patches prisma_client). 3. the CI mutation-test.yml workflow starts no Postgres, so its stats phase now aborts on the behavior-suite tests PR1 added to tests_dir. mutmut stays a deferred follow-up (as in PR1); the binding pre-merge signal remains the behavior matrix (G1) and the G4 regression-replay. * test(proxy): drop suite README + triage doc, trim test comments Remove the two prose docs from the behavior suite (README.md and mutmut_triage/pr2-team-tier1.md) and tighten the comment blocks on the team test files + harness down to the load-bearing parts (the gate each matrix pins, plus genuinely surprising results). No behavior change — all 286 scenarios still pass. * test(proxy): remove mutmut tests_dir comment * test(vertex_ai): tolerate transient 500 in google maps grounding test (#28503) test_gemini_google_maps_tool_simple makes live calls to Vertex AI's Google Maps grounding backend, which intermittently returns 500 INTERNAL ("Please retry") — a transient Google-side failure, not a LiteLLM bug. The request LiteLLM emits matches Google's published googleMaps grounding spec field-for-field, and the maps-platform 500 only occurs after Vertex accepts the request. The test already passes on RateLimitError; treat InternalServerError the same way so transient Vertex-side failures don't fail CI. * fix(docker): restore npm to non_root builder image (#28519) The non_root builder stage installs `nodejs` but not `npm`. Without `npm` on PATH, prisma-python falls back to downloading a Node runtime via nodeenv from nodejs.org, and that downloaded binary fails to load `libatomic.so.1` — breaking `prisma generate` and the image build. `npm` was dropped from this apk list in ca52e34. Restoring it lets prisma-python use the system Node + npm, matching docker/Dockerfile which already installs `npm` for the same reason. * build(deps): bump next from 16.2.4 to 16.2.6 in /ui/litellm-dashboard (#27665) (#28524) Bumps [next](https://github.com/vercel/next.js) from 16.2.4 to 16.2.6. - [Release notes](https://github.com/vercel/next.js/releases) - [Changelog](https://github.com/vercel/next.js/blob/canary/release.js) - [Commits](vercel/next.js@v16.2.4...v16.2.6) --- updated-dependencies: - dependency-name: next dependency-version: 16.2.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump black to 26.3.1 and apply formatting (#28525) * build(deps-dev): bump black 24.10.0 -> 26.3.1 * style: apply black 26.3.1 formatting * chore: authorize black 26.3.1 license in liccheck.ini * chore(deps): bump deps (#28528) * build(deps): bump next from 16.2.4 to 16.2.6 in /ui/litellm-dashboard (#27665) Bumps [next](https://github.com/vercel/next.js) from 16.2.4 to 16.2.6. - [Release notes](https://github.com/vercel/next.js/releases) - [Changelog](https://github.com/vercel/next.js/blob/canary/release.js) - [Commits](vercel/next.js@v16.2.4...v16.2.6) --- updated-dependencies: - dependency-name: next dependency-version: 16.2.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump protobufjs in /tests/pass_through_tests (#28296) Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.5.6 to 7.6.0. - [Release notes](https://github.com/protobufjs/protobuf.js/releases) - [Changelog](https://github.com/protobufjs/protobuf.js/blob/protobufjs-v7.6.0/CHANGELOG.md) - [Commits](protobufjs/protobuf.js@protobufjs-v7.5.6...protobufjs-v7.6.0) --- updated-dependencies: - dependency-name: protobufjs dependency-version: 7.6.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump ws from 8.20.0 to 8.20.1 in /tests/pass_through_tests (#28303) Bumps [ws](https://github.com/websockets/ws) from 8.20.0 to 8.20.1. - [Release notes](https://github.com/websockets/ws/releases) - [Commits](websockets/ws@8.20.0...8.20.1) --- updated-dependencies: - dependency-name: ws dependency-version: 8.20.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * test(e2e): forward LITELLM_LICENSE to UI e2e proxy (#28398) * test(e2e): forward LITELLM_LICENSE to UI e2e proxy The UI e2e job ran without LITELLM_LICENSE, so premium_user was always false in the issued login JWT and premium-gated UI surfaces (Team-BYOK Model switch, etc.) couldn't be driven through the UI. Forward the env var from run_e2e.sh and the CircleCI e2e_ui_testing job, and add a sanity test that decodes the admin storage state token and asserts premium_user=true so the wiring fails loudly if it ever regresses. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update ui/litellm-dashboard/e2e_tests/tests/proxy-admin/license.spec.ts Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add granian as a ASGI compliant web server. Provider better throughput stability, (#26027) * Add granian as a ASGI compliant web server. Provides better stability, 10-20 RPS improvement under standard LT conditions. TODO: Verify poetry lock details and add locust numbers to PR * Update granian version in license_cache.json and pyproject.toml to 2.5.7 * Enhance proxy CLI tests by adding SSL initialization checks for Granian server. Remove Python version skip conditions and implement tests to ensure SSL certificate and key are required for server initialization. * update uv lock to fix granian import error --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: harish-berri <harish@berri.ai>

Add granian as a ASGI compliant web server. Provides better stability…

ac77396

…, 10-20 RPS improvement under standard LT conditions. TODO: Verify poetry lock details and add locust numbers to PR

greptile-apps Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

Comment thread license_cache.json Outdated

Comment thread tests/test_litellm/proxy/test_proxy_cli.py

Comment thread tests/test_litellm/proxy/test_proxy_cli.py Outdated

Update granian version in license_cache.json and pyproject.toml to 2.5.7

8bd0008

harish-berri temporarily deployed to integration-postgres April 18, 2026 20:12 — with GitHub Actions Inactive

harish-berri had a problem deploying to integration-postgres April 18, 2026 20:13 — with GitHub Actions Error

harish-berri temporarily deployed to integration-postgres April 18, 2026 20:13 — with GitHub Actions Inactive

harish-berri had a problem deploying to integration-postgres April 18, 2026 20:13 — with GitHub Actions Error

Enhance proxy CLI tests by adding SSL initialization checks for Grani…

3064335

…an server. Remove Python version skip conditions and implement tests to ensure SSL certificate and key are required for server initialization.

harish-berri temporarily deployed to integration-postgres April 18, 2026 20:18 — with GitHub Actions Inactive

harish-berri had a problem deploying to integration-postgres April 18, 2026 20:19 — with GitHub Actions Error

harish-berri temporarily deployed to integration-postgres April 18, 2026 20:19 — with GitHub Actions Inactive

greptile-apps Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread pyproject.toml

Merge branch 'litellm_internal_staging' of https://github.com/BerriAI…

fdc9c05

…/litellm into litellm_sidecar_robyn_perf

harish-berri temporarily deployed to integration-postgres April 20, 2026 16:15 — with GitHub Actions Inactive

harish-berri had a problem deploying to integration-postgres April 20, 2026 16:15 — with GitHub Actions Error

harish-berri temporarily deployed to integration-postgres April 20, 2026 16:15 — with GitHub Actions Inactive

update uv lock to fix granian import error

7322b33

harish-berri temporarily deployed to integration-postgres April 20, 2026 16:45 — with GitHub Actions Inactive

harish-berri had a problem deploying to integration-postgres April 20, 2026 16:45 — with GitHub Actions Error

greptile-apps Bot reviewed Apr 20, 2026

View reviewed changes

ishaan-berri approved these changes Apr 20, 2026

View reviewed changes

harish-berri added 2 commits May 1, 2026 19:59

Merge branch 'litellm_internal_staging' of https://github.com/BerriAI…

8ca6142

…/litellm into litellm_sidecar_robyn_perf

Merge branch 'litellm_internal_staging' of https://github.com/BerriAI…

290b9e5

…/litellm into litellm_sidecar_robyn_perf

Merge branch 'litellm_internal_staging' of https://github.com/BerriAI…

1c2869d

…/litellm into litellm_sidecar_robyn_perf

harish-berri requested a review from a team May 17, 2026 20:04

Merge branch 'litellm_internal_staging' of https://github.com/BerriAI…

ce151f5

…/litellm into litellm_sidecar_robyn_perf merge parent

yuneng-berri approved these changes May 22, 2026

View reviewed changes

Merge branch 'litellm_internal_staging' of https://github.com/BerriAI…

70e7013

…/litellm into litellm_sidecar_robyn_perf

harish-berri merged commit d04373f into litellm_internal_staging May 22, 2026
118 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add granian as a ASGI compliant web server. Provider better throughput stability,#26027

Add granian as a ASGI compliant web server. Provider better throughput stability,#26027
harish-berri merged 10 commits into
litellm_internal_stagingfrom
litellm_sidecar_robyn_perf

harish-berri commented Apr 18, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 18, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Apr 20, 2026

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

harish-berri commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Raw Throughput of the Granian (4 workers (subprocesses) + 1 runtime worker)

Raw Throughput of the Uvicorn Server(4 workers (subporcesses))

Raw Throughput of the LLM endpoint

Uh oh!

greptile-apps Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harish-berri commented Apr 18, 2026 •

edited

Loading

greptile-apps Bot commented Apr 18, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading