Skip to content

fix(bedrock): decouple STS region from Bedrock aws_region_name#28245

Merged
yuneng-berri merged 5 commits into
litellm_internal_stagingfrom
litellm_fix_bedrock_sts_region_decouple
May 22, 2026
Merged

fix(bedrock): decouple STS region from Bedrock aws_region_name#28245
yuneng-berri merged 5 commits into
litellm_internal_stagingfrom
litellm_fix_bedrock_sts_region_decouple

Conversation

@milan-berri

@milan-berri milan-berri commented May 19, 2026

Copy link
Copy Markdown
Collaborator

Report: LiteLLM in eu-west-1 calling Bedrock model in eu-central-1 worked on v1.81.14 but broke on v1.83.14 with:

botocore.exceptions.SSLError: SSL validation failed for https://sts.eu-central-1.amazonaws.com/
[SSL: UNEXPECTED_EOF_WHILE_READING]

Environment is air-gapped — only an STS VPC interface endpoint in eu-west-1 is reachable; sts.eu-central-1 resolves to a public IP they cannot reach.

Workaround attempt with aws_sts_endpoint=https://sts.eu-west-1.amazonaws.com failed with:

An error occurred (SignatureDoesNotMatch) when calling the AssumeRole operation:
Credential should be scoped to a valid region.

because STS region_name still came from aws_region_name (eu-central-1).

Regression introduced by #21640

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Proof of fix

Customer setup: workload in eu-west-1 (AWS_REGION=eu-west-1), Bedrock model in eu-central-1 (aws_region_name: eu-central-1), air-gapped, STS VPC endpoint only in eu-west-1.

STS client kwargs resolved by _build_sts_client_kwargs after fix:

Scenario Resolved STS kwargs
AWS_REGION=eu-west-1, aws_region_name=eu-central-1 {verify, region_name=eu-west-1} ← fixes air-gapped case
aws_sts_endpoint=https://sts.eu-west-1.amazonaws.com, aws_region_name=eu-central-1 {verify, endpoint_url=…west-1, region_name=eu-west-1} ← fixes SignatureDoesNotMatch
aws_sts_endpoint=https://sts.us-east-1.amazonaws.com (#21640 use case) {verify, endpoint_url=…east-1, region_name=us-east-1}
aws_sts_endpoint=https://vpce-abc.sts.eu-west-1.vpce.amazonaws.com {verify, endpoint_url=…vpce…, region_name=eu-west-1}
aws_sts_endpoint=https://sts.amazonaws.com (global) {verify, endpoint_url=…amazonaws.com} — no region forced
Only aws_region_name=us-east-1, no env, no endpoint region_name=None ← Bedrock region never leaks into STS

Type

🐛 Bug Fix

Changes

Root cause

PR #21640 made _auth_with_aws_role resolve region = aws_region_name or AWS_REGION or AWS_DEFAULT_REGION and pass it as region_name to the STS client. For any model that sets aws_region_name (i.e. all Bedrock models), STS started hitting sts.{bedrock_region}.amazonaws.com instead of the workload region.

This breaks two valid cross-region setups:

  1. Air-gapped / single STS endpoint region — workload reaches STS only in eu-west-1, but Bedrock model lives in eu-central-1. Connection now goes to public sts.eu-central-1 and times out / drops TLS.
  2. aws_sts_endpoint override + cross-region Bedrock — endpoint URL points at one region, region_name follows Bedrock region, SigV4 fails with Credential should be scoped to a valid region.

Fix

Decouple Bedrock region (aws_region_name) from STS signing region. STS region is resolved from the endpoint host or the caller environment only:

sts_region = parse_from(aws_sts_endpoint)
             or AWS_REGION
             or AWS_DEFAULT_REGION

Three small helpers in litellm/llms/bedrock/base_aws_llm.py:

  • _parse_sts_region_from_endpoint(endpoint) — extracts region from sts.{region}.amazonaws.com, sts-fips.{region}.amazonaws.com, vpce-x.sts.{region}.vpce.amazonaws.com, and .amazonaws.com.cn. Returns None for the global endpoint.
  • _resolve_sts_region(aws_sts_endpoint=None) — parse → env fallback. Never reads aws_region_name.
  • _build_sts_client_kwargs(aws_sts_endpoint, ssl_verify) — single source of truth for boto3 STS client kwargs; keeps endpoint_url and region_name aligned so SigV4 always matches the URL.

Call sites updated to use the helper:

  • _auth_with_aws_role (AssumeRole path, both ambient and explicit credentials)
  • _handle_irsa_cross_account, _handle_irsa_same_account — drop the region positional arg
  • _auth_with_web_identity_token — drop the manual https://sts.{aws_region_name}.amazonaws.com synthesis (also fixed a latent bug where aws_region_name=None produced https://sts.None.amazonaws.com)

Backwards compatibility

  • PR feat(bedrock): support optional regional STS endpoint in role assumption #21640 use case (force regional STS): still works via aws_sts_endpoint=https://sts.{region}.amazonaws.com — region is parsed from the URL.
  • Customers that relied on aws_region_name alone to drive STS region (no env, no endpoint): set aws_sts_endpoint: https://sts.{region}.amazonaws.com on the model. This is the documented escape hatch and the same approach as boto3.client("sts", endpoint_url=…).
  • Same-region happy path (Bedrock region == workload region): unchanged; AWS_REGION already matches.

Tests

tests/test_litellm/llms/bedrock/test_base_aws_llm.py:

  • test_parse_sts_region_from_endpoint — public, VPCE, FIPS, global, malformed URLs.
  • test_resolve_sts_region — env-only, endpoint-only, endpoint-wins-over-env, global, VPCE.
  • test_build_sts_client_kwargs — 8-case matrix covering endpoint/region/verify alignment.
  • test_sts_uses_workload_region_not_bedrock_region — air-gapped customer case (AWS_REGION=eu-west-1 + aws_region_name=eu-central-1 → STS west-1).
  • test_sts_endpoint_region_matches_bedrock_region_param — endpoint west + Bedrock central → signing west (no SignatureDoesNotMatch).
  • test_irsa_cross_account_sts_client_uses_resolved_region — every IRSA STS call gets workload region, not Bedrock.
  • test_web_identity_token_sts_client_uses_build_sts_client_kwargs — web identity path aligned.
  • Updated existing test_eks_irsa_ambient_credentials_used and test_explicit_credentials_used_when_providedaws_region_name no longer drives STS region.
pytest tests/test_litellm/llms/bedrock -q
# 686 passed

milan-berri and others added 3 commits May 19, 2026 13:27
STS AssumeRole now resolves signing region from aws_sts_endpoint (parsed
host) or AWS_REGION/AWS_DEFAULT_REGION instead of aws_region_name, fixing
air-gapped cross-region Bedrock setups and endpoint/signature mismatches.

Co-authored-by: Cursor <cursoragent@cursor.com>
Parametrize _resolve_sts_region and _build_sts_client_kwargs matrix cases,
and assert IRSA/web-identity paths use aligned STS endpoint and region_name.

Co-authored-by: Cursor <cursoragent@cursor.com>
…identity endpoint synthesis

Co-authored-by: Cursor <cursoragent@cursor.com>
@codecov

codecov Bot commented May 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/llms/bedrock/base_aws_llm.py 95.83% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@greptile-apps

greptile-apps Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a regression introduced in #21640 where the Bedrock model region (aws_region_name) was incorrectly used as the STS signing region, breaking air-gapped deployments and causing SignatureDoesNotMatch errors when aws_sts_endpoint and aws_region_name pointed at different regions.

  • Introduces three small, single-responsibility helpers (_parse_sts_region_from_endpoint, _resolve_sts_region, _build_sts_client_kwargs) that derive the STS region solely from the endpoint URL or the caller's environment (AWS_REGION / AWS_DEFAULT_REGION), never from the Bedrock-specific aws_region_name.
  • Updates all five STS client creation sites (_auth_with_aws_role, _handle_irsa_cross_account, _handle_irsa_same_account, _auth_with_web_identity_token, ambient EKS path) to use the single _build_sts_client_kwargs helper, eliminating the divergent ad-hoc kwarg assembly that caused the bug.
  • Adds eight new test functions covering endpoint parsing, region resolution, and full auth flow integration for the reported failure scenarios; existing tests updated with correct expected values (not masked).

Confidence Score: 5/5

Safe to merge — the change is tightly scoped to STS client construction, all five call sites are updated consistently, and the helper logic is fully covered by parametrized unit tests including the exact customer-reported scenarios.

All STS client creation paths now flow through a single, well-tested helper. The regex correctly handles regional, FIPS, VPCE, China, and global endpoints. Existing tests were updated to reflect corrected behavior, not to hide regressions. No new network calls are made; no auth logic was altered beyond the region resolution fix.

No files require special attention.

Important Files Changed

Filename Overview
litellm/llms/bedrock/base_aws_llm.py Adds three focused helpers (_parse_sts_region_from_endpoint, _resolve_sts_region, _build_sts_client_kwargs) and updates all STS call sites to use them; correctly decouples Bedrock region from STS signing region across all auth paths.
tests/test_litellm/llms/bedrock/test_base_aws_llm.py Adds comprehensive parametrized tests for the three new helpers and integrates regression tests for the air-gapped and SignatureDoesNotMatch scenarios; existing tests updated correctly to reflect the new STS region resolution behavior rather than to hide regressions.

Reviews (2): Last reviewed commit: "test(bedrock): cover FIPS, GovCloud, and..." | Re-trigger Greptile

Comment thread tests/test_litellm/llms/bedrock/test_base_aws_llm.py
Comment thread litellm/llms/bedrock/base_aws_llm.py
Addresses greptile P2: regex sts(?:-fips)? supported sts-fips hosts but
was not exercised by the parametrized parse test.

Co-authored-by: Cursor <cursoragent@cursor.com>
@milan-berri

Copy link
Copy Markdown
Collaborator Author

@greptileai - check again, test cases added and regarding 2nd item:
Push back on the flag. Pre-#21640 behavior was already “STS follows the boto3 session (AWS_REGION / AWS_DEFAULT_REGION / profile),” not “global default” — this PR restores that. Anyone needing STS pinned to the Bedrock region can set aws_sts_endpoint: https://sts.{region}.amazonaws.com, which the new parse logic signs correctly. A flag adds permanent surface for a 3-month-old behavior that’s actively breaking air-gapped deployments.


# Regional STS hostnames, e.g. sts.eu-west-1.amazonaws.com or
# vpce-xxx.sts.eu-west-1.vpce.amazonaws.com
_STS_REGION_FROM_ENDPOINT_PATTERN = re.compile(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using regex here seems brittle, what is this doing? From the test cases I see:

"https://sts.us-east-1.amazonaws.com", "us-east-1"

is there a better way to do this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about explicit aws_sts_region param - clearest, no parsing; but skipped it to avoid another config knob. Why regex: boto3 doesn’t derive the SigV4 signing region from endpoint_url. If the URL says sts.eu-west-1 but region_name is eu-central-1, you get SignatureDoesNotMatch - that’s what the user hit. Parsing the hostname is the smallest way to keep URL and signing region in sync when aws_sts_endpoint is set.

@yuneng-berri yuneng-berri enabled auto-merge (squash) May 22, 2026 18:16
@yuneng-berri yuneng-berri merged commit 1b141bc into litellm_internal_staging May 22, 2026
107 of 109 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants