fix(bedrock): decouple STS region from Bedrock aws_region_name#28245
Conversation
STS AssumeRole now resolves signing region from aws_sts_endpoint (parsed host) or AWS_REGION/AWS_DEFAULT_REGION instead of aws_region_name, fixing air-gapped cross-region Bedrock setups and endpoint/signature mismatches. Co-authored-by: Cursor <cursoragent@cursor.com>
Parametrize _resolve_sts_region and _build_sts_client_kwargs matrix cases, and assert IRSA/web-identity paths use aligned STS endpoint and region_name. Co-authored-by: Cursor <cursoragent@cursor.com>
…identity endpoint synthesis Co-authored-by: Cursor <cursoragent@cursor.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR fixes a regression introduced in #21640 where the Bedrock model region (
Confidence Score: 5/5Safe to merge — the change is tightly scoped to STS client construction, all five call sites are updated consistently, and the helper logic is fully covered by parametrized unit tests including the exact customer-reported scenarios. All STS client creation paths now flow through a single, well-tested helper. The regex correctly handles regional, FIPS, VPCE, China, and global endpoints. Existing tests were updated to reflect corrected behavior, not to hide regressions. No new network calls are made; no auth logic was altered beyond the region resolution fix. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/llms/bedrock/base_aws_llm.py | Adds three focused helpers (_parse_sts_region_from_endpoint, _resolve_sts_region, _build_sts_client_kwargs) and updates all STS call sites to use them; correctly decouples Bedrock region from STS signing region across all auth paths. |
| tests/test_litellm/llms/bedrock/test_base_aws_llm.py | Adds comprehensive parametrized tests for the three new helpers and integrates regression tests for the air-gapped and SignatureDoesNotMatch scenarios; existing tests updated correctly to reflect the new STS region resolution behavior rather than to hide regressions. |
Reviews (2): Last reviewed commit: "test(bedrock): cover FIPS, GovCloud, and..." | Re-trigger Greptile
Addresses greptile P2: regex sts(?:-fips)? supported sts-fips hosts but was not exercised by the parametrized parse test. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@greptileai - check again, test cases added and regarding 2nd item: |
|
|
||
| # Regional STS hostnames, e.g. sts.eu-west-1.amazonaws.com or | ||
| # vpce-xxx.sts.eu-west-1.vpce.amazonaws.com | ||
| _STS_REGION_FROM_ENDPOINT_PATTERN = re.compile( |
There was a problem hiding this comment.
Using regex here seems brittle, what is this doing? From the test cases I see:
"https://sts.us-east-1.amazonaws.com", "us-east-1"
is there a better way to do this?
There was a problem hiding this comment.
I was thinking about explicit aws_sts_region param - clearest, no parsing; but skipped it to avoid another config knob. Why regex: boto3 doesn’t derive the SigV4 signing region from endpoint_url. If the URL says sts.eu-west-1 but region_name is eu-central-1, you get SignatureDoesNotMatch - that’s what the user hit. Parsing the hostname is the smallest way to keep URL and signing region in sync when aws_sts_endpoint is set.
1b141bc
into
litellm_internal_staging
Report: LiteLLM in eu-west-1 calling Bedrock model in eu-central-1 worked on v1.81.14 but broke on v1.83.14 with:
botocore.exceptions.SSLError: SSL validation failed for https://sts.eu-central-1.amazonaws.com/
[SSL: UNEXPECTED_EOF_WHILE_READING]
Environment is air-gapped — only an STS VPC interface endpoint in eu-west-1 is reachable; sts.eu-central-1 resolves to a public IP they cannot reach.
Workaround attempt with aws_sts_endpoint=https://sts.eu-west-1.amazonaws.com failed with:
An error occurred (SignatureDoesNotMatch) when calling the AssumeRole operation:
Credential should be scoped to a valid region.
because STS region_name still came from aws_region_name (eu-central-1).
Regression introduced by #21640
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Proof of fix
Customer setup: workload in
eu-west-1(AWS_REGION=eu-west-1), Bedrock model ineu-central-1(aws_region_name: eu-central-1), air-gapped, STS VPC endpoint only ineu-west-1.STS client kwargs resolved by
_build_sts_client_kwargsafter fix:AWS_REGION=eu-west-1,aws_region_name=eu-central-1{verify, region_name=eu-west-1}← fixes air-gapped caseaws_sts_endpoint=https://sts.eu-west-1.amazonaws.com,aws_region_name=eu-central-1{verify, endpoint_url=…west-1, region_name=eu-west-1}← fixes SignatureDoesNotMatchaws_sts_endpoint=https://sts.us-east-1.amazonaws.com(#21640 use case){verify, endpoint_url=…east-1, region_name=us-east-1}aws_sts_endpoint=https://vpce-abc.sts.eu-west-1.vpce.amazonaws.com{verify, endpoint_url=…vpce…, region_name=eu-west-1}aws_sts_endpoint=https://sts.amazonaws.com(global){verify, endpoint_url=…amazonaws.com}— no region forcedaws_region_name=us-east-1, no env, no endpointregion_name=None← Bedrock region never leaks into STSType
🐛 Bug Fix
Changes
Root cause
PR #21640 made
_auth_with_aws_roleresolveregion = aws_region_name or AWS_REGION or AWS_DEFAULT_REGIONand pass it asregion_nameto the STS client. For any model that setsaws_region_name(i.e. all Bedrock models), STS started hittingsts.{bedrock_region}.amazonaws.cominstead of the workload region.This breaks two valid cross-region setups:
eu-west-1, but Bedrock model lives ineu-central-1. Connection now goes to publicsts.eu-central-1and times out / drops TLS.aws_sts_endpointoverride + cross-region Bedrock — endpoint URL points at one region,region_namefollows Bedrock region, SigV4 fails withCredential should be scoped to a valid region.Fix
Decouple Bedrock region (
aws_region_name) from STS signing region. STS region is resolved from the endpoint host or the caller environment only:Three small helpers in
litellm/llms/bedrock/base_aws_llm.py:_parse_sts_region_from_endpoint(endpoint)— extracts region fromsts.{region}.amazonaws.com,sts-fips.{region}.amazonaws.com,vpce-x.sts.{region}.vpce.amazonaws.com, and.amazonaws.com.cn. ReturnsNonefor the global endpoint._resolve_sts_region(aws_sts_endpoint=None)— parse → env fallback. Never readsaws_region_name._build_sts_client_kwargs(aws_sts_endpoint, ssl_verify)— single source of truth for boto3 STS client kwargs; keepsendpoint_urlandregion_namealigned so SigV4 always matches the URL.Call sites updated to use the helper:
_auth_with_aws_role(AssumeRole path, both ambient and explicit credentials)_handle_irsa_cross_account,_handle_irsa_same_account— drop theregionpositional arg_auth_with_web_identity_token— drop the manualhttps://sts.{aws_region_name}.amazonaws.comsynthesis (also fixed a latent bug whereaws_region_name=Noneproducedhttps://sts.None.amazonaws.com)Backwards compatibility
aws_sts_endpoint=https://sts.{region}.amazonaws.com— region is parsed from the URL.aws_region_namealone to drive STS region (no env, no endpoint): setaws_sts_endpoint: https://sts.{region}.amazonaws.comon the model. This is the documented escape hatch and the same approach asboto3.client("sts", endpoint_url=…).AWS_REGIONalready matches.Tests
tests/test_litellm/llms/bedrock/test_base_aws_llm.py:test_parse_sts_region_from_endpoint— public, VPCE, FIPS, global, malformed URLs.test_resolve_sts_region— env-only, endpoint-only, endpoint-wins-over-env, global, VPCE.test_build_sts_client_kwargs— 8-case matrix covering endpoint/region/verify alignment.test_sts_uses_workload_region_not_bedrock_region— air-gapped customer case (AWS_REGION=eu-west-1+aws_region_name=eu-central-1→ STS west-1).test_sts_endpoint_region_matches_bedrock_region_param— endpoint west + Bedrock central → signing west (no SignatureDoesNotMatch).test_irsa_cross_account_sts_client_uses_resolved_region— every IRSA STS call gets workload region, not Bedrock.test_web_identity_token_sts_client_uses_build_sts_client_kwargs— web identity path aligned.test_eks_irsa_ambient_credentials_usedandtest_explicit_credentials_used_when_provided—aws_region_nameno longer drives STS region.pytest tests/test_litellm/llms/bedrock -q # 686 passed