What happened
POST /v1/messages/count_tokens against a Vertex AI–routed Claude model returns HTTP 500:
{"detail":{"error":"Internal server error: vertexai import failed please run `pip install -U \"google-cloud-aiplatform>=1.38\"`. Got error: No module named 'vertexai'"}}
The proxy is up, all other Vertex AI routes (/v1/messages, tool calls, streaming, vision, web_search, etc.) work for the same model alias — only count_tokens requires the vertexai SDK module. The same count_tokens call against claude-sonnet-4-6 (anthropic-native) returns {"input_tokens": 9} as expected from the same proxy.
google-cloud-aiplatform is not a LiteLLM core dependency, nor is it in the proxy-dev/proxy extras used by uv sync --frozen --group proxy-dev --extra proxy. So a fresh stable-tag install can never count tokens on Vertex AI Claude unless the operator separately installs the Gemini SDK.
Reproducer
# proxy launched via `uv run litellm --config config.yaml --port 4101`
curl -sS -X POST http://127.0.0.1:4101/v1/messages/count_tokens \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{"model":"claude-sonnet-4-6-vertex","messages":[{"role":"user","content":"hello world"}]}'
Returns the 500 above. The claude-sonnet-4-6-vertex alias routes to Vertex AI's Claude API and works for every other endpoint without google-cloud-aiplatform.
Expected
Counting tokens for Claude-on-Vertex should not require the Gemini-specific vertexai Python module. Vertex AI's Claude models are addressed via the Anthropic Messages API protocol; the token counter should use the same Anthropic-protocol path the rest of the vertex_ai Claude routing does (or call Vertex's :rawPredict count-tokens equivalent directly via HTTPS, like LiteLLM's other Vertex Claude transformations do).
Either:
- Make
count_tokens for Claude-on-Vertex avoid import vertexai entirely (preferred — keeps deps minimal); or
- Add
google-cloud-aiplatform to the default proxy extras with a clear error message pointing the operator at the install when it's absent (worse — pulls in a heavy Gemini SDK for a feature that should be SDK-free).
Why this matters
Claude Code's headless mode (and any tooling that uses count_tokens for budget display) is silently broken against any LiteLLM proxy that routes Claude to Vertex AI unless the operator knows to install the Gemini SDK.
Relevant LiteLLM version
v1.83.14-stable (also reproduced against current main).
Surfaced by
Daily Claude Code × LiteLLM compatibility-matrix cron (PR BerriAI/litellm-docs#142 — see count_tokens × vertex_ai cell).
What happened
POST /v1/messages/count_tokensagainst a Vertex AI–routed Claude model returns HTTP 500:{"detail":{"error":"Internal server error: vertexai import failed please run `pip install -U \"google-cloud-aiplatform>=1.38\"`. Got error: No module named 'vertexai'"}}The proxy is up, all other Vertex AI routes (
/v1/messages, tool calls, streaming, vision, web_search, etc.) work for the same model alias — onlycount_tokensrequires thevertexaiSDK module. The samecount_tokenscall againstclaude-sonnet-4-6(anthropic-native) returns{"input_tokens": 9}as expected from the same proxy.google-cloud-aiplatformis not a LiteLLM core dependency, nor is it in theproxy-dev/proxyextras used byuv sync --frozen --group proxy-dev --extra proxy. So a fresh stable-tag install can never count tokens on Vertex AI Claude unless the operator separately installs the Gemini SDK.Reproducer
Returns the 500 above. The
claude-sonnet-4-6-vertexalias routes to Vertex AI's Claude API and works for every other endpoint withoutgoogle-cloud-aiplatform.Expected
Counting tokens for Claude-on-Vertex should not require the Gemini-specific
vertexaiPython module. Vertex AI's Claude models are addressed via the Anthropic Messages API protocol; the token counter should use the same Anthropic-protocol path the rest of thevertex_aiClaude routing does (or call Vertex's:rawPredictcount-tokens equivalent directly via HTTPS, like LiteLLM's other Vertex Claude transformations do).Either:
count_tokensfor Claude-on-Vertex avoidimport vertexaientirely (preferred — keeps deps minimal); orgoogle-cloud-aiplatformto the default proxy extras with a clear error message pointing the operator at the install when it's absent (worse — pulls in a heavy Gemini SDK for a feature that should be SDK-free).Why this matters
Claude Code's headless mode (and any tooling that uses
count_tokensfor budget display) is silently broken against any LiteLLM proxy that routes Claude to Vertex AI unless the operator knows to install the Gemini SDK.Relevant LiteLLM version
v1.83.14-stable(also reproduced against currentmain).Surfaced by
Daily Claude Code × LiteLLM compatibility-matrix cron (PR BerriAI/litellm-docs#142 — see
count_tokens × vertex_aicell).