You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The production LiteLLM proxy (modelharbor) runs a fork pinned to upstream v1.82.3-stable.patch.4. Upstream has since advanced to v1.89.1 — 4,094 commits, including fixes the fork already carries as local patches. The operator needs to move the fork onto v1.89.1 so it tracks current upstream, but must (a) preserve the fork-specific license/premium unlock, (b) not regress the Anthropic /v1/messages passthrough cost accounting that was painstakingly fixed, and (c) not corrupt the production database through the multi-version Prisma migration. From the operator's perspective: how to advance to v1.89.1 with confidence that premium features stay unlocked, custom-priced passthrough cost still records > 0, and the production database migrates safely — all verifiable before any production change.
Solution
Create a vv1.89.1 branch on the fork that is upstream v1.89.1 plus only the fork-specific license/premium unlock. The previously-carried six-commit Anthropic passthrough custom-pricing cost fix is dropped because upstream v1.89.1 already merged equivalent logic (the trusted model_info["id"] resolution path, the server_tool_use dict normalization, and a superset of the tests). Before any production change, the upgrade is verified locally-first on the operator's Apple Silicon dev machine: the production database is cloned locally, a native arm64 build of the vv1.89.1 image is run against the clone (exercising both the forward Prisma migration and the live cost path through the real ZAI glm backend), and the operator confirms SpendLogs.response_cost > 0 via Claude Code pointed at the local proxy. Only after local verification passes is the amd64 production image built via CI and deployed to modelharbor.
User Stories
As the proxy operator, I want the fork rebased onto upstream v1.89.1, so that it tracks current upstream fixes and features.
As the proxy operator, I want only the fork-specific license/premium unlock carried forward, so that the branch stays minimal and free of redundant patches.
As the proxy operator, I want the redundant Anthropic passthrough cost patches dropped rather than carried, so that the branch doesn't conflict with upstream's equivalent (richer) implementation.
As the proxy operator, I want the new branch frozen on the immutable v1.89.1 tag (not the moving stable/1.89.x branch), so that the base is a reproducible release point.
As the proxy operator, I want the CI image-build workflow to trigger on the new branch name, so that pushing vv1.89.1 produces a registry image.
As the proxy operator, I want premium features to remain unlocked without a license, so that enterprise functionality works as before.
As the proxy operator, I want the premium unlock to flow through the v1.89.1 useAuthorized hook, so that the UI and API both see premium_user = true without fighting upstream's refactor.
As the proxy operator, I want Anthropic /v1/messages passthrough cost to record response_cost > 0 for custom-priced models, so that spend tracking remains accurate.
As the proxy operator, I want the cost path verified against the real ZAI glm backend with the real custom-pricing model configuration, so that a unit-test green-light isn't mistaken for correct cost accounting.
As the proxy operator, I want the production database cloned to the dev machine and the vv1.89.1 image run against it, so that the Prisma migration is rehearsed on a copy before touching production.
As the proxy operator, I want the litellm-proxy-extras 0.4.63→0.4.74 migrations diffed and dry-run on the clone, so that no destructive schema change reaches production unexamined.
As the proxy operator, I want a current database backup captured before the production deploy, so that the forward-only Prisma migration can be rolled back by restore.
As the proxy operator, I want prisma migrate status checked on the production database before deploy, so that pre-existing drift doesn't corrupt the forward migration.
As the proxy operator, I want the local verification to run on a native arm64 build (throwaway), while production uses the CI-built amd64 image, so that verification is fast and the production architecture is correct.
As the proxy operator, I want the local stack bound to loopback only with a fresh local master key, so that cloned production data is not exposed and existing cloned keys are inert.
As the proxy operator, I want a fresh LiteLLM virtual key generated for the local test, so that I can point Claude Code at the local proxy without reusing production credentials.
As the proxy operator, I want to be handed the exact Claude Code settings (base URL + key + model), so that I can run a test request with minimal setup.
As the proxy operator, I want the end-to-end cost verification to confirm SpendLogs.response_cost > 0 for my test request, so that the go/no-go decision for production is evidence-based.
As the proxy operator, I want the production deployment to pull the CI amd64 image and run docker compose up, so that modelharbor runs the verified v1.89.1 code.
As the proxy operator, I want the previous vv1.82.3-stable.patch.4 image and a database backup retained, so that production can be rolled back if v1.89.1 misbehaves.
As the proxy operator, I want the stale litellm-js/spend-logs schema-sync guidance in CLAUDE.md updated, so that internal docs match v1.89.1 (where that schema file no longer exists).
As the proxy operator, I want the cost-path fallback to be a fresh minimal patch against v1.89.1 (never a replay of the old commits), so that any residual cost issue is fixed cleanly on the new base.
As the proxy operator, I want the local proxy to mirror production's custom per-token pricing for the glm model, so that the cost verification is representative and avoids a false-negative response_cost = 0.
Implementation Decisions
Branch base:vv1.89.1 is created at the immutable upstream v1.89.1 tag, not the moving stable/1.89.x branch.
Carry scope: exactly one local commit — the license/premium unlock. The six-commit Anthropic passthrough custom-pricing cost fix is not carried; upstream v1.89.1 already contains equivalent logic (get_router_model_id() resolving metadata["model_info"]["id"] / litellm_metadata["model_info"]["id"]; use_custom_pricing_for_model(...); completion_cost(custom_pricing=, router_model_id=); response_cost persisted to kwargs and model_call_details; the server_tool_use dict coercion in Usage; and a superset of the tests).
License unlock application: cherry-picked onto v1.89.1. Files that apply cleanly: the CI image-build workflow (added as a new file, absent upstream), the LicenseCheck.is_premium() short-circuit, the UI-token premium_user default, the proxy-server premium_user sites, and the useAuthorized hook returning premium_user: true. The two UI hunks targeting the old local-useState premium pattern are dropped — v1.89.1 refactored that state into the useAuthorized hook, which the kept hook change already forces to true. The proxy-server hunks are accepted with context-fuzz fixes only (the anchors survived upstream churn).
CI image-build workflow: its push trigger is updated from the old branch name to vv1.89.1 so pushing the branch builds a registry image. The workflow retains its fork-specific registry, secrets, and notification hook.
Cost verification, not cost carrying: because the fix is upstream, correctness for the fork's specific deployment is treated as a runtime verification, not a carried patch. The local proxy mirrors production's custom per-token pricing (registered under the deployment model_id) so the custom_pricing / router_model_id path actually fires. If verification shows response_cost == 0, the fallback is a fresh minimal patch against v1.89.1 — never a replay of the obsolete six commits.
Database migration strategy: the production database is cloned to the dev machine; running the vv1.89.1 image against the clone replays the litellm-proxy-extras 0.4.63→0.4.74 forward migrations on a copy. A production backup is captured before deploy; prisma migrate status is checked for drift first. Rollback is image-plus-backup-restore (Prisma migrations are forward-only).
Architecture split: local verification uses a native arm64 build (throwaway — not pushed to the registry or modelharbor). Production uses the CI-built amd64 image. Cost/logging/migration behavior is treated as architecture-independent, so the arm64 verify is a valid go/no-go for those dimensions; the CI build on real amd64 covers any arch-specific build issue.
Local stack isolation: the local proxy and database are bound to loopback; a fresh local master key is used; only key hashes travel in the cloned data (no plaintext secrets). A fresh LiteLLM virtual key is generated for the Claude Code test.
Backend model: the local proxy routes the Anthropic /v1/messages passthrough to the ZAI glm backend using the provided ZAI key, mirroring production's model definition (alias, passthrough format, custom pricing).
Testing Decisions
Principle: test external behavior at the highest possible seam; prefer existing seams; do not test implementation details. Cost accounting is validated through observed response_cost, not through internal call shapes.
Cost code path (existing unit seam): run upstream's Anthropic passthrough logging-handler test suite on the vv1.89.1 branch. This is the highest existing seam for the cost logic and includes the server_tool_use-dict + mocked get_router_model_id → response_cost case. Prior art: the existing passthrough logging-handler tests already in the repo.
Cost in real config (new end-to-end seam): a real /v1/messages passthrough request through a running proxy (cloned data, mirrored custom pricing) to the ZAI glm backend, asserting SpendLogs.response_cost > 0. This seam is required because the unit tests mock the model and cannot validate the custom-pricing registration under the deployment model_id — the config-dependent behavior central to this upgrade. Prior art: the operator's documented verify procedure (stripped input_cost_per_token = 0 is normal; the spend-log response_cost must be > 0).
License/premium (behavioral seam): confirm premium-gated behavior is unlocked via the useAuthorized hook path. No new internal assertions; observe the external behavior.
Database migration (existing Prisma seam):prisma migrate deploy against the cloned production database, then prisma migrate status to confirm completion and no drift. Prior art: Prisma's own migration mechanism and the operator's documented startup-migration troubleshooting.
Production deploy (deployment-boundary seam): modelharbor compose pull && up with the CI amd64 image, proxy health verified, then one live glm request confirming response_cost > 0. Prior art: the operator's documented deploy path.
A good test here is one that would fail if the upgrade regressed cost accounting, premium unlocking, or migration integrity — and that runs against realistic configuration rather than mocks where the behavior is config-dependent.
Out of Scope
Upgrading past v1.89.1 (no forward tracking beyond this release point).
Carrying or re-implementing the obsolete Anthropic passthrough cost commits (upstream already has them); any residual cost issue is handled as a fresh minimal patch only if verification fails.
Changes to production data other than the forward Prisma migration; no data backfills, no manual schema edits.
Generalizing the license/premium unlock into a configurable feature (it remains a fork-specific hardcode).
Other litellm stacks on the modelharbor host (unrelated deployments) — untouched.
Performance/load testing of v1.89.1.
Backporting any v1.89.1 feature to the vv1.82.3-stable.patch.4 line.
Further Notes
The litellm-js/spend-logs/schema.prisma file no longer exists at v1.89.1 (it was consolidated), so the operator's CLAUDE.md schema-sync guidance referencing it is stale and should be updated as part of this work. litellm-proxy-extras is also now a monorepo workspace member rather than a standalone PyPI dependency, which changes the image build path.
The previously-active branch vv1.82.3-stable.patch.4 remains the production base and rollback image until v1.89.1 is verified in production; it is not deleted.
The ZAI API key used for local verification is injected as an environment variable only (never written into tracked config or the image); it should be rotated after use since it appeared in the planning session.
Problem Statement
The production LiteLLM proxy (modelharbor) runs a fork pinned to upstream
v1.82.3-stable.patch.4. Upstream has since advanced tov1.89.1— 4,094 commits, including fixes the fork already carries as local patches. The operator needs to move the fork ontov1.89.1so it tracks current upstream, but must (a) preserve the fork-specific license/premium unlock, (b) not regress the Anthropic/v1/messagespassthrough cost accounting that was painstakingly fixed, and (c) not corrupt the production database through the multi-version Prisma migration. From the operator's perspective: how to advance tov1.89.1with confidence that premium features stay unlocked, custom-priced passthrough cost still records> 0, and the production database migrates safely — all verifiable before any production change.Solution
Create a
vv1.89.1branch on the fork that is upstreamv1.89.1plus only the fork-specific license/premium unlock. The previously-carried six-commit Anthropic passthrough custom-pricing cost fix is dropped because upstreamv1.89.1already merged equivalent logic (the trustedmodel_info["id"]resolution path, theserver_tool_usedict normalization, and a superset of the tests). Before any production change, the upgrade is verified locally-first on the operator's Apple Silicon dev machine: the production database is cloned locally, a native arm64 build of thevv1.89.1image is run against the clone (exercising both the forward Prisma migration and the live cost path through the real ZAI glm backend), and the operator confirmsSpendLogs.response_cost > 0via Claude Code pointed at the local proxy. Only after local verification passes is the amd64 production image built via CI and deployed to modelharbor.User Stories
v1.89.1, so that it tracks current upstream fixes and features.v1.89.1tag (not the movingstable/1.89.xbranch), so that the base is a reproducible release point.vv1.89.1produces a registry image.useAuthorizedhook, so that the UI and API both seepremium_user = truewithout fighting upstream's refactor./v1/messagespassthrough cost to recordresponse_cost > 0for custom-priced models, so that spend tracking remains accurate.vv1.89.1image run against it, so that the Prisma migration is rehearsed on a copy before touching production.litellm-proxy-extras0.4.63→0.4.74 migrations diffed and dry-run on the clone, so that no destructive schema change reaches production unexamined.prisma migrate statuschecked on the production database before deploy, so that pre-existing drift doesn't corrupt the forward migration.SpendLogs.response_cost > 0for my test request, so that the go/no-go decision for production is evidence-based.docker compose up, so that modelharbor runs the verifiedv1.89.1code.vv1.82.3-stable.patch.4image and a database backup retained, so that production can be rolled back ifv1.89.1misbehaves.litellm-js/spend-logsschema-sync guidance in CLAUDE.md updated, so that internal docs matchv1.89.1(where that schema file no longer exists).v1.89.1(never a replay of the old commits), so that any residual cost issue is fixed cleanly on the new base.response_cost = 0.Implementation Decisions
vv1.89.1is created at the immutable upstreamv1.89.1tag, not the movingstable/1.89.xbranch.v1.89.1already contains equivalent logic (get_router_model_id()resolvingmetadata["model_info"]["id"]/litellm_metadata["model_info"]["id"];use_custom_pricing_for_model(...);completion_cost(custom_pricing=, router_model_id=);response_costpersisted to kwargs andmodel_call_details; theserver_tool_usedict coercion inUsage; and a superset of the tests).v1.89.1. Files that apply cleanly: the CI image-build workflow (added as a new file, absent upstream), theLicenseCheck.is_premium()short-circuit, the UI-tokenpremium_userdefault, the proxy-serverpremium_usersites, and theuseAuthorizedhook returningpremium_user: true. The two UI hunks targeting the old local-useStatepremium pattern are dropped —v1.89.1refactored that state into theuseAuthorizedhook, which the kept hook change already forces to true. The proxy-server hunks are accepted with context-fuzz fixes only (the anchors survived upstream churn).vv1.89.1so pushing the branch builds a registry image. The workflow retains its fork-specific registry, secrets, and notification hook.custom_pricing/router_model_idpath actually fires. If verification showsresponse_cost == 0, the fallback is a fresh minimal patch againstv1.89.1— never a replay of the obsolete six commits.vv1.89.1image against the clone replays thelitellm-proxy-extras0.4.63→0.4.74 forward migrations on a copy. A production backup is captured before deploy;prisma migrate statusis checked for drift first. Rollback is image-plus-backup-restore (Prisma migrations are forward-only)./v1/messagespassthrough to the ZAI glm backend using the provided ZAI key, mirroring production's model definition (alias, passthrough format, custom pricing).Testing Decisions
response_cost, not through internal call shapes.vv1.89.1branch. This is the highest existing seam for the cost logic and includes theserver_tool_use-dict + mockedget_router_model_id→response_costcase. Prior art: the existing passthrough logging-handler tests already in the repo./v1/messagespassthrough request through a running proxy (cloned data, mirrored custom pricing) to the ZAI glm backend, assertingSpendLogs.response_cost > 0. This seam is required because the unit tests mock the model and cannot validate the custom-pricing registration under the deployment model_id — the config-dependent behavior central to this upgrade. Prior art: the operator's documented verify procedure (strippedinput_cost_per_token = 0is normal; the spend-logresponse_costmust be> 0).useAuthorizedhook path. No new internal assertions; observe the external behavior.prisma migrate deployagainst the cloned production database, thenprisma migrate statusto confirm completion and no drift. Prior art: Prisma's own migration mechanism and the operator's documented startup-migration troubleshooting.compose pull && upwith the CI amd64 image, proxy health verified, then one live glm request confirmingresponse_cost > 0. Prior art: the operator's documented deploy path.Out of Scope
v1.89.1(no forward tracking beyond this release point).v1.89.1.v1.89.1feature to thevv1.82.3-stable.patch.4line.Further Notes
litellm-js/spend-logs/schema.prismafile no longer exists atv1.89.1(it was consolidated), so the operator's CLAUDE.md schema-sync guidance referencing it is stale and should be updated as part of this work.litellm-proxy-extrasis also now a monorepo workspace member rather than a standalone PyPI dependency, which changes the image build path.v1.89.1confirms the drop is safe.vv1.82.3-stable.patch.4remains the production base and rollback image untilv1.89.1is verified in production; it is not deleted.