Skip to content

feat(ai-monitoring): Fetch model context size and rename task to fetch_ai_model_info#112656

Open
constantinius wants to merge 5 commits intomasterfrom
constantinius/feat/tasks/ai-agent-monitoring-fetch-llm-context-size
Open

feat(ai-monitoring): Fetch model context size and rename task to fetch_ai_model_info#112656
constantinius wants to merge 5 commits intomasterfrom
constantinius/feat/tasks/ai-agent-monitoring-fetch-llm-context-size

Conversation

@constantinius
Copy link
Copy Markdown
Contributor

@constantinius constantinius commented Apr 10, 2026

Closes https://linear.app/getsentry/issue/TET-2219/sentry-map-llm-context-size-to-relay-cost-calculation-config

Introduces a new AIModelMetadata schema that nests costs under a costs field and adds an optional contextSize. Context size is fetched from OpenRouter's context_length field and models.dev's limit.context field, following the same precedence logic as costs (OpenRouter takes priority).

Architecture

Both tasks run independently on the same cron schedule (every 30 min):

Task Cache key Format
fetch_ai_model_costs (legacy, TODO remove) ai-model-costs:v2 Flat AIModelCostV2
fetch_ai_model_metadata (new) ai-model-metadata:v1 Nested AIModelMetadata

They share helper functions (_normalize_model_id, _create_prefix_glob_model_name, safe_float_conversion) but fetch and cache independently. The old task + cache key will be removed once all consumers have migrated.

GlobalConfig serves both fields side by side:

  • aiModelCosts — legacy flat format (read by Relay today)
  • aiModelMetadata — new nested format with contextSize (not yet consumed by Relay)

New schema (ai-model-metadata:v1)

{
  "version": 1,
  "models": {
    "gpt-4": {
      "costs": {
        "inputPerToken": 0.0000003,
        "outputPerToken": 0.00000165,
        "outputReasoningPerToken": 0.0,
        "inputCachedPerToken": 0.0000015,
        "inputCacheWritePerToken": 0.00001875
      },
      "contextSize": 1000000
    },
    "claude-3-5-sonnet": {
      "costs": {
        "inputPerToken": 0.000003,
        "outputPerToken": 0.000015,
        "outputReasoningPerToken": 0.0,
        "inputCachedPerToken": 0.0000015,
        "inputCacheWritePerToken": 0.00000375
      }
    }
  }
}

contextSize is optional — only present when the source API provides it.

New types

Type Purpose
AIModelCost Token cost fields (nested under costs)
AIModelMetadata Per-model entry: costs + optional contextSize
AIModelMetadataConfig Top-level config: version + models dict

Legacy types (AIModelCostV2, AIModelCosts) are kept unchanged for the old task.

Co-Authored-By: Claude Sonnet 4 noreply@anthropic.com

…h_ai_model_info

Extend the fetch_ai_model_costs task to also fetch context size (context
window length) for each AI model alongside token costs. Context size is
sourced from OpenRouter's context_length field and models.dev's
limit.context field, following the same precedence logic as costs
(OpenRouter takes priority).

The task is renamed from fetch_ai_model_costs to fetch_ai_model_info
since it now fetches more than just cost data. The AIModelCostV2 type
gains an optional contextSize field (int).

Updated references:
- Task registration name in server.py cron schedule
- Logger metric names in warning messages
- All test imports, method names, and assertions

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@constantinius constantinius requested review from a team as code owners April 10, 2026 10:01
@linear-code
Copy link
Copy Markdown

linear-code bot commented Apr 10, 2026

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 10, 2026
Comment on lines +20 to +26
class AIModelCostV2(TypedDict, total=False):
inputPerToken: Required[float]
outputPerToken: Required[float]
outputReasoningPerToken: Required[float]
inputCachedPerToken: Required[float]
inputCacheWritePerToken: Required[float]
contextSize: int
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should update the config version, or use another structure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a new field in the config would be the best approach, because I can see us wanting to expand this even further for non cost related things. I am aware that it complicates this all a bit now, but it will allow us to do a quick metadata changes in future.

That new field could be called LLMModelMetadata and for each model would contain cost and context information for now, with a chance to expand it in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added type defs for a new schema.

Comment on lines +20 to +26
class AIModelCostV2(TypedDict, total=False):
inputPerToken: Required[float]
outputPerToken: Required[float]
outputReasoningPerToken: Required[float]
inputCachedPerToken: Required[float]
inputCacheWritePerToken: Required[float]
contextSize: int
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a new field in the config would be the best approach, because I can see us wanting to expand this even further for non cost related things. I am aware that it complicates this all a bit now, but it will allow us to do a quick metadata changes in future.

That new field could be called LLMModelMetadata and for each model would contain cost and context information for now, with a chance to expand it in the future.

…acy task

Introduce a new LLMModelMetadata schema with costs nested under a
'costs' field and an optional contextSize. Context size is fetched from
OpenRouter's context_length and models.dev's limit.context fields.

Both tasks run independently on the same cron schedule:
- fetch_ai_model_costs -> writes ai-model-costs:v2 (flat AIModelCostV2)
- fetch_llm_model_metadata -> writes llm-model-metadata:v1 (nested LLMModelMetadata)

They share raw fetch helpers (_fetch_openrouter_raw, _fetch_models_dev_raw)
but format and cache independently. The old task + cache key will be
removed once all consumers have migrated.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
…acy task

Introduce a new LLMModelMetadata schema with costs nested under a
'costs' field and an optional contextSize. Context size is fetched from
OpenRouter's context_length and models.dev's limit.context fields.

Both tasks run independently on the same cron schedule:
- fetch_ai_model_costs -> writes ai-model-costs:v2 (flat AIModelCostV2)
- fetch_llm_model_metadata -> writes llm-model-metadata:v1 (nested LLMModelMetadata)

They share raw fetch helpers (_fetch_openrouter_raw, _fetch_models_dev_raw)
but format and cache independently. The old task + cache key will be
removed once all consumers have migrated.

GlobalConfig now serves both fields side by side:
- aiModelCosts: legacy flat format (TODO remove)
- llmModelMetadata: new nested format with contextSize

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

Backend Test Failures

Failures on 80438bc in this run:

tests/sentry/api/endpoints/test_relay_globalconfig_v3.py::test_global_configlog
[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/api/endpoints/test_relay_globalconfig_v3.py:72: in test_global_config
    assert normalized == config
E   AssertionError: assert {'aiModelCost....0, ...}, ...} == {'aiModelCost.....]}]}}}, ...}
E     
E     Omitting 5 identical items, use -vv to show
E     Right contains 1 more item:
E     {'llmModelMetadata': None}
E     
E     Full diff:
E       {
E           'aiModelCosts': None,
E     -     'llmModelMetadata': None,
E           'measurements': {
E               'builtinMeasurements': [
E                   {
E                       'name': 'app_start_cold',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'app_start_warm',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'cls',
E                       'unit': 'none',
E                   },
E                   {
E                       'name': 'connection.rtt',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fcp',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fid',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fp',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'frames_frozen_rate',
E                       'unit': 'ratio',
E                   },
E                   {
E                       'name': 'frames_frozen',
E                       'unit': 'none',
... (1493 more lines)
tests/sentry/api/endpoints/test_relay_globalconfig_v3.py::test_global_config_valid_with_generic_filterslog
[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/api/endpoints/test_relay_globalconfig_v3.py:127: in test_global_config_valid_with_generic_filters
    assert config == normalize_global_config(config)
E   AssertionError: assert {'aiModelCost...ts': 10}, ...} == {'aiModelCost.....]}]}}}, ...}
E     
E     Omitting 5 identical items, use -vv to show
E     Left contains 1 more item:
E     {'llmModelMetadata': None}
E     
E     Full diff:
E       {
E           'aiModelCosts': None,
E           'filters': {
E               'filters': [
E                   {
E                       'condition': {
E                           'inner': {
E                               'name': 'event.contexts.browser.name',
E                               'op': 'eq',
E                               'value': 'Firefox',
E                           },
E                           'op': 'not',
E                       },
E                       'id': 'test-id',
E                       'isEnabled': True,
E                   },
E               ],
E               'version': 1,
E           },
E     +     'llmModelMetadata': None,
E           'measurements': {
E               'builtinMeasurements': [
E                   {
E                       'name': 'app_start_cold',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'app_start_warm',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'cls',
E                       'unit': 'none',
E                   },
E                   {
E                       'name': 'connection.rtt',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fcp',
... (1483 more lines)

Relay's normalize_global_config strips unknown fields, causing
test_relay_globalconfig_v3 failures. The new cache is still populated
and readable via llm_model_metadata_config() but should not be added
to Relay's GlobalConfig until Relay supports the field.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
"measurements": get_measurements_config(),
"aiModelCosts": ai_model_costs_config(),
"aiModelCosts": ai_model_costs_config(), # TODO: Remove once all consumers use aiModelMetadata
"aiModelMetadata": ai_model_metadata_config(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this schema already added to Relay?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a PR in Relay for that: getsentry/relay#5814

@jjbayer
Copy link
Copy Markdown
Member

jjbayer commented Apr 13, 2026

A general problem with config migrations like this is that customer Relays also receive the global config, and they might still expect the old format. So the safer way forward is to leave the old format in place and add an additional key.

@constantinius
Copy link
Copy Markdown
Contributor Author

A general problem with config migrations like this is that customer Relays also receive the global config, and they might still expect the old format. So the safer way forward is to leave the old format in place and add an additional key.

@jjbayer the previous task and config remains for a time

…nfig

Rename all LLM prefixes to AI for consistency with existing naming:
- LLMModelCost -> AIModelCost
- LLMModelMetadata -> AIModelMetadata
- LLMModelMetadataConfig -> AIModelMetadataConfig
- llm-model-metadata:v1 -> ai-model-metadata:v1
- fetch_llm_model_metadata -> fetch_ai_model_metadata
- llm_model_metadata_config -> ai_model_metadata_config

Add aiModelMetadata to GlobalConfig alongside aiModelCosts. Relay's
normalize_global_config strips unknown fields, so relay globalconfig
tests pop aiModelMetadata before comparing until Relay adds support.

Restore original fetch_ai_model_costs code untouched from master.
New fetch_ai_model_metadata task appended at the bottom of the file.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@constantinius constantinius force-pushed the constantinius/feat/tasks/ai-agent-monitoring-fetch-llm-context-size branch from ae0f9b1 to f81a89a Compare April 13, 2026 12:52
Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f81a89a. Configure here.

outputPerToken: float
outputReasoningPerToken: float
inputCachedPerToken: float
inputCacheWritePerToken: float
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical TypedDict duplicated under a new name

Low Severity

AIModelCost is field-for-field identical to AIModelCostV2. A simple type alias (AIModelCost = AIModelCostV2) or reusing the existing type in AIModelMetadata would avoid the duplication while still allowing the legacy type to be independently removed later.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f81a89a. Configure here.

@jjbayer
Copy link
Copy Markdown
Member

jjbayer commented Apr 14, 2026

A general problem with config migrations like this is that customer Relays also receive the global config, and they might still expect the old format. So the safer way forward is to leave the old format in place and add an additional key.

@jjbayer the previous task and config remains for a time

OK as long as we double-write in sentry before we deploy the relay change this should be fine. I discussed with @Dav1dde that customer relays should not be a problem because they will just skip this part of normalization once the old config key is gone, which means our own relays will populate the attributes. Since metrics extraction is essentially dead, we no longer require doing this on the edge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants