[FLINK-39059][models] Add unified inference metrics for model functions by dubin555 · Pull Request #27724 · apache/flink

dubin555 · 2026-03-02T13:58:31Z

What is the purpose of the change

The flink-models module (both flink-model-triton and flink-model-openai) currently has no metric instrumentation. Users running model inference in production have no visibility into request rates, error rates, or latency — making it impossible to set up monitoring or alerting for inference degradation.

This PR adds unified inference metrics to both Triton and OpenAI model function base classes, following the same MetricGroup patterns used elsewhere in Flink (e.g., CachingAsyncLookupFunction, AsyncMLPredictRunner).

Four metrics are registered under the model_inference group:

Metric	Type	Description
`inference_requests`	Counter	Total inference requests initiated
`inference_requests_success`	Counter	Successful inference completions
`inference_requests_failure`	Counter	Failed requests (network, HTTP, parse errors)
`inference_latency_ms`	Gauge	Last inference round-trip time in ms

Brief change log

Added metric fields and registerMetrics() call in AbstractTritonModelFunction.open() so all Triton subclasses automatically get metrics
Instrumented all success/failure paths in TritonInferenceModelFunction.asyncPredict() with counter increments and latency tracking
Added metric fields, registration, and whenComplete() instrumentation in AbstractOpenAIModelFunction.open() / asyncPredict()
Null inputs and context-overflow-skipped inputs in OpenAI are filtered before incrementing request counts to avoid inflation
Used a volatile long gauge for latency rather than histogram, since DescriptiveStatisticsHistogram lives in flink-runtime which is not available as a dependency in flink-models

Verifying this change

This change added tests and can be verified as follows:

Added TritonInferenceMetricsTest — integration test using MockWebServer that verifies metrics are correctly registered and updated after successful inference calls
Added OpenAIInferenceMetricsTest — integration test using MockWebServer that verifies chat inference metrics and null-input skip behavior
Existing tests pass unchanged (no behavioral changes to inference logic)

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? yes
If yes, how is the feature documented? JavaDocs

Add inference metrics (request count, success/failure counters, latency gauge) to both Triton and OpenAI model inference functions. The flink-models module previously had zero MetricGroup/Counter/Gauge references, making it impossible to monitor model inference performance in production. Metrics registered under "model_inference" group: - inference_requests: total inference requests - inference_requests_success: successful completions - inference_requests_failure: failed requests (network, HTTP errors, parse) - inference_latency_ms: last inference round-trip latency

flinkbot · 2026-03-02T14:10:56Z

CI report:

2fa6e52 Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-39059][models] Add unified inference metrics for model functions#27724

[FLINK-39059][models] Add unified inference metrics for model functions#27724
dubin555 wants to merge 1 commit intoapache:masterfrom
dubin555:oss-scout/verify-add-model-inference-metrics

dubin555 commented Mar 2, 2026

Uh oh!

flinkbot commented Mar 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dubin555 commented Mar 2, 2026

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

flinkbot commented Mar 2, 2026 •

edited

Loading