A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Use Cases
prometheus_scrape against a high-cardinality endpoint OOM-kills the agent inside the source, before any transform runs.
In our case a single broker exposes a /metrics payload of ~30 MiB / ~265k series. Vector parses all of it into Vec<Event> (each Metric carries an owned BTreeMap<String, TagValueSet>), peaks at ~450 MiB per scrape, and OOMs. Downstream we only ship <1K metric names — but by then the damage is done.
Attempted Solutions
A downstream filter transform doesn't help because the parse has already happened. tag_cardinality_limit, expire_metrics_secs, scrape interval staggering, and scrape_timeout_secs don't address the per-scrape parse peak either. There is currently no way to drop metrics at the scrape layer.
Proposal
Add two optional source-level fields applied to the raw exposition text before parsing:
sources:
pinot_broker:
type: prometheus_scrape
endpoints: [http://broker:8080/metrics]
metric_name_allowlist:
- pinot_broker_queries_OneMinuteRate
- "pinot_broker_queryTotalTimeMs_*"
metric_name_denylist:
- "pinot_broker_*_99thPercentile"
- Shell-style globs (
glob is already a workspace dep).
- Empty allowlist + empty denylist →
Cow::Borrowed(body) returned — zero-copy fast path, identical bytes flow to parse_text. Strictly additive; no behavior change for existing users.
- Active filter walks the body line-by-line, preserves
# HELP / # TYPE / unrelated comments, drops data lines whose name fails the predicate.
- Patch fits in one file (
src/sources/prometheus/scrape.rs): two Vec<String> config fields, a small MetricNameFilter struct, a ~40-line helper, one extra line in on_response between from_utf8_lossy and parse_text.
In our test cluster this collapses the per-scrape peak from ~450 MiB to <30 MiB and stops the OOM loop on a 1 GiB pod, with no change for sources that don't set the new fields.
I have a working implementation against current master with unit tests; happy to open a PR if maintainers are open to this direction. Narrower in scope than #18304's full metric_relabel_configs — that could still be layered on later.
References
Version
Reproduced on 0.43.0 and current master (0.49.0).
A note for the community
Use Cases
prometheus_scrapeagainst a high-cardinality endpoint OOM-kills the agent inside the source, before any transform runs.In our case a single broker exposes a
/metricspayload of ~30 MiB / ~265k series. Vector parses all of it intoVec<Event>(eachMetriccarries an ownedBTreeMap<String, TagValueSet>), peaks at ~450 MiB per scrape, and OOMs. Downstream we only ship <1K metric names — but by then the damage is done.Attempted Solutions
A downstream
filtertransform doesn't help because the parse has already happened.tag_cardinality_limit,expire_metrics_secs, scrape interval staggering, andscrape_timeout_secsdon't address the per-scrape parse peak either. There is currently no way to drop metrics at the scrape layer.Proposal
Add two optional source-level fields applied to the raw exposition text before parsing:
globis already a workspace dep).Cow::Borrowed(body)returned — zero-copy fast path, identical bytes flow toparse_text. Strictly additive; no behavior change for existing users.# HELP/# TYPE/ unrelated comments, drops data lines whose name fails the predicate.src/sources/prometheus/scrape.rs): twoVec<String>config fields, a smallMetricNameFilterstruct, a ~40-line helper, one extra line inon_responsebetweenfrom_utf8_lossyandparse_text.In our test cluster this collapses the per-scrape peak from ~450 MiB to <30 MiB and stops the OOM loop on a 1 GiB pod, with no change for sources that don't set the new fields.
I have a working implementation against current
masterwith unit tests; happy to open a PR if maintainers are open to this direction. Narrower in scope than #18304's fullmetric_relabel_configs— that could still be layered on later.References
metric_relabel_configs; this issue proposes a narrow first stepVersion
Reproduced on
0.43.0and currentmaster(0.49.0).