Summary
When using controller-runtime's priority queue (UsePriorityQueue: true), the workqueue_depth metric can cause unbounded memory growth and extremely slow metrics serialization times, leading to Prometheus scrape timeouts.
Problem
The workqueue_depth metric includes a priority label that creates a new metric time series for each unique priority value:
workqueue_depth{name="my-controller", controller="my-controller", priority="0"} 0
workqueue_depth{name="my-controller", controller="my-controller", priority="1"} 0
workqueue_depth{name="my-controller", controller="my-controller", priority="12345"} 0
The Prometheus client library never automatically cleans up these metric entries, even after items are dequeued. If an application uses incrementing priority values (e.g., for LIFO ordering), each enqueue creates a persistent metric entry.
Impact
In a real-world scenario with ~70 controllers using incrementing priorities:
- 810K+ unique metric entries accumulated over time
- 15+ second metrics serialization time (exceeds typical 10s scrape timeout)
- Prometheus scrape failures with "broken pipe" errors
Root Cause
The metric is defined in pkg/internal/metrics/workqueue.go:
var (
depth = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Subsystem: WorkQueueSubsystem,
Name: DepthKey,
Help: "Current depth of workqueue by workqueue and priority",
}, []string{"name", "controller", "priority"}) // <-- priority label
And used in depthWithPriorityMetric:
func (g *depthWithPriorityMetric) Inc(priority int) {
depth.WithLabelValues(append(g.lvs, strconv.Itoa(priority))...).Inc()
}
Each unique priority integer becomes a distinct label value, creating a new persistent metric entry.
Workaround
Applications using custom priorities should bound their priority values to a small range (e.g., 0-100) to limit metric cardinality:
const maxPriority = 100
var priorityCounter atomic.Int32
func enqueue(item T) {
priority := priorityCounter.Add(1)
if priority >= maxPriority {
priorityCounter.Store(0)
}
queue.AddWithOpts(priorityqueue.AddOpts{Priority: ptr.To(int(priority))}, item)
}
Potential Solutions
-
Document the cardinality risk - Add warnings to priority queue documentation about metric cardinality when using custom priorities
-
Make priority label optional - Add configuration to disable the priority label for users who don't need per-priority observability
Environment
- controller-runtime version: v0.22.2
- Go version: 1.24.x
- Kubernetes version: 1.31.x
Summary
When using controller-runtime's priority queue (
UsePriorityQueue: true), theworkqueue_depthmetric can cause unbounded memory growth and extremely slow metrics serialization times, leading to Prometheus scrape timeouts.Problem
The
workqueue_depthmetric includes aprioritylabel that creates a new metric time series for each unique priority value:The Prometheus client library never automatically cleans up these metric entries, even after items are dequeued. If an application uses incrementing priority values (e.g., for LIFO ordering), each enqueue creates a persistent metric entry.
Impact
In a real-world scenario with ~70 controllers using incrementing priorities:
Root Cause
The metric is defined in
pkg/internal/metrics/workqueue.go:And used in
depthWithPriorityMetric:Each unique priority integer becomes a distinct label value, creating a new persistent metric entry.
Workaround
Applications using custom priorities should bound their priority values to a small range (e.g., 0-100) to limit metric cardinality:
Potential Solutions
Document the cardinality risk - Add warnings to priority queue documentation about metric cardinality when using custom priorities
Make priority label optional - Add configuration to disable the priority label for users who don't need per-priority observability
Environment