Automated saturation finder for the portal HTTP streaming service. See docs/specs/2026-05-11-design.md for the design.
- Daily CronJob (
saturation-daily) walks each query profile's concurrency upward in additive steps. - Each step runs as a k6
TestRun(handled by the upstream k6-operator). - After each step the controller queries
vmselectfor throughput and time-weighted error rate. - On plateau or error-budget breach, the controller records the last good level and moves to the next profile.
- Per-profile summary metrics are pushed to
vminsertand visualized in Grafana.
| Path | What |
|---|---|
controller/ |
Python controller image (saturation) |
k6/stream.js |
Per-user k6 streaming script |
chart/ |
Helm chart: namespace, RBAC, ConfigMaps, CronJob |
grafana/dashboard.json |
Grafana dashboard (datasource UID victoria) |
- VictoriaMetrics with
vmselectandvminsertreachable in-cluster. - k6-operator installed (upstream chart):
helm install k6-operator grafana/k6-operator -n k6-operator-system --create-namespace. - portal deployed in a load environment, exposing the
/datasets/...API.
cd controller
docker build -t <your-registry>/loadtest-portal-controller:<tag> .
docker push <your-registry>/loadtest-portal-controller:<tag>Author a values.yaml with your profiles:
controller:
image:
repository: <your-registry>/loadtest-portal-controller
tag: <tag>
portal:
namespace: mainnet-load-portal
deployment: portal
baseUrl: http://portal.mainnet-load-portal:8080
profiles:
- name: evm-topic-scan
dataset: ethereum-mainnet
stream_url: /datasets/ethereum-mainnet/stream
blocks_per_request: 100
query: { ... }Install:
helm install loadtest-portal ./chart -f values.yamlkubectl -n portal-loadtest create job --from=cronjob/saturation-daily \
run-release-$(date +%s) -- env RUN_TYPE=release VERSION=<release-tag>(RUN_TYPE and VERSION overrides are picked up by the controller. Without overrides, the controller resolves VERSION from the portal Deployment image tag.)
curl -X POST -H "Content-Type: application/json" \
-u admin:<pass> \
http://grafana.example.com/api/dashboards/db \
-d @grafana/dashboard.jsonA new profile should have blocks_per_request set so a healthy 200-status response lands in ~3–6 seconds (so a 30s measurement window contains multiple completions per user). If you're not sure, start with blocks_per_request: 10 and adjust based on a smoke run.
If a saturation test plateaus at a suspiciously round number, check container_cpu_usage_seconds_total on the runner pods — the load generator may be the bottleneck. Increase runner.parallelism (more pods, fewer VUs per pod) and re-test.
- gzip only in v1. Real clients prefer zstd; saturation numbers will differ once zstd is enabled.
- No locking — operator confirms no active run before triggering manually.
See docs/specs/2026-05-11-design.md § "Known limitations and follow-ups".