optimize openmetrics text parsing (~4x perf)#402
optimize openmetrics text parsing (~4x perf)#402brian-brazil merged 10 commits intoprometheus:masterfrom
Conversation
e6d7944 to
7832fed
Compare
brian-brazil
left a comment
There was a problem hiding this comment.
I don't think this is correct, on several points.
There was a problem hiding this comment.
internal functions and constants should begin with _
There was a problem hiding this comment.
Shouldn't this comparison be against 0?
This is also going to be n^2 overall if there's many backslashes.
There was a problem hiding this comment.
it's n^2 in the worst case yes when we're escaping too many characters, I've mad a change by calling the function with smaller arguments _is_character_escaped(value_substr[:i], i) instead of _is_character_escaped(value_substr, i)
Let me know if you're thinking of a possible better solution to optimize 👍
There was a problem hiding this comment.
Why are you lstipping here?
There was a problem hiding this comment.
What if label_name is zero length?
There was a problem hiding this comment.
Yeah, this is n^2. Work from the start in one loop.
There was a problem hiding this comment.
it's n^2 in the worst case yes when we're escaping too many characters, I've mad a change by calling the function with smaller arguments _is_character_escaped(value_substr[:i], i) instead of _is_character_escaped(value_substr, i)
Let me know if you're thinking of a possible better solution to optimize 👍
There was a problem hiding this comment.
Couldn't you check each value, rather than the whole string?
There was a problem hiding this comment.
What if " # " is part of a label value?
You should also add a testcase for this so it doesn't trip up someone else.
There was a problem hiding this comment.
Tabs aren't supported.
There was a problem hiding this comment.
This is not a clean way to do this. Use find instead.
|
Thank you for reviewing the PR, I've made the requested changes and added a test case, the parsing logic is more solid now. |
There was a problem hiding this comment.
value_substr[-1] is more succinct
There was a problem hiding this comment.
This is still here, don't look inside error strings
There was a problem hiding this comment.
This is guaranteed to be right after the equals.
There was a problem hiding this comment.
Shouldn't the MetricFamily code catch this already?
There was a problem hiding this comment.
This is guaranteed to be after the ", if present.
There was a problem hiding this comment.
Use find instead of index
There was a problem hiding this comment.
What if it (incorrectly) does?
There was a problem hiding this comment.
changed it to if text.count(seperator) == 0:
it should garantee that 👍
|
Here is the benchmark after the changes we made in this PR, the perfs got even better now ~3.9x |
|
That looks about right. Could you expand the unittests to ensure we're covering everything for both the old and new way of parsing labels? Also, it'd be great if you could add tests for any things this PR had incorrect at any point as if you've made this mistake others likely will too and the tests are going to be used as the openmetrics test suite. |
|
Added test cases, I guess the PR is ready for a final review :) |
There was a problem hiding this comment.
How would this handle something like {a} ?
There was a problem hiding this comment.
made the required changes and added some test cases like that
tests/openmetrics/test_parser.py
Outdated
There was a problem hiding this comment.
As these tests will become the official regression suite for OpenMetrics, it'd be best to test a full line rather than just a function.
There was a problem hiding this comment.
added multiple test cases to assert what function are called 👍
Signed-off-by: Ahmed Mezghani <ahmed.mezghani@outlook.com>
Signed-off-by: Ahmed Mezghani <ahmed.mezghani@outlook.com>
Signed-off-by: Ahmed Mezghani <ahmed.mezghani@outlook.com>
Signed-off-by: Ahmed Mezghani <ahmed.mezghani@outlook.com>
Signed-off-by: Ahmed Mezghani <ahmed.mezghani@outlook.com>
Signed-off-by: Ahmed Mezghani <ahmed.mezghani@outlook.com>
Signed-off-by: Ahmed Mezghani <ahmed.mezghani@outlook.com>
Signed-off-by: Ahmed Mezghani <ahmed.mezghani@outlook.com>
Signed-off-by: Ahmed Mezghani <ahmed.mezghani@outlook.com>
Signed-off-by: Ahmed Mezghani <ahmed.mezghani@outlook.com>
|
Thanks! |
|
Great! @brian-brazil :) any plans to release soon? |
|
I'll add it to my todo list. |
This PR optimizes the openmetrics parser using the logic introduced in #282 to optimize the prometheus parser.
Here are some benchmark using
timeit: