Make metadata pod lookups more resilient to short lived processes#2094
Merged
ddelnano merged 3 commits intopixie-io:mainfrom Jan 24, 2025
Conversation
This opts the df.ctx['pod'] syntax sugar to try another pod name lookup if the default upid -> pod name lookup fails. This failure is common for pods with short lived processes, so using a pod IP based lookup (local_addr) is attempted if the first lookup fails Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
aimichelle
reviewed
Jan 24, 2025
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>
aimichelle
approved these changes
Jan 24, 2025
ddelnano
added a commit
to ddelnano/pixie
that referenced
this pull request
Aug 6, 2025
…xie-io#2094) Summary: Make metadata pod lookups more resilient to short lived processes This is a continuation of the work started from pixie-io#1989. Since the `local_addr` column is populated for client side traces, it can be used as a fallback lookup for these traces. This doesn't solve all of the permutations of missing short lived processes (pixie-io#1638), but provides more coverage than before. Relevant Issues: pixie-io#1638 Type of change: /kind bugfix Test Plan: Verified the following - [x] Compared the performance with and without this change with `src/e2e_test/vizier/exectime:exectime`. This change has a minor performance impact, but it closes the gap on certain situations that previously caused users to distrust Pixie's instrumentation ``` # Performance baseline $ ./exectime benchmark -a testing.getcosmic.ai:443 -c <cluster_id> 2>&1 | tee baseline_for_simple_udf_swap_e20880ffd.txt # Performance of this change ./exectime benchmark -a testing.getcosmic.ai:443 -c <cluster_id> 2>&1 | tee simple_udf_swap_cd217c05c.txt ``` [simple_udf_swap_cd217c05c.txt](https://github.com/user-attachments/files/18497709/simple_udf_swap_cd217c05c.txt) [baseline_for_simple_udf_swap_e20880ffd.txt](https://github.com/user-attachments/files/18497710/baseline_for_simple_udf_swap_e20880ffd.txt) - [x] Ran `for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; done` within a pod and verified that with this change all traces are shown, without this change a significant number of traces are missed. See before and after screenshots below:   Changelog Message: Fix a certain class of cases where Pixie previously missed protocol traces from short lived connections --------- Signed-off-by: Dom Del Nano <ddelnano@gmail.com> GitOrigin-RevId: 623e988
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary: Make metadata pod lookups more resilient to short lived processes
This is a continuation of the work started from #1989. Since the
local_addrcolumn is populated for client side traces, it can be used as a fallback lookup for these traces. This doesn't solve all of the permutations of missing short lived processes (#1638), but provides more coverage than before.Relevant Issues: #1638
Type of change: /kind bugfix
Test Plan: Verified the following
src/e2e_test/vizier/exectime:exectime. This change has a minor performance impact, but it closes the gap on certain situations that previously caused users to distrust Pixie's instrumentationsimple_udf_swap_cd217c05c.txt
baseline_for_simple_udf_swap_e20880ffd.txt
for i in $(seq 0 1000); do curl http://google.com/$i; sleep 2; donewithin a pod and verified that with this change all traces are shown, without this change a significant number of traces are missed. See before and after screenshots below:Changelog Message: Fix a certain class of cases where Pixie previously missed protocol traces from short lived connections