Automatically record heap profiles in testplans#147
Conversation
Add option to record heap profiles pre and post gc after every round
9209d78 to
664d189
Compare
add incremental recording of stress test vals
644e9c5 to
e1bc2f3
Compare
|
|
||
| [global] | ||
| plan = "graphsync" | ||
| case = "stress" |
There was a problem hiding this comment.
Not sure how different this test plan is to the one that is at https://github.com/filecoin-project/lotus/tree/master/testplans/graphsync , but if they are different (and I believe they are), we should change the name of the case to something else.
The automated dashboards, such as https://ci.testground.ipfs.team/dashboard?task_id=c09omhl5p7a858f1470g rely on uniqueness of the name... i.e. if we have 2 testplans that do different stuff and share the same plan:case , then dashboards would be meaningless.
Nothing urgent, just explaining in case we decide we want to run this on TaaS as well, periodically, which I think would be nice.
| id = "providers" | ||
| instances = { count = 1 } | ||
| [groups.resources] | ||
| memory = "4096Mi" |
There was a problem hiding this comment.
Running this test with the following params, results in OOMKilled error for the provider.
[global.run.test_params]
size = "8GB"
latencies = '["20ms"]'
bandwidths = '["128MiB"]'
[[groups]]
id = "providers"
instances = { count = 1 }
[groups.resources]
memory = "2048Mi"
cpu = "1000m"
[[groups]]
id = "requestors"
instances = { count = 1 }
[groups.resources]
memory = "2048Mi"
cpu = "1000m"
There was a problem hiding this comment.
should be fixed now.
There was a problem hiding this comment.
I am still getting OOMKilled when I try to run the memory-stress-k8s.toml with size > memory for providers and requestors.
Goals
Memory performance is an ongoing concern in graphsync, with folks wanting insight into what's happening. In an effort to support this more effectively, this PR adds an option to record heap profiles automatically after every round in the stress testplan, and optionally after every 10 blocks are sent or received. This allows people running the test to be able to inspect memory performance in the results to identify issues.
Implementation