Conversation
Using gotestsum with --rerun-failed to run these tests will generate output that simulates passing, skipped, failing, and flaky tests. The output from this can be used to generate test data for the flaky test detection test.
- Detect a test being run multiple times - Capture these additional runs as test retries - Detect a fail->pass transition as a flaky test - Additionally, fix start/stop time detection of tests
f5495cd to
caff748
Compare
caff748 to
962b295
Compare
|
Thank you @gca3020, I really appreciate your contribution and also for highlighting the future direction of Go regarding flaky tests. I've reviewed the code and tested end-to-end using gotestsum --rerun-fails and it works nicely, a great addition. On the breaking change point, thanks for flagging, I agree it's the correct approach. The current behaviour of emitting a separate TestResult for each retry attempt was never the intended design and collapsing retries into a single result with nested retryAttempts is the right model per spec. That being said I'll release this as v0.1.0 to give users a clear upgrade signal. I'll merge and release soon! Thanks again! |
|
Thanks! One other thing I noticed while implementing this and fighting the linter, is that gotestsum provides a package (testjson) for parsing the Go test json format. It might be worth updating the reporter to use that in the future, as it's maintained, well tested, and supports go test functionality like benchmarks and race detection. Regardless, thanks for creating this project in the first place; it's really great to be able to have good annotations in test runs! |
Detect Retries and Flaky Tests
Note
This could be considered a breaking change, since a test with RetryAttempts will be reported differently than it was before. This will only occur when using a test harness that supports retries (see below), and since this project is still <1.0, I figured this would be acceptable, but wanted to call it out regardless. The new ctrf results file continues to conform to the specification documented at https://ctrf.io/docs/specification/
What's in this PR?
This PR refactors some of the detection logic, to support accurately reporting test retries, and flaky tests. This is a gap in the current implementation, and this functionality is requested in #12.
Additionally, this PR fixes some of the Start/Stop/Duration logic, which is required to accurately grab the test output of only a single retry.
But I thought Go didn't do retries?
They don't, yet. There is currently an open (and accepted) proposal to add this to the core
go testfunctionality. See golang/go#62244 for more details.In the meantime, the gotestsum package supports re-running failed tests using the
--rerun-fails=nand--rerun-fails-max-failures=nflags. Using these flags, after the full suite is complete, the subset of failed tests is re-run as part of a new suite. This repeats as much as specified, with each new run containing just the tests that failed in the last run.So how do we detect them?
Essentially, as we are parsing the JSON "events" coming from the go test output, we look back through the list of already-captured
TestResults to see if we already have an instance of this test in our report. If no, the behavior continues as it does today, and we append the new TestResult to the list, and update the summary counters as normal.However, if we already have a "matching" test, then we instead update that test object, populating the list of RetryAttempts as per the spec in https://ctrf.io/docs/specification/test#retry-attempt-object.
What about the Start/Stop time changes?
With the nature of Go's test2json output, we know that a test has a "Start Timestamp" when the
runevent appears in the JSON output. We capture this time as the "start" time of a specific test in a map. When we eventually see a "pass/fail/skip" event for the same test, we use that as the "stop" time of the test.Because we are guaranteed that a test won't retry before it has actually failed, we can use these start/stop times to gate how far we look back when building the "Messages" for the failed test. We only need to look backwards to the "startTime" of the current in-progress test, which prevents us from grabbing logs from previous runs.
Since I needed these start/stop times anyway, I figured I would go ahead and instrument the optional Start/Stop fields in each TestResult and RetryAttempt, where ctrf can use them to compute better timing information.
AI Usage Disclosure