Skip to content

feat!: Detect retries and flaky tests#28

Open
gca3020 wants to merge 4 commits intoctrf-io:mainfrom
gca3020:flaky-test-support
Open

feat!: Detect retries and flaky tests#28
gca3020 wants to merge 4 commits intoctrf-io:mainfrom
gca3020:flaky-test-support

Conversation

@gca3020
Copy link
Copy Markdown

@gca3020 gca3020 commented Apr 3, 2026

Detect Retries and Flaky Tests

Note

This could be considered a breaking change, since a test with RetryAttempts will be reported differently than it was before. This will only occur when using a test harness that supports retries (see below), and since this project is still <1.0, I figured this would be acceptable, but wanted to call it out regardless. The new ctrf results file continues to conform to the specification documented at https://ctrf.io/docs/specification/

What's in this PR?

This PR refactors some of the detection logic, to support accurately reporting test retries, and flaky tests. This is a gap in the current implementation, and this functionality is requested in #12.

Additionally, this PR fixes some of the Start/Stop/Duration logic, which is required to accurately grab the test output of only a single retry.

But I thought Go didn't do retries?

They don't, yet. There is currently an open (and accepted) proposal to add this to the core go test functionality. See golang/go#62244 for more details.

In the meantime, the gotestsum package supports re-running failed tests using the --rerun-fails=n and --rerun-fails-max-failures=n flags. Using these flags, after the full suite is complete, the subset of failed tests is re-run as part of a new suite. This repeats as much as specified, with each new run containing just the tests that failed in the last run.

So how do we detect them?

Essentially, as we are parsing the JSON "events" coming from the go test output, we look back through the list of already-captured TestResults to see if we already have an instance of this test in our report. If no, the behavior continues as it does today, and we append the new TestResult to the list, and update the summary counters as normal.

However, if we already have a "matching" test, then we instead update that test object, populating the list of RetryAttempts as per the spec in https://ctrf.io/docs/specification/test#retry-attempt-object.

What about the Start/Stop time changes?

With the nature of Go's test2json output, we know that a test has a "Start Timestamp" when the run event appears in the JSON output. We capture this time as the "start" time of a specific test in a map. When we eventually see a "pass/fail/skip" event for the same test, we use that as the "stop" time of the test.

Because we are guaranteed that a test won't retry before it has actually failed, we can use these start/stop times to gate how far we look back when building the "Messages" for the failed test. We only need to look backwards to the "startTime" of the current in-progress test, which prevents us from grabbing logs from previous runs.

Since I needed these start/stop times anyway, I figured I would go ahead and instrument the optional Start/Stop fields in each TestResult and RetryAttempt, where ctrf can use them to compute better timing information.

AI Usage Disclosure

  • Just trivial autocomplete suggestions from Copilot, which ended up being mostly useless anyway.

gca3020 added 3 commits April 3, 2026 14:08
Using gotestsum with --rerun-failed to run these tests will generate
output that simulates passing, skipped, failing, and flaky tests. The
output from this can be used to generate test data for the flaky test
detection test.
- Detect a test being run multiple times
- Capture these additional runs as test retries
- Detect a fail->pass transition as a flaky test
- Additionally, fix start/stop time detection of tests
@gca3020 gca3020 mentioned this pull request Apr 3, 2026
@gca3020 gca3020 force-pushed the flaky-test-support branch from f5495cd to caff748 Compare April 4, 2026 14:13
@gca3020 gca3020 force-pushed the flaky-test-support branch from caff748 to 962b295 Compare April 4, 2026 18:42
@Ma11hewThomas
Copy link
Copy Markdown
Contributor

Thank you @gca3020, I really appreciate your contribution and also for highlighting the future direction of Go regarding flaky tests.

I've reviewed the code and tested end-to-end using gotestsum --rerun-fails and it works nicely, a great addition.

On the breaking change point, thanks for flagging, I agree it's the correct approach. The current behaviour of emitting a separate TestResult for each retry attempt was never the intended design and collapsing retries into a single result with nested retryAttempts is the right model per spec.

That being said I'll release this as v0.1.0 to give users a clear upgrade signal.

I'll merge and release soon! Thanks again!

@gca3020
Copy link
Copy Markdown
Author

gca3020 commented Apr 4, 2026

Thanks! One other thing I noticed while implementing this and fighting the linter, is that gotestsum provides a package (testjson) for parsing the Go test json format. It might be worth updating the reporter to use that in the future, as it's maintained, well tested, and supports go test functionality like benchmarks and race detection.

Regardless, thanks for creating this project in the first place; it's really great to be able to have good annotations in test runs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants