test(data): fix flaky quantized depth roundtrip tests#159
Merged
janickm merged 1 commit intoJun 22, 2026
Merged
Conversation
The quantized depth roundtrip tests used an unseeded np.random RNG and an exact 0.5 * scale tolerance. The float32 source data and the float32 cast of the dequantized output each carry up to ~max_value * eps_f32 of representation error on top of the quantization rounding error, so the measured difference occasionally exceeds 0.5 * scale (about 0.8% of random seeds for the float64 intermediate case), making the test flaky in CI. Seed the RNG for deterministic, reproducible runs and widen the tolerance to account for float32 representation error. Verified the widened bound holds across 5000 seeds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
test_quantized_depth_roundtriptest inncore/impl/data/v4/components_test.pyfails intermittently in CI (e.g. run 27951090321):Root cause
Two issues compound:
np.random.default_rng()with no seed, so each CI run drew different data and results were non-reproducible.0.5 * scale, but the sourceoriginalis float32 and the dequantized output is also cast back to float32. Each carries up to~max_value * eps_f32of representation error on top of the rounding error. The measured|dequantized - original|therefore occasionally exceeds the exact0.5 * scalebound (about 0.8% of random seeds for the float64-intermediate case), tripping the assertion withrtol=0.Fix
np.random.default_rng(0)) in all three quantized roundtrip tests so runs are deterministic and reproducible.0.5 * scale + 2 * max_value * eps_f32, reflecting the real float32 representation error budget. The other two cases already used looser bounds (scale/1.0 * scale).Verified the widened bound holds across 5000 seeds (worst observed error 0.000500 vs atol 0.000514), so the test stays green even if the seed changes later.
Testing