fix(licensing): preserve all versions in check_binary_deps multi-version diff#4711
Merged
bobbai00 merged 7 commits intoMay 3, 2026
Merged
Conversation
…ion diff The internal indexers built `dict[name, version]` (and `dict[artifact, (version, basename)]` for jars), so when a name appeared twice with different versions the second assignment silently overwrote the first. The combined LICENSE-binary on main today has 97 such artifacts and 106 of 566 jar entries were being dropped on the floor — undetectable by CI because apache#4632 split each ecosystem across services so any single CI invocation only saw one version per lib. Switch the indexers to multimaps: - npm/python: dict[name, set[version]] - jar: dict[artifact, dict[version, basename]] Update diff_simple/diff_jars to surface a per-version added/stale list and a drift entry shaped (name, sorted_claimed_versions, sorted_real_versions). Update the report to render multi-version drift as ~ jetty-server: LICENSE-binary=9.4.20.v20190813, 11.0.20 bundled=... falling back to the single-version form when there's only one version on each side. Also restore version info in added/stale lines (previously printed bare names after apache#4693). Verified end-to-end against the real combined LICENSE-binary: clean run now preserves all 566 jar entries (was 460), and per-version drift on multi- version artifacts is correctly reported and gated by --ignore-transitive-version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4711 +/- ##
============================================
+ Coverage 43.20% 43.25% +0.05%
- Complexity 2036 2110 +74
============================================
Files 957 957
Lines 34077 34946 +869
Branches 3753 3893 +140
============================================
+ Hits 14722 15116 +394
- Misses 18580 19041 +461
- Partials 775 789 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
|
can you add a test case for it? I think this tool is complex enough. we want to guard its behavior. you can add its run in python CI. |
… amber CI Per review on apache#4711: _index_jar's two-level dict[str, dict[str, str]] was overkill — the basename storage was a defensive fallback for parser bugs and a "no reconstruction in report()" convenience, neither of which justifies the extra shape. Switched to dict[str, set[str]] matching _index_npm and _index_python; added _jar_basename(artifact, version) for rendering. The version regex captures classifier suffixes whole, so reconstruction round-trips byte-for-byte (verified by JarBasenameRoundTrip). Unparseable jar names are now warned about on stderr instead of being stored under a sentinel key. Replaced the ad-hoc smoke scripts with a stdlib unittest suite at bin/licensing/test_check_binary_deps.py (27 tests, ~4ms, no pytest dep). Coverage: - Multi-version preservation in all three indexers (regression test for the bug this PR fixes). - npm scoped-name parsing, jar parser-bug warning. - JAR_NAME_VERSION + _jar_basename round-trip including classifier- suffix jars (netty-tcnative-boringssl-static linux-x86_64) and Scala-suffix jars (fs2-core_2.13). - _is_direct_jar across bare / group-prefixed / Scala-suffix forms. - diff_simple and diff_jars: clean, single-version drift, multi- version drift, added/stale (per-version output). - End-to-end main() for python: clean, transitive drift (strict vs flag), direct drift (always fails), added/stale (always fail), multi-version drop classified as drift not stale. Wired into the amber job in build.yml (right after Python setup, before any check_binary_deps.py invocation): `python3 -m unittest discover`. Stdlib-only, no install needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
unit test added |
Yicong-Huang
approved these changes
May 3, 2026
Yicong-Huang
reviewed
May 3, 2026
…l 3.x) Per review on apache#4711: the unit tests belong in the python job, not the amber (Scala) job that happened to have python set up for shelling out. Place the step right after `Set up Python ${{ matrix.python-version }}` with no `if:` guard, so the test runs on every matrix row (3.10 / 3.11 / 3.12 / 3.13). The test is stdlib-only and ~4 ms, so the multi-version coverage is essentially free and guards against any future Python version compat regression in check_binary_deps.py before the actual license check (3.12 only) runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Yicong-Huang
pushed a commit
that referenced
this pull request
May 3, 2026
…ion diff (#4711) ### What changes were proposed in this PR? Fixes a latent bug in `bin/licensing/check_binary_deps.py` — the internal indexers used `dict[name, version]` (one version per name), so when the same name appeared with two different versions the second assignment silently overwrote the first. Concretely, particularly, this PR switches all three indexers to the same shape: ```python _index_npm : dict[str, set[str]] # name -> versions _index_python : dict[str, set[str]] # name -> versions _index_jar : dict[str, set[str]] # artifact -> versions ``` `diff_simple` / `diff_jars` are updated to emit per-version `added` / `stale` and per-name `drift` tuples shaped `(name, sorted_claimed, sorted_real)`. Multi-version drift renders as ``` ~ jetty-server: LICENSE-binary=9.4.20.v20190813, 11.0.20 bundled=9.4.20.v20190813, 11.0.21 ``` falling back to the existing single-version form when there's only one version on each side. As a side benefit, `added` / `stale` lines now include the version (this regressed in #4693 which printed bare names). This PR also adds several **unit tests** for the `check_license_binary` script. Wired into the `amber` job in `build.yml` right after Python setup (before any `check_binary_deps.py` invocation): ```yaml - name: Unit-test licensing scripts run: python3 -m unittest discover -s bin/licensing -p "test_*.py" -v ``` ### Any related issues, documentation, discussions? Follow-up to #4693 ### How was this PR tested? `python3 -m unittest discover -s bin/licensing -p "test_*.py" -v` runs 27 tests in 4ms, all passing. The new step in CI runs the same command on every PR that exercises the `amber` job. Manually verified end-to-end against the real combined LICENSE-binary built via `concat_license_binary.py`. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (claude-opus-4-7) --------- (backported from commit 9ece88e) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
Backport to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this PR?
Fixes a latent bug in
bin/licensing/check_binary_deps.py— the internal indexers useddict[name, version](one version per name), so when the same name appeared with two different versions the second assignment silently overwrote the first. Concretely, particularly, this PR switches all three indexers to the same shape:diff_simple/diff_jarsare updated to emit per-versionadded/staleand per-namedrifttuples shaped(name, sorted_claimed, sorted_real). Multi-version drift renders asfalling back to the existing single-version form when there's only one version on each side. As a side benefit,
added/stalelines now include the version (this regressed in #4693 which printed bare names).This PR also adds several unit tests for the
check_license_binaryscript.Wired into the
amberjob inbuild.ymlright after Python setup (before anycheck_binary_deps.pyinvocation):Any related issues, documentation, discussions?
Follow-up to #4693
How was this PR tested?
python3 -m unittest discover -s bin/licensing -p "test_*.py" -vruns 27 tests in 4ms, all passing. The new step in CI runs the same command on every PR that exercises theamberjob. Manually verified end-to-end against the real combined LICENSE-binary built viaconcat_license_binary.py.Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-7)