Skip to content

fix(licensing): preserve all versions in check_binary_deps multi-version diff#4711

Merged
bobbai00 merged 7 commits into
apache:mainfrom
bobbai00:fix/license-multi-version-indexing
May 3, 2026
Merged

fix(licensing): preserve all versions in check_binary_deps multi-version diff#4711
bobbai00 merged 7 commits into
apache:mainfrom
bobbai00:fix/license-multi-version-indexing

Conversation

@bobbai00
Copy link
Copy Markdown
Contributor

@bobbai00 bobbai00 commented May 3, 2026

What changes were proposed in this PR?

Fixes a latent bug in bin/licensing/check_binary_deps.py — the internal indexers used dict[name, version] (one version per name), so when the same name appeared with two different versions the second assignment silently overwrote the first. Concretely, particularly, this PR switches all three indexers to the same shape:

_index_npm    : dict[str, set[str]]   # name      -> versions
_index_python : dict[str, set[str]]   # name      -> versions
_index_jar    : dict[str, set[str]]   # artifact  -> versions

diff_simple / diff_jars are updated to emit per-version added / stale and per-name drift tuples shaped (name, sorted_claimed, sorted_real). Multi-version drift renders as

~ jetty-server: LICENSE-binary=9.4.20.v20190813, 11.0.20  bundled=9.4.20.v20190813, 11.0.21

falling back to the existing single-version form when there's only one version on each side. As a side benefit, added / stale lines now include the version (this regressed in #4693 which printed bare names).

This PR also adds several unit tests for the check_license_binary script.

Wired into the amber job in build.yml right after Python setup (before any check_binary_deps.py invocation):

- name: Unit-test licensing scripts
  run: python3 -m unittest discover -s bin/licensing -p "test_*.py" -v

Any related issues, documentation, discussions?

Follow-up to #4693

How was this PR tested?

python3 -m unittest discover -s bin/licensing -p "test_*.py" -v runs 27 tests in 4ms, all passing. The new step in CI runs the same command on every PR that exercises the amber job. Manually verified end-to-end against the real combined LICENSE-binary built via concat_license_binary.py.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-7)

…ion diff

The internal indexers built `dict[name, version]` (and `dict[artifact, (version,
basename)]` for jars), so when a name appeared twice with different versions
the second assignment silently overwrote the first. The combined LICENSE-binary
on main today has 97 such artifacts and 106 of 566 jar entries were being
dropped on the floor — undetectable by CI because apache#4632 split each ecosystem
across services so any single CI invocation only saw one version per lib.

Switch the indexers to multimaps:
  - npm/python: dict[name, set[version]]
  - jar:        dict[artifact, dict[version, basename]]

Update diff_simple/diff_jars to surface a per-version added/stale list and a
drift entry shaped (name, sorted_claimed_versions, sorted_real_versions).
Update the report to render multi-version drift as
  ~ jetty-server: LICENSE-binary=9.4.20.v20190813, 11.0.20  bundled=...
falling back to the single-version form when there's only one version on each
side. Also restore version info in added/stale lines (previously printed bare
names after apache#4693).

Verified end-to-end against the real combined LICENSE-binary: clean run now
preserves all 566 jar entries (was 460), and per-version drift on multi-
version artifacts is correctly reported and gated by --ignore-transitive-version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.25%. Comparing base (5a80494) to head (5f646d1).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4711      +/-   ##
============================================
+ Coverage     43.20%   43.25%   +0.05%     
- Complexity     2036     2110      +74     
============================================
  Files           957      957              
  Lines         34077    34946     +869     
  Branches       3753     3893     +140     
============================================
+ Hits          14722    15116     +394     
- Misses        18580    19041     +461     
- Partials        775      789      +14     
Flag Coverage Δ
access-control-service 28.12% <ø> (ø)
agent-service 33.49% <ø> (-0.24%) ⬇️
amber 41.75% <ø> (+0.41%) ⬆️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 32.40% <ø> (-0.85%) ⬇️
frontend 34.97% <ø> (-0.31%) ⬇️
python 84.72% <ø> (-0.12%) ⬇️
workflow-compiling-service 47.72% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Yicong-Huang
Copy link
Copy Markdown
Contributor

can you add a test case for it? I think this tool is complex enough. we want to guard its behavior. you can add its run in python CI.

… amber CI

Per review on apache#4711: _index_jar's two-level dict[str, dict[str, str]] was
overkill — the basename storage was a defensive fallback for parser bugs
and a "no reconstruction in report()" convenience, neither of which
justifies the extra shape. Switched to dict[str, set[str]] matching
_index_npm and _index_python; added _jar_basename(artifact, version) for
rendering. The version regex captures classifier suffixes whole, so
reconstruction round-trips byte-for-byte (verified by JarBasenameRoundTrip).
Unparseable jar names are now warned about on stderr instead of being
stored under a sentinel key.

Replaced the ad-hoc smoke scripts with a stdlib unittest suite at
bin/licensing/test_check_binary_deps.py (27 tests, ~4ms, no pytest dep).
Coverage:
  - Multi-version preservation in all three indexers (regression test
    for the bug this PR fixes).
  - npm scoped-name parsing, jar parser-bug warning.
  - JAR_NAME_VERSION + _jar_basename round-trip including classifier-
    suffix jars (netty-tcnative-boringssl-static linux-x86_64) and
    Scala-suffix jars (fs2-core_2.13).
  - _is_direct_jar across bare / group-prefixed / Scala-suffix forms.
  - diff_simple and diff_jars: clean, single-version drift, multi-
    version drift, added/stale (per-version output).
  - End-to-end main() for python: clean, transitive drift (strict vs
    flag), direct drift (always fails), added/stale (always fail),
    multi-version drop classified as drift not stale.

Wired into the amber job in build.yml (right after Python setup, before
any check_binary_deps.py invocation): `python3 -m unittest discover`.
Stdlib-only, no install needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the ci changes related to CI label May 3, 2026
@bobbai00 bobbai00 requested a review from Yicong-Huang May 3, 2026 06:44
@bobbai00
Copy link
Copy Markdown
Contributor Author

bobbai00 commented May 3, 2026

can you add a test case for it? I think this tool is complex enough. we want to guard its behavior. you can add its run in python CI.

unit test added

@Yicong-Huang Yicong-Huang added the release/v1.1.0-incubating back porting to release/v1.1.0-incubating label May 3, 2026
Comment thread .github/workflows/build.yml Outdated
bobbai00 and others added 2 commits May 3, 2026 00:26
…l 3.x)

Per review on apache#4711: the unit tests belong in the python job, not the
amber (Scala) job that happened to have python set up for shelling out.
Place the step right after `Set up Python ${{ matrix.python-version }}`
with no `if:` guard, so the test runs on every matrix row (3.10 / 3.11
/ 3.12 / 3.13). The test is stdlib-only and ~4 ms, so the multi-version
coverage is essentially free and guards against any future Python
version compat regression in check_binary_deps.py before the actual
license check (3.12 only) runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bobbai00 bobbai00 enabled auto-merge (squash) May 3, 2026 07:37
@bobbai00 bobbai00 merged commit 9ece88e into apache:main May 3, 2026
36 checks passed
Yicong-Huang pushed a commit that referenced this pull request May 3, 2026
…ion diff (#4711)

### What changes were proposed in this PR?

Fixes a latent bug in `bin/licensing/check_binary_deps.py` — the
internal indexers used `dict[name, version]` (one version per name), so
when the same name appeared with two different versions the second
assignment silently overwrote the first. Concretely, particularly, this
PR switches all three indexers to the same shape:

```python
_index_npm    : dict[str, set[str]]   # name      -> versions
_index_python : dict[str, set[str]]   # name      -> versions
_index_jar    : dict[str, set[str]]   # artifact  -> versions
```

`diff_simple` / `diff_jars` are updated to emit per-version `added` /
`stale` and per-name `drift` tuples shaped `(name, sorted_claimed,
sorted_real)`. Multi-version drift renders as

```
~ jetty-server: LICENSE-binary=9.4.20.v20190813, 11.0.20  bundled=9.4.20.v20190813, 11.0.21
```

falling back to the existing single-version form when there's only one
version on each side. As a side benefit, `added` / `stale` lines now
include the version (this regressed in #4693 which printed bare names).

This PR also adds several **unit tests** for the `check_license_binary`
script.

Wired into the `amber` job in `build.yml` right after Python setup
(before any `check_binary_deps.py` invocation):

```yaml
- name: Unit-test licensing scripts
  run: python3 -m unittest discover -s bin/licensing -p "test_*.py" -v
```

### Any related issues, documentation, discussions?

Follow-up to #4693

### How was this PR tested?

`python3 -m unittest discover -s bin/licensing -p "test_*.py" -v` runs
27 tests in 4ms, all passing. The new step in CI runs the same command
on every PR that exercises the `amber` job. Manually verified end-to-end
against the real combined LICENSE-binary built via
`concat_license_binary.py`.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-7)

---------

(backported from commit 9ece88e)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Backport to release/v1.1.0-incubating succeeded as 4d90034. Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI dev fix python release/v1.1.0-incubating back porting to release/v1.1.0-incubating

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants