Skip to content

perf(ci): use uv pip for faster Python dep install#4521

Merged
Yicong-Huang merged 2 commits into
apache:mainfrom
Yicong-Huang:perf/ci-cache-pip-and-sbt
Apr 26, 2026
Merged

perf(ci): use uv pip for faster Python dep install#4521
Yicong-Huang merged 2 commits into
apache:mainfrom
Yicong-Huang:perf/ci-cache-pip-and-sbt

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented Apr 26, 2026

What changes were proposed in this PR?

Both the scala and python jobs in .github/workflows/github-action-build.yml install Python dependencies via pip install -r requirements.txt. That step alone takes:

  • scala job: ~1m 21s
  • python matrix: 1m 12s – 4m 41s (3.13 is slowest because some packages lack prebuilt wheels for it)

An earlier attempt on this PR turned on the built-in pip wheel cache (actions/setup-python + cache: 'pip') and saw only ~14s saved — the bottleneck is install (resolve + extract + write site-packages for ~230 packages), not download.

This PR switches both jobs to uv pip install --system. uv is a Rust reimplementation of pip with no transitive deps, so installing it via python -m pip install uv adds only ~3s, and the same wheel set then installs in ~10s instead of ~70s on the same runner.

No new third-party GitHub Action is added — uv is fetched as a regular pip package — so this stays within the ASF Infra GitHub Actions allowlist already used by this repo (sbt/setup-sbt, coursier/cache-action, docker/*, amannn/*, apache/*).

Any related issues, documentation, discussions?

Closes #4519. Companion to #4508 (which combined the two lint sbt invocations).

How was this PR tested?

CI on this PR — comparing Install dependencies step time before vs after on both scala and python (3.10|3.11|3.12|3.13) jobs. Earlier exploratory commits on this branch tried caching target/scala-2.13/{classes,zinc,src_managed} directly; that broke scalafix because zinc skipped the compile that produces the SemanticDB files scalafix needs, so the sbt-target cache was dropped.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

@github-actions github-actions Bot added the ci changes related to CI label Apr 26, 2026
@Yicong-Huang Yicong-Huang force-pushed the perf/ci-cache-pip-and-sbt branch from 750ba06 to f0cfa99 Compare April 26, 2026 02:11
The `Install dependencies` step in both the `scala` and `python` jobs
of `.github/workflows/github-action-build.yml` ran `pip install -r
requirements.txt`, which spent ~1m 21s in the `scala` job and 1m 12s –
4m 41s across the `python` matrix. Adding the built-in pip wheel cache
helped only ~14s because the bottleneck was install (resolve, extract,
write site-packages for ~230 packages), not download.

Switch both jobs to `uv pip install --system`. uv is a Rust
reimplementation of pip with no transitive deps, so installing it via
`python -m pip install uv` adds only ~3s, and the same wheel set then
installs in ~10s instead of ~70s. No third-party action is added (uv
itself is fetched as a regular pip package), so this stays inside the
ASF Infra GitHub Actions allowlist already used by this repo.

Closes apache#4519

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Yicong-Huang Yicong-Huang force-pushed the perf/ci-cache-pip-and-sbt branch from 77c7b7a to 329cafb Compare April 26, 2026 04:33
@Yicong-Huang Yicong-Huang changed the title perf(ci): cache pip wheels and sbt build artifacts perf(ci): use uv pip for faster Python dep install Apr 26, 2026
@Yicong-Huang Yicong-Huang self-assigned this Apr 26, 2026
@Yicong-Huang Yicong-Huang enabled auto-merge (squash) April 26, 2026 05:12
Copy link
Copy Markdown
Contributor

@aglinxinyuan aglinxinyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Yicong-Huang Yicong-Huang merged commit f59dde5 into apache:main Apr 26, 2026
11 checks passed
Yicong-Huang added a commit that referenced this pull request Apr 26, 2026
### What changes were proposed in this PR?

Removes two PyPI backport packages from `amber/requirements.txt`:

- `typing==3.7.4.3` — backport of the `typing` stdlib module for Python
<3.5. Has been part of the stdlib since 3.5.
- `dataclasses==0.6` — backport of the `dataclasses` stdlib module for
Python <3.7. Has been part of the stdlib since 3.7.

This repo's CI matrix is Python 3.10–3.13, so both are obsolete.
Installing them on supported versions is wasted CI time, and the PyPI
`typing` package can shadow the stdlib version in subtle ways because
its API is frozen at 3.7-era.

`typing_extensions==4.14.1` (line 34, kept) is a different,
still-maintained package that backports *new* typing features to older
Pythons; it's correctly retained.

### Any related issues, documentation, discussions?

Closes #4522. Surfaced during investigation in #4519/#4521.

### How was this PR tested?

CI

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Yicong-Huang pushed a commit to Yicong-Huang/texera that referenced this pull request May 2, 2026
apache#4521 had the python dep install in the scala and python matrix jobs
on `uv pip install --system` for speed. apache#4597 unintentionally rewrote
those lines back to stock pip while inlining the binary license
checks; the regression has been carried forward by every subsequent
rebase. Restore uv for speed.

The python job's 3.12 leg is the only one that drives the binary
license check (`pip-licenses` -> `check_binary_deps.py python`). Keep
stock pip on that leg so the resolved versions match
`amber/LICENSE-binary-python`, which is generated with pip and tracks
what the production image installs. uv and pip can resolve unpinned
transitives differently; without this carve-out the check would false-
positive on resolver drift, and we'd be forced to update LICENSE-
binary-python to chase the CI side (production still uses pip).

Other python legs (3.10, 3.11, 3.13) use uv. The scala job's binary
license check is jar-only, so it uses uv too. Dev deps install runs
post-snapshot so it can use uv on all legs.

Closes apache#4635
Yicong-Huang added a commit that referenced this pull request May 3, 2026
…4636)

## What changes were proposed in this PR?

#4521 had the python dep install in the scala and python matrix jobs on
`uv pip install --system` for install-speed. #4597 unintentionally
rewrote those lines back to stock `pip install` while inlining the
binary license checks, and the regression has been carried forward by
every subsequent rebase. Restore uv — but with a targeted carve-out for
the leg that drives the binary license check.

### Why the carve-out

The python job's `3.12` matrix entry is the only leg that runs
`pip-licenses` and feeds the result into
`bin/licensing/check_binary_deps.py python`. That tool compares the
installed Python tree against `amber/LICENSE-binary-python`, which is
generated **with pip** and tracks what the production image installs. uv
and pip resolvers can land on different versions of unpinned transitives
— if the 3.12 leg installs with uv, `check_binary_deps.py` would
false-positive on resolver drift, forcing us to chase those drifts in
`LICENSE-binary-python` (and diverge from production).

So: stock `pip install` on the 3.12 leg only; uv everywhere else.

### Per-step shape

- **scala job → Install dependencies**: `uv pip install --system`. Its
license check is jar-only, so Python resolver differences don't matter
here.
- **python job → Install dependencies**: branches on
`matrix.python-version`. `3.12` keeps `pip install`; `3.10`, `3.11`,
`3.13` use `uv pip install --system`.
- **python job → Install dev dependencies**: `uv pip install --system`.
Runs post-snapshot, so uv is safe on all legs.

No behaviour change for the license check itself. Other legs gain
install speed.

## Any related issues, documentation, discussions?

Closes #4635. Restores #4521. Regression introduced by #4597.

## How was this PR tested?

Will be exercised by this PR's own scala and python matrices. The
expected signal:

- [x] scala job: install step uses uv, tests still run.
- [x] python 3.10 / 3.11 / 3.13 legs: install step uses uv.
- [x] python 3.12 leg: install step uses pip; pip-licenses manifest
unchanged; `check_binary_deps.py python` passes.

## Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7 (Claude Code)
Yicong-Huang added a commit that referenced this pull request May 3, 2026
…4636)

## What changes were proposed in this PR?

#4521 had the python dep install in the scala and python matrix jobs on
`uv pip install --system` for install-speed. #4597 unintentionally
rewrote those lines back to stock `pip install` while inlining the
binary license checks, and the regression has been carried forward by
every subsequent rebase. Restore uv — but with a targeted carve-out for
the leg that drives the binary license check.

### Why the carve-out

The python job's `3.12` matrix entry is the only leg that runs
`pip-licenses` and feeds the result into
`bin/licensing/check_binary_deps.py python`. That tool compares the
installed Python tree against `amber/LICENSE-binary-python`, which is
generated **with pip** and tracks what the production image installs. uv
and pip resolvers can land on different versions of unpinned transitives
— if the 3.12 leg installs with uv, `check_binary_deps.py` would
false-positive on resolver drift, forcing us to chase those drifts in
`LICENSE-binary-python` (and diverge from production).

So: stock `pip install` on the 3.12 leg only; uv everywhere else.

### Per-step shape

- **scala job → Install dependencies**: `uv pip install --system`. Its
license check is jar-only, so Python resolver differences don't matter
here.
- **python job → Install dependencies**: branches on
`matrix.python-version`. `3.12` keeps `pip install`; `3.10`, `3.11`,
`3.13` use `uv pip install --system`.
- **python job → Install dev dependencies**: `uv pip install --system`.
Runs post-snapshot, so uv is safe on all legs.

No behaviour change for the license check itself. Other legs gain
install speed.

## Any related issues, documentation, discussions?

Closes #4635. Restores #4521. Regression introduced by #4597.

## How was this PR tested?

Will be exercised by this PR's own scala and python matrices. The
expected signal:

- [x] scala job: install step uses uv, tests still run.
- [x] python 3.10 / 3.11 / 3.13 legs: install step uses uv.
- [x] python 3.12 leg: install step uses pip; pip-licenses manifest
unchanged; `check_binary_deps.py python` passes.

## Was this PR authored or co-authored using generative AI tooling?

(backported from commit a3d43db)

Generated-by: Claude Opus 4.7 (Claude Code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch CI Python dependency install from pip to uv

2 participants