Skip to content

perf(ci): trim redundant scala compile and reorder for early lint failure#4638

Merged
Yicong-Huang merged 2 commits into
apache:mainfrom
Yicong-Huang:perf/scala-step-order
May 2, 2026
Merged

perf(ci): trim redundant scala compile and reorder for early lint failure#4638
Yicong-Huang merged 2 commits into
apache:mainfrom
Yicong-Huang:perf/scala-step-order

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 2, 2026

What changes were proposed in this PR?

Tighten the scala job in build.yml:

  • Drop Compile with sbt: sbt clean package — its package output was unused and it re-cleaned a tree the dist step had just compiled.
  • Drop the leading clean; from the dist step so it can reuse the lint compile.
  • Merge scalafmt, scalafix, and all per-module dist commands into a single sbt invocation with each as its own argument, so the whole chain runs in one JVM and sbt exits at the first failing command.
  • Move Create Databases ahead of any sbt step (the JOOQ source generators connect to texera_db during compile).
  • Move Install dependencies (pip) just before Run backend tests, since only the test step needs the python deps.

New step order:

Create Databases
Setup sbt launcher / coursier cache
sbt scalafmtCheckAll "scalafixAll --check" <Service>/dist ...   # one JVM, fail-fast
Unzip / license check / audit
Install dependencies (pip)
Create texera_db_for_test_cases
Set docker-java API version
Run backend tests

Any related issues, documentation, discussions?

Closes #4637.

How was this PR tested?

Exercised by this PR's own scala matrix. Each individual command (scalafmt, scalafix, dist, license check, audit, tests) is unchanged; only ordering, the merged sbt invocation, and the removal of redundant sbt clean package differ.

Timing comparison on the scala job, sbt-touching steps only (run 25239784635 before, run 25241165819 after):

step before after
Lint with scalafmt 45 s (merged)
Build distributable bundles (sbt 'clean; X/dist; ...') 3 m 4 s (merged)
Compile with sbt (sbt clean package) 1 m 26 s removed
Lint with scalafix 47 s (merged)
Combined sbt scalafmtCheckAll "scalafixAll --check" X/dist ... 4 m 31 s
sbt subtotal 6 m 2 s 4 m 31 s

Net savings on the sbt portion ~1 m 30 s (matches the dropped redundant compile plus one fewer sbt JVM cold-start). uv pip migration is independent (#4636) and would shave another ~45 s off the python Install dependencies step.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

@github-actions github-actions Bot added the ci changes related to CI label May 2, 2026
@Yicong-Huang Yicong-Huang force-pushed the perf/scala-step-order branch 3 times, most recently from bdf8acd to d23ba67 Compare May 2, 2026 02:07
@Yicong-Huang Yicong-Huang added the release/v1.1.0-incubating back porting to release/v1.1.0-incubating label May 2, 2026
@Yicong-Huang Yicong-Huang enabled auto-merge (squash) May 2, 2026 02:26
@Yicong-Huang Yicong-Huang force-pushed the perf/scala-step-order branch from b8b1c25 to ec0e2e7 Compare May 2, 2026 02:29
…lure

The scala job re-compiled the project twice every run: once during
'Build distributable bundles' (which prefixed its sbt command with
'clean') and again during 'Compile with sbt: sbt clean package',
whose package output nothing later in the job consumed. Drop the
second step entirely and remove the leading 'clean;' from the dist
step so the dist build can reuse the preceding compile.

While here, reorder so cheap lint runs first and dist reuses
scalafix's compile output:

  scalafmt -> scalafix -> create dbs -> dist (no clean) -> license
  checks -> install python deps -> create test db -> docker-java
  config -> backend tests

scalafix moves up so its compile feeds the dist step (incremental
instead of from-scratch). 'Install dependencies' (pip) moves down
so a lint or dist failure does not pay the install cost; only the
backend test step needs the python deps.

Closes apache#4637

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Yicong-Huang Yicong-Huang force-pushed the perf/scala-step-order branch from ec0e2e7 to 73c406d Compare May 2, 2026 02:30
@Yicong-Huang Yicong-Huang requested a review from aglinxinyuan May 2, 2026 02:46
@Yicong-Huang Yicong-Huang merged commit 8a4f2dd into apache:main May 2, 2026
22 checks passed
github-actions Bot pushed a commit that referenced this pull request May 2, 2026
…lure (#4638)

### What changes were proposed in this PR?

Tighten the scala job in `build.yml`:

- Drop `Compile with sbt: sbt clean package` — its `package` output was
unused and it re-cleaned a tree the dist step had just compiled.
- Drop the leading `clean;` from the dist step so it can reuse the lint
compile.
- Merge `scalafmt`, `scalafix`, and all per-module `dist` commands into
a single `sbt` invocation with each as its own argument, so the whole
chain runs in one JVM and sbt exits at the first failing command.
- Move `Create Databases` ahead of any sbt step (the JOOQ source
generators connect to `texera_db` during compile).
- Move `Install dependencies` (pip) just before `Run backend tests`,
since only the test step needs the python deps.

New step order:

```
Create Databases
Setup sbt launcher / coursier cache
sbt scalafmtCheckAll "scalafixAll --check" <Service>/dist ...   # one JVM, fail-fast
Unzip / license check / audit
Install dependencies (pip)
Create texera_db_for_test_cases
Set docker-java API version
Run backend tests
```

### Any related issues, documentation, discussions?

Closes #4637.

### How was this PR tested?

Exercised by this PR's own scala matrix. Each individual command
(scalafmt, scalafix, dist, license check, audit, tests) is unchanged;
only ordering, the merged sbt invocation, and the removal of redundant
`sbt clean package` differ.

Timing comparison on the scala job, sbt-touching steps only (run
[25239784635](https://github.com/apache/texera/actions/runs/25239784635)
before, run
[25241165819](https://github.com/apache/texera/actions/runs/25241165819)
after):

| step | before | after |
|---|---|---|
| Lint with scalafmt | 45 s | (merged) |
| Build distributable bundles (`sbt 'clean; X/dist; ...'`) | 3 m 4 s |
(merged) |
| Compile with sbt (`sbt clean package`) | 1 m 26 s | removed |
| Lint with scalafix | 47 s | (merged) |
| **Combined `sbt scalafmtCheckAll "scalafixAll --check" X/dist ...`** |
— | **4 m 31 s** |
| sbt subtotal | **6 m 2 s** | **4 m 31 s** |

Net savings on the sbt portion ~1 m 30 s (matches the dropped redundant
compile plus one fewer sbt JVM cold-start). uv pip migration is
independent (#4636) and would shave another ~45 s off the python
`Install dependencies` step.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

(backported from commit 8a4f2dd)
SarahAsad23 pushed a commit to SarahAsad23/texera that referenced this pull request May 4, 2026
…lure (apache#4638)

### What changes were proposed in this PR?

Tighten the scala job in `build.yml`:

- Drop `Compile with sbt: sbt clean package` — its `package` output was
unused and it re-cleaned a tree the dist step had just compiled.
- Drop the leading `clean;` from the dist step so it can reuse the lint
compile.
- Merge `scalafmt`, `scalafix`, and all per-module `dist` commands into
a single `sbt` invocation with each as its own argument, so the whole
chain runs in one JVM and sbt exits at the first failing command.
- Move `Create Databases` ahead of any sbt step (the JOOQ source
generators connect to `texera_db` during compile).
- Move `Install dependencies` (pip) just before `Run backend tests`,
since only the test step needs the python deps.

New step order:

```
Create Databases
Setup sbt launcher / coursier cache
sbt scalafmtCheckAll "scalafixAll --check" <Service>/dist ...   # one JVM, fail-fast
Unzip / license check / audit
Install dependencies (pip)
Create texera_db_for_test_cases
Set docker-java API version
Run backend tests
```

### Any related issues, documentation, discussions?

Closes apache#4637.

### How was this PR tested?

Exercised by this PR's own scala matrix. Each individual command
(scalafmt, scalafix, dist, license check, audit, tests) is unchanged;
only ordering, the merged sbt invocation, and the removal of redundant
`sbt clean package` differ.

Timing comparison on the scala job, sbt-touching steps only (run
[25239784635](https://github.com/apache/texera/actions/runs/25239784635)
before, run
[25241165819](https://github.com/apache/texera/actions/runs/25241165819)
after):

| step | before | after |
|---|---|---|
| Lint with scalafmt | 45 s | (merged) |
| Build distributable bundles (`sbt 'clean; X/dist; ...'`) | 3 m 4 s |
(merged) |
| Compile with sbt (`sbt clean package`) | 1 m 26 s | removed |
| Lint with scalafix | 47 s | (merged) |
| **Combined `sbt scalafmtCheckAll "scalafixAll --check" X/dist ...`** |
— | **4 m 31 s** |
| sbt subtotal | **6 m 2 s** | **4 m 31 s** |

Net savings on the sbt portion ~1 m 30 s (matches the dropped redundant
compile plus one fewer sbt JVM cold-start). uv pip migration is
independent (apache#4636) and would shave another ~45 s off the python
`Install dependencies` step.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI release/v1.1.0-incubating back porting to release/v1.1.0-incubating

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed up scala CI job: drop redundant compile and reorder for early lint failure

2 participants