Skip to content

Add strip_components to extract/download_and_extract `http_arch…#29281

Open
willstranton wants to merge 1 commit intobazelbuild:masterfrom
willstranton:strip
Open

Add strip_components to extract/download_and_extract `http_arch…#29281
willstranton wants to merge 1 commit intobazelbuild:masterfrom
willstranton:strip

Conversation

@willstranton
Copy link
Copy Markdown
Contributor

Add strip_components to extract/download_and_extract http_archive

Description

The strip_components attribute functions similar to tar --strip-components:

Strip NUMBER leading components from file names on extraction.

This is an alternative to the existing strip_prefix attribute, which required knowing the exact prefix to be stripped. Only one of the two attributes (strip_prefix, strip_components) can be set at one time.

Motivation

See #28879

Build API Changes

  1. Has this been discussed in a design doc or issue? (Please link it)

See #28879

  1. Is the change backward compatible?

Yes

  1. If it's a breaking change, what is the migration plan?

N/A - this is not a breaking change.

Checklist

  • I have added tests for the new use cases (if any).
  • I have updated the documentation (if applicable).

Release Notes

RELNOTES[NEW]: Adds the strip_components attribute to extract/download_and_extract/http_archive to allow stripping of path components when extracting files.

…ive`

The `strip_components` attribute functions similar to tar --strip-components:

> Strip NUMBER leading components from file names on extraction.

This is an alternative to the existing `strip_prefix` attribute, which required
knowing the exact prefix to be stripped. Only one of the two attributes
(`strip_prefix`, `strip_components`) can be set at one time.

Fixes bazelbuild#28879

RELNOTES[NEW]: Adds the `strip_components` attribute to `extract`/`download_and_extract`/`http_archive` to allow stripping of path components when extracting files.
@willstranton willstranton marked this pull request as ready for review April 13, 2026 22:12
@github-actions github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. team-Core Skyframe, bazel query, BEP, options parsing, bazelrc awaiting-review PR is awaiting review from an assigned reviewer labels Apr 13, 2026
@meteorcloudy
Copy link
Copy Markdown
Member

which required knowing the exact prefix to be stripped.

If the source archive URL is deterministic, the exact prefix should be known?

@willstranton
Copy link
Copy Markdown
Contributor Author

If the source archive URL is deterministic, the exact prefix should be known?

Yes, that's true, but it's inconvenient to have to examine an archive to determine that exact prefix. This pull request is a "quality of life" improvement. As you point out, it's not a "must have".

Summarizing from the community:

  1. Copying the inconvenience expressed by the original issue filer in http_archive (also repository_ctx.extract) strip_components #28879 and why it's useful to have:

Archives often have a containing directories.

Sometimes, this is long or not easily memorable -- a version number, or a commit hash
Sometimes, this is not readily known. E.g. npm packages usually use a package/ prefix, but not always.
Usually, users don't actually care what the leading component is, they just want to remove it.
...
This feature is in both BSD and GNU tar; it's very useful.
While no mentioned in my original comment, it would also be very useful for archive_override (bzlmod).

I remember having to update dependencies manually before BCR. You had to update the tar archive AND the prefix that was stripped.

  1. Feature request: download_and_extract(strip_prefix="*") #13960 is an earlier request from 2021 that expresses similar friction.

When first adding a http_archive (or alternative) to your workspace, it's easy enough to find what the top level directory is called... but with many archives it requires a bit more effort...
...with dependencies that change... this can get very tiresome....
My particular use case is a custom build definition that provides a simpler interface to private repositories... I don't know of any justification for requiring strip_prefix to be specified manually.

  1. Issue 28879 has at least 2 members commenting on/in agreement with this proposal. With me being the author of this pull request, that makes 3. The second issue 13960 has two members commenting as well. So 5? people who want this solved somehow? I'll admit that counting users can be disingenuous since they could all be from the same company/friends rallying each other on. I have no relation to any of folks mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-review PR is awaiting review from an assigned reviewer team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants