refactor(amber): stop hardcoding S3 in REST catalog init#4988
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4988 +/- ##
============================================
+ Coverage 43.17% 43.19% +0.01%
+ Complexity 2214 2211 -3
============================================
Files 1045 1045
Lines 40260 40161 -99
Branches 4250 4234 -16
============================================
- Hits 17381 17346 -35
+ Misses 21812 21743 -69
- Partials 1067 1072 +5
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks @mengw15 can you make the description a bit concise about why this change is needed? Also please add tests to confirm this change works. |
Thanks for the questions. Why the change is needed. When a Lakekeeper warehouse is created, the S3 settings (endpoint, region, credentials, path-style, etc.) are already registered against that warehouse on the server side. At REST-catalog init the client only needs the warehouse identifier and uri — Lakekeeper resolves and serves the S3 config from the warehouse record. The previously hardcoded s3.* properties from StorageConfig were therefore redundant on the client; deleting them lets each warehouse own its own storage settings instead of all warehouses being forced onto the system bucket. I'll tighten the PR description to say just this. About tests. End-to-end verification needs a running Lakekeeper, which CI doesn't have yet. #4276 (draft) adds Lakekeeper to CI; once that lands I'll layer an integration test on top of it that creates a warehouse with its own S3 settings, opens a REST catalog with only warehouse + uri, and round-trips a table. |
Align test_iceberg_rest_catalog_integration.py with create_rest_catalog's new signature after S3 settings stopped being passed at catalog init. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
now as #4276 is merged, can we hook up with new tests? |
With #4276 merged, which brought over the Lakekeeper CI job and the two integration tests. These two tests are testing the createRestCatalog in Scala and python. so this PR is covered end-to-end. With CI passed, I think we can confirm that this change works. |
sg. thanks! let's also make sure the coverage is filled, this can make sure your changes in this PR are actually being tested in the CI.
See more #4988 (comment) |
Amber integration job runs without jacoco, so IcebergRestCatalogIntegrationSpec does not register on codecov. Add a unit test that drives createRestCatalog far enough to construct the property Map; .initialize then throws because no Lakekeeper is up in unit-test scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
It seems like the amber-integration job doesn't upload to Codecov. Added a mock unit test in IcebergUtilSpec to satisfy the patch number; it just drives createRestCatalog until .initialize throws (no Lakekeeper in unit-test scope). Real coverage still comes from the integration test. |
|
Yes integration test is for now designed not to alter coverage report: we rely on unit tests. |
Tightens the previous coverage-only test: instead of intercepting any Exception, assert RESTException specifically. The property Map is built before .initialize, so a RESTException from either an unreachable Lakekeeper or a missing warehouse confirms the Map composition is sound and the failure is server-side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unit test I added in IcebergUtilSpec is coverage-only — without a running Lakekeeper, RESTCatalog wraps the connection failure into RESTException The PR's real semantics (client sends only warehouse + uri, Lakekeeper resolves S3 config from the warehouse record) can only be verified end-to-end against a running Lakekeeper. That's what IcebergRestCatalogIntegrationSpec covers in amber-integration. For a real test in a coverage-uploading job, we'd need to add Lakekeeper (+ MinIO + Postgres) to the amber job — basically duplicating the service setup that amber-integration already does. Want me to do that, or is the current coverage-only test fine? |
Yicong-Huang
left a comment
There was a problem hiding this comment.
LGTM, minor comment on test.
Addresses review feedback on PR apache#4988: the previous name "build REST catalog properties without S3 settings" described the new call shape but didn't reflect what the assertion actually verifies (intercept RESTException on server-side connection failure). Rename to make the assertion clear to readers who don't have the PR context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What changes were proposed in this PR?
Stop hardcoding
s3.endpoint,s3.region,s3.path-style-access,s3.access-key-idands3.secret-access-keyat REST-catalog init in bothIcebergUtil.createRestCatalog(Scala) andiceberg_utils.create_rest_catalog(Python). Both helpers now pass onlywarehouse+ cataloguri(and on the Scala side theFileIOimpl hint).Why: When a Lakekeeper warehouse is created, its S3 settings (endpoint, region, credentials, path-style) are registered against that warehouse on the server. At catalog init the client only needs
warehouse+uri— Lakekeeper resolves the S3 config from the warehouse record and serves it back. The hardcodedStorageConfig.s3*values on the client were redundant, and forcing them everywhere also pinned every warehouse to the single system bucket. Removing them lets each warehouse own its own storage settings.StorageConfig.s3*itself is kept —pytexera/storage/large_binary_manager.pystill uses it for the non-Icebergtexera-large-binariesbucket (R UDF large-binary support), which is out of scope.Any related issues, documentation, discussions?
Closes #4987
How was this PR tested?
sbt "WorkflowCore/compile"— passes; verifies no other Scala caller depends on the removed properties.ast.parse; the only caller (iceberg_catalog_instance.py) is updated to match the newcreate_rest_catalogsignature.End-to-end verification (warehouse with its own S3 settings → REST catalog opened with only
warehouse+uri→ table round-trip) requires a running Lakekeeper, which CI doesn't have today. #4276 (draft) wires Lakekeeper into CI; once that lands I'll add the integration test on top of it.Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)