REST Spec: Add resolve endpoint for catalog objects by stevenzwu · Pull Request #15830 · apache/iceberg

stevenzwu · 2026-03-30T20:22:22Z

Design doc: https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0#heading=h.e6w7vgpr8t2f

Adds POST /v1/{prefix}/resolve — a single endpoint that takes one or more typed arrays of catalog items (currently relations; extensible to e.g. functions) and returns their current state with per-item outcomes (loaded, not-modified, not-found) plus a typed unprocessed list for partial-progress capping.

stevenzwu · 2026-03-30T20:34:40Z

+        view:
+          $ref: '#/components/schemas/LoadViewResult'
+
+    LoadRelationResult:


Future materialized-view would look like

{ "object-type": "materialized-view", "view": { }, "storage-table": { } }

No, I don't think we would have a new type. It would just have a storage-table associated with it, which is what makes it a materialized view. I'm not sure we need a separate type.

danielcweeks · 2026-04-01T15:06:08Z

+            "GET /v1/{prefix}/namespaces/{namespace}/relations/{relation}",
+            "POST /v1/{prefix}/relations/batch-load"


I have a minor objection to the use of relations here since we want this to be a general endpoint for resolution. Objects like table/view are considered relations, but something like a function would not (unless you're for a strictly relational algebra definition, but that's not consistent with sql usage).

We may also include other objects in the future, so a more general term like resolve, identifiers, resources, or entities might be better.

this is related to the other identifier conflict domain discussion where we want to allow the same identifier for a relational object (like table) and a function. With that assumption, the endpoint will need to have object category in the path to distinguish them. Otherwise, we would require identifier uniqueness across all object types, which is not the consensus from the identifier conflict discussion.

danielcweeks · 2026-04-01T15:10:39Z

+      description:
+        "
+        Load metadata for multiple relations in one request. Identifiers may span different namespaces.
+        Each item includes a `TableIdentifier` and optional per-item parameters (`etag` and `snapshots`).


Seems like a good time to introduce Identifier that is just the same as a table identifier. Seems odd we would continue to use an object specific identifier type to reference multiple.

Since it has the same structure, you could possibly just have then extend identifier (depending on how that affects the open api structure and generated code).

I am actually about to share a proposal with the community. https://docs.google.com/document/d/1NTQhgNbP2dkIMuXUMA5JdwliVQKCp1TU_ux5J_AaPiw/edit?tab=t.0#heading=h.xzrfzeom8dqa

danielcweeks · 2026-04-01T15:14:57Z

+        The server resolves each identifier as a table or view.
+
+
+        The per-item `status` in the response indicates the outcome:


This feels awkward because we're using HTTP status codes for non-request/internal results. I don't think that makes a lot of sense and prefer we indicate result behavior in a different way.

What about defining a enum schema in the REST spec? it will be similar to the http status name though.

BatchLoadItemResultStatus: type: string description: | The outcome of loading a single item in a batch load response. enum: - success - not-modified - not-found

Open to other suggestions.

BTW, previously @gaborkaszab and @jbonofre suggested using http status code in the design doc comment.

I did a little more research. Two patterns are common.

use http status code

Microsoft Graph API — JSON batching where each item has its own integer HTTP status, headers, and body. Docs

Facebook/Meta Graph API — Each item has a code field (HTTP integer) plus optional headers and body. Docs

Elasticsearch Bulk API — Each item has an integer status field (e.g. 200, 201, 404, 409) plus a string result field (e.g. "created", "updated", "not_found"). Docs

split into separate lists. AWS services commonly use this pattern.

AWS DynamoDB BatchGetItem — Found items in Responses, absent items silently omitted, incomplete items in UnprocessedKeys. No status field. Docs

AWS SQS SendMessageBatch / DeleteMessageBatch — Results split into Successful and Failed lists. Failed entries have Code (string error code like "InvalidParameterValue"), not HTTP status integers. Docs

AWS S3 DeleteObjects — In verbose mode, successful deletes listed in Deleted, failures in Errors with string Code (e.g. "AccessDenied"). Docs

I would personally prefer a more structured response than just embedding HTTP codes. I ran into this issue with GraphQL, which returns 200 (403) or something which is a bit confusing.

Agreed. Switched the per-item shape to a discriminated union using the same object-type-style pattern already in LoadRelationResult:

BatchLoadRelationResultItem: oneOf: - $ref: '#/components/schemas/BatchLoadRelationLoaded' - $ref: '#/components/schemas/BatchLoadRelationNotModified' - $ref: '#/components/schemas/BatchLoadRelationNotFound' discriminator: propertyName: result-type mapping: loaded: '#/components/schemas/BatchLoadRelationLoaded' not-modified: '#/components/schemas/BatchLoadRelationNotModified' not-found: '#/components/schemas/BatchLoadRelationNotFound'

Each variant declares exactly its own required fields:

loaded → result (required) + etag (optional, tables only)

not-modified → etag (required)

not-found → identifier (required)

Why domain-native names over HTTP status integers:

Wire stays honest. HTTP codes describe transport; per-item outcomes describe domain state. Keeping them separate means caches, retries, tracing, and monitoring don't have to peek inside the body to know what actually happened.

Presence rules become schema-enforced. oneOf lets each variant require its own fields instead of relying on prose in the description.

Stronger generated clients. result-type generates into sealed interfaces / discriminated unions / sum types, so the compiler catches missed cases. An integer status generates into a plain int that callers switch on by hand.

Extensible without fake codes. Adding outcomes like skipped or stale-etag-mismatch later is a new variant — no need to overload 429 or invent a meaning for an HTTP code that doesn't fit.

Internally consistent with this PR. LoadRelationResult already uses object-type as a string discriminator; reusing the same style for per-item outcomes keeps the reader's mental model uniform.

stevenzwu · 2026-04-01T18:24:43Z

+      type: string
+      description: |
+        The type of a catalog object.
+      enum:


We may add values such as materialized-view or function in the future.

Materialized view is just a type of view, so I'm not sure we need to distinguish.

The response object is different for MV, which contains both view metadata and storage table metadata. If we want to load a MV in one round trip, a specific MV type is needed so that client knows how to parse the response.

In the Java library, we may need to define a MaterializedView type, which could be mostly just a container class for a View and a Table fields.

ldsantos0911 · 2026-04-17T05:26:33Z

+          items:
+            $ref: '#/components/schemas/BatchLoadRelationRequestItem'
+
+    BatchLoadRelationRequestItem:


At either this layer or the URL layer, would it make sense to honor the referenced-by parameter available to loadTable and loadView?

I think it would have to be at the pre-request level since you need to distinguish different loads (and they may be treated differently for authorization)

danielcweeks · 2026-04-21T18:40:35Z

-                  "GET /v1/{prefix}/namespaces/{namespace}/views/{view}"
+                  "GET /v1/{prefix}/namespaces/{namespace}/views/{view}",
+                  "GET /v1/{prefix}/namespaces/{namespace}/relations/{relation}",
+                  "POST /v1/{prefix}/relations/batch-load"


This is a little pedantic, but I'm not entirely sold on the resource path here. I looked across many different implementations and found lots of approaches. However, I don't think we need to put the /batch-load at the end. We can just leave it as POST /v1/{prefix}/relations. I could see that you might say "what if we need to create", but we already have a transactions endpoint that is for that operation.

I added batch-load to the path because POST is used for the batch load so that the list of identifiers can be encoded in the request payload.

If we remove it, it is a bit weird to have POST /v1/{prefix}/relations for batch get purpose.

danielcweeks · 2026-04-21T18:47:42Z

+        Servers MAY cap the amount of computation or response payload size per request and return
+        `unprocessed-identifiers` for items they did not process. Clients SHOULD retry unprocessed
+        identifiers in a subsequent request.


Do we want to describe how to handle if too many items are requested? I understand the server can process a subset and return the unprocessed list, but what if someone lists an entire catalog and then asks for 100K resources? What error code can/should the server return if they consider the request unreasonable (400 might fit that)

Good catch, thanks. Updated the spec.

unprocessed-identifiers was only meant as a cooperative soft cap — the server accepted the request, then stopped partway due to its own cost budget. It doesn't cover the "please reject this upfront" case you described.

Added a typed hard cap to CatalogConfig:

relations-batch-load-max-items: type: integer minimum: 1

Advertising the limit lets well-behaved clients chunk proactively, and servers that receive an oversized request still have a spec-supported way to reject with 400.

The batchLoadRelations description and unprocessed-identifiers field now spell out the two mechanisms as distinct:

relations-batch-load-max-items — governs whether the request is accepted at all (400 if exceeded).

unprocessed-identifiers — reports partial progress within an accepted request.

Went with 400 rather than 413 since the limit is on identifier count (a request property), not body size — happy to switch if you disagree.

Add the CatalogObjectType string enum (table, view) as defined in apache#15830 (universal relation load). It serves as the companion discriminator for CatalogObjectIdentifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Introduces two related REST schemas: - CatalogObjectIdentifier: a bare array of hierarchical levels that references a catalog object (table, view, or namespace). The object kind is determined by context (e.g. the endpoint or a companion CatalogObjectType discriminator), not by the identifier structure alone. - CatalogObjectType: an enum of "table", "view", and "namespace" intended to be used as a discriminator alongside CatalogObjectIdentifier. Also regenerates rest-catalog-open-api.py to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a single POST /v1/{prefix}/resolve endpoint that resolves one or more catalog objects to their current state in one request. Stacks on top of the CatalogObjectIdentifier + CatalogObjectType schemas introduced in the previous commit. Request body carries one or more typed arrays of items (currently only `relations`; designed to be extended with sibling arrays such as `functions` in the future). Each relation item carries a CatalogObjectIdentifier plus optional per-item hints (`etag`, `snapshots`) that apply when the resolved relation is a table. Response body carries parallel typed arrays of per-item results. For `relations`, each result is a ResolveRelationResult whose `status` field discriminates between three outcomes: - `loaded`: the relation exists; the typed payload is returned in `result` as a LoadRelationResult (object-type + table/view branch). For tables, the current `etag` MAY also be included. - `not-modified`: the relation is a table whose ETag matches the caller's provided `etag`; no payload is returned, only the current `etag`. - `not-found`: no table or view exists for the identifier; the item MAY include a structured error. Partial progress: servers MAY return a subset of items under `unprocessed` (a typed object mirroring the request shape, e.g. `unprocessed.relations`) when a request exceeds internal cost/payload budgets. Each unprocessed entry carries the identifier plus optional `code` and `reason`. CatalogConfig gains a `resolve-max-items` field advertising the maximum total items the server will accept in one request. Exceeding the limit MUST cause the server to reject the whole request with 400; the `unprocessed` mechanism is distinct and reports partial progress within an accepted request. Authorization failures for any requested item SHOULD fail the entire request with 403; the error body carries `forbidden-identifiers` listing the offending identifiers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop the separate UnprocessedRelation schema and its diagnostic fields (`code`, `reason`). Instead, `UnprocessedItem.relations` now echoes the original `ResolveRelationItem` entries the server didn't process, so clients retry by re-submitting exactly those items without having to reconstruct them. This keeps unprocessed entries in lockstep with the request shape (including per-item `etag` and `snapshots` hints) as the schema grows with future typed arrays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added the OPENAPI label Mar 30, 2026

stevenzwu mentioned this pull request Mar 30, 2026

REST Spec: add batch load endpoints for tables and views #15528

Closed

stevenzwu commented Mar 30, 2026

View reviewed changes

stevenzwu force-pushed the rest-spec-universal-load branch from fd3e2d9 to 9a1edfd Compare March 30, 2026 20:40

danielcweeks reviewed Apr 1, 2026

View reviewed changes

Comment thread open-api/rest-catalog-open-api.yaml Outdated

danielcweeks reviewed Apr 1, 2026

View reviewed changes

stevenzwu commented Apr 1, 2026

View reviewed changes

stevenzwu mentioned this pull request Apr 2, 2026

Core: Add Java reference implementation for relation load endpoints #15831

Draft

stevenzwu force-pushed the rest-spec-universal-load branch from cebd6ab to b093e29 Compare April 2, 2026 19:40

ldsantos0911 reviewed Apr 17, 2026

View reviewed changes

danielcweeks reviewed Apr 21, 2026

View reviewed changes

stevenzwu mentioned this pull request Apr 28, 2026

OpenAPI: Add CatalogObjectIdentifier schema #16144

Open

3 tasks

stevenzwu changed the title ~~REST Spec: Add single and batch endpoints for loading relational objects (table, view, and future MV)~~ REST Spec: Add resolve endpoint for catalog objects Apr 29, 2026

stevenzwu and others added 2 commits April 29, 2026 15:33

stevenzwu force-pushed the rest-spec-universal-load branch from 20207d4 to f1ff6e7 Compare April 30, 2026 20:26

		"GET /v1/{prefix}/namespaces/{namespace}/relations/{relation}",
		"POST /v1/{prefix}/relations/batch-load"

		The server resolves each identifier as a table or view.


		The per-item `status` in the response indicates the outcome:

Conversation

stevenzwu commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielcweeks Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stevenzwu commented Mar 30, 2026 •

edited

Loading

stevenzwu Apr 1, 2026 •

edited

Loading

danielcweeks Apr 1, 2026 •

edited

Loading

stevenzwu Apr 3, 2026 •

edited

Loading

stevenzwu Apr 21, 2026 •

edited

Loading

stevenzwu Apr 21, 2026 •

edited

Loading

stevenzwu Apr 21, 2026 •

edited

Loading

stevenzwu Apr 21, 2026 •

edited

Loading