OpenAPI: Add CatalogObjectIdentifier schema by stevenzwu · Pull Request #16144 · apache/iceberg

stevenzwu · 2026-04-28T17:57:22Z

Summary

Adds the CatalogObjectIdentifier schema to the REST catalog spec — an ordered list of hierarchical levels (["accounting", "tax", "paid"]) that works uniformly for tables, views, materialized views, and namespaces. The kind of object an identifier refers to is determined by context (the endpoint, or a companion type discriminator defined by that endpoint), not by the identifier structure itself.

Structurally the same as Namespace (a bare array of strings); the distinct name signals "any catalog object" rather than specifically a namespace path.

Motivation

Multiple concurrent efforts need a generic catalog-object identifier and would otherwise each introduce their own:

Events endpoint (REST Spec: Events endpoint #12584)
Resolve endpoint for relational objects (REST Spec: Add resolve endpoint for catalog objects #15830)
Functions endpoint (REST spec: add list/load function endpoints to OpenAPI spec #15180)

Introducing one shared schema avoids identifier proliferation as new object types (functions, materialized views) are added to the spec.

Scope

Intentionally minimal:

Add CatalogObjectIdentifier alongside the existing TableIdentifier and Namespace.
No changes to existing endpoints. All current references to TableIdentifier and Namespace are preserved — no breaking changes.
No discriminator enum is included. The originally-proposed shared CatalogObjectType was dropped during review; the resolve and events endpoints will each define their own narrower closed enums in their respective PRs, since each has a different forward-compat profile.

Test plan

make -C open-api lint passes (openapi-spec-validator + yamllint --strict)
make -C open-api generate regenerates rest-catalog-open-api.py cleanly
python3 -m py_compile open-api/rest-catalog-open-api.py succeeds

🤖 Generated with Claude Code

Introduces two related REST schemas: - CatalogObjectIdentifier: a bare array of hierarchical levels that references a catalog object (table, view, or namespace). The object kind is determined by context (e.g. the endpoint or a companion CatalogObjectType discriminator), not by the identifier structure alone. - CatalogObjectType: an enum of "table", "view", and "namespace" intended to be used as a discriminator alongside CatalogObjectIdentifier. Also regenerates rest-catalog-open-api.py to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds Java reference implementations for the REST schemas introduced in apache#16144. - api/org.apache.iceberg.catalog.CatalogObjectIdentifier: hand-written POJO mirroring Namespace — static of(String...) factory, null/null-byte validation, levels()/level(i)/length() accessors, dotted toString. - api/org.apache.iceberg.catalog.CatalogObjectType: enum of TABLE, VIEW, NAMESPACE with lowercase wire strings and a fromName factory, mirroring PlanStatus. - Registers a bare-array serializer and deserializer for CatalogObjectIdentifier in RESTSerializers, matching the way Namespace is wired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

danielcweeks · 2026-04-30T18:02:48Z

+    CatalogObjectType:
+      type: string
+      description: |
+        The type of a catalog object.
+      enum:
+        - table
+        - view
+        - namespace


I think there was an open question about how and where this would actually be used. Introducing type without context leaves me unsure if it's inline with how we want to reference types.

For example, you could have the resolve endpoint return:

[ [ identifier, type, metadata ] [ identifier, type, metadata ] ... ]

or:

tables: <identifier, metadata> views: <identifier, metadata> namespaces: <identifier, metadata>

The first approach requires type, but might impact backward compatibility. If we introduce a new type (e.g. function) then clients would break if they don't understand the type. The second approach allows you to extend the response object without modifying the original fields.

I'd like to see how it would be referenced in context.

I updated the design doc for the resolve endpoint and included example request and response payload json
https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0#heading=h.z0wh4486aab5

Here is the usage from the events endpoint spec PR. It is used as an event filter in the request body.
https://github.com/apache/iceberg/pull/12584/changes#r3170935023

In the Events Endpoint having this explicit type filter is valuable I believe which is why we introduced this type there.
Shouldn't clients be tolerant in both cases - for unknown enum variants as well as unknown fields?

I don't think it's as easy as saying the clients should tolerate it. If you add new types, generated parsers will break when they encounter something that wasn't originally enumerated. That's why the type approach feels brittle.

@rdblue might have an opinion here since he originally brought it up in the discussion.

To clarify Dan's framing, here are the two response shapes he sketched:

Shape A — flat array of typed records, each carrying its own discriminator:

[ { "identifier": [...], "type": "table", "metadata": {...} }, { "identifier": [...], "type": "view", "metadata": {...} } ]

Shape B — separate bucket per kind:

{ "tables": [ { "identifier": [...], "metadata": {...} }, ... ], "views": [ ... ], "namespaces": [ ... ] }

Dan's critique: Shape A's type is a closed enum, so adding a new value (e.g. function) breaks generated parsers on old clients.

The Shape B counter has its own forward-compat hole — and a worse one. An old client that doesn't know the materialized_views key never iterates that field, so the identifier just vanishes from the resolved set; the client can't even distinguish "didn't resolve" from "resolved to a kind I don't recognize." A parse error at least tells you something is wrong.

The design doc actually already proposes a hybrid that handles the top-level concern: per-category typed arrays (relations: [...], future functions: [...]) bucket the major categories Shape-B style, while inside each bucket the result is a flat record that carries an object-type discriminator Shape-A style:

{ "relations": [ { "identifier": ["analytics", "daily_sales_view"], "status": "loaded", "result": { "object-type": "view", "view": { /* LoadViewResult */ } } } ] }

Adding a new top-level category (functions) is safe: generators silently ignore unknown top-level fields. The forward-compat risk only lives on result.object-type — that's where adding materialized-view later could break old codegen if it's a closed enum.

Proposal: spec result.object-type (and CatalogObjectType itself) as type: string with documented values, not a closed enum.

ResolveResult: type: object required: [object-type] properties: object-type: type: string description: | Object kind. Currently one of: table, view, materialized-view. Clients should fall through to a default handler for unrecognized values. table: $ref: '#/components/schemas/LoadTableResult' view: $ref: '#/components/schemas/LoadViewResult'

What this buys:

No codegen breakage. datamodel-codegen emits object_type: str, not Literal[...]. Pydantic, Jackson with default config, Go's encoding/json, etc. all accept arbitrary strings.

No silent data loss. Every resolved identifier still has a row with identifier, status, and result. A hand-written client switches on known object-type values and falls through unknown ones to a "skip / report unsupported" branch with full visibility.

Schema still documents the valid set. The description carries the enumeration; humans and IDE tooltips see it. We can validate server-side conformance in our own CI without imposing closed-enum behavior on every generated client.

Filter input keeps the closed enum. CatalogObjectType as enum on request bodies (e.g. the events filter) is fine — the client picks values from its own schema; the server tolerates older filter sets.

Prior art for this pattern:

Iceberg REST already uses it internally. The most-evolved discriminator fields in this spec are declared as plain type: string, not closed enums, with valid values listed via discriminator.mapping:

MetadataUpdate.action (BaseUpdate) — the set has grown over time (add-encryption-key, remove-encryption-key, remove-schemas, remove-partition-specs, enable-row-lineage, set-partition-statistics, remove-partition-statistics, …) without breaking clients with stale schemas.

TableRequirement.type — same pattern.

ViewRequirement.type — same pattern.

Closed enum in this spec is reserved for stable sets (FileFormat, SortDirection, NullOrder, SnapshotRefType) that aren't expected to grow casually. CatalogObjectType is clearly in the first group — materialized-view and function are already foreseen.

Updated proposal: drop the shared CatalogObjectType and split it into two narrower closed enums, one per use case.

Resolve response: closed enum [table, view, materialized-view]. The relations universe is small enough to enumerate, and pre-listing materialized-view even before MV is implemented means clients ship with the value already in their schema — no breakage when a server starts returning it. Trade-off is a maintenance commitment: adding a 4th relational type later (e.g. external-table, streaming-view) becomes a coordinated spec change rather than a transparent extension.

Events filter: closed enum on the request body. Client picks values from its own schema; server rejects unknowns with a clear error. No codegen-breakage concern, since the field never carries server-originated values back to old clients.

Events response: no separate object-type discriminator needed. The events spec (#12584) already defines an operation-type enum (create-table, drop-view, etc.) on each event, which implicitly encodes the kind of object the event is about.

Naming: two independent named schemas — RelationType near the resolve endpoint, EventObjectType (or similar) near events. Decoupling is the point: each enum evolves on its own schedule, which is the property the shared CatalogObjectType couldn't deliver.

CatalogObjectType was introduced as a single shared discriminator for two prospective consumers — the resolve endpoint response and the events endpoint filter — but those use cases have different forward-compat profiles and will define their own narrower closed enums (RelationType near resolve, EventObjectType near events). Removing the shared schema avoids cross-coupling those evolution schedules and lets each endpoint commit only to the values it needs. Also rewords the CatalogObjectIdentifier description to no longer reference CatalogObjectType by name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the spec change in apache#16144: the shared CatalogObjectType schema is being removed since the resolve and events endpoints will each define their own narrower closed enums in their respective PRs. Also rewords the CatalogObjectIdentifier Javadoc to no longer reference CatalogObjectType by name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds Java reference implementations for the REST schemas introduced in apache#16144. - api/org.apache.iceberg.catalog.CatalogObjectIdentifier: hand-written POJO mirroring Namespace — static of(String...) factory, null/null-byte validation, levels()/level(i)/length() accessors, dotted toString. - api/org.apache.iceberg.catalog.CatalogObjectType: enum of TABLE, VIEW, NAMESPACE with lowercase wire strings and a fromName factory, mirroring PlanStatus. - Registers a bare-array serializer and deserializer for CatalogObjectIdentifier in RESTSerializers, matching the way Namespace is wired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the spec change in apache#16144: the shared CatalogObjectType schema is being removed since the resolve and events endpoints will each define their own narrower closed enums in their respective PRs. Also rewords the CatalogObjectIdentifier Javadoc to no longer reference CatalogObjectType by name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

flyrain · 2026-05-12T21:33:53Z

+        companion type discriminator), not by the identifier structure alone.
+      type: array
+      items:
+        type: string


Do we apply any constraints on the table/view/namespace name? For example, no slash(/) is allowed. Given we didn't specify any constraint on the table identifier, it's not a blocker for this PR. We can work on that as a followup. We can also discuss whether we could avoid any constraints in IRC, and relying on the implementations(catalogs, engines) to cast their options.

We have previously let folks do whatever they want in string fields and left it up to the implementation to decide whether or not that string is invalid.

github-actions Bot added the OPENAPI label Apr 28, 2026

stevenzwu changed the title ~~OpenAPI: Add CatalogObjectIdentifier schema~~ OpenAPI: Add CatalogObjectIdentifier and CatalogObjectType schemas Apr 28, 2026

stevenzwu mentioned this pull request Apr 29, 2026

API, Core: Add CatalogObjectIdentifier #16160

Open

3 tasks

stevenzwu force-pushed the rest-catalog-object-identifier branch from 0b17740 to e6a0323 Compare April 29, 2026 22:33

danielcweeks reviewed Apr 30, 2026

View reviewed changes

stevenzwu mentioned this pull request May 5, 2026

REST Spec: Events endpoint #12584

Open

stevenzwu changed the title ~~OpenAPI: Add CatalogObjectIdentifier and CatalogObjectType schemas~~ OpenAPI: Add CatalogObjectIdentifier schema May 11, 2026

danielcweeks approved these changes May 12, 2026

View reviewed changes

flyrain reviewed May 12, 2026

View reviewed changes

flyrain approved these changes May 12, 2026

View reviewed changes

RussellSpitzer approved these changes May 12, 2026

View reviewed changes

huaxingao approved these changes May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAPI: Add CatalogObjectIdentifier schema#16144

OpenAPI: Add CatalogObjectIdentifier schema#16144
stevenzwu wants to merge 2 commits into
apache:mainfrom
stevenzwu:rest-catalog-object-identifier

stevenzwu commented Apr 28, 2026 •

edited

Loading

Uh oh!

danielcweeks Apr 30, 2026

Uh oh!

stevenzwu Apr 30, 2026 •

edited

Loading

Uh oh!

c-thiel May 4, 2026

Uh oh!

danielcweeks May 7, 2026

Uh oh!

stevenzwu May 7, 2026 •

edited

Loading

Uh oh!

stevenzwu May 8, 2026

Uh oh!

flyrain May 12, 2026

Uh oh!

RussellSpitzer May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

stevenzwu commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Scope

Test plan

Uh oh!

danielcweeks Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

c-thiel May 4, 2026

Choose a reason for hiding this comment

Uh oh!

danielcweeks May 7, 2026

Choose a reason for hiding this comment

Uh oh!

stevenzwu May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu May 8, 2026

Choose a reason for hiding this comment

Uh oh!

flyrain May 12, 2026

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

stevenzwu commented Apr 28, 2026 •

edited

Loading

stevenzwu Apr 30, 2026 •

edited

Loading

stevenzwu May 7, 2026 •

edited

Loading

RussellSpitzer May 12, 2026 •

edited

Loading