OpenAPI: Add CatalogObjectIdentifier schema#16144
Conversation
Introduces two related REST schemas: - CatalogObjectIdentifier: a bare array of hierarchical levels that references a catalog object (table, view, or namespace). The object kind is determined by context (e.g. the endpoint or a companion CatalogObjectType discriminator), not by the identifier structure alone. - CatalogObjectType: an enum of "table", "view", and "namespace" intended to be used as a discriminator alongside CatalogObjectIdentifier. Also regenerates rest-catalog-open-api.py to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0b17740 to
e6a0323
Compare
Adds Java reference implementations for the REST schemas introduced in apache#16144. - api/org.apache.iceberg.catalog.CatalogObjectIdentifier: hand-written POJO mirroring Namespace — static of(String...) factory, null/null-byte validation, levels()/level(i)/length() accessors, dotted toString. - api/org.apache.iceberg.catalog.CatalogObjectType: enum of TABLE, VIEW, NAMESPACE with lowercase wire strings and a fromName factory, mirroring PlanStatus. - Registers a bare-array serializer and deserializer for CatalogObjectIdentifier in RESTSerializers, matching the way Namespace is wired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| CatalogObjectType: | ||
| type: string | ||
| description: | | ||
| The type of a catalog object. | ||
| enum: | ||
| - table | ||
| - view | ||
| - namespace |
There was a problem hiding this comment.
I think there was an open question about how and where this would actually be used. Introducing type without context leaves me unsure if it's inline with how we want to reference types.
For example, you could have the resolve endpoint return:
[
[ identifier, type, metadata ]
[ identifier, type, metadata ]
...
]or:
tables: <identifier, metadata>
views: <identifier, metadata>
namespaces: <identifier, metadata>The first approach requires type, but might impact backward compatibility. If we introduce a new type (e.g. function) then clients would break if they don't understand the type. The second approach allows you to extend the response object without modifying the original fields.
I'd like to see how it would be referenced in context.
There was a problem hiding this comment.
I updated the design doc for the resolve endpoint and included example request and response payload json
https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0#heading=h.z0wh4486aab5
Here is the usage from the events endpoint spec PR. It is used as an event filter in the request body.
https://github.com/apache/iceberg/pull/12584/changes#r3170935023
There was a problem hiding this comment.
In the Events Endpoint having this explicit type filter is valuable I believe which is why we introduced this type there.
Shouldn't clients be tolerant in both cases - for unknown enum variants as well as unknown fields?
There was a problem hiding this comment.
I don't think it's as easy as saying the clients should tolerate it. If you add new types, generated parsers will break when they encounter something that wasn't originally enumerated. That's why the type approach feels brittle.
@rdblue might have an opinion here since he originally brought it up in the discussion.
There was a problem hiding this comment.
To clarify Dan's framing, here are the two response shapes he sketched:
Shape A — flat array of typed records, each carrying its own discriminator:
[
{ "identifier": [...], "type": "table", "metadata": {...} },
{ "identifier": [...], "type": "view", "metadata": {...} }
]Shape B — separate bucket per kind:
{
"tables": [ { "identifier": [...], "metadata": {...} }, ... ],
"views": [ ... ],
"namespaces": [ ... ]
}Dan's critique: Shape A's type is a closed enum, so adding a new value (e.g. function) breaks generated parsers on old clients.
The Shape B counter has its own forward-compat hole — and a worse one. An old client that doesn't know the materialized_views key never iterates that field, so the identifier just vanishes from the resolved set; the client can't even distinguish "didn't resolve" from "resolved to a kind I don't recognize." A parse error at least tells you something is wrong.
The design doc actually already proposes a hybrid that handles the top-level concern: per-category typed arrays (relations: [...], future functions: [...]) bucket the major categories Shape-B style, while inside each bucket the result is a flat record that carries an object-type discriminator Shape-A style:
{
"relations": [
{
"identifier": ["analytics", "daily_sales_view"],
"status": "loaded",
"result": { "object-type": "view", "view": { /* LoadViewResult */ } }
}
]
}Adding a new top-level category (functions) is safe: generators silently ignore unknown top-level fields. The forward-compat risk only lives on result.object-type — that's where adding materialized-view later could break old codegen if it's a closed enum.
Proposal: spec result.object-type (and CatalogObjectType itself) as type: string with documented values, not a closed enum.
ResolveResult:
type: object
required: [object-type]
properties:
object-type:
type: string
description: |
Object kind. Currently one of: table, view, materialized-view.
Clients should fall through to a default handler for unrecognized values.
table:
$ref: '#/components/schemas/LoadTableResult'
view:
$ref: '#/components/schemas/LoadViewResult'What this buys:
- No codegen breakage.
datamodel-codegenemitsobject_type: str, notLiteral[...]. Pydantic, Jackson with default config, Go'sencoding/json, etc. all accept arbitrary strings. - No silent data loss. Every resolved identifier still has a row with
identifier,status, andresult. A hand-written client switches on knownobject-typevalues and falls through unknown ones to a "skip / report unsupported" branch with full visibility. - Schema still documents the valid set. The description carries the enumeration; humans and IDE tooltips see it. We can validate server-side conformance in our own CI without imposing closed-enum behavior on every generated client.
- Filter input keeps the closed enum.
CatalogObjectTypeasenumon request bodies (e.g. the events filter) is fine — the client picks values from its own schema; the server tolerates older filter sets.
Prior art for this pattern:
-
Iceberg REST already uses it internally. The most-evolved discriminator fields in this spec are declared as plain
type: string, not closed enums, with valid values listed viadiscriminator.mapping:MetadataUpdate.action(BaseUpdate) — the set has grown over time (add-encryption-key,remove-encryption-key,remove-schemas,remove-partition-specs,enable-row-lineage,set-partition-statistics,remove-partition-statistics, …) without breaking clients with stale schemas.TableRequirement.type— same pattern.ViewRequirement.type— same pattern.
Closed
enumin this spec is reserved for stable sets (FileFormat,SortDirection,NullOrder,SnapshotRefType) that aren't expected to grow casually.CatalogObjectTypeis clearly in the first group —materialized-viewandfunctionare already foreseen.
There was a problem hiding this comment.
Updated proposal: drop the shared CatalogObjectType and split it into two narrower closed enums, one per use case.
Resolve response: closed enum [table, view, materialized-view]. The relations universe is small enough to enumerate, and pre-listing materialized-view even before MV is implemented means clients ship with the value already in their schema — no breakage when a server starts returning it. Trade-off is a maintenance commitment: adding a 4th relational type later (e.g. external-table, streaming-view) becomes a coordinated spec change rather than a transparent extension.
Events filter: closed enum on the request body. Client picks values from its own schema; server rejects unknowns with a clear error. No codegen-breakage concern, since the field never carries server-originated values back to old clients.
Events response: no separate object-type discriminator needed. The events spec (#12584) already defines an operation-type enum (create-table, drop-view, etc.) on each event, which implicitly encodes the kind of object the event is about.
Naming: two independent named schemas — RelationType near the resolve endpoint, EventObjectType (or similar) near events. Decoupling is the point: each enum evolves on its own schedule, which is the property the shared CatalogObjectType couldn't deliver.
CatalogObjectType was introduced as a single shared discriminator for two prospective consumers — the resolve endpoint response and the events endpoint filter — but those use cases have different forward-compat profiles and will define their own narrower closed enums (RelationType near resolve, EventObjectType near events). Removing the shared schema avoids cross-coupling those evolution schedules and lets each endpoint commit only to the values it needs. Also rewords the CatalogObjectIdentifier description to no longer reference CatalogObjectType by name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the spec change in apache#16144: the shared CatalogObjectType schema is being removed since the resolve and events endpoints will each define their own narrower closed enums in their respective PRs. Also rewords the CatalogObjectIdentifier Javadoc to no longer reference CatalogObjectType by name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Java reference implementations for the REST schemas introduced in apache#16144. - api/org.apache.iceberg.catalog.CatalogObjectIdentifier: hand-written POJO mirroring Namespace — static of(String...) factory, null/null-byte validation, levels()/level(i)/length() accessors, dotted toString. - api/org.apache.iceberg.catalog.CatalogObjectType: enum of TABLE, VIEW, NAMESPACE with lowercase wire strings and a fromName factory, mirroring PlanStatus. - Registers a bare-array serializer and deserializer for CatalogObjectIdentifier in RESTSerializers, matching the way Namespace is wired. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the spec change in apache#16144: the shared CatalogObjectType schema is being removed since the resolve and events endpoints will each define their own narrower closed enums in their respective PRs. Also rewords the CatalogObjectIdentifier Javadoc to no longer reference CatalogObjectType by name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| companion type discriminator), not by the identifier structure alone. | ||
| type: array | ||
| items: | ||
| type: string |
There was a problem hiding this comment.
Do we apply any constraints on the table/view/namespace name? For example, no slash(/) is allowed. Given we didn't specify any constraint on the table identifier, it's not a blocker for this PR. We can work on that as a followup. We can also discuss whether we could avoid any constraints in IRC, and relying on the implementations(catalogs, engines) to cast their options.
There was a problem hiding this comment.
We have previously let folks do whatever they want in string fields and left it up to the implementation to decide whether or not that string is invalid.
Summary
Adds the
CatalogObjectIdentifierschema to the REST catalog spec — an ordered list of hierarchical levels (["accounting", "tax", "paid"]) that works uniformly for tables, views, materialized views, and namespaces. The kind of object an identifier refers to is determined by context (the endpoint, or a companion type discriminator defined by that endpoint), not by the identifier structure itself.Structurally the same as
Namespace(a bare array of strings); the distinct name signals "any catalog object" rather than specifically a namespace path.Motivation
Multiple concurrent efforts need a generic catalog-object identifier and would otherwise each introduce their own:
Introducing one shared schema avoids identifier proliferation as new object types (functions, materialized views) are added to the spec.
Scope
Intentionally minimal:
CatalogObjectIdentifieralongside the existingTableIdentifierandNamespace.TableIdentifierandNamespaceare preserved — no breaking changes.CatalogObjectTypewas dropped during review; the resolve and events endpoints will each define their own narrower closed enums in their respective PRs, since each has a different forward-compat profile.Test plan
make -C open-api lintpasses (openapi-spec-validator + yamllint --strict)make -C open-api generateregeneratesrest-catalog-open-api.pycleanlypython3 -m py_compile open-api/rest-catalog-open-api.pysucceeds🤖 Generated with Claude Code