REST Spec: Events endpoint by c-thiel · Pull Request #12584 · apache/iceberg

c-thiel · 2025-03-20T09:32:57Z

Proposal: https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?usp=sharing

c-thiel · 2025-04-16T12:49:47Z

There seems to be a bug in the updated datamodel-code-generator==0.28.5 that is used on main, which is why the pipeline fails.
datamodel-code-generator==0.28.4 generates the discriminator correctly as discriminator='reference-type' (with -) while the updated version on main uses _ which I believe is wrong.

github-actions · 2025-07-01T00:20:51Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

rdblue · 2025-07-10T20:11:22Z

+            List of catalog objects (namespaces, tables, views) to get events for.
+            If not provided, events for all objects will be returned subject to other filters.
+            For specified namespaces, events for the namespaces and all containing objects
+            (namespaces, tables, views) will be returned.


Minor: is "will" correct if this is defining what services should or must do? I would expect this to be "must be returned" if we want to specify the service behavior.

In addition, is the requirement to return changes for child objects recursive or for one layer only?

Changed to "must". Added hint for recursive.

The main motivation for this beeing recursive is that a namespace has almost no value has it only has the update-namespace-properties event attached to it. What most folks are probably interested in is everything that happend in a Namespace.
Thinking about it again though its not nice that some filter behave differently than others. Happy to discuss this again.

After discussing this we ended up keeping it recursive for the following reasons:

Otherwise not possible to sub-select an area of the Catalog without previously crawling it

Otherwise not possible to efficiently listen to Create Events, as specific identifiers are not yet known

Consistent with the behaviour on Warehouse level which is recursive.

rdblue · 2025-07-10T20:15:17Z

+            If not provided, all types are returned.
+        catalog-objects:
+          type: array
+          discriminator:


This is probably my lack of experience with Open API, but what is this doing? Is the array of a single type or multiple types?

My expectation is that items would be a type that has a descriminator so you could request changes for a view and a table -- for instance to see changes to a view that references a table. But with the descriminator nested in the array I'm wondering if this is a single type for all items. And if so, what does that look like in JSON given that the array itself must contain objects and can't have a reference-type property.

Looks like reference-type is included in each type definition so I'm guessing this behaving as I would expect. It just seems like a confusing way to embed this since the array can't have a property. Refactoring this may make it more clear.

Your understanding of the behavior is correct. I tested it with two code generators and both worked OK. Nonetheless it is a quite exotic construct. I used it to avoid introducing another type and safe some code.

As clarity is more important, I added a new type CatalogObjectReference now.

rdblue · 2025-07-10T20:17:31Z

+            If not provided, events for all objects will be returned subject to other filters.
+            For specified namespaces, events for the namespaces and all containing objects
+            (namespaces, tables, views) will be returned.
+        custom-filters:


Why have a sub-object of custom filters rather than setting additionalProperties: true on the request object?

I think this also allows passing any JSON structure. Do we want to limit that as we do elsewhere?

additionalProperties: type: string

I prefer to separate custom filters to avoid potential collisions with filters that we standardize later.

I added type: string

Do we already have custom filter use cases in mind?

A filter on actor for example

rdblue · 2025-07-10T20:22:50Z

+            Implementation-specific filter extensions. Implementations may define custom filter 
+            properties beyond the standard ones defined in this specification.
+
+    NamespaceReference:


Are these references necessary? Namespaces, views, and tables share a common namespace so any identifier can identify only one. Could this be simpler by using a single more generic identifier like ObjectIdentifier?

I'm not sure what the value of the separation between TableIdentifier and Namespace is, so I want to avoid unnecessary complexity based on it.

I thought about this too, but couldn't find another good way that:

Differentiates between Views, Tables and Namespaces, so that individual objects can be targeted. TableIdentifier requires name, so we can't use it to express "Give me everything in Namespace X"

Keeps uuids. I prefer to work with UUIDs instead of names. Keeping UUIDs here was also a wish from the community.

If you have another idea how we can make this more compact, I am happy to change!
Using inline schemas led to almost the same amount of code, but was much less readable.

rdblue · 2025-07-10T20:26:14Z

+      properties:
+        next-page-token:
+          $ref: "#/components/schemas/PageToken"
+        highest-processed-timestamp-ms:


Thinking through how catalogs will evolve to support transactions, I think it is likely that some of them will have a catalog-level sequence number for changes. This schema doesn't prevent us from either allowing that in a response or adding it in the future, right? I don't think so but I want to raise it for everyone to think about.

It does not prevent us from adding it in the future.
Returning it now would be a minor violation as we don't set additionalProperties, although clients should be prepared to discard unknown fields anyway.

We have the request-id also as part of the Event itself, which could be filled with the transaction id of a catalog if the catalog supports this.

Did I misunderstand the purpose of next-page-token? Is it only for pagination of one query checkpoint?

Is the timestamp meant for the query checkpoint? Some catalogs may implement a global catalog sequence number to order all catalog changes, which would be a great fit for the query checkpoint for continuation.

If we use a timestamp as the continuation point, it would requires server to implement a monotonically increasing clock for this to work correctly.

Only the next-page-token can and should be used for a query checkpoint and it is only valid for the same filter combination.

We had some discussion about using a global sequence number instead, but settled in a discussion that not all catalogs offer a global sequence number, so we cannot rely on it for pagination. highest-processed-timestamp-ms is only informational for the user, so that clients get an idea on what events have been included in the current batch.

If we introduce the concept of a global sequence number elsewhere (presumably optional?) we should introdcue this as a new field in the request & response objects of the Events Endpoint as well.

Only the next-page-token can and should be used for a query checkpoint

thanks for confirming that.

If we introduce the concept of a global sequence number elsewhere (presumably optional?) we should introdcue this as a new field in the request & response objects of the Events Endpoint as well.

If catalog has a global sequence number, the PageToken can be set to the sequence number for checkpoint/continuation point.

does this require timestamp to be monotonic? if not, what's the value of this field? We already have the continuation-token for the next query resume point.

…spaceResponse

c-thiel · 2025-12-06T15:36:47Z

Summary of changes since early December 2025. Changes are motivated from discussions in this threads or learnings from the Rust request & response object implementations in apache/iceberg-rust#1907:

Allow JSON Values for custom filter (change additionalProperties: type: String to additionalProperties: true in 7f17a42
Make Actor an Object instead of a string in 3eeb95e. In most real-world szenarios an Actor will have more than one field. The new design is more extendable as we don't have to squish a JSON object into a string type, but can instead pass an extendable Object as actor.
Add missing updates field to CreateViewOperation
Rename EventsResponse to QueryEventsResponse
Rename CatalogObject to CatalogObjectIdentifier
Rename event-count in Event to request-event-count to be inline with request-id and hint a bit more that these are the events in a request and not the total events in the response
Rename page-token to continuation-token and make it required. For reasoning see Discussion in 178ed3e
Add missing type: string to OperationType - anyof - enum field in c73e200
Fix duplicate namespace field in CreateNamespaceOperation (via allOf and explicit) in 2f9b4db

Ping @stevenzwu @aheev

aheev · 2025-12-06T16:22:04Z

Summary of changes since early December 2025. Changes are motivated from discussions in this threads or learnings from the Rust request & response object implementations in apache/iceberg-rust#1907:

Allow JSON Values for custom filter (change additionalProperties: type: String to additionalProperties: true in 7f17a42

Make Actor an Object instead of a string in 3eeb95e. In most real-world szenarios an Actor will have more than one field, and even if not, the Catalog can simply return {"id": "..."}, which IMO would still be clearer and more extendable than a plain string.

Add missing updates field to CreateViewOperation

Rename EventsResponse to QueryEventsResponse

Rename CatalogObject to CatalogObjectIdentifier

Rename event-count in Event to request-event-count to be inline with request-id and hint a bit more that these are the events in a request and not the total events in the response

Rename page-token to continuation-token and make it required. For reasoning see Discussion in 178ed3e

Add missing type: string to OperationType - anyof - enum field in c73e200

Fix duplicate namespace field in CreateNamespaceOperation (via allOf and explicit) in 2f9b4db

Ping @stevenzwu @aheev

Thanks a bunch 🙌 I was waiting for this. I will go ahead and apply in the other PR

aheev · 2025-12-08T11:08:22Z

Make Actor an Object instead of a string in 3eeb95e. In most real-world szenarios an Actor will have more than one field, and even if not, the Catalog can simply return {"id": "..."}, which IMO would still be clearer and more extendable than a plain string.

@c-thiel do you want to allow JSON values for actor as well?

c-thiel · 2025-12-09T08:10:17Z

Make Actor an Object instead of a string in 3eeb95e. In most real-world szenarios an Actor will have more than one field, and even if not, the Catalog can simply return {"id": "..."}, which IMO would still be clearer and more extendable than a plain string.

@c-thiel do you want to allow JSON values for actor as well?

Yes - I don't think we should restrict this further.

stevenzwu · 2025-12-09T15:23:03Z

+        and basic audit capabilities.
+
+        The server encodes all necessary state in the token to ensure
+        consistent filtering across pages. Clients should use the returned `continuation-token` for


The server encodes all necessary state in the token to ensure consistent filtering across pages

maybe further remove the consistent filtering part. e.g. sth like

The server encodes all necessary state within the continuation-token. The client should treat this token as a required opaque value and pass it unchanged in subsequent requests to resume the changelog consumption.

Updates the Observability REST API proposal to align with the emerging Iceberg Events API specification (apache/iceberg#12584) per review feedback. Key changes: - Changed Events endpoint from GET to POST with request body - Moved Events API to Iceberg REST Catalog path (/api/catalog/v1/{prefix}/events) - Adopted Iceberg event structure (event-id, request-id, timestamp-ms, operation) - Added standard operation types (create-table, update-table, drop-table, etc.) - Added Polaris custom operation types with x-polaris-* prefix convention - Updated OpenAPI schemas for Iceberg compatibility - Added Section 8 documenting Iceberg alignment rationale - Added mapping table from Polaris internal events to Iceberg operations References: - Iceberg Events API PR: apache/iceberg#12584 - Iceberg Events API Doc: https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8

stevenzwu · 2026-04-01T18:53:33Z

+          $ref: '#/components/schemas/CustomOperationType'
+        # Common optional properties
+        identifier:
+          $ref: "#/components/schemas/TableIdentifier"


this is where a generic CatalogObjectIdentifier is probably a better fit

stevenzwu · 2026-04-01T18:56:28Z

+        - $ref: '#/components/schemas/CatalogObjectIdentifier'
+      example: [ "accounting", "tax" ]
+
+    CatalogObjectIdentifier:


My proposal differs in the structure: https://docs.google.com/document/d/1NTQhgNbP2dkIMuXUMA5JdwliVQKCp1TU_ux5J_AaPiw/edit?tab=t.0#heading=h.lrmay9r6i8ai

I suggest we maintain the same structure as TableIdentifier, which would also make the migration easier.

stevenzwu · 2026-04-14T21:15:38Z

+            If not provided, events for all objects must be returned subject to other filters.
+            For specified namespaces, events for the namespaces and all containing objects
+            (namespaces, tables, views) must be returned (recursively).
+        catalog-objects-by-id:


id is a bit vague in this context. if we want uuid here, let's name this as catalog-objects-by-uuid

stevenzwu · 2026-04-14T21:25:30Z

+        catalog-objects-by-id:
+          type: array
+          items:
+            $ref: "#/components/schemas/CatalogObjectUuid"


During implementation, I found this struct a bit unnecessary. It contains two fields: (1) uuid (2) object types. I am wondering if this can be simplified as just a list of UUID strings. There is already a separate filter dimension of object-types. Are we concerned about the UUID collisions across object types (like table and function)?

This is also inconsistent with the catalog-objects-by-name. From the dev thread, we are already aligned that different object types (like table and function) can have the same name/identifier. If we are concerned about the UUID conflict across object types, do we also need per item object type for the catalog-objects-by-name array since name collision is much higher likely to happen.

It is probably simpler to just keep those filter dimensions totally independent.

stevenzwu · 2026-04-14T21:40:07Z

+            If not provided, the server may choose a default page size.
+            Servers may return less results than requested for various reasons, such as
+            server side limits, payload size or processing time.
+        after-timestamp-ms:


Is this a client timestamp? if yes, I assume it is for rough times range query? Then do we need the before-timestamp-ms field too to define the time range?

I assume this is not the event timestamp (highest from the last response), as we already have the continuation-token for the query resume point.

I am wondering if we should add this filter right now. Do we have specific use cases in mind?

stevenzwu · 2026-04-14T21:45:01Z

+      properties:
+        next-page-token:
+          $ref: "#/components/schemas/PageToken"
+        highest-processed-timestamp-ms:


does this require timestamp to be monotonic? if not, what's the value of this field? We already have the continuation-token for the next query resume point.

stevenzwu · 2026-04-14T21:45:31Z

+        - highest-processed-timestamp-ms
+        - events
+      properties:
+        next-page-token:


should this response field be called continuation-token as well?

stevenzwu · 2026-04-14T22:59:43Z

+
+        Consumers should be prepared to handle 410 Gone responses when requested sequences are
+        outside the server's retention window. Consumers should also de-duplicate received events based
+        on the event's `event-id`. Consistency guarantees vary between server implementations.


nit: remove event's

flyrain · 2026-04-29T01:18:58Z

+      example: [ "accounting", "tax" ]
+
+    CatalogObjectIdentifier:
+      describe: Reference to a named object in the catalog, such as namespace, table, or view.


FYI, it would be a UDF.

stevenzwu · 2026-04-30T21:25:14Z

+            $ref: "#/components/schemas/CatalogObjectUuid"
+          description: >
+            Filter events by the list of catalog objects referenced by UUID (tables, views).
+        object-types:


just a note for the reference link

Follow up on this discussion #16144 (comment)

@c-thiel object identifier is not unique across whole catalog. e.g. a table and a function can have the same identifier. Currently, it seems that the operation-type (like create-table) covers the object type along with the operation type. Maybe they can be decoupled: object-type would be table and operation-type would be create?

nm. split into object and operation types would complicate the discriminator field. OpenAPI only allows a single string for discriminator and we can't natively discriminate on a tuple. nested discrimination seems more complex.

But we can still add the object-type: table (etc.) as a const field on each subtype. Wire-level symmetry with the request filter is achieved, and CustomOperation becomes self-describing because it carries an explicit object-type. Cost: the value is technically derivable from operation-type for the standard ops, so it's "redundant" — but for custom ops it's load-bearing, and for clients it's a stable field they don't have to parse.

c-thiel marked this pull request as draft March 20, 2025 09:33

github-actions Bot added the OPENAPI label Mar 20, 2025

nastra reviewed Mar 26, 2025

View reviewed changes

Comment thread open-api/rest-catalog-open-api.yaml Outdated

c-thiel added 4 commits May 7, 2025 11:06

Events endpoint

5fa2b5a

Address comments

9f9898c

revert format changes

f1e25fe

rename transaction to request-id, Custom Operations, Actors

4d67051

c-thiel force-pushed the ct/irc-events-endpoint branch from c972354 to 4d67051 Compare May 7, 2025 09:07

c-thiel added 3 commits May 27, 2025 09:48

remove "assumed-by" recursion in favor of "actor-chain"

1c36850

fix indentation

5cfce4a

fix remove copy

86a0f5a

nastra reviewed May 27, 2025

View reviewed changes

Comment thread open-api/rest-catalog-open-api.yaml Outdated

snazy reviewed May 27, 2025

View reviewed changes

Comment thread open-api/rest-catalog-open-api.yaml Outdated

Comment thread open-api/rest-catalog-open-api.yaml

Comment thread open-api/rest-catalog-open-api.yaml Outdated

adnanhemani reviewed May 27, 2025

View reviewed changes

Comment thread open-api/rest-catalog-open-api.py Outdated

address comments

3de9a7c

c-thiel force-pushed the ct/irc-events-endpoint branch from 14b249e to 3de9a7c Compare May 28, 2025 17:21

github-actions Bot added the stale label Jul 1, 2025

Fokko added not-stale and removed stale labels Jul 1, 2025

Address comments

c54125d

wgtmac reviewed Jul 2, 2025

View reviewed changes

Comment thread open-api/rest-catalog-open-api.yaml Outdated

fix: remove obsolete metadata from required

0e8ff9b

rdblue reviewed Jul 10, 2025

View reviewed changes

Comment thread open-api/rest-catalog-open-api.yaml Outdated

rdblue reviewed Jul 10, 2025

View reviewed changes

gaborkaszab reviewed Dec 4, 2025

View reviewed changes

Comment thread open-api/rest-catalog-open-api.yaml Outdated

c-thiel added 9 commits December 4, 2025 13:14

address comments

df4a4fd

page-token -> continuation-token

178ed3e

Merge branch 'main' into ct/irc-events-endpoint

72318fd

make linter happier

492d555

fix: OperationType enum type

c73e200

make linter happy

c9fa451

fix: QueryEventsRequest allow JSON values for custom filters

7f17a42

fix: request-event-count ename, fix duplicate namespace in CreateName…

2f9b4db

…spaceResponse

make actor type object

3eeb95e

c-thiel mentioned this pull request Dec 6, 2025

feat: Event API Types apache/iceberg-rust#1907

Closed

address comments

58f1dfc

stevenzwu reviewed Dec 9, 2025

View reviewed changes

Jayesh45-master mentioned this pull request Dec 26, 2025

Implement in-memory test harness for IRC Events REST endpoint #14929

Closed

adnanhemani mentioned this pull request Mar 3, 2026

(Proposal) REST API for Events apache/polaris#3924

Closed

6 tasks

stevenzwu reviewed Apr 1, 2026

View reviewed changes

stevenzwu reviewed Apr 14, 2026

View reviewed changes

stevenzwu changed the title ~~Proposal: IRC Events endpoint~~ REST Spec: Events endpoint Apr 14, 2026

stevenzwu reviewed Apr 14, 2026

View reviewed changes

stevenzwu mentioned this pull request Apr 15, 2026

API, Core: Implement events endpoint #15990

Draft

stevenzwu mentioned this pull request Apr 28, 2026

OpenAPI: Add CatalogObjectIdentifier schema #16144

Open

3 tasks

flyrain reviewed Apr 29, 2026

View reviewed changes

stevenzwu reviewed Apr 30, 2026

View reviewed changes

visit2rahul mentioned this pull request May 12, 2026

Core: Implement IRC Events endpoint request and response objects #16296

Open

7 tasks

Conversation

c-thiel commented Mar 20, 2025

Uh oh!

Uh oh!

c-thiel commented Apr 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

c-thiel commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aheev commented Dec 6, 2025

Uh oh!

aheev commented Dec 8, 2025

Uh oh!

c-thiel commented Dec 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

rdblue Jul 10, 2025 •

edited

Loading

stevenzwu Dec 3, 2025 •

edited

Loading

c-thiel commented Dec 6, 2025 •

edited

Loading

stevenzwu May 5, 2026 •

edited

Loading

stevenzwu May 5, 2026 •

edited

Loading