Skip to content

[History Sever] Use compression library in collector and historyserver#4899

Open
chiayi wants to merge 2 commits into
ray-project:masterfrom
chiayi:history-server-compress
Open

[History Sever] Use compression library in collector and historyserver#4899
chiayi wants to merge 2 commits into
ray-project:masterfrom
chiayi:history-server-compress

Conversation

@chiayi

@chiayi chiayi commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Why are these changes needed?

This PR is for utilizing the compression library from #4828 in the collector and historyserver components

Related issue number

#4804

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@chiayi chiayi force-pushed the history-server-compress branch from 6eade58 to cf8f7a8 Compare June 2, 2026 19:41
@chiayi chiayi changed the title [WIP][History Sever] Use compression library in collector and historyserver [History Sever] Use compression library in collector and historyserver Jun 2, 2026
@chiayi chiayi marked this pull request as ready for review June 2, 2026 19:42
Comment thread historyserver/pkg/collector/eventcollector/eventcollector.go

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf8f7a8aca

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

sessionPath := path.Clean(path.Join(ec.root, utils.AppendRayClusterNameNamespace(ec.clusterName, ec.clusterNamespace), sessionNameToUse))

basePath := path.Join(sessionPath, "node_events", fmt.Sprintf("%s-%s", nodeIDToUse, hourKey))
basePath := path.Join(sessionPath, "node_events", fmt.Sprintf("%s-%s.gz", nodeIDToUse, hourKey))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve node_ip resolution for gzipped node events

When the collector starts writing node_events with a .gz suffix here, the dead-cluster log path that accepts node_ip can no longer resolve the node ID: ServerHandler.ipToNodeId still lists these files and searchNodeIDHexInEventFile reads them via GetContent and passes the compressed bytes directly to json.Unmarshal (historyserver/pkg/historyserver/reader.go:522-537). In environments where users call /logs with node_ip instead of node_id after collection, this will skip every newly written node event file and return node_id not found; that reader needs the same gzip-aware path as ProcessSingleSession.

Useful? React with 👍 / 👎.

@chiayi chiayi force-pushed the history-server-compress branch from cf8f7a8 to e29adaa Compare June 3, 2026 17:49

@andrewsykim andrewsykim left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rueian for second pass

Comment thread historyserver/pkg/collector/eventcollector/eventcollector.go Outdated
Comment thread historyserver/pkg/collector/eventcollector/eventcollector.go Outdated
Comment on lines +1416 to +1420
} else {
// GetContent and io.ReadAll failures are treated as transient storage errors:
// skip this file and continue.
eventioReader = h.reader.GetContent(clusterNameNamespace, eventFile)
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this branch, given that we changed eventFilePattern to match only the .gz suffix?

@chiayi chiayi Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally kept this branch for backwards compatibility, but I'm guessing we might not have to worry about that given the where the project is at currently, will instead just add a warning that file is not supported.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, PTAL!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, will change it so that there is a path to support uncompressed event files as well.

@chiayi chiayi force-pushed the history-server-compress branch from e29adaa to 81e61f9 Compare June 8, 2026 21:08

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 81e61f9. Configure here.

Comment thread historyserver/pkg/eventserver/eventserver_test.go
Comment thread historyserver/pkg/eventserver/eventserver.go Outdated
@chiayi chiayi force-pushed the history-server-compress branch 2 times, most recently from f4e4062 to 5660bc5 Compare June 10, 2026 00:22
@chiayi chiayi force-pushed the history-server-compress branch from 5660bc5 to 895293c Compare June 11, 2026 00:36
@chiayi chiayi requested a review from rueian June 11, 2026 00:37
@rueian

rueian commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Test History Server E2E keeps failing. Is it flaky?

@chiayi

chiayi commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@rueian Looked into the test failures. One of them is due to the test not being wrapped in the compression library reader. Specifically the collector_test.go and azureblob_collector_test.go and I have made the fixes.
The second one seems to be due to resource issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants