docs(benchmarking): Record GLM 5.2 corpus run by dcramer · Pull Request #423 · getsentry/warden

dcramer · 2026-07-02T22:14:29Z

Record the OpenRouter GLM 5.2 high Sentry corpus benchmark result in the docs benchmark data and add the matching readout. The row captures the validated high-effort run after failed shard repair, including the lower recall and the run cleanup needed to make the artifacts comparable.

Benchmark Result

Add the traced GLM 5.2 result JSON with 15 of 86 known corpus findings found, 18 emitted findings, and the validated token, cost, and timing summaries.

Run Notes

Document the GLM 5.2 no-finding parser issue, the combined-clean shard artifacts, and the seer_rpc.py lower-parallelism repair so the result is interpreted as benchmark data with an operational caveat.

Add the OpenRouter GLM 5.2 high Sentry corpus benchmark result and the matching docs readout. Note the no-finding JSON parser issue and the repaired shard handling so reviewers can distinguish benchmark performance from run cleanup. Co-Authored-By: GPT-5 Codex <noreply@anthropic.com>

dcramer force-pushed the docs/glm-52-benchmark branch from 7fa7449 to 2a6d84d Compare July 2, 2026 22:17

vercel Bot deployed to Preview July 2, 2026 22:18 View deployment

dcramer marked this pull request as ready for review July 2, 2026 22:19

dcramer merged commit c16f264 into main Jul 2, 2026
22 checks passed

dcramer deleted the docs/glm-52-benchmark branch July 2, 2026 22:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

docs(benchmarking): Record GLM 5.2 corpus run#423

docs(benchmarking): Record GLM 5.2 corpus run#423
dcramer merged 1 commit into
mainfrom
docs/glm-52-benchmark

dcramer commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

dcramer commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant