Fix deploy pipeline: retry artifact downloads, strict 4/4 filtering#66
Conversation
- deploy-dashboard.yml: Add 3-attempt retry with backoff on gh run download to handle transient GitHub API 502 errors during artifact pagination - csv-to-dashboard-json.cjs: Remove placeholder JSON generation for missing benchmarks — teams with incomplete traces are now excluded from rankings instead of getting artificially low scores - update-history.cjs: Include per-benchmark scores and metrics in history snapshots for per-trace trend analysis - .gitignore: Ignore artifacts/ and fuzz-perf/ directories Co-authored-by: Cursor <cursoragent@cursor.com>
📝 WalkthroughWalkthroughThe PR hardens the dashboard deployment pipeline with retry logic for artifact downloads, validates benchmark completeness across teams, and enriches history snapshots with per-benchmark score data. Incomplete teams are logged and excluded from rankings rather than filled with placeholders. ChangesDashboard Build and Artifact Processing
Possibly related PRs
Poem
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/csv-to-dashboard-json.cjs (1)
69-90: ⚡ Quick winConsider removing unused placeholder function.
The
makePlaceholderJsonfunction appears to be unused after removing the placeholder generation behavior. Removing it would improve code clarity.♻️ Proposed cleanup
-function makePlaceholderJson(targetName) { - return { - info: { - name: targetName, - app_version: { major: 0, minor: 0, patch: 0 }, - jam_version: { major: 0, minor: 7, patch: 2 }, - }, - stats: { - steps: 0, - imported: 0, - import_max_step: 0, - import_min: MISSING_BENCHMARK_MS, - import_max: MISSING_BENCHMARK_MS, - import_mean: MISSING_BENCHMARK_MS, - import_p50: MISSING_BENCHMARK_MS, - import_p75: MISSING_BENCHMARK_MS, - import_p90: MISSING_BENCHMARK_MS, - import_p99: MISSING_BENCHMARK_MS, - import_std_dev: 0, - }, - }; -} -Additionally,
MISSING_BENCHMARK_MS(line 37) could also be removed if no longer needed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/csv-to-dashboard-json.cjs` around lines 69 - 90, Remove the unused placeholder generator to clean up dead code: delete the makePlaceholderJson function and, if no other references exist, also remove the MISSING_BENCHMARK_MS constant declared earlier; search for usages of makePlaceholderJson and MISSING_BENCHMARK_MS in the file to confirm they are unused before deleting to avoid breaking behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/update-history.cjs`:
- Around line 65-73: The benchmarkScores serialization uses inconsistent
null-coalescing: update the metrics assignment inside the loop that builds
entry.benchmarkScores so it uses nullish coalescing like score does (replace
info.metrics || null with info.metrics ?? null) to preserve legitimate falsy
values such as 0; locate the loop where t.benchmarkScores is iterated and adjust
the metrics expression accordingly.
---
Nitpick comments:
In `@scripts/csv-to-dashboard-json.cjs`:
- Around line 69-90: Remove the unused placeholder generator to clean up dead
code: delete the makePlaceholderJson function and, if no other references exist,
also remove the MISSING_BENCHMARK_MS constant declared earlier; search for
usages of makePlaceholderJson and MISSING_BENCHMARK_MS in the file to confirm
they are unused before deleting to avoid breaking behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9e304f09-f11f-4afe-943b-b686e2713d6c
📒 Files selected for processing (4)
.github/workflows/deploy-dashboard.yml.gitignorescripts/csv-to-dashboard-json.cjsscripts/update-history.cjs
Summary
gh run downloadto handle transient GitHub API 502 errors during artifact pagination. With 3700+ artifacts in the repo, the API intermittently returns 502 during pagination — currently silently swallowed by2>/dev/null, causing valid artifacts to be skipped.benchmarkScores(score + metrics per trace) in history snapshots, enabling per-trace trend analysis.Problem
The current
gh run download --name <artifact> 2>/dev/null || echo "(not found, skipping)"pattern treats transient 502 errors identically to genuinely missing artifacts. This causes:csv-to-dashboard-json.cjscreates placeholder JSONs with 0 values for the "missing" artifactsgenerate-aggregated-data.jscalculates scores usinggeometricMeanwhich filters out zeros, giving teams a score based on fewer benchmarks (e.g., 3/4 instead of 4/4)Verified locally: 5 teams currently affected by false positives (artifacts exist on GitHub but were missed by the pipeline). With the retry fix, all 5 are recovered — 4 artifacts needed a second attempt.
Test plan
deploy-dashboardworkflow run after merge to verify corrected scoresMade with Cursor
Summary by CodeRabbit
Bug Fixes
Chores