Skip to content
This repository was archived by the owner on Jun 7, 2026. It is now read-only.

Fix deploy pipeline: retry artifact downloads, strict 4/4 filtering#66

Merged
tomusdrw merged 1 commit into
FluffyLabs:mainfrom
polykrate:dev
May 10, 2026
Merged

Fix deploy pipeline: retry artifact downloads, strict 4/4 filtering#66
tomusdrw merged 1 commit into
FluffyLabs:mainfrom
polykrate:dev

Conversation

@polykrate

@polykrate polykrate commented May 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • deploy-dashboard.yml: Add 3-attempt retry with progressive backoff (1s, 2s, 3s) on gh run download to handle transient GitHub API 502 errors during artifact pagination. With 3700+ artifacts in the repo, the API intermittently returns 502 during pagination — currently silently swallowed by 2>/dev/null, causing valid artifacts to be skipped.
  • csv-to-dashboard-json.cjs: Remove placeholder JSON generation (zero-value files) for missing benchmarks. Teams with incomplete traces are now excluded from rankings entirely, instead of getting artificially low scores calculated from 3/4 benchmarks.
  • update-history.cjs: Include per-benchmark benchmarkScores (score + metrics per trace) in history snapshots, enabling per-trace trend analysis.

Problem

The current gh run download --name <artifact> 2>/dev/null || echo "(not found, skipping)" pattern treats transient 502 errors identically to genuinely missing artifacts. This causes:

  1. csv-to-dashboard-json.cjs creates placeholder JSONs with 0 values for the "missing" artifacts
  2. generate-aggregated-data.js calculates scores using geometricMean which filters out zeros, giving teams a score based on fewer benchmarks (e.g., 3/4 instead of 4/4)
  3. Affected teams get artificially better/worse rankings depending on which trace is missing

Verified locally: 5 teams currently affected by false positives (artifacts exist on GitHub but were missed by the pipeline). With the retry fix, all 5 are recovered — 4 artifacts needed a second attempt.

Test plan

  • Local simulation of full artifact download with retry: 84/92 success, 0 false positives, 4 saved by retry
  • Trigger manual deploy-dashboard workflow run after merge to verify corrected scores
  • Verify history.json contains benchmarkScores in the new snapshot

Made with Cursor

Summary by CodeRabbit

  • Bug Fixes

    • Improved deployment reliability with automatic retry logic for artifact downloads.
    • Teams with incomplete benchmark data are now properly excluded from rankings rather than filled with placeholder values.
  • Chores

    • Updated build configuration to ignore artifact and performance testing directories.
    • Enhanced historical data storage to capture per-team benchmark scores.

Review Change Stack

- deploy-dashboard.yml: Add 3-attempt retry with backoff on gh run download
  to handle transient GitHub API 502 errors during artifact pagination
- csv-to-dashboard-json.cjs: Remove placeholder JSON generation for missing
  benchmarks — teams with incomplete traces are now excluded from rankings
  instead of getting artificially low scores
- update-history.cjs: Include per-benchmark scores and metrics in history
  snapshots for per-trace trend analysis
- .gitignore: Ignore artifacts/ and fuzz-perf/ directories

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai

coderabbitai Bot commented May 10, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

The PR hardens the dashboard deployment pipeline with retry logic for artifact downloads, validates benchmark completeness across teams, and enriches history snapshots with per-benchmark score data. Incomplete teams are logged and excluded from rankings rather than filled with placeholders.

Changes

Dashboard Build and Artifact Processing

Layer / File(s) Summary
Gitignore Configuration
.gitignore
Adds ignore patterns for /artifacts/ and /fuzz-perf/ directories to prevent committed artifacts.
Artifact Download with Retries
.github/workflows/deploy-dashboard.yml
gh run download step now retries up to 3 times per artifact with increasing wait periods and logs explicit "not found after 3 attempts, skipping" messages on exhausted retries.
Benchmark Validation and Filtering
scripts/csv-to-dashboard-json.cjs
After CSV-to-JSON conversion, script validates presence of all REQUIRED_BENCHMARKS per team, logs missing benchmarks per team, excludes incomplete teams from rankings, and reports only the count of converted benchmark files.
Snapshot Enrichment with Benchmark Scores
scripts/update-history.cjs
Snapshot construction now conditionally includes a benchmarkScores object keyed by benchmark name, with each entry containing score and metrics fields (both defaulting to null when missing).

Possibly related PRs

Poem

🐰 Artifacts now retry with patient hops,
Benchmarks validated—incomplete ones drop,
Scores enriched in history's careful hand,
A resilient pipeline, steadily planned! ✨

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main focus of the PR: adding retries to artifact downloads and implementing strict benchmark filtering (excluding teams with missing benchmarks rather than filling placeholders).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
scripts/csv-to-dashboard-json.cjs (1)

69-90: ⚡ Quick win

Consider removing unused placeholder function.

The makePlaceholderJson function appears to be unused after removing the placeholder generation behavior. Removing it would improve code clarity.

♻️ Proposed cleanup
-function makePlaceholderJson(targetName) {
-  return {
-    info: {
-      name: targetName,
-      app_version: { major: 0, minor: 0, patch: 0 },
-      jam_version: { major: 0, minor: 7, patch: 2 },
-    },
-    stats: {
-      steps: 0,
-      imported: 0,
-      import_max_step: 0,
-      import_min: MISSING_BENCHMARK_MS,
-      import_max: MISSING_BENCHMARK_MS,
-      import_mean: MISSING_BENCHMARK_MS,
-      import_p50: MISSING_BENCHMARK_MS,
-      import_p75: MISSING_BENCHMARK_MS,
-      import_p90: MISSING_BENCHMARK_MS,
-      import_p99: MISSING_BENCHMARK_MS,
-      import_std_dev: 0,
-    },
-  };
-}
-

Additionally, MISSING_BENCHMARK_MS (line 37) could also be removed if no longer needed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/csv-to-dashboard-json.cjs` around lines 69 - 90, Remove the unused
placeholder generator to clean up dead code: delete the makePlaceholderJson
function and, if no other references exist, also remove the MISSING_BENCHMARK_MS
constant declared earlier; search for usages of makePlaceholderJson and
MISSING_BENCHMARK_MS in the file to confirm they are unused before deleting to
avoid breaking behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/update-history.cjs`:
- Around line 65-73: The benchmarkScores serialization uses inconsistent
null-coalescing: update the metrics assignment inside the loop that builds
entry.benchmarkScores so it uses nullish coalescing like score does (replace
info.metrics || null with info.metrics ?? null) to preserve legitimate falsy
values such as 0; locate the loop where t.benchmarkScores is iterated and adjust
the metrics expression accordingly.

---

Nitpick comments:
In `@scripts/csv-to-dashboard-json.cjs`:
- Around line 69-90: Remove the unused placeholder generator to clean up dead
code: delete the makePlaceholderJson function and, if no other references exist,
also remove the MISSING_BENCHMARK_MS constant declared earlier; search for
usages of makePlaceholderJson and MISSING_BENCHMARK_MS in the file to confirm
they are unused before deleting to avoid breaking behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9e304f09-f11f-4afe-943b-b686e2713d6c

📥 Commits

Reviewing files that changed from the base of the PR and between 10337a2 and 9f2d260.

📒 Files selected for processing (4)
  • .github/workflows/deploy-dashboard.yml
  • .gitignore
  • scripts/csv-to-dashboard-json.cjs
  • scripts/update-history.cjs

Comment thread scripts/update-history.cjs

@tomusdrw tomusdrw left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@tomusdrw tomusdrw merged commit 033a192 into FluffyLabs:main May 10, 2026
12 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request May 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants