Fix deploy pipeline: retry artifact downloads, strict 4/4 filtering by polykrate · Pull Request #66 · FluffyLabs/jam-testing

polykrate · 2026-05-10T15:03:31Z

Summary

deploy-dashboard.yml: Add 3-attempt retry with progressive backoff (1s, 2s, 3s) on gh run download to handle transient GitHub API 502 errors during artifact pagination. With 3700+ artifacts in the repo, the API intermittently returns 502 during pagination — currently silently swallowed by 2>/dev/null, causing valid artifacts to be skipped.
csv-to-dashboard-json.cjs: Remove placeholder JSON generation (zero-value files) for missing benchmarks. Teams with incomplete traces are now excluded from rankings entirely, instead of getting artificially low scores calculated from 3/4 benchmarks.
update-history.cjs: Include per-benchmark benchmarkScores (score + metrics per trace) in history snapshots, enabling per-trace trend analysis.

Problem

The current gh run download --name <artifact> 2>/dev/null || echo "(not found, skipping)" pattern treats transient 502 errors identically to genuinely missing artifacts. This causes:

csv-to-dashboard-json.cjs creates placeholder JSONs with 0 values for the "missing" artifacts
generate-aggregated-data.js calculates scores using geometricMean which filters out zeros, giving teams a score based on fewer benchmarks (e.g., 3/4 instead of 4/4)
Affected teams get artificially better/worse rankings depending on which trace is missing

Verified locally: 5 teams currently affected by false positives (artifacts exist on GitHub but were missed by the pipeline). With the retry fix, all 5 are recovered — 4 artifacts needed a second attempt.

Test plan

Local simulation of full artifact download with retry: 84/92 success, 0 false positives, 4 saved by retry
Trigger manual deploy-dashboard workflow run after merge to verify corrected scores
Verify history.json contains benchmarkScores in the new snapshot

Made with Cursor

Summary by CodeRabbit

Bug Fixes
- Improved deployment reliability with automatic retry logic for artifact downloads.
- Teams with incomplete benchmark data are now properly excluded from rankings rather than filled with placeholder values.
Chores
- Updated build configuration to ignore artifact and performance testing directories.
- Enhanced historical data storage to capture per-team benchmark scores.

- deploy-dashboard.yml: Add 3-attempt retry with backoff on gh run download to handle transient GitHub API 502 errors during artifact pagination - csv-to-dashboard-json.cjs: Remove placeholder JSON generation for missing benchmarks — teams with incomplete traces are now excluded from rankings instead of getting artificially low scores - update-history.cjs: Include per-benchmark scores and metrics in history snapshots for per-trace trend analysis - .gitignore: Ignore artifacts/ and fuzz-perf/ directories Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-05-10T15:03:42Z

📝 Walkthrough

Walkthrough

The PR hardens the dashboard deployment pipeline with retry logic for artifact downloads, validates benchmark completeness across teams, and enriches history snapshots with per-benchmark score data. Incomplete teams are logged and excluded from rankings rather than filled with placeholders.

Changes

Dashboard Build and Artifact Processing

Layer / File(s)	Summary
Gitignore Configuration `.gitignore`	Adds ignore patterns for `/artifacts/` and `/fuzz-perf/` directories to prevent committed artifacts.
Artifact Download with Retries `.github/workflows/deploy-dashboard.yml`	`gh run download` step now retries up to 3 times per artifact with increasing wait periods and logs explicit "not found after 3 attempts, skipping" messages on exhausted retries.
Benchmark Validation and Filtering `scripts/csv-to-dashboard-json.cjs`	After CSV-to-JSON conversion, script validates presence of all `REQUIRED_BENCHMARKS` per team, logs missing benchmarks per team, excludes incomplete teams from rankings, and reports only the count of converted benchmark files.
Snapshot Enrichment with Benchmark Scores `scripts/update-history.cjs`	Snapshot construction now conditionally includes a `benchmarkScores` object keyed by benchmark name, with each entry containing `score` and `metrics` fields (both defaulting to `null` when missing).

Possibly related PRs

FluffyLabs/jam-testing#38: Updates agents and team discovery to align with dashboard workflow team-handling changes.

Poem

🐰 Artifacts now retry with patient hops,
Benchmarks validated—incomplete ones drop,
Scores enriched in history's careful hand,
A resilient pipeline, steadily planned! ✨

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main focus of the PR: adding retries to artifact downloads and implementing strict benchmark filtering (excluding teams with missing benchmarks rather than filling placeholders).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

scripts/csv-to-dashboard-json.cjs (1)

69-90: ⚡ Quick win

Consider removing unused placeholder function.

The makePlaceholderJson function appears to be unused after removing the placeholder generation behavior. Removing it would improve code clarity.

♻️ Proposed cleanup

-function makePlaceholderJson(targetName) {
-  return {
-    info: {
-      name: targetName,
-      app_version: { major: 0, minor: 0, patch: 0 },
-      jam_version: { major: 0, minor: 7, patch: 2 },
-    },
-    stats: {
-      steps: 0,
-      imported: 0,
-      import_max_step: 0,
-      import_min: MISSING_BENCHMARK_MS,
-      import_max: MISSING_BENCHMARK_MS,
-      import_mean: MISSING_BENCHMARK_MS,
-      import_p50: MISSING_BENCHMARK_MS,
-      import_p75: MISSING_BENCHMARK_MS,
-      import_p90: MISSING_BENCHMARK_MS,
-      import_p99: MISSING_BENCHMARK_MS,
-      import_std_dev: 0,
-    },
-  };
-}
-

Additionally, MISSING_BENCHMARK_MS (line 37) could also be removed if no longer needed.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/csv-to-dashboard-json.cjs` around lines 69 - 90, Remove the unused
placeholder generator to clean up dead code: delete the makePlaceholderJson
function and, if no other references exist, also remove the MISSING_BENCHMARK_MS
constant declared earlier; search for usages of makePlaceholderJson and
MISSING_BENCHMARK_MS in the file to confirm they are unused before deleting to
avoid breaking behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/update-history.cjs`:
- Around line 65-73: The benchmarkScores serialization uses inconsistent
null-coalescing: update the metrics assignment inside the loop that builds
entry.benchmarkScores so it uses nullish coalescing like score does (replace
info.metrics || null with info.metrics ?? null) to preserve legitimate falsy
values such as 0; locate the loop where t.benchmarkScores is iterated and adjust
the metrics expression accordingly.

---

Nitpick comments:
In `@scripts/csv-to-dashboard-json.cjs`:
- Around line 69-90: Remove the unused placeholder generator to clean up dead
code: delete the makePlaceholderJson function and, if no other references exist,
also remove the MISSING_BENCHMARK_MS constant declared earlier; search for
usages of makePlaceholderJson and MISSING_BENCHMARK_MS in the file to confirm
they are unused before deleting to avoid breaking behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9e304f09-f11f-4afe-943b-b686e2713d6c

📥 Commits

Reviewing files that changed from the base of the PR and between 10337a2 and 9f2d260.

📒 Files selected for processing (4)

.github/workflows/deploy-dashboard.yml
.gitignore
scripts/csv-to-dashboard-json.cjs
scripts/update-history.cjs

tomusdrw

thanks!

coderabbitai Bot suggested changes May 10, 2026

View reviewed changes

Comment thread scripts/update-history.cjs

coderabbitai Bot approved these changes May 10, 2026

View reviewed changes

tomusdrw approved these changes May 10, 2026

View reviewed changes

tomusdrw merged commit 033a192 into FluffyLabs:main May 10, 2026
12 checks passed

polykrate mentioned this pull request May 11, 2026

Dashboard: dynamic baseline & speed threshold indicators #70

Closed

4 tasks

coderabbitai Bot mentioned this pull request May 27, 2026

Fix dashboard deploy #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix deploy pipeline: retry artifact downloads, strict 4/4 filtering#66

Fix deploy pipeline: retry artifact downloads, strict 4/4 filtering#66
tomusdrw merged 1 commit into
FluffyLabs:mainfrom
polykrate:dev

polykrate commented May 10, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 10, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Poem

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

tomusdrw left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

polykrate commented May 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Poem

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tomusdrw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

polykrate commented May 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 10, 2026 •

edited

Loading