Skip to content

fix(bench): grid compressor — non-thinking model, degenerate-output guard, prompts in artifact#246

Merged
drewstone merged 1 commit into
mainfrom
fix/grid-compressor
Jun 10, 2026
Merged

fix(bench): grid compressor — non-thinking model, degenerate-output guard, prompts in artifact#246
drewstone merged 1 commit into
mainfrom
fix/grid-compressor

Conversation

@drewstone

Copy link
Copy Markdown
Contributor

Grid run #1's integrity flags, fixed: the thinking worker model burned the compressor's 1024-token cap on reasoning and emitted a ~1-word prompt — silently substituting a different treatment (prompt REMOVAL) for the requested 50% compression. The cells still PROMOTED (the domain was saturated, so the score leg was vacuous), which is exactly why this had to fail loud.

  • Compressor = a fixed non-thinking model (COMPRESSOR_MODEL, default deepseek-v4-flash), maxTokens 2048.
  • Degenerate output fails loud: <20% of the target word count throws with the output in the message — a wrong treatment can never silently enter a cell.
  • The artifact now persists the actual per-cell prompts, so the treatment is inspectable post-hoc (run fix: persist final runtime stream failures #1's couldn't be).

Run #1's honest residue (recorded in the ledger): machinery validated end-to-end with the program's first PROMOTED verdicts; the science needs a non-saturated domain — rerun queued with a weaker worker. The steering cost-mechanics observation (refine early-stops on resolve → half sample's cost at equal score) stands.

…uard, prompts in artifact

Grid run #1 silently degenerated: the thinking worker model burned the
compressor's token cap on reasoning and emitted a ~1-word prompt — a
different treatment (prompt REMOVAL) than the requested 50% ratio. The
compressor is now a fixed non-thinking model (COMPRESSOR_MODEL, default
deepseek-v4-flash), a degenerate output (<20% of target words) fails loud,
and the artifact persists the actual cell prompts so the treatment is
inspectable.

@tangletools tangletools left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Auto-approved PR — 146c7ed1

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-10T22:07:56Z

@drewstone drewstone merged commit c94d289 into main Jun 10, 2026
1 check passed
@drewstone drewstone deleted the fix/grid-compressor branch June 10, 2026 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants