Skip to content

[litellm-agent] Staging → litellm_internal_staging (5/17/2026)#28108

Closed
oss-pr-review-agent-shin[bot] wants to merge 3 commits into
litellm_internal_stagingfrom
shin_agent_oss_staging_05_17_2026
Closed

[litellm-agent] Staging → litellm_internal_staging (5/17/2026)#28108
oss-pr-review-agent-shin[bot] wants to merge 3 commits into
litellm_internal_stagingfrom
shin_agent_oss_staging_05_17_2026

Conversation

@oss-pr-review-agent-shin

Copy link
Copy Markdown
Contributor

Automated staging PR created by litellm-agent.

This branch collects PRs approved by the agent on 5/17/2026.

⚠️ Human review required before CI. Convert from draft to ready when you've reviewed the diff.

shin-berri and others added 3 commits May 13, 2026 22:37
[Infra] Promote internal staging to main
[Infra] Promote internal staging to main
…nt_tokens (closes #28084) (#28107)

Squash-merged by litellm-agent from voidborne-d's PR.
@oss-pr-review-agent-shin

Copy link
Copy Markdown
Contributor Author

@greptile please review

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 3 committers have signed the CLA.

✅ yuneng-berri
❌ voidborne-d
❌ shin-berri
You have signed the CLA already but the status is still pending? Let us recheck it.

@greptile-apps

greptile-apps Bot commented May 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes issue #28084 by removing a spurious import vertexai gate from VertexAIPartnerModels.count_tokens. Partner model token counting (Claude, Mistral, Llama on Vertex AI) routes through VertexAIPartnerModelsTokenCounter, which talks to the :rawPredict endpoint over plain httpx and has no dependency on the Gemini SDK — the gate was an accidental hard dependency that broke /v1/messages/count_tokens for any LiteLLM installation without google-cloud-aiplatform>=1.38.

  • main.py: Deletes ~12 lines of vertexai import/version-check boilerplate and replaces them with a clear explanatory comment; the rest of count_tokens is unchanged.
  • New test: Two fully-mocked regression tests pin the absence of the gate — one simulates vertexai being unimportable and verifies normal execution continues, the other asserts the handler module itself never pulls the Gemini SDK into sys.modules.

Confidence Score: 5/5

Safe to merge — the change removes dead code that was incorrectly gating a network-only path, and the new tests mock all I/O correctly with no real network calls.

The diff is small and surgical: it deletes an import check that was never needed by the actual token-counting implementation and adds two well-structured regression tests that fully stub the auth and HTTP layers. The handler itself is unchanged, the rest of count_tokens is unchanged, and all tests respect the no-real-network-calls constraint for this test folder.

No files require special attention.

Important Files Changed

Filename Overview
litellm/llms/vertex_ai/vertex_ai_partner_models/main.py Removes the now-unnecessary vertexai (Gemini SDK) import gate from count_tokens; replaces it with an explanatory comment. The actual token-counting path uses VertexAIPartnerModelsTokenCounter over plain httpx and never needed that import.
tests/test_litellm/llms/vertex_ai/vertex_ai_partner_models/count_tokens/test_count_tokens_no_vertexai_sdk.py New regression test file with two fully-mocked tests: one verifies count_tokens proceeds past the old import gate when vertexai is unimportable; the other asserts the handler module itself never loads the Gemini SDK. All network calls are stubbed.

Reviews (1): Last reviewed commit: "fix(vertex_ai/partner_models): drop unus..." | Re-trigger Greptile

@codecov

codecov Bot commented May 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

cursor Bot pushed a commit that referenced this pull request May 17, 2026
Adds a new triage flow that evaluates external pull requests and issues
against the project's contribution rubric and, when configured to do so,
auto-closes non-conforming ones with an explanatory comment. Contributors
can update + reopen to be re-evaluated.

Scope:
- Internal BerriAI contributors (author_association OWNER/MEMBER/COLLABORATOR)
  and bot accounts are skipped entirely.
- 'Fixes #1234' / 'Resolves https://github.com/.../issues/N' in the PR body
  short-circuits to PASS without burning LLM tokens.
- LLM judge returns structured JSON (verdict, missing[], explanation);
  parser tolerates markdown fences and embedded JSON.
- LLM errors NEVER close PRs/issues — failure surfaces as 'skip-llm-error'.

Safety:
- pull_request_target / issues triggers are FORCED dry-run in the workflow;
  only manual workflow_dispatch with close=true (and AGENT_SHIN_ENABLED=true)
  takes destructive action.
- Default mode writes verdicts to GITHUB_STEP_SUMMARY only — no public
  comments until the team flips the AGENT_SHIN_ENABLED repo variable.
- LLM uses an OpenAI-compatible endpoint (model and base URL configurable
  via repo variables; key via OPENAI_API_KEY secret).

Files:
- .github/scripts/triage_with_llm.py   - judge orchestrator + CLI
- .github/workflows/triage_pr_with_llm.yml
- .github/workflows/triage_issue_with_llm.yml
- tests/test_litellm/test_github_triage_with_llm.py - 33 unit tests

End-to-end validated against four real PRs (#28117 internal collaborator,
#28108 bot, #28129 'Fixes #28128', #28116 no linked issue) and issue
#28132 with a stubbed LLM judge: each path produces the expected action.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
@Sameerlite Sameerlite closed this May 22, 2026
@Sameerlite Sameerlite deleted the shin_agent_oss_staging_05_17_2026 branch May 22, 2026 12:07
mateo-berri added a commit that referenced this pull request Jun 18, 2026
…and review-gate label lifecycle (#30433)

* feat(triage): auto-close stale PRs with Greptile score <4/5

Adds .github/scripts/close_low_quality_prs.py and a daily workflow that
closes PRs which:
  - are open for at least 7 days, and
  - carry a most-recent greptile-apps review with Confidence Score <4/5,
  - and are not drafts or opt-out-labeled ('do not close', 'wip', etc.).

Each closure posts an explanatory comment telling the contributor how to
bring the PR back (rebase, re-request greptile, reopen at 4+/5). The
4/5 bar is already documented in the PR template
(.github/pull_request_template.md), so this just enforces it.

Tested with a dry run against the live BerriAI/litellm backlog of 1000
open PRs: 100 candidates identified, 598 PRs pass the bar (4+/5), 186
are too young, 97 are drafts, 19 lack any Greptile review and are left
alone.

Workflow defaults to closing 25 PRs/run as a safety net and supports
workflow_dispatch with overrides (close=false for a dry run, custom
min_age_days/min_score/limit).

18 unit tests cover score extraction (HTML/markdown/plain text, login
variants, multi-review picks latest) and per-PR evaluation (drafts,
opt-out labels, age, missing/passing/failing scores).

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* docs(templates): require expected/actual + QA proof for external contributions

PR template:
- Make the rubric explicit at the top: link an issue, OR provide a clear
  problem description + expected vs. actual + visual QA proof.
- Add dedicated sections for each piece so the bot has a deterministic
  shape to read.
- Keep the existing 'Linear ticket' section for internal contributors
  (they're exempt from the auto-triage rubric).

Bug report template:
- Split 'What happened?' into 'Actual behavior' + 'Expected behavior'.
- Make logs/screenshot a required textarea.
- Warning banner at the top tells external contributors that incomplete
  reports will be auto-closed (with re-evaluation on reopen).

Feature request template:
- Require a concrete use case + example in the motivation field, not just
  a one-liner pitch.
- Same auto-triage warning banner.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* feat(triage): Agent Shin LLM-as-judge for external PRs and issues

Adds a new triage flow that evaluates external pull requests and issues
against the project's contribution rubric and, when configured to do so,
auto-closes non-conforming ones with an explanatory comment. Contributors
can update + reopen to be re-evaluated.

Scope:
- Internal BerriAI contributors (author_association OWNER/MEMBER/COLLABORATOR)
  and bot accounts are skipped entirely.
- 'Fixes #1234' / 'Resolves https://github.com/.../issues/N' in the PR body
  short-circuits to PASS without burning LLM tokens.
- LLM judge returns structured JSON (verdict, missing[], explanation);
  parser tolerates markdown fences and embedded JSON.
- LLM errors NEVER close PRs/issues — failure surfaces as 'skip-llm-error'.

Safety:
- pull_request_target / issues triggers are FORCED dry-run in the workflow;
  only manual workflow_dispatch with close=true (and AGENT_SHIN_ENABLED=true)
  takes destructive action.
- Default mode writes verdicts to GITHUB_STEP_SUMMARY only — no public
  comments until the team flips the AGENT_SHIN_ENABLED repo variable.
- LLM uses an OpenAI-compatible endpoint (model and base URL configurable
  via repo variables; key via OPENAI_API_KEY secret).

Files:
- .github/scripts/triage_with_llm.py   - judge orchestrator + CLI
- .github/workflows/triage_pr_with_llm.yml
- .github/workflows/triage_issue_with_llm.yml
- tests/test_litellm/test_github_triage_with_llm.py - 33 unit tests

End-to-end validated against four real PRs (#28117 internal collaborator,
#28108 bot, #28129 'Fixes #28128', #28116 no linked issue) and issue
#28132 with a stubbed LLM judge: each path produces the expected action.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* feat(triage): scope Greptile auto-closer to external contributors + dry-run by default

- close_low_quality_prs.py now filters by GitHub author_association via
  the REST API: PRs from OWNER / MEMBER / COLLABORATOR (and bot accounts)
  are skipped with a new 'skip-internal' summary bucket.
- close_low_quality_prs.yml now defaults workflow_dispatch close=false,
  and ignores 'close=true' unless the new repo variable
  AGENT_SHIN_ENABLED is set to 'true'. Scheduled runs are dry-run only
  until the team flips that switch.
- Updated unit tests: one new test asserting internal authors are
  skipped, and an autouse fixture treats unspecified test PRs as
  external so the rest of the suite still exercises the close path.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(workflows): scheduled cron closes PRs; safe --close strip in triage

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(triage): scheduled cron stays dry-run; dedent prompts before interpolation

- close_low_quality_prs.yml: only workflow_dispatch with close=true (and
  AGENT_SHIN_ENABLED=true) actually closes PRs. Scheduled runs are always
  dry-run, matching the safety invariant documented for triage_pr/issue.
- triage_with_llm.py: textwrap.dedent on an f-string with multi-line
  interpolated bodies fails because the body's 2nd+ lines start at column 0,
  making the common-indent zero. Dedent the static template first, then
  .format() the title/body in.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* Fix bugs in auto-close PR triage scripts

- close_low_quality_prs.py: Treat author_association API lookup failures
  as internal (fail-safe) so transient errors don't cause internal
  contributors' PRs to be auto-closed.
- triage_with_llm.py: Update summary heading from 'Would post comment:'
  to 'Posted comment:' since this branch only runs after the comment
  has already been posted.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* feat(triage): default Agent Shin to gpt-5.4-mini with reasoning_effort=none

- Bump DEFAULT_MODEL from gpt-4o-mini to gpt-5.4-mini (more modern;
  4M total context window per OpenAI catalog, JSON-schema response
  format, function calling all supported).
- For gpt-5.x family models, pass reasoning_effort="none" via
  extra_body. gpt-5.x rejects temperature != 1 unless reasoning_effort
  is explicitly "none"; setting it lets us keep temperature=0 for
  deterministic JSON rubric judgments. extra_body works across openai
  SDK versions regardless of whether they natively type the kwarg.
- For non-gpt5 overrides (TRIAGE_MODEL=gpt-4o-mini etc.), reasoning_effort
  is not sent.
- 4 new unit tests cover: gpt-5.4-mini -> reasoning_effort=none,
  capitalized/dated gpt-5 variants -> reasoning_effort=none,
  gpt-4o-mini -> no extra_body, base_url passthrough.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(triage): bugbot — drop dead gh_json and fix --optout-label append-with-default

- Removed the unused gh_json helper (bugbot low-severity dead code).
- Replaced argparse `action="append", default=[...]` with default=None
  + DEFAULT_OPTOUT_LABELS fallback. The mutable-default + append combo
  silently APPENDS to the canonical defaults instead of replacing them,
  so --optout-label could not actually scope the opt-out list.
- Added tests covering both the canonical default and the
  flag-replaces-defaults behavior.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(triage): bugbot — tighten linked-issue regex, fail-safe author_association, fix empty TRIAGE_MODEL

Three independent bugbot findings against triage_with_llm.py:

1. LINKED_ISSUE_PATTERN included weak keywords (`see`, `ref`,
   `addresses`) so casual mentions like "See #1234 for context" were
   short-circuited to pass-linked-issue without ever calling the LLM —
   contradicting the prompt's own "a bare issue number without a closing
   keyword counts only if it's clearly the related issue (not a passing
   mention)" rubric. Limit the regex to GitHub's documented PR-closing
   keywords (fixes/fix/fixed/closes/close/closed/resolves/resolve/resolved).

2. is_internal_contributor() treated an empty/missing author_association
   as external (eligible for the destructive close path), while the sibling
   is_external_pr_author() in close_low_quality_prs.py fail-safes the same
   case as internal. Align the two so a partial/unknown GitHub response can
   never make a PR eligible for auto-close.

3. argparse `default=os.environ.get("TRIAGE_MODEL", DEFAULT_MODEL)` returns
   the empty string when GitHub Actions exposes an unset repo variable as
   an empty-string env var (the optional vars.TRIAGE_MODEL case in the
   workflow). Use `os.environ.get(...) or DEFAULT_MODEL` so empty -> default,
   matching the existing OPENAI_BASE_URL pattern.

Tests:
- Casual mentions now must fall through to the LLM (parametrized);
  added an orchestration test ensuring "See #1234" reaches the judge.
- Empty/missing author_association now fails safe (parametrized).
- Empty TRIAGE_MODEL env var falls back to DEFAULT_MODEL; explicit
  TRIAGE_MODEL is still honored.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(workflows): bugbot — gate Agent Shin --close on '= true' not '!= false'

The PR and issue Agent Shin workflows gated the destructive --close
flag with [ "${DISPATCH_CLOSE:-false}" != "false" ]. That pattern
treats anything other than the literal string "false" as enabling
closure — "True", "yes", "1", typos, accidental whitespace, etc.
The workflow_dispatch input UI is a 'true'/'false' choice dropdown so
the form is constrained, but the API (`gh workflow run -f close=...`)
accepts any string, and a CI cron / external invoker passing a
non-canonical truthy value would have silently enabled real
contributor PR closures.

Mirror the sibling Greptile closer's [ "${CLOSE_FLAG}" = "true" ]
pattern: only the EXACT string "true" enables --close; every other
value (including the unset/empty default) resolves to dry-run. This is
the fail-safe philosophy applied everywhere else in this PR.

Added tests/test_litellm/test_github_triage_workflows.py with two
parametrized invariants:
  1. The destructive gate uses '= "true"' for its env-var
     comparison (either bare '${ENV}' or '${ENV:-false}' form
     accepted), and never the fail-open '!= "false"' pattern.
  2. Every destructive gate is also gated on AGENT_SHIN_ENABLED being
     "true" — either by entering the close branch on '=' or by
     bailing out early on '!=' — so flipping the repo variable off is
     a true kill switch regardless of per-run inputs.

Manually verified the test fails on the buggy '!= "false"' pattern and
passes on the fix, so it would have caught the regression at PR time.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* feat(triage): close any PR (incl. drafts, any age); add @agent-shin reconsider flow

Follow-up to PR #28117. Three behavior changes + one new workflow,
addressing the team's concerns on the original review:

1) Apply auto-close to ALL open PRs, not just those over a week old.

   - close_low_quality_prs.py: --min-age-days default flipped from 7 to
     0. The flag is preserved as an opt-in safety net for one-off
     backfill runs that want to spare very-young PRs, but the daily
     scheduled sweep now closes external-author PRs as soon as Greptile
     scores them <4/5.
   - close_low_quality_prs.yml: workflow_dispatch input default also
     flipped to 0; doc comments updated.

2) Apply auto-close to draft PRs too.

   - close_low_quality_prs.py: removed the skip-draft branch in
     evaluate_pr. Drafts are NOT a free pass — the team's intent is
     'open PR count == PRs internal collaborators need to action on',
     so a draft Greptile scored 2/5 still belongs in the closed bucket.
     Authors who genuinely need a long-lived draft can attach the 'wip'
     opt-out label, which is unchanged.
   - The 'skip-draft' action is gone; the 'wip' label still skips.

3) Address the 'OSS contributors cannot reopen a bot-closed PR' wrinkle.

   GitHub does NOT let an external (non-write-access) contributor
   reopen a PR that was closed by a bot or maintainer (long-standing
   limitation). The original PR's close-comments told contributors to
   'Reopen the PR — I'll re-evaluate automatically', which is broken
   for the very audience this triage targets. Two changes:

   a) Reword every close-comment (Greptile sweep + Agent Shin PR
      close + Agent Shin issue close + PR template) to recommend:
        - Open a new PR with the updated branch (primary path).
        - Or comment '@agent-shin reconsider' on the closed PR for a
          re-evaluation that, on pass, reopens the PR via the bot's
          GH_TOKEN write access.

   b) Add the @agent-shin reconsider workflow:
        - .github/workflows/triage_reconsider.yml: new
          'issue_comment'-triggered workflow. Authorizes only the
          PR/issue author or an internal collaborator
          (OWNER/MEMBER/COLLABORATOR), gated via a step output so
          unauthorized commenters never reach the destructive steps.
          Globally gated on AGENT_SHIN_ENABLED='true' (positive form,
          matching the test_github_triage_workflows guardrail
          patterns).
        - triage_with_llm.py: --reconsider mode. On a closed PR/issue,
          re-runs the LLM judge (or linked-issue regex short-circuit)
          and:
            - on pass: reopens via reopen_pr/reopen_issue + posts a
              'Re-evaluated and reopened' comment.
            - on fail: leaves closed and posts a 'still missing X'
              comment so the contributor can iterate again.
          Reconsider-on-open is a no-op ('skip-not-closed').
          Internal-author + bot-account skips still take priority over
          reconsider.

4) Greptile-on-closed-PRs question: the team asked whether Greptile can
   re-review a closed PR. Greptile's docs don't address this and we
   shouldn't promise behavior we can't verify, so the new close-comment
   wording does NOT instruct contributors to 're-request greptile on
   the closed PR'. Instead it points them at the new-PR path (which
   Greptile definitely reviews) or the @agent-shin reconsider trigger
   (which re-runs the LiteLLM-side rubric judge, not Greptile).

Tests: 93 passing (was 59).

  - test_github_close_low_quality_prs.py: replaced 'skip drafts' test
    with 'closes drafts when score is low' + 'closes brand-new PR when
    min_age=0' + 'no skip when min_age=0'. The 'skip too young'
    assertion is preserved as opt-in.
  - test_github_triage_with_llm.py: 6 new TestTriageOrchestration cases
    for reconsider mode (skip-not-closed on open, reopen on pass,
    still-failing comment on fail, linked-issue short-circuit reopen,
    skip internal author in reconsider, reopen-issue on pass) + a new
    TestCloseCommentText class that pins the user-facing 'open a new
    PR' + '@agent-shin reconsider' wording.
  - test_github_triage_workflows.py: added triage_reconsider.yml to
    the destructive-gate guardrail table; AGENT_SHIN_ENABLED is its
    own destructive gate (no separate per-run flag needed).

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* test(triage): pin safe behavior for curly braces in PR/issue title+body

Adds regression tests covering the bugbot high-severity finding that
str.format() would crash on user-supplied content containing { or }.
Empirically str.format() does NOT re-parse interpolated values — only
the template literal is scanned for replacement fields — so the bug
does not exist in the current code, but pinning the safe behavior
prevents a future templating change from silently reintroducing it.

Also pins the dedented prompt shape (no leading 8-space indentation on
template lines) so a future change to the build_*_prompt functions can't
silently regress the LLM judge prompt format on multi-line bodies.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(triage): bugbot — reconsider dry-run + bot-closed guard + rate limit

Address three Greptile/veria-ai concerns on the @agent-shin reconsider
flow:

1. **Reconsider had no dry-run path.** The previous reconsider mode
   ignored `--close` and always posted comments + reopened on a pass.
   A local operator running
   `python triage_with_llm.py --reconsider --pr N` would silently
   take destructive GitHub actions with no way to preview. Reconsider
   now honors `close=False` the same way regular triage does and
   returns `would-reopen` / `would-reconsider-still-failing` for
   step-summary rendering.

2. **Reconsider could reopen maintainer-closed PRs/issues** (Medium
   security finding from veria-ai). The workflow only checked that the
   commenter was authorized — it did NOT check that the most recent
   close was performed by Agent Shin. A contributor could comment
   `@agent-shin reconsider` on a PR a maintainer closed for non-rubric
   reasons (duplicate, security report, design rejection) and have the
   bot reopen it. Add `was_closed_by_agent_shin()` which inspects the
   issue events API for the most recent `closed` actor and only
   permits reopen when that actor matches the configured bot login
   (default `github-actions[bot]`, overridable via env). Fail-closed
   on missing events.

3. **No rate-limiting on the reconsider trigger.** Every
   `@agent-shin reconsider` comment burns CI minutes + an OpenAI API
   call. Add a 10-minute cooldown via
   `seconds_since_last_reconsider_verdict()` which greps the issue's
   comment list for the bot's own verdict marker
   (`<!-- agent-shin:reconsider-verdict -->`). Inside the window the
   triage returns `skip-rate-limited` and the LLM never runs.

Workflow update:
- `triage_reconsider.yml` now passes `--close` only when
  `AGENT_SHIN_ENABLED=true`, matching the pattern of
  `triage_pr_with_llm.yml`. The script runs in both states so the
  verdict still appears in the step summary for QA.

Tests:
- Add 5 reconsider safety tests: dry-run for pass / fail / linked-issue
  short-circuit, bot-closed-guard refusal on maintainer close,
  rate-limit refusal inside the cooldown window, and cooldown-elapsed
  acceptance.
- Add unit tests for `was_closed_by_agent_shin` (bot / maintainer /
  missing actor / env-override) and
  `seconds_since_last_reconsider_verdict` (no marker / multiple
  markers / non-bot comment with marker / bot comment without marker).
- Pin the `<!-- agent-shin:reconsider-verdict -->` marker in both
  reopen and still-failing comments — dropping it would silently
  break the cooldown.

Existing reconsider tests updated to pass `close=True` (the
production path now) + stub the new guards via
`_stub_reconsider_guards`. 112 tests pass (was 93).

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* feat(triage): 1-day grace period before close + SwiftWinds immediate-close bypass

- Add a 24-hour grace window between the first low-quality detection
  and the actual auto-close. The first detection posts a warning
  comment that explicitly says "You have 1 day to address this before
  this PR is auto-closed" and points the contributor at:
    * `@agent-shin reconsider` to request another look (and re-open)
    * `@greptileai` to request a fresh Greptile review — works
      even after the PR is closed
- Both `triage_with_llm.py` (LLM judge) and `close_low_quality_prs.py`
  (Greptile-score closer) share the same `<!-- agent-shin:grace-warning -->`
  HTML marker so a warning posted by either path is recognized by both.
- Add IMMEDIATE_CLOSE_LOGINS = {swiftwinds} to bypass BOTH the grace
  period AND the dry-run / AGENT_SHIN_ENABLED gating. SwiftWinds is the
  user's personal account (no push permissions to litellm) used to
  dogfood the bot; user explicitly asked: "For SwiftWinds, just close
  immediately. Faster iteration that way."
- Update the standard close comments to mention that `@greptileai`
  works even after the PR is closed.
- Add 23 new tests covering: warn-grace on first detection, skip during
  grace window, close after grace expires, SwiftWinds bypass (case
  insensitive, with close=False, no random-login false positives), the
  grace-warning text invariants, and the SwiftWinds entry in the
  IMMEDIATE_CLOSE_LOGINS constant.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix: skip grace-period text in close comment for IMMEDIATE_CLOSE_LOGINS

For PRs from IMMEDIATE_CLOSE_LOGINS (e.g. swiftwinds), evaluate_pr
returns 'close' immediately without ever posting a grace warning, so
the close comment should not reference a 1-day grace period.

Make close_pr take a grace_period_elapsed flag, default True, and
pass False from the main loop when the close path was the
immediate-close branch.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(close-low-quality-prs): report actual closes in dry-run summary

IMMEDIATE_CLOSE_LOGINS PRs are closed even when the global --close flag is
not set, but the summary used the global dry-run flag to choose between
'would close' and 'closed'. Split the count so operators can see both
actual closures and dry-run would-be closures.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* chore(triage): vendor Agent Shin (#28117) onto demo branch

Brings the Agent Shin OSS-triage scripts, workflows, issue/PR templates, and
tests from PR #28117 onto this branch so the new review-gate feature and its
end-to-end demo are self-contained and runnable in CI.

https://claude.ai/code/session_01XyyWa8t2VYmoGd6mKMEqkZ

* feat(triage): add "ready for review" label lifecycle to Agent Shin

Adds review_gate(), a state machine that keeps a `ready for review` label in
sync with whether an external PR clears BOTH gates — the LLM rubric and
Greptile's most recent confidence score:

- pass (untagged)            -> add label + "ready for review" / "all clear" comment
- pass (already tagged)      -> no-op (idempotent across re-runs)
- regress (Greptile < 4/5 or QA proof removed) -> remove label + "what's missing"
  comment, PR stays open
- recover after a regression -> "all clear again" comment + re-add the label
- fail & untagged, < 24h old -> one-time "what's missing" notice (grace window)
- fail & untagged, > 24h old -> close + comment (reopen via @agent-shin reconsider)

The label itself is the persisted state, so comments fire only on transitions
(never on every scheduled run). All side effects are gated behind --close, so
the dry-run contract matches the existing triage flow. Lifecycle comments use
hidden HTML markers and deliberately avoid the auto-close marker so they never
trip the reconsider provenance check.

Relocates the shared Greptile helpers (extract_greptile_score, SCORE_PATTERN,
GREPTILE_BOT_LOGINS, parse_iso8601) into triage_with_llm.py so the daily sweep
and the review gate read the score through one implementation, and adds the
review_gate.yml workflow (dry-run unless AGENT_SHIN_ENABLED=true) plus 18 unit
tests covering every branch and a full pass->regress->recover cycle.

https://claude.ai/code/session_01XyyWa8t2VYmoGd6mKMEqkZ

* Port review-gate feature from #28758 onto #28147 triage scripts

Adds the "ready for review" label lifecycle (originally PR #28758) on top
of #28147's refactored triage_with_llm.py. The original commit was
authored against an older snapshot of #28117 and could not be applied
cleanly, so the additions were re-applied surgically:

- New constants: READY_FOR_REVIEW_LABEL, DEFAULT_GRACE_DAYS,
  DEFAULT_MIN_GREPTILE_SCORE, READY/REGRESSED/WITHIN_GRACE markers,
  GREPTILE_BOT_LOGINS, SCORE_PATTERN, AGENT_SHIN_AUTO_CLOSE_MARKER.
- New helpers: add_label, remove_label, extract_greptile_score,
  parse_iso8601 (the latter two mirrored from close_low_quality_prs.py
  so the daily sweep and the review gate read the score through the
  same logic).
- New comment formatters: format_ready_for_review_comment,
  format_all_clear_comment, format_regression_comment,
  format_within_grace_comment.
- New entry point: review_gate() implementing the pass/regress/recover
  state machine, with the label itself acting as persisted state so
  transition comments fire only on actual transitions.
- main() learns --review-gate, --grace-days, --min-greptile-score and
  dispatches to review_gate() when the flag is set.

Verified via tests/test_litellm/test_github_review_gate.py (18 tests)
and the existing triage suites (144 more) — all 162 pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* agent_shin: extract shared constants/helpers; cover review_gate.yml in guardrail tests

Bug 1: `triage_with_llm.py` and `close_low_quality_prs.py` each defined
their own copies of `extract_greptile_score`, `parse_iso8601`,
`GREPTILE_BOT_LOGINS`, `SCORE_PATTERN`, `GRACE_COMMENT_MARKER`,
`GRACE_PERIOD_SECONDS`, `IMMEDIATE_CLOSE_LOGINS`, and
`AGENT_SHIN_DEFAULT_BOT_LOGIN`. The comments explicitly said the two
copies had to stay in sync, but nothing enforced it. A future change to
one (e.g. extending `SCORE_PATTERN` for a new Greptile output format)
would silently diverge from the other and the daily sweep and the LLM
judge would disagree on which PRs have low scores.

Extract these to `.github/scripts/agent_shin_shared.py` and re-export
them from each script so the existing test attribute access
(`triage_module.GRACE_COMMENT_MARKER`, etc.) keeps working without
any test changes.

Bug 2: `review_gate.yml` is a destructive workflow (close PRs, add/remove
labels, post comments) with the same gating philosophy as the others
(`AGENT_SHIN_ENABLED = "true"` + a per-run `CLOSE_FLAG = "true"`),
but it was missing from `DESTRUCTIVE_GATE_ENV` in the guardrail tests.
Add it so a future regression (e.g. flipping to `!= "false"`) is
caught by the same parameterized invariants as every other workflow.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* agent_shin: fix bug bundle (gated LLM key, author-filtered marker dedup, dedup gh/grace helpers)

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* agent_shin: fix review_gate close-after-regression and case-insensitive label match

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* feat(triage): add one-shot 7-day heads-up sweep for Agent Shin rollout

Adds a rollout-day workflow that comments on every open external PR/issue
that the new triage bot WOULD auto-close, giving contributors 7 days to
fix their description before any destructive action runs.

Why now: merging this PR enables Agent Shin in dry-run. The follow-up
"enact" PR (next Monday) flips the destructive paths on. Without this
heads-up, contributors would get a close-comment on day 8 with no prior
warning. The heads-up names the cutoff date, lists the rubric, calls out
each PR/issue's specific missing pieces, and explains the recovery paths
(@agent-shin reconsider for PRs, edit + reopen for issues).

Files
- .github/scripts/_agent_shin_actions.py — thin maybe_post_comment /
  maybe_close_* / maybe_add_label / etc. wrappers. Each is a single
  `if dry_run: log; return; else: call_through()` so a dry-run preview
  differs from the real run in exactly one call site per mutation. The
  call-through goes via `triage_with_llm.<name>` (module-qualified) so
  monkeypatching the underlying function in tests is reflected here.
- .github/scripts/triage_rollout_heads_up.py — the sweep. Iterates every
  open PR + issue via `gh pr list` / `gh issue list`, runs the future
  rubric (review_gate for PRs, triage(kind="issue") for issues), and
  posts the heads-up on any item that would be auto-closed. Idempotent
  via a `<!-- agent-shin:rollout-heads-up -->` marker. Defaults to dry-
  run; --close opts in to real posts. --close-on overrides the cutoff
  date (defaults to today + 7 days).
- .github/workflows/triage_rollout_heads_up.yml — one-shot workflow.
  Triggers on push to litellm_internal_staging filtered to the script
  path (fires on rollout merge) plus workflow_dispatch with a dry_run
  input that defaults to "true" for safe manual re-runs.
- tests/test_litellm/test_triage_rollout_heads_up.py — 28 unit tests
  covering: the dry-run wrappers (each maybe_* gates correctly), the
  _would_be_closed predicate for PR vs. issue results, the comment
  formatter (cutoff/rubric/marker/recovery wording), per-item dispatch
  (skip-not-open, skip-internal-author, skip-already-notified,
  skip-passing, would-post/posted), and the sweep loop end-to-end.

Local preview (no GitHub mutations):
    python3 .github/scripts/triage_rollout_heads_up.py --repo BerriAI/litellm

Real run (what the workflow does):
    python3 .github/scripts/triage_rollout_heads_up.py --repo BerriAI/litellm --close

TODO: replace the placeholder ROLLOUT_BLOG_URL with the canonical
docs URL once the litellm-docs PR ships.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: gate reconsider workflow OPENAI_API_KEY + remove dead actions wrappers

- Mirror sibling Agent Shin workflows by only exposing OPENAI_API_KEY in
  triage_reconsider.yml when vars.AGENT_SHIN_ENABLED == 'true'. Previously
  the secret was unconditionally exposed, so any PR/issue author could
  trigger paid LLM calls by commenting '@agent-shin reconsider' even while
  the bot was supposed to be in dry-run.
- Remove the six unused dry-run wrappers (maybe_close_pr, maybe_close_issue,
  maybe_reopen_pr, maybe_reopen_issue, maybe_add_label, maybe_remove_label)
  from _agent_shin_actions.py — only maybe_post_comment is used by rollout
  scripts. Drop the associated tests that exercised the now-removed
  functions.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: address triage script edge cases

- triage_rollout_heads_up.py: replace %-d strftime specifier (GNU-only)
  with portable day formatting so the script doesn't crash on Windows.
- close_low_quality_prs.py: skip malformed JSON lines in fetch_pr_comments
  instead of letting one bad line abort the daily sweep, matching the
  pattern in triage_with_llm._iter_paginated_json.
- triage_with_llm.py: move has_linked_issue short-circuit before
  build_pr_prompt to avoid unnecessary prompt construction on PRs that
  link an issue.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(scripts): per-PR error isolation and limit grace warnings in close_low_quality_prs

- Wrap per-PR processing in try/except so a transient GitHub API failure
  on one PR no longer aborts the entire daily sweep (mirrors the pattern
  already used in triage_rollout_heads_up.py).
- Have --limit bound *all* destructive write actions (closures and grace
  warnings combined), not just closures. Prevents a backlog of newly
  failing PRs from flooding contributors with comments in a single run.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(agent-shin): remove 1000-PR cap on bulk sweeps; sweep entire backlog

Both bulk-sweep scripts hardcoded `gh {pr,issue} list --limit 1000`, and gh
lists newest-first — so the OLDEST ~900 PRs and ~380 issues were silently
dropped. That's exactly the stale backlog the daily closer and one-shot
rollout heads-up exist to catch.

Extract a single `list_open_items(kind, *, repo, fields)` helper into
`agent_shin_shared.py` with `GH_LIST_ALL_LIMIT = 100_000` — a ceiling far
above any realistic open backlog so gh paginates until the queue is
exhausted. `fetch_open_prs` and `_list_open_numbers` both delegate to it,
so the limit lives in exactly one place going forward.

Verified live against BerriAI/litellm:
- `fetch_open_prs` -> 1981 PRs (was 1000)
- `_list_open_numbers(issue)` -> 1382 issues (was 1000)
- `_list_open_numbers(pr)` -> 1981 PRs (was 1000)

Adds 7 regression tests asserting the new limit is passed, the dedicated
`gh {pr,issue} list` command + fields are used per kind, bad kind raises
ValueError, and both callers delegate to the shared helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(agent-shin): require non-mocked end-to-end QA proof for PR pass

The PR rubric previously passed any PR with a linked issue, regardless
of whether it showed the fix actually working. Sample spot-check found
21/25 recent external PRs passing, including ones that linked an issue
but provided zero QA evidence.

Tighten the rubric so a pass now requires BOTH:

  (1) CONTEXT — a linked issue OR a clear problem description with
      expected-vs-actual behavior.
  (2) END-TO-END QA PROOF — at least one of:
      (a) screenshot(s) of the fix working,
      (b) screen recording / video,
      (c) specific commands actually run, paired with their real
          output, against the real system.

Mocked unit tests, generic 'I tested it' claims, 'all tests pass'
without output, and the linked issue itself are explicitly excluded
from QA proof.

Also add 'qa_proof_type' to the JSON schema so the per-PR report
surfaces which kind of proof (or 'none') the judge saw.

Re-sample on the same 25 recent external PRs shifts the verdict
distribution from 21 pass / 4 fail to 4 pass / 21 fail, with zero
prior-fails now passing — the stricter rule catches PRs that ship
only with unit-test claims and no real integration evidence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agent-shin): link blog explainer from every action-required bot comment

Adds "What's this and why am I getting it?" links to docs.litellm.ai/blog/
agent-shin-triage from the four comments contributors actually read when
something went wrong: PR close, PR grace warning, issue close, issue grace
warning. PR comments also link the rubric section directly from the
QA-proof bullet so contributors can self-serve "what counts as proof"
without pinging a maintainer.

Pins the new guarantees in tests: blog link must appear in all four
comments, and the PR close comment must continue to flag mocked-dependency
unit tests as insufficient proof.

The linked blog post is in BerriAI/litellm-docs PR #240; the URL will 404
until that lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review_gate): raise sweep limit from 1000 to 100000 to match GH_LIST_ALL_LIMIT

gh lists newest-first, so capping at 1000 silently drops the oldest open
PRs — exactly the stale ones the daily sweep is meant to reconcile. Use
the same ceiling as agent_shin_shared.GH_LIST_ALL_LIMIT so the workflow
sees the entire backlog.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* Fix three Agent Shin triage edge cases

- review_gate: expire the regression-marker short-circuit after grace_days
  so PRs that were regressed and then abandoned can eventually be closed.
- review_gate: when the rubric short-circuits to pass via the linked-issue
  regex but Greptile drags the PR below the bar, replace the synthetic
  'LLM was not called' explanation with the real Greptile shortfall so
  regression / close comments are not misleading.
- triage_rollout_heads_up._comments_have_marker: drop the unused 'kind'
  parameter and filter by bot author so a contributor quoting the
  heads-up via 'Quote reply' cannot trick the idempotency check, matching
  the pattern in triage_with_llm._has_marker.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: pass min_greptile_score through to ready-for-review comment text

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* feat(agent-shin): warmer triage comments — bullet-train emoji, 'what you got right' section, softer 'park this for later' framing

User feedback on the auto-triage comments contributors will see:

1. Tone — the previous 'You have 1 day to address this before this PR is
   auto-closed' framing reads as an ultimatum. Replace with: 'If the
   description isn't updated in the next 1 day, I'll auto-close this PR.
   That's not us saying we don't care about the change — we want the
   open-PR list to mirror what a maintainer can act on right now, so
   contributors don't get lost in a backlog. A closed PR is a soft "park
   this for later," not a rejection. Take your time.'

2. Positive feedback — the previous comments only listed what was missing.
   Now every close + grace-warning comment opens with a 'What you got
   right:' section rendered from the judge's per-field flags. Contributors
   see a checkmark for everything they got right (linked issue, problem
   description, expected/actual, QA proof for PRs; runnable repro,
   screenshot/log, expected/actual, motivation+example for issues) before
   the gaps. The block is omitted entirely when nothing is present so
   we never render 'What you got right: (nothing).'

3. Reconsider trigger — the previous grace warning told contributors to
   comment '@agent-shin reconsider' during the grace window. They don't
   need to — the bot re-checks on every sweep. The new copy says 'just
   update the description, no need to ping me' for the grace path, and
   reserves '@agent-shin reconsider' for the post-close recovery path.

4. Bullet-train emoji — replace 👋 with 🚄 (Shinkansen, the symbol of
   Agent Shin) across every action-required comment: PR close, PR grace
   warning, issue close, issue grace warning, within-grace, Greptile-
   closer grace warning, rollout heads-up. Pinned in tests so a future
   refactor can't silently revert.

5. Greptile-post-close — the @greptileai bullet now explicitly says 'a
   low Greptile score isn't a blocker either,' since the previous copy
   buried the fact that @greptileai works after auto-close.

Comment templates updated: format_pr_close_comment,
format_issue_close_comment, format_grace_warning_pr_comment,
format_grace_warning_issue_comment, format_within_grace_comment
(triage_with_llm.py); format_grace_warning_comment
(close_low_quality_prs.py); format_heads_up_comment header
(triage_rollout_heads_up.py).

New helpers: _format_present_for_pr / _format_present_for_issue /
_format_present_block, driven off the existing per-field flags the
LLM judge already emits — no prompt change needed.

New tests pin: bullet-train emoji in every action-required comment;
'What you got right' appears with ✅ bullets when fields are present;
the block is omitted when no fields are present; 'park this for
later' / 'not a rejection' softer framing; grace warnings tell the
contributor 'no need to ping' during the grace window (reconsider is
the post-close path only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agent-shin): gate triage on a dogfood allowlist

Add ALLOWLIST_LOGINS to agent_shin_shared so Agent Shin only acts on the
named accounts while the set is non-empty. mateo-berri and SwiftWinds are
allowlisted for the dogfood rollout; everyone else is skipped with
skip-not-allowlisted across all four entrypoints (triage, review gate, the
daily low-quality sweep, and the rollout heads-up).

For an allowlisted author the usual internal/external classification is
bypassed, so a maintainer's own org account still gets triaged during
testing. Emptying the set lifts the restriction and restores full triage
for the public rollout. The gate is dependency-injected via an `allowlist`
parameter defaulting to the constant, so the internal/external-skip paths
stay testable.

* feat(agent-shin): tighten QA-proof and issue rubrics, ack reconsider with reactions

Reorder the end-to-end QA proof options to video, then screenshots, then
exact commands with their real output across the PR template, the LLM judge
prompts, and every contributor-facing comment, and spell out that mocked or
stubbed runs (including pytest on the repo's own unit tests, which mock the
provider, DB, and network) never count as proof. QA proof is now required of
all contributors, not just external ones.

Tighten the issue bug-report rubric to require end-to-end evidence of the bug
(the "before" half: a video, screenshot, or command paired with real output)
plus expected vs. actual behavior, drop the bias toward PASS, and collapse the
separate has_repro/has_proof flags into a single has_repro signal.

Standardize the bullet-train emoji and strip em dashes from the bot's
public-facing messages, and route issue recovery through @agent-shin
reconsider since GitHub doesn't let OSS authors reopen an issue a bot closed.

Acknowledge an @agent-shin reconsider the moment it's accepted with an eyes
reaction and a thumbs-up once the run finishes, both gated on
AGENT_SHIN_ENABLED so dry-run leaves no trace.

* fix(agent-shin): shorten auto-close grace to 2 hours and drop the instant-close bypass

Two dogfooding changes to the Agent Shin grace window. First, the warn-then-close
grace (GRACE_PERIOD_SECONDS) drops from a day to 2 hours so the "fix it before it
closes" loop can be exercised in one sitting; the constant carries a note to bump
it back up for the public rollout.

Second, remove IMMEDIATE_CLOSE_LOGINS entirely. SwiftWinds (the external dogfood
account) used to skip the grace window and close on first detection, which also
meant closing real PRs even during a scheduled dry run because the per-PR
override flipped dry_run off. It now follows the same warn-then-close path as
every other author, so a low-quality PR is warned first and only closed once the
2-hour window elapses. This also closes the Greptile finding that the sweep could
mutate real PRs while AGENT_SHIN_ENABLED was still off.

The review gate's separate age-based grace (DEFAULT_GRACE_DAYS) is left unchanged.

Regression tests pin that SwiftWinds now warns-grace instead of closing instantly,
and that a dry-run sweep over a closeable PR reports "would close" without making
any GitHub mutation.

* fix(agent-shin): gate reconsider reopen on an Agent Shin close marker

was_closed_by_agent_shin only checked that the most recent close actor was
the bot identity. That identity defaults to github-actions[bot], which is
shared by every workflow in the repo (stale/duplicate sweeps included), so a
contributor could @agent-shin reconsider an item another workflow closed and,
if the description passed the rubric, get it reopened even though Agent Shin
was never the closer.

Require a second, Agent-Shin-specific signal alongside the actor check: an
auto-close comment stamped with a hidden AGENT_SHIN_CLOSE_MARKER. Both close
paths (the grace-period close and the review-gate close) flow through
format_pr_close_comment / format_issue_close_comment, so stamping the marker
there covers every real close while leaving the grace warnings unmarked. The
guard stays fail-closed: no marker, no reopen.

This also replaces the unused AGENT_SHIN_AUTO_CLOSE_MARKER constant (a visible
phrase the guard never consulted) with the hidden marker the guard now relies
on.

* fix(agent-shin): stamp close marker on sweep closes and disclose regression deadline

The daily Greptile sweep's close comment advertised `@agent-shin reconsider`
but never stamped AGENT_SHIN_CLOSE_MARKER, so the reconsider reopen guard
(was_closed_by_agent_shin), which now also requires that marker, silently
rejected every sweep-closed PR with `skip-not-bot-closed`. Move the marker into
agent_shin_shared so both close paths share one source of truth, extract
format_close_comment so the sweep close comment is unit-testable, and stamp the
marker there.

Also disclose the grace_days deadline in the review-gate regression comment; it
promised "the PR stays open" without mentioning that a still-failing PR is
auto-closed grace_days after the notice, which would surprise contributors with
a close they were never warned about.

* fix(triage): tighten Agent Shin reconsider reopen guards

The bot-closed guard accepted any historical Agent Shin marker comment
on the thread as proof that Agent Shin owned the latest close, so a
post-reopen close by another workflow under the shared
`github-actions[bot]` identity could still satisfy the gate and let
`@agent-shin reconsider` reopen a PR that Agent Shin did not close
this cycle. `fetch_last_close_event` now also returns the latest
`closed` event timestamp, and `was_closed_by_agent_shin` requires
the most recent Agent Shin marker comment to sit at (or just before)
that timestamp, with a small skew window for clock drift between the
events and comments APIs.

In the same path the LLM verdict check used `decision != "fail"` to
choose the reopen branch, which treated a missing, empty, or typo
verdict as a pass. Reopen is destructive, so the check now requires an
explicit `decision == "pass"` and ambiguous verdicts fall through
to the "still failing" branch instead.

* style(agent-shin): black-format reconsider guard hardening

* docs(agent-shin): scope dry-run wrapper docstring to the single existing helper

The module docstring claimed it wrapped every Agent Shin mutation and
referenced post_comment/close_pr/etc., but only maybe_post_comment exists.
Describe the single helper accurately while keeping the dry-run pattern
guidance for any future wrapper.

* chore(agent-shin): defer issue/PR template changes to the rollout PR

The triage and review-gate automation is gated to the allowlisted authors
(mateo-berri, SwiftWinds) and AGENT_SHIN_ENABLED, so during this rollout it
only acts on internal PRs/issues. The issue and PR templates have no such
gate; they change for every contributor on merge and advertise that an LLM
bot auto-closes external submissions, which won't happen while the allowlist
is the sole author gate. Revert bug_report.yml, feature_request.yml, and
pull_request_template.md to base so the public-facing messaging lands with
the rollout flip instead of ahead of it. The scripts embed their own rubric
and never read these files, so triage behavior is unchanged.

* ci(agent-shin): hash-pin the openai install in privileged triage workflows

The triage workflows install the OpenAI client with `pip install
"openai>=1.40.0"`, a floating lower bound that resolves openai and its
whole transitive tree to whatever PyPI serves at run time. These jobs run
under pull_request_target with a write-scoped GITHUB_TOKEN, and the
install plus the triage run happen on every PR open regardless of the
AGENT_SHIN_ENABLED dry-run gate (that gate only withholds the LLM key and
the destructive --close path), so a compromised release would execute
during install or import while the token is in scope.

Install instead from a new .github/scripts/triage-requirements.txt that
pins openai==2.33.0 and every transitive dependency to an exact version
with sha256 hashes, via pip --require-hashes. The workflows already
sparse-checkout .github/scripts from the base repo (never fork code), so
the pinned file is trusted. Add static guardrails to
test_github_triage_workflows.py that fail if any installer workflow
reverts to a floating openai install or if the requirements file loses
its exact pins or hashes.

* ci(agent-shin): gate rollout heads-up real run behind manual dispatch

The rollout heads-up workflow fired its real `--close` sweep on every push
to litellm_internal_staging that touched the script, and exposed
OPENAI_API_KEY unconditionally, unlike every sibling triage workflow which
only exposes the key on an enabled or dispatched run. That made merging the
script post real heads-up comments (bounded only by the dogfood allowlist),
which contradicts the inert-by-default safety invariant; once the allowlist
is cleared for the public rollout, any later edit to the file would sweep
the whole open backlog with real writes.

The heads-up cannot be gated on AGENT_SHIN_ENABLED: its whole job is to warn
contributors before that flag flips on, so it has to run while the flag is
still off. Instead the automatic push trigger now stays dry-run, and the
real one-shot sweep is a deliberate manual workflow_dispatch with
dry_run=false, the sole path that adds `--close`. OPENAI_API_KEY is exposed
only on that dispatch, matching the sibling workflows.

Add static guardrails that fail if the push path regains a `--close`, if the
dispatch gate stops fail-closing on the exact string "false", or if the key
is exposed unconditionally again.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants