Combine #28758 + #28117 + #28147 — Agent Shin auto-close + review-gate label lifecycle by mateo-berri · Pull Request #28759 · BerriAI/litellm

mateo-berri · 2026-05-25T01:01:36Z

Summary

Combines three open PRs into a single branch targeting litellm_internal_staging:

#28117 — Agent Shin: LLM-judge + Greptile auto-close (any age, any draft state) + @agent-shin reconsider flow.
#28147 — Safety fixes on top of feat(triage): Agent Shin — LLM-judge + Greptile auto-close (any age, any draft state) + @agent-shin reconsider flow #28117: reconsider provenance check, 1-day grace period before close, @greptileai post-close hint, SwiftWinds dogfood bypass.
#28758 — ready for review label lifecycle (review-gate state machine) + the proxy/UI bugfixes that rode along on the demo branch (team-cache refresh, vertex-anthropic model filter, spend-logs route permissions).

How it was assembled

New branch off litellm_internal_staging (the base of feat(triage): Agent Shin — LLM-judge + Greptile auto-close (any age, any draft state) + @agent-shin reconsider flow #28117).
Merged litellm_fix-agent-shin-reconsider-safety-1af4 (fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood #28147) — this brings feat(triage): Agent Shin — LLM-judge + Greptile auto-close (any age, any draft state) + @agent-shin reconsider flow #28117 along since fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood #28147 is stacked on it.
Merged claude/agent-shin-pr-demo-UhrOx (Agent Shin: "ready for review" label lifecycle + end-to-end demo #28758). Eight files conflicted: Agent Shin: "ready for review" label lifecycle + end-to-end demo #28758 was built on an older snapshot of feat(triage): Agent Shin — LLM-judge + Greptile auto-close (any age, any draft state) + @agent-shin reconsider flow #28117 (its 0ad6836e31 commit literally vendored an old version), so the constants/imports in triage_with_llm.py and close_low_quality_prs.py diverged. Resolved by taking fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood #28147's (newer, more complete) versions of the duplicated files and keeping Agent Shin: "ready for review" label lifecycle + end-to-end demo #28758's unique additions (review_gate.yml, test_github_review_gate.py, the proxy/UI changes, issue templates).
Manually ported the actual review-gate feature commit (b4d970371e) onto fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood #28147's structure since a cherry-pick conflicted: added the review_gate() state machine, the --review-gate / --grace-days / --min-greptile-score CLI flags, new constants (READY_FOR_REVIEW_LABEL, READY/REGRESSED/WITHIN_GRACE markers, AGENT_SHIN_AUTO_CLOSE_MARKER, GREPTILE_BOT_LOGINS, SCORE_PATTERN), and new helpers (add_label, remove_label, extract_greptile_score, parse_iso8601, format_*_comment).

Test plan

tests/test_litellm/test_github_review_gate.py — 18 tests, all pass
tests/test_litellm/test_github_triage_with_llm.py — passes
tests/test_litellm/test_github_close_low_quality_prs.py — passes
tests/test_litellm/test_github_triage_workflows.py — passes
python3 .github/scripts/triage_with_llm.py --help shows --review-gate, --grace-days, --min-greptile-score
uv run black --check clean on edited files

162 GitHub-triage tests pass total.

Reviewer notes

The three source PRs should stay open until this combined PR merges; closing them prematurely would lose review history.
.github/workflows/review_gate.yml runs in dry-run mode unless the repo variable AGENT_SHIN_ENABLED is set to true.

🤖 Generated with Claude Code

Note

Medium Risk
Automated closing, labeling, and commenting on external PRs/issues can misfire or upset contributors; mitigated by dry-run defaults, grace periods, internal exemptions, and explicit enablement.

Overview
Introduces Agent Shin, an OSS triage stack under .github/scripts/ and matching workflows, gated by repo variable AGENT_SHIN_ENABLED (dry-run until set to true).

Contributor-facing templates now spell out auto-triage rules: bug/feature issue forms require repro, expected vs actual, and proof; the PR template adds problem/expected-actual/QA sections and documents Greptile ≥4/5 plus @agent-shin reconsider.

Core automation: triage_with_llm.py LLM-judges external PRs/issues (linked issue or rubric + QA proof), 24h grace before close, @agent-shin reconsider with bot-only reopen guards and rate limits. close_low_quality_prs.py sweeps open external PRs below Greptile 4/5 (drafts included). review_gate() adds a ready for review label lifecycle (LLM + Greptile, regression handling, auto-close after grace). triage_rollout_heads_up.py posts a one-shot 7-day rollout warning. Shared helpers live in agent_shin_shared.py; _agent_shin_actions.py centralizes dry-run vs real GitHub writes.

Workflows: PR/issue triage, reconsider on comment, review gate (PR events + daily sweep), daily low-quality closer, rollout heads-up on staging merge. Tests cover review gate, Greptile closer, and related triage logic under tests/test_litellm/.

^{Reviewed by Cursor Bugbot for commit dd02b76. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds .github/scripts/close_low_quality_prs.py and a daily workflow that closes PRs which: - are open for at least 7 days, and - carry a most-recent greptile-apps review with Confidence Score <4/5, - and are not drafts or opt-out-labeled ('do not close', 'wip', etc.). Each closure posts an explanatory comment telling the contributor how to bring the PR back (rebase, re-request greptile, reopen at 4+/5). The 4/5 bar is already documented in the PR template (.github/pull_request_template.md), so this just enforces it. Tested with a dry run against the live BerriAI/litellm backlog of 1000 open PRs: 100 candidates identified, 598 PRs pass the bar (4+/5), 186 are too young, 97 are drafts, 19 lack any Greptile review and are left alone. Workflow defaults to closing 25 PRs/run as a safety net and supports workflow_dispatch with overrides (close=false for a dry run, custom min_age_days/min_score/limit). 18 unit tests cover score extraction (HTML/markdown/plain text, login variants, multi-review picks latest) and per-PR evaluation (drafts, opt-out labels, age, missing/passing/failing scores). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

…ributions PR template: - Make the rubric explicit at the top: link an issue, OR provide a clear problem description + expected vs. actual + visual QA proof. - Add dedicated sections for each piece so the bot has a deterministic shape to read. - Keep the existing 'Linear ticket' section for internal contributors (they're exempt from the auto-triage rubric). Bug report template: - Split 'What happened?' into 'Actual behavior' + 'Expected behavior'. - Make logs/screenshot a required textarea. - Warning banner at the top tells external contributors that incomplete reports will be auto-closed (with re-evaluation on reopen). Feature request template: - Require a concrete use case + example in the motivation field, not just a one-liner pitch. - Same auto-triage warning banner. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

Adds a new triage flow that evaluates external pull requests and issues against the project's contribution rubric and, when configured to do so, auto-closes non-conforming ones with an explanatory comment. Contributors can update + reopen to be re-evaluated. Scope: - Internal BerriAI contributors (author_association OWNER/MEMBER/COLLABORATOR) and bot accounts are skipped entirely. - 'Fixes #1234' / 'Resolves https://github.com/.../issues/N' in the PR body short-circuits to PASS without burning LLM tokens. - LLM judge returns structured JSON (verdict, missing[], explanation); parser tolerates markdown fences and embedded JSON. - LLM errors NEVER close PRs/issues — failure surfaces as 'skip-llm-error'. Safety: - pull_request_target / issues triggers are FORCED dry-run in the workflow; only manual workflow_dispatch with close=true (and AGENT_SHIN_ENABLED=true) takes destructive action. - Default mode writes verdicts to GITHUB_STEP_SUMMARY only — no public comments until the team flips the AGENT_SHIN_ENABLED repo variable. - LLM uses an OpenAI-compatible endpoint (model and base URL configurable via repo variables; key via OPENAI_API_KEY secret). Files: - .github/scripts/triage_with_llm.py - judge orchestrator + CLI - .github/workflows/triage_pr_with_llm.yml - .github/workflows/triage_issue_with_llm.yml - tests/test_litellm/test_github_triage_with_llm.py - 33 unit tests End-to-end validated against four real PRs (#28117 internal collaborator, #28108 bot, #28129 'Fixes #28128', #28116 no linked issue) and issue #28132 with a stubbed LLM judge: each path produces the expected action. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

…ry-run by default - close_low_quality_prs.py now filters by GitHub author_association via the REST API: PRs from OWNER / MEMBER / COLLABORATOR (and bot accounts) are skipped with a new 'skip-internal' summary bucket. - close_low_quality_prs.yml now defaults workflow_dispatch close=false, and ignores 'close=true' unless the new repo variable AGENT_SHIN_ENABLED is set to 'true'. Scheduled runs are dry-run only until the team flips that switch. - Updated unit tests: one new test asserting internal authors are skipped, and an autouse fixture treats unspecified test PRs as external so the rest of the suite still exercises the close path. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

Co-authored-by: Yassin Kortam <yassin@berri.ai>

…rpolation - close_low_quality_prs.yml: only workflow_dispatch with close=true (and AGENT_SHIN_ENABLED=true) actually closes PRs. Scheduled runs are always dry-run, matching the safety invariant documented for triage_pr/issue. - triage_with_llm.py: textwrap.dedent on an f-string with multi-line interpolated bodies fails because the body's 2nd+ lines start at column 0, making the common-indent zero. Dedent the static template first, then .format() the title/body in. Co-authored-by: Yassin Kortam <yassin@berri.ai>

- close_low_quality_prs.py: Treat author_association API lookup failures as internal (fail-safe) so transient errors don't cause internal contributors' PRs to be auto-closed. - triage_with_llm.py: Update summary heading from 'Would post comment:' to 'Posted comment:' since this branch only runs after the comment has already been posted. Co-authored-by: Yassin Kortam <yassin@berri.ai>

…t=none - Bump DEFAULT_MODEL from gpt-4o-mini to gpt-5.4-mini (more modern; 4M total context window per OpenAI catalog, JSON-schema response format, function calling all supported). - For gpt-5.x family models, pass reasoning_effort="none" via extra_body. gpt-5.x rejects temperature != 1 unless reasoning_effort is explicitly "none"; setting it lets us keep temperature=0 for deterministic JSON rubric judgments. extra_body works across openai SDK versions regardless of whether they natively type the kwarg. - For non-gpt5 overrides (TRIAGE_MODEL=gpt-4o-mini etc.), reasoning_effort is not sent. - 4 new unit tests cover: gpt-5.4-mini -> reasoning_effort=none, capitalized/dated gpt-5 variants -> reasoning_effort=none, gpt-4o-mini -> no extra_body, base_url passthrough. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

…-with-default - Removed the unused gh_json helper (bugbot low-severity dead code). - Replaced argparse `action="append", default=[...]` with default=None + DEFAULT_OPTOUT_LABELS fallback. The mutable-default + append combo silently APPENDS to the canonical defaults instead of replacing them, so --optout-label could not actually scope the opt-out list. - Added tests covering both the canonical default and the flag-replaces-defaults behavior. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

…sociation, fix empty TRIAGE_MODEL Three independent bugbot findings against triage_with_llm.py: 1. LINKED_ISSUE_PATTERN included weak keywords (`see`, `ref`, `addresses`) so casual mentions like "See #1234 for context" were short-circuited to pass-linked-issue without ever calling the LLM — contradicting the prompt's own "a bare issue number without a closing keyword counts only if it's clearly the related issue (not a passing mention)" rubric. Limit the regex to GitHub's documented PR-closing keywords (fixes/fix/fixed/closes/close/closed/resolves/resolve/resolved). 2. is_internal_contributor() treated an empty/missing author_association as external (eligible for the destructive close path), while the sibling is_external_pr_author() in close_low_quality_prs.py fail-safes the same case as internal. Align the two so a partial/unknown GitHub response can never make a PR eligible for auto-close. 3. argparse `default=os.environ.get("TRIAGE_MODEL", DEFAULT_MODEL)` returns the empty string when GitHub Actions exposes an unset repo variable as an empty-string env var (the optional vars.TRIAGE_MODEL case in the workflow). Use `os.environ.get(...) or DEFAULT_MODEL` so empty -> default, matching the existing OPENAI_BASE_URL pattern. Tests: - Casual mentions now must fall through to the LLM (parametrized); added an orchestration test ensuring "See #1234" reaches the judge. - Empty/missing author_association now fails safe (parametrized). - Empty TRIAGE_MODEL env var falls back to DEFAULT_MODEL; explicit TRIAGE_MODEL is still honored. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

…false' The PR and issue Agent Shin workflows gated the destructive --close flag with [ "${DISPATCH_CLOSE:-false}" != "false" ]. That pattern treats anything other than the literal string "false" as enabling closure — "True", "yes", "1", typos, accidental whitespace, etc. The workflow_dispatch input UI is a 'true'/'false' choice dropdown so the form is constrained, but the API (`gh workflow run -f close=...`) accepts any string, and a CI cron / external invoker passing a non-canonical truthy value would have silently enabled real contributor PR closures. Mirror the sibling Greptile closer's [ "${CLOSE_FLAG}" = "true" ] pattern: only the EXACT string "true" enables --close; every other value (including the unset/empty default) resolves to dry-run. This is the fail-safe philosophy applied everywhere else in this PR. Added tests/test_litellm/test_github_triage_workflows.py with two parametrized invariants: 1. The destructive gate uses '= "true"' for its env-var comparison (either bare '${ENV}' or '${ENV:-false}' form accepted), and never the fail-open '!= "false"' pattern. 2. Every destructive gate is also gated on AGENT_SHIN_ENABLED being "true" — either by entering the close branch on '=' or by bailing out early on '!=' — so flipping the repo variable off is a true kill switch regardless of per-run inputs. Manually verified the test fails on the buggy '!= "false"' pattern and passes on the fix, so it would have caught the regression at PR time. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

@agent-shin

…econsider flow Follow-up to PR #28117. Three behavior changes + one new workflow, addressing the team's concerns on the original review: 1) Apply auto-close to ALL open PRs, not just those over a week old. - close_low_quality_prs.py: --min-age-days default flipped from 7 to 0. The flag is preserved as an opt-in safety net for one-off backfill runs that want to spare very-young PRs, but the daily scheduled sweep now closes external-author PRs as soon as Greptile scores them <4/5. - close_low_quality_prs.yml: workflow_dispatch input default also flipped to 0; doc comments updated. 2) Apply auto-close to draft PRs too. - close_low_quality_prs.py: removed the skip-draft branch in evaluate_pr. Drafts are NOT a free pass — the team's intent is 'open PR count == PRs internal collaborators need to action on', so a draft Greptile scored 2/5 still belongs in the closed bucket. Authors who genuinely need a long-lived draft can attach the 'wip' opt-out label, which is unchanged. - The 'skip-draft' action is gone; the 'wip' label still skips. 3) Address the 'OSS contributors cannot reopen a bot-closed PR' wrinkle. GitHub does NOT let an external (non-write-access) contributor reopen a PR that was closed by a bot or maintainer (long-standing limitation). The original PR's close-comments told contributors to 'Reopen the PR — I'll re-evaluate automatically', which is broken for the very audience this triage targets. Two changes: a) Reword every close-comment (Greptile sweep + Agent Shin PR close + Agent Shin issue close + PR template) to recommend: - Open a new PR with the updated branch (primary path). - Or comment '@agent-shin reconsider' on the closed PR for a re-evaluation that, on pass, reopens the PR via the bot's GH_TOKEN write access. b) Add the @agent-shin reconsider workflow: - .github/workflows/triage_reconsider.yml: new 'issue_comment'-triggered workflow. Authorizes only the PR/issue author or an internal collaborator (OWNER/MEMBER/COLLABORATOR), gated via a step output so unauthorized commenters never reach the destructive steps. Globally gated on AGENT_SHIN_ENABLED='true' (positive form, matching the test_github_triage_workflows guardrail patterns). - triage_with_llm.py: --reconsider mode. On a closed PR/issue, re-runs the LLM judge (or linked-issue regex short-circuit) and: - on pass: reopens via reopen_pr/reopen_issue + posts a 'Re-evaluated and reopened' comment. - on fail: leaves closed and posts a 'still missing X' comment so the contributor can iterate again. Reconsider-on-open is a no-op ('skip-not-closed'). Internal-author + bot-account skips still take priority over reconsider. 4) Greptile-on-closed-PRs question: the team asked whether Greptile can re-review a closed PR. Greptile's docs don't address this and we shouldn't promise behavior we can't verify, so the new close-comment wording does NOT instruct contributors to 're-request greptile on the closed PR'. Instead it points them at the new-PR path (which Greptile definitely reviews) or the @agent-shin reconsider trigger (which re-runs the LiteLLM-side rubric judge, not Greptile). Tests: 93 passing (was 59). - test_github_close_low_quality_prs.py: replaced 'skip drafts' test with 'closes drafts when score is low' + 'closes brand-new PR when min_age=0' + 'no skip when min_age=0'. The 'skip too young' assertion is preserved as opt-in. - test_github_triage_with_llm.py: 6 new TestTriageOrchestration cases for reconsider mode (skip-not-closed on open, reopen on pass, still-failing comment on fail, linked-issue short-circuit reopen, skip internal author in reconsider, reopen-issue on pass) + a new TestCloseCommentText class that pins the user-facing 'open a new PR' + '@agent-shin reconsider' wording. - test_github_triage_workflows.py: added triage_reconsider.yml to the destructive-gate guardrail table; AGENT_SHIN_ENABLED is its own destructive gate (no separate per-run flag needed). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

Adds regression tests covering the bugbot high-severity finding that str.format() would crash on user-supplied content containing { or }. Empirically str.format() does NOT re-parse interpolated values — only the template literal is scanned for replacement fields — so the bug does not exist in the current code, but pinning the safe behavior prevents a future templating change from silently reintroducing it. Also pins the dedented prompt shape (no leading 8-space indentation on template lines) so a future change to the build_*_prompt functions can't silently regress the LLM judge prompt format on multi-line bodies. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

@agent-shin

Address three Greptile/veria-ai concerns on the @agent-shin reconsider flow: 1. **Reconsider had no dry-run path.** The previous reconsider mode ignored `--close` and always posted comments + reopened on a pass. A local operator running `python triage_with_llm.py --reconsider --pr N` would silently take destructive GitHub actions with no way to preview. Reconsider now honors `close=False` the same way regular triage does and returns `would-reopen` / `would-reconsider-still-failing` for step-summary rendering. 2. **Reconsider could reopen maintainer-closed PRs/issues** (Medium security finding from veria-ai). The workflow only checked that the commenter was authorized — it did NOT check that the most recent close was performed by Agent Shin. A contributor could comment `@agent-shin reconsider` on a PR a maintainer closed for non-rubric reasons (duplicate, security report, design rejection) and have the bot reopen it. Add `was_closed_by_agent_shin()` which inspects the issue events API for the most recent `closed` actor and only permits reopen when that actor matches the configured bot login (default `github-actions[bot]`, overridable via env). Fail-closed on missing events. 3. **No rate-limiting on the reconsider trigger.** Every `@agent-shin reconsider` comment burns CI minutes + an OpenAI API call. Add a 10-minute cooldown via `seconds_since_last_reconsider_verdict()` which greps the issue's comment list for the bot's own verdict marker (``). Inside the window the triage returns `skip-rate-limited` and the LLM never runs. Workflow update: - `triage_reconsider.yml` now passes `--close` only when `AGENT_SHIN_ENABLED=true`, matching the pattern of `triage_pr_with_llm.yml`. The script runs in both states so the verdict still appears in the step summary for QA. Tests: - Add 5 reconsider safety tests: dry-run for pass / fail / linked-issue short-circuit, bot-closed-guard refusal on maintainer close, rate-limit refusal inside the cooldown window, and cooldown-elapsed acceptance. - Add unit tests for `was_closed_by_agent_shin` (bot / maintainer / missing actor / env-override) and `seconds_since_last_reconsider_verdict` (no marker / multiple markers / non-bot comment with marker / bot comment without marker). - Pin the `` marker in both reopen and still-failing comments — dropping it would silently break the cooldown. Existing reconsider tests updated to pass `close=True` (the production path now) + stub the new guards via `_stub_reconsider_guards`. 112 tests pass (was 93). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

…close bypass - Add a 24-hour grace window between the first low-quality detection and the actual auto-close. The first detection posts a warning comment that explicitly says "You have 1 day to address this before this PR is auto-closed" and points the contributor at: * `@agent-shin reconsider` to request another look (and re-open) * `@greptileai` to request a fresh Greptile review — works even after the PR is closed - Both `triage_with_llm.py` (LLM judge) and `close_low_quality_prs.py` (Greptile-score closer) share the same `` HTML marker so a warning posted by either path is recognized by both. - Add IMMEDIATE_CLOSE_LOGINS = {swiftwinds} to bypass BOTH the grace period AND the dry-run / AGENT_SHIN_ENABLED gating. SwiftWinds is the user's personal account (no push permissions to litellm) used to dogfood the bot; user explicitly asked: "For SwiftWinds, just close immediately. Faster iteration that way." - Update the standard close comments to mention that `@greptileai` works even after the PR is closed. - Add 23 new tests covering: warn-grace on first detection, skip during grace window, close after grace expires, SwiftWinds bypass (case insensitive, with close=False, no random-login false positives), the grace-warning text invariants, and the SwiftWinds entry in the IMMEDIATE_CLOSE_LOGINS constant. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

For PRs from IMMEDIATE_CLOSE_LOGINS (e.g. swiftwinds), evaluate_pr returns 'close' immediately without ever posting a grace warning, so the close comment should not reference a 1-day grace period. Make close_pr take a grace_period_elapsed flag, default True, and pass False from the main loop when the close path was the immediate-close branch. Co-authored-by: Yassin Kortam <yassin@berri.ai>

IMMEDIATE_CLOSE_LOGINS PRs are closed even when the global --close flag is not set, but the summary used the global dry-run flag to choose between 'would close' and 'closed'. Split the count so operators can see both actual closures and dry-run would-be closures. Co-authored-by: Yassin Kortam <yassin@berri.ai>

Brings the Agent Shin OSS-triage scripts, workflows, issue/PR templates, and tests from PR #28117 onto this branch so the new review-gate feature and its end-to-end demo are self-contained and runnable in CI. https://claude.ai/code/session_01XyyWa8t2VYmoGd6mKMEqkZ

@agent-shin

Adds review_gate(), a state machine that keeps a `ready for review` label in sync with whether an external PR clears BOTH gates — the LLM rubric and Greptile's most recent confidence score: - pass (untagged) -> add label + "ready for review" / "all clear" comment - pass (already tagged) -> no-op (idempotent across re-runs) - regress (Greptile < 4/5 or QA proof removed) -> remove label + "what's missing" comment, PR stays open - recover after a regression -> "all clear again" comment + re-add the label - fail & untagged, < 24h old -> one-time "what's missing" notice (grace window) - fail & untagged, > 24h old -> close + comment (reopen via @agent-shin reconsider) The label itself is the persisted state, so comments fire only on transitions (never on every scheduled run). All side effects are gated behind --close, so the dry-run contract matches the existing triage flow. Lifecycle comments use hidden HTML markers and deliberately avoid the auto-close marker so they never trip the reconsider provenance check. Relocates the shared Greptile helpers (extract_greptile_score, SCORE_PATTERN, GREPTILE_BOT_LOGINS, parse_iso8601) into triage_with_llm.py so the daily sweep and the review gate read the score through one implementation, and adds the review_gate.yml workflow (dry-run unless AGENT_SHIN_ENABLED=true) plus 18 unit tests covering every branch and a full pass->regress->recover cycle. https://claude.ai/code/session_01XyyWa8t2VYmoGd6mKMEqkZ

…eview_gate feature applied in next commit

Adds the "ready for review" label lifecycle (originally PR #28758) on top of #28147's refactored triage_with_llm.py. The original commit was authored against an older snapshot of #28117 and could not be applied cleanly, so the additions were re-applied surgically: - New constants: READY_FOR_REVIEW_LABEL, DEFAULT_GRACE_DAYS, DEFAULT_MIN_GREPTILE_SCORE, READY/REGRESSED/WITHIN_GRACE markers, GREPTILE_BOT_LOGINS, SCORE_PATTERN, AGENT_SHIN_AUTO_CLOSE_MARKER. - New helpers: add_label, remove_label, extract_greptile_score, parse_iso8601 (the latter two mirrored from close_low_quality_prs.py so the daily sweep and the review gate read the score through the same logic). - New comment formatters: format_ready_for_review_comment, format_all_clear_comment, format_regression_comment, format_within_grace_comment. - New entry point: review_gate() implementing the pass/regress/recover state machine, with the label itself acting as persisted state so transition comments fire only on actual transitions. - main() learns --review-gate, --grace-days, --min-greptile-score and dispatches to review_gate() when the flag is set. Verified via tests/test_litellm/test_github_review_gate.py (18 tests) and the existing triage suites (144 more) — all 162 pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

CLAassistant · 2026-05-25T01:01:45Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 4 committers have signed the CLA.

✅ mateo-berri
❌ claude
❌ cursoragent
❌ Mateo

Mateo seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov · 2026-05-25T01:04:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-25T01:09:24Z

Greptile Summary

This PR combines three branches into a single Agent Shin automation: an LLM-as-judge triage bot that auto-closes low-quality external PRs, a @agent-shin reconsider comment-trigger reopen flow, and a ready for review label state machine (review_gate) that tags passing PRs and removes the tag on regression.

triage_with_llm.py is the core orchestration script (~1 500 lines) implementing triage(), review_gate(), reconsider logic, grace periods, and all GitHub write helpers; most new workflows invoke it with different flag combinations.
close_low_quality_prs.py is a separate daily cron script that sweeps all open PRs and closes those where Greptile scored below the threshold, with its own grace-warning flow.
Five new workflows wire up the triggers (pull_request_target, issue_comment, daily cron) with explicit AGENT_SHIN_ENABLED dry-run gating; 162 mock-only tests are added in tests/test_litellm/.

Confidence Score: 3/5

The workflows contain two issues that can produce real GitHub write operations (PR closures, comments) before the team intends to enable the bot.

In triage_pr_with_llm.yml, OPENAI_API_KEY is always injected into the step regardless of AGENT_SHIN_ENABLED, causing live LLM calls for every inbound external PR the moment the secret is configured. More critically, the IMMEDIATE_CLOSE_LOGINS bypass in triage() ignores the --close flag entirely — the workflow strips --close for pull_request_target events and prints 'No PRs will be closed', but PRs from swiftwinds are still commented on and closed even when the bot is nominally disabled. The review_gate.yml handles the key guard correctly and is otherwise sound. The test suite is thorough and all mock-based.

.github/workflows/triage_pr_with_llm.yml (unconditional API key) and .github/scripts/triage_with_llm.py lines 1383–1406 (IMMEDIATE_CLOSE_LOGINS bypass ignores --close).

Important Files Changed

Filename	Overview
.github/scripts/triage_with_llm.py	New 1545-line Agent Shin orchestration script: LLM-judge triage, reconsider flow, and review_gate() state machine. Contains the IMMEDIATE_CLOSE_LOGINS bypass that ignores --close for swiftwinds, a dead AGENT_SHIN_AUTO_CLOSE_MARKER constant, unfiltered _has_marker() author checks, and a grace-period approach in review_gate() that differs from triage() in ways that close PRs without prior warning on first deployment.
.github/workflows/triage_pr_with_llm.yml	New PR triage workflow using pull_request_target. Exposes OPENAI_API_KEY unconditionally (unlike review_gate.yml which guards the key behind AGENT_SHIN_ENABLED), causing LLM calls even when the bot is supposed to be in dry-run mode.
.github/workflows/review_gate.yml	New review gate workflow (pull_request_target + daily cron). Correctly guards OPENAI_API_KEY behind AGENT_SHIN_ENABLED. DO_CLOSE logic is sound. No significant issues.
.github/scripts/close_low_quality_prs.py	New 645-line daily cron closer. Uses time-since-warning approach for grace period (always warns before closing). IMMEDIATE_CLOSE_LOGINS bypass is in evaluate_pr() but does not bypass dry_run flag at the caller level, so behavior is consistent.
.github/workflows/triage_reconsider.yml	New reconsider workflow triggered by issue_comment. Authorization check (commenter is author or internal collaborator) is correctly gated before any destructive steps.
tests/test_litellm/test_github_review_gate.py	18 new unit tests for review_gate() state machine. All use mocks; no real network calls. Coverage is thorough for labeled/unlabeled/within-grace/past-grace paths.
tests/test_litellm/test_github_triage_with_llm.py	Large test file for triage(), reconsider, grace period, and IMMEDIATE_CLOSE_LOGINS behavior. All mock-based. Explicitly validates swiftwinds bypass as intentional design.
tests/test_litellm/test_github_close_low_quality_prs.py	New mock-only tests for the daily closer. No real network calls. Coverage looks solid for grace period, immediate close, and internal author exemption paths.

_{Reviews (1): Last reviewed commit: "Port review-gate feature from #28758 ont..." | Re-trigger Greptile}

greptile-apps · 2026-05-25T01:09:27Z

+    # accounts (`is_immediate`) ignore `--close` so a workflow that
+    # forces dry-run for the global population can still take real
+    # action on those test accounts.
+    if not close and not is_immediate:
+        return {**base_result, "action": "would-close", "verdict": verdict}
+


IMMEDIATE_CLOSE_LOGINS bypass fires even when AGENT_SHIN_ENABLED is not set

triage() closes and comments on swiftwinds PRs regardless of whether --close was passed (if not close and not is_immediate: falls through when is_immediate=True). The triage_pr_with_llm.yml workflow always strips --close for pull_request_target events and prints ::notice:: No comments will be posted; no PRs will be closed — but that guarantee is violated for any login in IMMEDIATE_CLOSE_LOGINS. An operator who has not yet set AGENT_SHIN_ENABLED=true and believes the bot is fully inert will still see live PR closures and comment posts for those accounts.

greptile-apps · 2026-05-25T01:09:28Z

+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          OPENAI_BASE_URL: ${{ vars.OPENAI_BASE_URL }}


OPENAI_API_KEY exposed unconditionally, burning API budget in dry-run mode

OPENAI_API_KEY is always injected into the step, so the LLM judge is called for every external PR that lacks a linked-issue short-circuit, even when AGENT_SHIN_ENABLED is not set. The sister workflow review_gate.yml guards this correctly with (vars.AGENT_SHIN_ENABLED == 'true' || github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY || '', passing an empty string when the bot is disabled. Without a similar guard here, a high-volume repo in dry-run mode will silently consume LLM quota on every opened/reopened PR.

Suggested change

OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

OPENAI_BASE_URL: ${{ vars.OPENAI_BASE_URL }}

GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

OPENAI_API_KEY: ${{ (vars.AGENT_SHIN_ENABLED == 'true' || github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY || '' }}

greptile-apps · 2026-05-25T01:09:29Z

+
+# Marker phrase Agent Shin always includes in its auto-close comments
+# (see `format_pr_close_comment` / `format_issue_close_comment`). The
+# review-gate close path stamps the same marker so reconsider can recognize
+# the closure as a bot close. Keep the marker in sync with the literal text
+# in those formatter functions.
+AGENT_SHIN_AUTO_CLOSE_MARKER = "I'm **Agent Shin**"
+
+# Model families that require `reasoning_effort` to be set, and that reject


AGENT_SHIN_AUTO_CLOSE_MARKER is defined but never used in code logic

The constant AGENT_SHIN_AUTO_CLOSE_MARKER = "I'm **Agent Shin**" is defined with a comment saying "reconsider can recognize the closure as a bot close". However, was_closed_by_agent_shin() uses fetch_last_close_actor() (GitHub close-event actor) to detect bot closures — it never inspects comment text for this marker. The format functions also hardcode the string literal instead of referencing the constant. This creates a false sense that keeping the string in sync matters for correctness, and the constant will diverge silently from the formatter functions over time.

greptile-apps · 2026-05-25T01:09:30Z

+def _has_marker(comments: Iterable[dict], marker: str) -> bool:
+    return any(marker in (comment.get("body") or "") for comment in comments)


_has_marker does not filter by comment author — user-injected HTML can suppress bot notifications

_has_marker(comments, marker) iterates all comments regardless of who wrote them. A contributor can paste  or  in their own PR description or a comment, causing review_gate() to either skip the within-grace notification entirely (so the user misses the courtesy warning before closure) or always post "all clear again" instead of "ready for review" on the first pass. Both effects are benign in practice, but deduplicating on bot-authored comments would make the state machine robust to accidental or deliberate injection.

greptile-apps · 2026-05-25T01:09:31Z

+    # Not passing and not tagged: close if past the grace window, else notify once.
+    if age_days is not None and age_days >= grace_days:
+        comment = format_pr_close_comment({**verdict, "missing": missing})
+        if not close:
+            return {**base_result, "action": "would-close", "comment": comment}
+        post_comment(repo, number, comment)
+        close_pr(repo, number)
+        return {**base_result, "action": "closed", "comment": comment}


review_gate() closes PRs older than grace_days without any prior warning on first encounter

Unlike triage(), which always posts a GRACE_COMMENT_MARKER warning on the first low-quality detection and only closes on a subsequent run after GRACE_PERIOD_SECONDS has elapsed, review_gate() closes any PR whose creation age exceeds grace_days (1 day) immediately — regardless of whether a warning was ever posted. On initial workflow deployment, the daily cron sweep will encounter all existing open external PRs that are older than 1 day and close them without any advance notice. close_low_quality_prs.py uses the time-since-warning approach and avoids this problem.

veria-ai · 2026-05-25T01:15:52Z

PR overview

The PR has worked down several findings, with 3 issues already addressed, but a few low-impact workflow abuse paths remain open. The remaining risks are operational rather than data-compromise risks: external users may be able to trigger unnecessary LLM/API spend, and a reconsider command may reopen items closed by other GitHub Actions because closures share the same bot actor. Overall security posture is improving, but these workflow guards should be tightened before merging.

Open issues (3)

.github/scripts/triage_with_llm.py:326 — Low: Reconsider can reopen items closed by other GitHub Actions An external issue/PR author can comment @agent-shin reconsider and reo
.github/workflows/review_gate.yml:20 — Low: Repeated LLM spend from fork PR updates When AGENT_SHIN_ENABLED is true, every synchronize event on a fork PR gets `OPENAI_API_
.github/workflows/triage_issue_with_llm.yml:58 — Low: LLM spend on automatic dry-run events When AGENT_SHIN_ENABLED is true, automatic issues events receive OPENAI_API_KEY, then l

Fixed/addressed: 3 · PR risk: 5/10

…ve label match Co-authored-by: Yassin Kortam <yassin@berri.ai>

cursor · 2026-05-25T02:25:24Z

+
+on:
+  pull_request_target:
+    types: [opened, reopened]


PR triage workflow duplicates review gate LLM calls

Low Severity

Both triage_pr_with_llm.yml and review_gate.yml fire on the same pull_request_target events (opened, reopened). When AGENT_SHIN_ENABLED=true, both workflows call the LLM judge for the same PR on the same event. The triage workflow always strips --close on pull_request_target, making its automatic trigger purely observational — its entire function is already performed by the review gate. This results in paying for two independent LLM evaluations of every opened/reopened PR while only one produces any action.

Additional Locations (1)

.github/workflows/review_gate.yml#L23-L25

^{Reviewed by Cursor Bugbot for commit 19dfdd4. Configure here.}

cursor · 2026-05-25T02:25:24Z

+            return {**base_result, "action": "would-close", "comment": comment}
+        post_comment(repo, number, comment)
+        close_pr(repo, number)
+        return {**base_result, "action": "closed", "comment": comment}


Review gate omits IMMEDIATE_CLOSE_LOGINS grace bypass

Low Severity

The review_gate() function's close path uses only PR age (age_days >= grace_days) to decide whether to close, without checking IMMEDIATE_CLOSE_LOGINS. In contrast, triage() explicitly checks login.lower() in IMMEDIATE_CLOSE_LOGINS to bypass grace entirely. The PR documentation states these logins "bypass BOTH the grace period AND the dry-run gating," but the review gate still subjects them to the full age-based grace window. A dogfood account opening a fresh PR would receive a "within-grace" notice from the review gate instead of being closed immediately.

Additional Locations (1)

.github/scripts/triage_with_llm.py#L1331-L1334

^{Reviewed by Cursor Bugbot for commit 19dfdd4. Configure here.}

cursor · 2026-05-25T02:25:24Z

+# the closure as a bot close. Keep the marker in sync with the literal text
+# in those formatter functions.
+AGENT_SHIN_AUTO_CLOSE_MARKER = "I'm **Agent Shin**"
+


Unused constant misleadingly documented for reconsider detection

Low Severity

AGENT_SHIN_AUTO_CLOSE_MARKER is documented as enabling reconsider to "recognize the closure as a bot close," but was_closed_by_agent_shin() actually uses GitHub event-actor identity checking (not comment-body matching) for that purpose. The constant is never used for operational detection anywhere — only in a test assertion to pin comment text. The misleading documentation could cause a future developer to rely on comment-body matching that doesn't exist.

^{Reviewed by Cursor Bugbot for commit 19dfdd4. Configure here.}

@agent-shin

Adds a rollout-day workflow that comments on every open external PR/issue that the new triage bot WOULD auto-close, giving contributors 7 days to fix their description before any destructive action runs. Why now: merging this PR enables Agent Shin in dry-run. The follow-up "enact" PR (next Monday) flips the destructive paths on. Without this heads-up, contributors would get a close-comment on day 8 with no prior warning. The heads-up names the cutoff date, lists the rubric, calls out each PR/issue's specific missing pieces, and explains the recovery paths (@agent-shin reconsider for PRs, edit + reopen for issues). Files - .github/scripts/_agent_shin_actions.py — thin maybe_post_comment / maybe_close_* / maybe_add_label / etc. wrappers. Each is a single `if dry_run: log; return; else: call_through()` so a dry-run preview differs from the real run in exactly one call site per mutation. The call-through goes via `triage_with_llm.<name>` (module-qualified) so monkeypatching the underlying function in tests is reflected here. - .github/scripts/triage_rollout_heads_up.py — the sweep. Iterates every open PR + issue via `gh pr list` / `gh issue list`, runs the future rubric (review_gate for PRs, triage(kind="issue") for issues), and posts the heads-up on any item that would be auto-closed. Idempotent via a `` marker. Defaults to dry- run; --close opts in to real posts. --close-on overrides the cutoff date (defaults to today + 7 days). - .github/workflows/triage_rollout_heads_up.yml — one-shot workflow. Triggers on push to litellm_internal_staging filtered to the script path (fires on rollout merge) plus workflow_dispatch with a dry_run input that defaults to "true" for safe manual re-runs. - tests/test_litellm/test_triage_rollout_heads_up.py — 28 unit tests covering: the dry-run wrappers (each maybe_* gates correctly), the _would_be_closed predicate for PR vs. issue results, the comment formatter (cutoff/rubric/marker/recovery wording), per-item dispatch (skip-not-open, skip-internal-author, skip-already-notified, skip-passing, would-post/posted), and the sweep loop end-to-end. Local preview (no GitHub mutations): python3 .github/scripts/triage_rollout_heads_up.py --repo BerriAI/litellm Real run (what the workflow does): python3 .github/scripts/triage_rollout_heads_up.py --repo BerriAI/litellm --close TODO: replace the placeholder ROLLOUT_BLOG_URL with the canonical docs URL once the litellm-docs PR ships. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@agent-shin

…appers - Mirror sibling Agent Shin workflows by only exposing OPENAI_API_KEY in triage_reconsider.yml when vars.AGENT_SHIN_ENABLED == 'true'. Previously the secret was unconditionally exposed, so any PR/issue author could trigger paid LLM calls by commenting '@agent-shin reconsider' even while the bot was supposed to be in dry-run. - Remove the six unused dry-run wrappers (maybe_close_pr, maybe_close_issue, maybe_reopen_pr, maybe_reopen_issue, maybe_add_label, maybe_remove_label) from _agent_shin_actions.py — only maybe_post_comment is used by rollout scripts. Drop the associated tests that exercised the now-removed functions. Co-authored-by: Yassin Kortam <yassin@berri.ai>

veria-ai · 2026-05-26T04:03:32Z

+    actor = fetch_last_close_actor(repo, number)
+    if not actor:
+        return False
+    return actor.lower() == expected


Low: Reconsider can reopen items closed by other GitHub Actions

An external issue/PR author can comment @agent-shin reconsider and reopen an item closed by another workflow, such as stale or duplicate handling, because closures performed with GITHUB_TOKEN all have the same github-actions[bot] actor. Stamp Agent Shin close comments with a dedicated hidden marker and require the latest close to be accompanied by that marker before allowing the reopen path.

- triage_rollout_heads_up.py: replace %-d strftime specifier (GNU-only) with portable day formatting so the script doesn't crash on Windows. - close_low_quality_prs.py: skip malformed JSON lines in fetch_pr_comments instead of letting one bad line abort the daily sweep, matching the pattern in triage_with_llm._iter_paginated_json. - triage_with_llm.py: move has_linked_issue short-circuit before build_pr_prompt to avoid unnecessary prompt construction on PRs that link an issue. Co-authored-by: Yassin Kortam <yassin@berri.ai>

…e_low_quality_prs - Wrap per-PR processing in try/except so a transient GitHub API failure on one PR no longer aborts the entire daily sweep (mirrors the pattern already used in triage_rollout_heads_up.py). - Have --limit bound *all* destructive write actions (closures and grace warnings combined), not just closures. Prevents a backlog of newly failing PRs from flooding contributors with comments in a single run. Co-authored-by: Yassin Kortam <yassin@berri.ai>

cursor · 2026-05-26T04:32:51Z

+        passing = result.get("passing")
+        if passing is None:
+            return False  # skipped — nothing for the heads-up to warn about
+        return passing is False


Heads-up script misidentifies regressed PRs as closable

Low Severity

_would_be_closed returns True for any PR result where passing is False, including the regressed-already-notified and would-remove-label actions from review_gate. Those PRs are explicitly promised to stay open (the regression comment says "The PR stays open"), yet the heads-up script would post a warning claiming they'll be auto-closed. The function conflates "not currently passing" with "will be auto-closed."

Additional Locations (1)

.github/scripts/triage_with_llm.py#L1106-L1108

^{Reviewed by Cursor Bugbot for commit 09720a7. Configure here.}

Both bulk-sweep scripts hardcoded `gh {pr,issue} list --limit 1000`, and gh lists newest-first — so the OLDEST ~900 PRs and ~380 issues were silently dropped. That's exactly the stale backlog the daily closer and one-shot rollout heads-up exist to catch. Extract a single `list_open_items(kind, *, repo, fields)` helper into `agent_shin_shared.py` with `GH_LIST_ALL_LIMIT = 100_000` — a ceiling far above any realistic open backlog so gh paginates until the queue is exhausted. `fetch_open_prs` and `_list_open_numbers` both delegate to it, so the limit lives in exactly one place going forward. Verified live against BerriAI/litellm: - `fetch_open_prs` -> 1981 PRs (was 1000) - `_list_open_numbers(issue)` -> 1382 issues (was 1000) - `_list_open_numbers(pr)` -> 1981 PRs (was 1000) Adds 7 regression tests asserting the new limit is passed, the dedicated `gh {pr,issue} list` command + fields are used per kind, bad kind raises ValueError, and both callers delegate to the shared helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The PR rubric previously passed any PR with a linked issue, regardless of whether it showed the fix actually working. Sample spot-check found 21/25 recent external PRs passing, including ones that linked an issue but provided zero QA evidence. Tighten the rubric so a pass now requires BOTH: (1) CONTEXT — a linked issue OR a clear problem description with expected-vs-actual behavior. (2) END-TO-END QA PROOF — at least one of: (a) screenshot(s) of the fix working, (b) screen recording / video, (c) specific commands actually run, paired with their real output, against the real system. Mocked unit tests, generic 'I tested it' claims, 'all tests pass' without output, and the linked issue itself are explicitly excluded from QA proof. Also add 'qa_proof_type' to the JSON schema so the per-PR report surfaces which kind of proof (or 'none') the judge saw. Re-sample on the same 25 recent external PRs shifts the verdict distribution from 21 pass / 4 fail to 4 pass / 21 fail, with zero prior-fails now passing — the stricter rule catches PRs that ship only with unit-test claims and no real integration evidence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…comment Adds "What's this and why am I getting it?" links to docs.litellm.ai/blog/ agent-shin-triage from the four comments contributors actually read when something went wrong: PR close, PR grace warning, issue close, issue grace warning. PR comments also link the rubric section directly from the QA-proof bullet so contributors can self-serve "what counts as proof" without pinging a maintainer. Pins the new guarantees in tests: blog link must appear in all four comments, and the PR close comment must continue to flag mocked-dependency unit tests as insufficient proof. The linked blog post is in BerriAI/litellm-docs PR #240; the URL will 404 until that lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…IST_ALL_LIMIT gh lists newest-first, so capping at 1000 silently drops the oldest open PRs — exactly the stale ones the daily sweep is meant to reconcile. Use the same ceiling as agent_shin_shared.GH_LIST_ALL_LIMIT so the workflow sees the entire backlog. Co-authored-by: Yassin Kortam <yassin@berri.ai>

- review_gate: expire the regression-marker short-circuit after grace_days so PRs that were regressed and then abandoned can eventually be closed. - review_gate: when the rubric short-circuits to pass via the linked-issue regex but Greptile drags the PR below the bar, replace the synthetic 'LLM was not called' explanation with the real Greptile shortfall so regression / close comments are not misleading. - triage_rollout_heads_up._comments_have_marker: drop the unused 'kind' parameter and filter by bot author so a contributor quoting the heads-up via 'Quote reply' cannot trick the idempotency check, matching the pattern in triage_with_llm._has_marker. Co-authored-by: Yassin Kortam <yassin@berri.ai>

veria-ai · 2026-05-28T04:29:49Z

+# Manual dry-run:     gh workflow run "Agent Shin — review gate" -f close=false
+#
+# We use `pull_request_target` so the workflow can read repo secrets and run
+# against fork PRs. Fork code is never checked out — only PR metadata is read


Low: Repeated LLM spend from fork PR updates

When AGENT_SHIN_ENABLED is true, every synchronize event on a fork PR gets OPENAI_API_KEY and runs the review gate, so an external PR author can burn LLM/API budget by repeatedly pushing commits. Consider removing synchronize here and relying on the scheduled/manual runs for re-checks, or add a per-PR cooldown before exposing the LLM key.

Suggested change

# against fork PRs. Fork code is never checked out — only PR metadata is read

types: [opened, reopened, ready_for_review]

Co-authored-by: Yassin Kortam <yassin@berri.ai>

veria-ai · 2026-05-28T04:41:57Z

+          # The Python script calls the LLM whenever this var is set
+          # (regardless of `--close`); stripping `--close` doesn't suppress
+          # the API call, only the destructive side effects.
+          OPENAI_API_KEY: ${{ (vars.AGENT_SHIN_ENABLED == 'true' || github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY || '' }}


Low: LLM spend on automatic dry-run events

When AGENT_SHIN_ENABLED is true, automatic issues events receive OPENAI_API_KEY, then lines 84-96 force dry-run but still execute the Python script; that script calls the LLM whenever the key is set. An external GitHub user can open or reopen issues to trigger paid LLM calls without any bot action. The same pattern exists in .github/workflows/triage_pr_with_llm.yml; keep the key empty for automatic dry-run triggers, or only call the LLM on workflows that can actually act.

Suggested change

OPENAI_API_KEY: ${{ (vars.AGENT_SHIN_ENABLED == 'true' || github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY || '' }}

OPENAI_API_KEY: ${{ github.event_name == 'workflow_dispatch' && secrets.OPENAI_API_KEY || '' }}

cursor

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

There are 5 total unresolved issues (including 4 from previous reviews).

^{Reviewed by Cursor Bugbot for commit dd02b76. Configure here.}

cursor · 2026-05-28T04:47:00Z

+                    "action": "reopened",
+                    "comment": reopen_body,
+                }
+            return base


Reconsider reopens PRs the review gate immediately re-closes

Medium Severity

The triage function's linked-issue short-circuit in reconsider mode (has_linked_issue → pass → reopen) never checks the Greptile confidence score, while review_gate requires both a rubric pass and Greptile ≥ 4/5. When the review gate closes a PR because Greptile scored it below the threshold (even though it has a linked issue), the reconsider flow reopens it based solely on the linked issue. The reopened event then re-triggers the review gate, which immediately closes the PR again — producing contradictory comments and a frustrating reopen-close loop bounded only by the 10-minute rate limiter.

Additional Locations (1)

.github/scripts/triage_with_llm.py#L1104-L1110

^{Reviewed by Cursor Bugbot for commit dd02b76. Configure here.}

@agent-shin

…you got right' section, softer 'park this for later' framing User feedback on the auto-triage comments contributors will see: 1. Tone — the previous 'You have 1 day to address this before this PR is auto-closed' framing reads as an ultimatum. Replace with: 'If the description isn't updated in the next 1 day, I'll auto-close this PR. That's not us saying we don't care about the change — we want the open-PR list to mirror what a maintainer can act on right now, so contributors don't get lost in a backlog. A closed PR is a soft "park this for later," not a rejection. Take your time.' 2. Positive feedback — the previous comments only listed what was missing. Now every close + grace-warning comment opens with a 'What you got right:' section rendered from the judge's per-field flags. Contributors see a checkmark for everything they got right (linked issue, problem description, expected/actual, QA proof for PRs; runnable repro, screenshot/log, expected/actual, motivation+example for issues) before the gaps. The block is omitted entirely when nothing is present so we never render 'What you got right: (nothing).' 3. Reconsider trigger — the previous grace warning told contributors to comment '@agent-shin reconsider' during the grace window. They don't need to — the bot re-checks on every sweep. The new copy says 'just update the description, no need to ping me' for the grace path, and reserves '@agent-shin reconsider' for the post-close recovery path. 4. Bullet-train emoji — replace 👋 with 🚄 (Shinkansen, the symbol of Agent Shin) across every action-required comment: PR close, PR grace warning, issue close, issue grace warning, within-grace, Greptile- closer grace warning, rollout heads-up. Pinned in tests so a future refactor can't silently revert. 5. Greptile-post-close — the @greptileai bullet now explicitly says 'a low Greptile score isn't a blocker either,' since the previous copy buried the fact that @greptileai works after auto-close. Comment templates updated: format_pr_close_comment, format_issue_close_comment, format_grace_warning_pr_comment, format_grace_warning_issue_comment, format_within_grace_comment (triage_with_llm.py); format_grace_warning_comment (close_low_quality_prs.py); format_heads_up_comment header (triage_rollout_heads_up.py). New helpers: _format_present_for_pr / _format_present_for_issue / _format_present_block, driven off the existing per-field flags the LLM judge already emits — no prompt change needed. New tests pin: bullet-train emoji in every action-required comment; 'What you got right' appears with ✅ bullets when fields are present; the block is omitted when no fields are present; 'park this for later' / 'not a rejection' softer framing; grace warnings tell the contributor 'no need to ping' during the grace window (reconsider is the post-close path only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mateo-berri · 2026-06-14T21:52:20Z

Superseded by #30433 — recreated on the litellm_-prefixed branch litellm_agent_shin_review_gate with a clearer title and description. Same content, same base (litellm_internal_staging).

cursoragent and others added 22 commits May 17, 2026 09:24

fix(workflows): scheduled cron closes PRs; safe --close strip in triage

4961bf4

Co-authored-by: Yassin Kortam <yassin@berri.ai>

Merge PR #28117 + #28147 (Agent Shin auto-close + reconsider safety)

7fb1ec6

Merge PR #28758 — keep #28147 versions of overlapping triage files; r…

b7ed45e

…eview_gate feature applied in next commit

mateo-berri requested a review from a team May 25, 2026 01:01

greptile-apps Bot reviewed May 25, 2026

View reviewed changes

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread .github/scripts/triage_with_llm.py Outdated

Comment thread tests/test_litellm/test_github_triage_workflows.py

veria-ai Bot reviewed May 25, 2026

View reviewed changes

Comment thread .github/workflows/triage_pr_with_llm.yml Outdated

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread .github/scripts/triage_with_llm.py

Comment thread .github/scripts/triage_with_llm.py Outdated

agent_shin: fix review_gate close-after-regression and case-insensiti…

19dfdd4

…ve label match Co-authored-by: Yassin Kortam <yassin@berri.ai>

cursor Bot reviewed May 25, 2026

View reviewed changes

yuneng-berri approved these changes May 26, 2026

View reviewed changes

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread .github/scripts/_agent_shin_actions.py Outdated

Comment thread .github/workflows/triage_reconsider.yml Outdated

mateo-berri mentioned this pull request May 26, 2026

Agent Shin rollout: day-7 enactment sweep + flip AGENT_SHIN_ENABLED to live-by-default #28834

Open

4 tasks

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread .github/scripts/triage_rollout_heads_up.py Outdated

Comment thread .github/scripts/triage_with_llm.py

Comment thread .github/scripts/close_low_quality_prs.py Outdated

veria-ai Bot reviewed May 26, 2026

View reviewed changes

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread .github/scripts/close_low_quality_prs.py Outdated

Comment thread .github/scripts/close_low_quality_prs.py Outdated

cursor Bot reviewed May 26, 2026

View reviewed changes

mateo-berri and others added 3 commits May 28, 2026 03:04

cursor Bot reviewed May 28, 2026

View reviewed changes

Comment thread .github/workflows/review_gate.yml Outdated

cursor Bot reviewed May 28, 2026

View reviewed changes

Comment thread .github/scripts/triage_with_llm.py Outdated

Comment thread .github/scripts/triage_with_llm.py

Comment thread .github/scripts/triage_rollout_heads_up.py

veria-ai Bot reviewed May 28, 2026

View reviewed changes

cursor Bot reviewed May 28, 2026

View reviewed changes

Comment thread .github/scripts/triage_with_llm.py Outdated

fix: pass min_greptile_score through to ready-for-review comment text

dd02b76

Co-authored-by: Yassin Kortam <yassin@berri.ai>

veria-ai Bot reviewed May 28, 2026

View reviewed changes

cursor Bot reviewed May 28, 2026

View reviewed changes

mateo-berri mentioned this pull request Jun 14, 2026

feat(agent-shin): automated PR/issue triage, low-quality auto-close, and review-gate label lifecycle #30433

Merged

6 tasks

mateo-berri closed this Jun 14, 2026

		OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
		OPENAI_BASE_URL: ${{ vars.OPENAI_BASE_URL }}

		def _has_marker(comments: Iterable[dict], marker: str) -> bool:
		return any(marker in (comment.get("body") or "") for comment in comments)

	# against fork PRs. Fork code is never checked out — only PR metadata is read
	types: [opened, reopened, ready_for_review]

	OPENAI_API_KEY: ${{ (vars.AGENT_SHIN_ENABLED == 'true' \|\| github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY \|\| '' }}
	OPENAI_API_KEY: ${{ github.event_name == 'workflow_dispatch' && secrets.OPENAI_API_KEY \|\| '' }}

Uh oh!

Conversation

mateo-berri commented May 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it was assembled

Test plan

Reviewer notes

Uh oh!

CLAassistant commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 25, 2026

Codecov Report

Uh oh!

greptile-apps Bot commented May 25, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veria-ai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Open issues (3)

Uh oh!

Uh oh!

Uh oh!

cursor Bot May 25, 2026

Choose a reason for hiding this comment

PR triage workflow duplicates review gate LLM calls

Uh oh!

Uh oh!

cursor Bot May 25, 2026

Choose a reason for hiding this comment

Review gate omits IMMEDIATE_CLOSE_LOGINS grace bypass

Uh oh!

cursor Bot May 25, 2026

Choose a reason for hiding this comment

Unused constant misleadingly documented for reconsider detection

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veria-ai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Heads-up script misidentifies regressed PRs as closable

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veria-ai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mateo-berri commented May 25, 2026 •

edited by cursor Bot

Loading

CLAassistant commented May 25, 2026 •

edited

Loading

veria-ai Bot commented May 25, 2026 •

edited

Loading