fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood by mateo-berri · Pull Request #28147 · BerriAI/litellm

mateo-berri · 2026-05-18T05:23:14Z

Fix-on-top for #28117. Combines two layers of changes:

Reconsider safety (original scope) — addresses three concerns Greptile + veria-ai flagged on the @agent-shin reconsider flow.
1-day grace period before close + SwiftWinds dogfood (new scope) — the user explicitly asked for both in chat:

"I want to give contributors a 1 day grace (specify in the comment) to fix their pr before it closes. They can still @ the bot the request another look (and thus a possible re-opening). It should state in the message all this and that also even after the pr is closed @'ing greptileai works fine. Also, turn on full (non dry run) mode with SwiftWinds. Thats my personal account I want to test this PR with for this week. It has no push permissions to litellm so its a perfect example."
"For SwiftWinds, just close immediately. Faster iteration that way."

Part 1 — Reconsider safety (Greptile + veria-ai feedback on #28117)

Greptile review on #28117 (Confidence Score 3/5): "The reconsider mode in triage_with_llm.py ignores --close and makes real write calls unconditionally; an operator running --reconsider locally without the workflow AGENT_SHIN_ENABLED guard would post comments and reopen PRs. The reconsider workflow also has no check that the closure came from the bot, so it can reopen items closed by maintainers for non-quality reasons."

veria-ai review on #28117 (Risk 5/10): "This PR adds GitHub automation that can close and reopen issues/PRs from a write-token workflow. The reconsider path authorizes the original external author but does not verify that the item was previously closed by Agent Shin, so a contributor can make the bot reopen a PR a maintainer closed for another reason."

1a. Reconsider had no dry-run path (Greptile)

Before: --reconsider ignored --close and always posted comments + reopened on pass. After: reconsider honors close=False the same way regular triage does. Returns would-reopen / would-reconsider-still-failing (with the previewed comment body) for step-summary rendering. The workflow only passes --close when AGENT_SHIN_ENABLED=true.

1b. Reconsider could reopen maintainer-closed PRs/issues (Greptile + veria-ai security)

Before: triage_reconsider.yml only checked the commenter — it did NOT check that the most recent close was performed by Agent Shin. A contributor could @agent-shin reconsider on a PR a maintainer closed for non-rubric reasons (duplicate, security report, design rejection) and have the bot reopen it. After: was_closed_by_agent_shin() inspects the issue events API for the most recent closed event's actor and only permits reopen when the actor matches the configured bot login (default github-actions[bot], overridable via AGENT_SHIN_BOT_LOGIN). Fail-closed on missing events. Returns skip-not-bot-closed; the LLM never runs.

1c. No rate-limiting on the reconsider trigger (Greptile)

Before: every @agent-shin reconsider comment burned CI minutes + an OpenAI API call. After: a 10-minute cooldown via seconds_since_last_reconsider_verdict(), which detects the bot's own verdict marker . Inside the cooldown window, triage returns skip-rate-limited and the LLM never runs.

Part 2 — 1-day grace period before close

Behavior

When Agent Shin (LLM judge) or the Greptile-score closer flags an external PR/issue as low-quality, the bot now does NOT close immediately. Instead it posts a grace-period warning comment that explicitly states:

"You have 1 day to address this before this PR is auto-closed."
During the grace window: update the description and comment @agent-shin reconsider to skip the auto-close.
After auto-close: @agent-shin reconsider still re-runs triage and reopens; @greptileai also works even after the PR is closed for a fresh re-review.

The warning comment carries an HTML marker  that both scripts use to coordinate:

Both .github/scripts/triage_with_llm.py (LLM judge) and .github/scripts/close_low_quality_prs.py (Greptile-score closer) detect the marker.
A warning posted by one path is recognized by the other on the next run.
GRACE_PERIOD_SECONDS = 86400 (24h) — defined in both files; tests pin both.

Flow on a real low-quality PR (24h cadence is the daily Greptile cron)

Day	Closer evaluates	Grace marker	Action
1	Greptile 3/5, no warning yet	absent	`warn-grace` → post warning, do NOT close
1.5	Greptile 3/5, warning 12h old	< 24h	`skip-in-grace-period` → no-op
2	Greptile 3/5, warning 25h old	≥ 24h	`close` → post close comment + close PR
2 (alt)	Greptile bumped to 5/5 after fix	≥ 24h	`skip-score-ok` → PR stays open

`evaluate_pr` action surface (close_low_quality_prs.py)

New: warn-grace, skip-in-grace-period. Existing actions (close, skip-too-young, skip-optout-label, skip-internal, skip-no-greptile-score, skip-score-ok) are unchanged.

`triage()` action surface (triage_with_llm.py)

New: warned-grace, would-warn-grace (dry-run preview), skip-in-grace-period. Existing closed / would-close now only fire after the grace window.

Part 3 — SwiftWinds dogfood bypass (`IMMEDIATE_CLOSE_LOGINS`)

The user explicitly named SwiftWinds as their personal account for testing this PR for the week ahead, and explicitly said "for SwiftWinds, just close immediately. Faster iteration that way."

IMMEDIATE_CLOSE_LOGINS = frozenset({"swiftwinds"}) (case-insensitive match) lives in both scripts and bypasses both:

The 1-day grace period — close fires on the first detection.
The dry-run / AGENT_SHIN_ENABLED workflow gating — the script writes (post comment + close PR) for these logins regardless of whether the workflow passed --close.

This is intentional: SwiftWinds has no push permissions to litellm, which makes it a clean dogfood account, and waiting 24h per iteration would kill the testing feedback loop.

The bypass is centralized in the Python scripts — workflows don't need conditional logic. Static workflow tests (test_github_triage_workflows.py) still pin the AGENT_SHIN_ENABLED = "true" gate for the global population.

Files changed

.github/scripts/triage_with_llm.py
- Reconsider safety: was_closed_by_agent_shin(), seconds_since_last_reconsider_verdict(), fetch_last_close_actor(), _iter_paginated_json(); bot-closed + rate-limit guards before the LLM call; honors close=False in reconsider mode; RECONSIDER_COMMENT_MARKER on reopen / still-failing comments.
- Grace period: GRACE_COMMENT_MARKER, GRACE_PERIOD_SECONDS, IMMEDIATE_CLOSE_LOGINS; seconds_since_last_grace_warning(); format_grace_warning_pr_comment() and format_grace_warning_issue_comment(); triage() now posts a warning on the first failing detection and only closes after the window elapses (or for IMMEDIATE_CLOSE_LOGINS); format_pr_close_comment() updated to mention @greptileai works post-close.
- Refactor: shared _seconds_since_latest_marker_comment() helper underneath both reconsider and grace helpers so the marker-iteration logic lives in one place.
.github/scripts/triage_reconsider.yml
- Passes --close only when AGENT_SHIN_ENABLED=true (matches the other Agent Shin workflows); always runs the script so the dry-run verdict appears in the step summary. (Part of original PR scope.)
.github/scripts/close_low_quality_prs.py
- Grace period: GRACE_COMMENT_MARKER, GRACE_PERIOD_SECONDS, IMMEDIATE_CLOSE_LOGINS (mirroring the LLM-judge constants); seconds_since_last_grace_warning() (operates on already-fetched comments, with injectable now for tests); format_grace_warning_comment(); evaluate_pr returns warn-grace / skip-in-grace-period / close based on prior warning age; main() posts the warning via new post_grace_warning() and dispatches per-PR pr_dry_run = dry_run and not is_immediate so SwiftWinds bypasses the global --close gate.
- The standard close comment now mentions @greptileai works even after the PR is closed.
tests/test_litellm/test_github_triage_with_llm.py
- Updates existing close-path tests to use the new _stub_grace_aged_out / _stub_grace_no_warning helpers.
- New tests (15): TestTriageOrchestration::{test_should_post_grace_warning_on_first_failing_run_in_close_mode, test_should_skip_close_inside_grace_window, test_should_dry_run_grace_warning_when_close_false, test_should_skip_grace_for_swiftwinds_login, test_should_close_swiftwinds_even_when_close_flag_false, test_should_match_immediate_close_login_case_insensitively, test_should_not_treat_random_external_login_as_immediate_close}; TestImmediateCloseLoginsConstant; TestGraceWarningCommentText (5 invariants on the user-facing language: 1-day grace, @agent-shin reconsider, @greptileai, "even after the PR is closed", marker presence); TestSecondsSinceLastGraceWarning (3 cases).
tests/test_litellm/test_github_close_low_quality_prs.py
- Updates existing close-path tests: first detection now produces warn-grace, not close.
- New tests (12): grace flow for drafts / brand-new PRs / no-prior-warning, post-grace close after window expires, skip inside window, SwiftWinds bypass + case-insensitivity, TestSecondsSinceLastGraceWarning, TestImmediateCloseLoginsConstant, TestGraceWarningCommentText (1-day, reconsider, @greptileai, marker, post-close mentions).

Test results

$ uv run pytest tests/test_litellm/test_github_triage_with_llm.py \
                tests/test_litellm/test_github_close_low_quality_prs.py \
                tests/test_litellm/test_github_triage_workflows.py
============================= 144 passed in 2.36s =============================

(Was 112 on the previous revision; +32 new tests for the grace path, the SwiftWinds bypass, the new comment language, and the new helper.)

Out of scope

The grace window length (GRACE_PERIOD_SECONDS = 86400) is a constant. If the team wants per-repo tuning we can wire it through --grace-seconds in a follow-up.
IMMEDIATE_CLOSE_LOGINS is hardcoded to {"swiftwinds"} (the user's named test account). Adding more dogfood accounts is a one-line change.
No removal of debug code — none was added; all changes are production-quality.

Type

🐛 Bug Fix
🛡️ Security
✨ Feature

Slack Thread

@agent-shin

Address three Greptile/veria-ai concerns on the @agent-shin reconsider flow: 1. **Reconsider had no dry-run path.** The previous reconsider mode ignored `--close` and always posted comments + reopened on a pass. A local operator running `python triage_with_llm.py --reconsider --pr N` would silently take destructive GitHub actions with no way to preview. Reconsider now honors `close=False` the same way regular triage does and returns `would-reopen` / `would-reconsider-still-failing` for step-summary rendering. 2. **Reconsider could reopen maintainer-closed PRs/issues** (Medium security finding from veria-ai). The workflow only checked that the commenter was authorized — it did NOT check that the most recent close was performed by Agent Shin. A contributor could comment `@agent-shin reconsider` on a PR a maintainer closed for non-rubric reasons (duplicate, security report, design rejection) and have the bot reopen it. Add `was_closed_by_agent_shin()` which inspects the issue events API for the most recent `closed` actor and only permits reopen when that actor matches the configured bot login (default `github-actions[bot]`, overridable via env). Fail-closed on missing events. 3. **No rate-limiting on the reconsider trigger.** Every `@agent-shin reconsider` comment burns CI minutes + an OpenAI API call. Add a 10-minute cooldown via `seconds_since_last_reconsider_verdict()` which greps the issue's comment list for the bot's own verdict marker (``). Inside the window the triage returns `skip-rate-limited` and the LLM never runs. Workflow update: - `triage_reconsider.yml` now passes `--close` only when `AGENT_SHIN_ENABLED=true`, matching the pattern of `triage_pr_with_llm.yml`. The script runs in both states so the verdict still appears in the step summary for QA. Tests: - Add 5 reconsider safety tests: dry-run for pass / fail / linked-issue short-circuit, bot-closed-guard refusal on maintainer close, rate-limit refusal inside the cooldown window, and cooldown-elapsed acceptance. - Add unit tests for `was_closed_by_agent_shin` (bot / maintainer / missing actor / env-override) and `seconds_since_last_reconsider_verdict` (no marker / multiple markers / non-bot comment with marker / bot comment without marker). - Pin the `` marker in both reopen and still-failing comments — dropping it would silently break the cooldown. Existing reconsider tests updated to pass `close=True` (the production path now) + stub the new guards via `_stub_reconsider_guards`. 112 tests pass (was 93). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

CLAassistant · 2026-05-18T05:23:21Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov · 2026-05-18T05:26:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…close bypass - Add a 24-hour grace window between the first low-quality detection and the actual auto-close. The first detection posts a warning comment that explicitly says "You have 1 day to address this before this PR is auto-closed" and points the contributor at: * `@agent-shin reconsider` to request another look (and re-open) * `@greptileai` to request a fresh Greptile review — works even after the PR is closed - Both `triage_with_llm.py` (LLM judge) and `close_low_quality_prs.py` (Greptile-score closer) share the same `` HTML marker so a warning posted by either path is recognized by both. - Add IMMEDIATE_CLOSE_LOGINS = {swiftwinds} to bypass BOTH the grace period AND the dry-run / AGENT_SHIN_ENABLED gating. SwiftWinds is the user's personal account (no push permissions to litellm) used to dogfood the bot; user explicitly asked: "For SwiftWinds, just close immediately. Faster iteration that way." - Update the standard close comments to mention that `@greptileai` works even after the PR is closed. - Add 23 new tests covering: warn-grace on first detection, skip during grace window, close after grace expires, SwiftWinds bypass (case insensitive, with close=False, no random-login false positives), the grace-warning text invariants, and the SwiftWinds entry in the IMMEDIATE_CLOSE_LOGINS constant. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

For PRs from IMMEDIATE_CLOSE_LOGINS (e.g. swiftwinds), evaluate_pr returns 'close' immediately without ever posting a grace warning, so the close comment should not reference a 1-day grace period. Make close_pr take a grace_period_elapsed flag, default True, and pass False from the main loop when the close path was the immediate-close branch. Co-authored-by: Yassin Kortam <yassin@berri.ai>

cursor

Cursor Bugbot has reviewed your changes using high mode and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Summary misreports "would close" when immediate-close PRs were actually closed
- Split the dry-run summary to report both actual closures (from the closed counter) and the dry-run "would close" count, so immediate-close logins are no longer misreported.

Preview (b0fc8c224f)

diff --git a/.github/scripts/close_low_quality_prs.py b/.github/scripts/close_low_quality_prs.py
--- a/.github/scripts/close_low_quality_prs.py
+++ b/.github/scripts/close_low_quality_prs.py
@@ -38,6 +38,7 @@
 import argparse
 import datetime as dt
 import json
+import os
 import re
 import subprocess
 import sys
@@ -66,7 +67,34 @@
 # `default=[...]` combination silently mutates the shared default list.
 DEFAULT_OPTOUT_LABELS = ("do not close", "keep open", "wip")
 
+# HTML marker appended to grace-period warning comments. Shared with the
+# Agent Shin LLM-judge script (`triage_with_llm.py`) so a warning posted
+# by either path is recognized by both: the LLM judge can see "Greptile
+# already warned this contributor 12 hours ago" and skip re-warning, and
+# the Greptile closer can see "Agent Shin already warned" and close on
+# the next run if Greptile still has a low score.
+GRACE_COMMENT_MARKER = "<!-- agent-shin:grace-warning -->"
 
+# Length of the grace period between the warning comment and the actual
+# auto-close. Set to 24 hours so the contributor has at least one full
+# working day across any time zone to push fixes or comment
+# `@agent-shin reconsider`. Mirrors the constant of the same name in
+# `triage_with_llm.py` — keep them in sync if either changes.
+GRACE_PERIOD_SECONDS = 86400
+
+# Default login of the GitHub identity that performs Agent Shin's writes;
+# used for matching the author of a grace-warning comment so we don't
+# count somebody quoting the marker. The env override
+# `AGENT_SHIN_BOT_LOGIN` mirrors `triage_with_llm.py`.
+AGENT_SHIN_DEFAULT_BOT_LOGIN = "github-actions[bot]"
+
+# Logins (case-insensitive) that bypass BOTH the 1-day grace period AND
+# the dry-run gating. Mirrors `IMMEDIATE_CLOSE_LOGINS` in
+# `triage_with_llm.py`. Used for dogfooding the bot from external test
+# accounts that have no push permissions to the repo.
+IMMEDIATE_CLOSE_LOGINS = frozenset({"swiftwinds"})
+
+
 def gh(*args: str) -> str:
     """Run a `gh` CLI command and return stdout. Raises on non-zero exit."""
     result = subprocess.run(
@@ -196,6 +224,124 @@
     return bool(labels & {lbl.lower() for lbl in optout_labels})
 
 
+def seconds_since_last_grace_warning(
+    comments: Iterable[dict],
+    *,
+    bot_login: str | None = None,
+    now: dt.datetime | None = None,
+) -> float | None:
+    """Return seconds since the bot's most recent grace-period warning, or
+    None if no such warning has ever been posted on this PR.
+
+    Detects warnings by matching `GRACE_COMMENT_MARKER` in comments
+    authored by the bot identity. Operates on an already-fetched
+    comments list (avoids a second `gh api` call when the caller has
+    already pulled the page for Greptile-score extraction).
+
+    `now` is injectable so callers (and tests) can pin the reference
+    time. The closer runs everything against a single `now` snapshot
+    captured at the top of `main()` so age calculations stay consistent
+    across many PRs in a single run.
+    """
+    expected_login = (
+        bot_login
+        or os.environ.get("AGENT_SHIN_BOT_LOGIN")
+        or AGENT_SHIN_DEFAULT_BOT_LOGIN
+    ).lower()
+    latest: dt.datetime | None = None
+    for comment in comments:
+        author = ((comment.get("user") or {}).get("login") or "").lower()
+        if author != expected_login:
+            continue
+        body = comment.get("body") or ""
+        if GRACE_COMMENT_MARKER not in body:
+            continue
+        created = comment.get("created_at")
+        if not created:
+            continue
+        try:
+            ts = parse_iso8601(created)
+        except ValueError:
+            continue
+        if latest is None or ts > latest:
+            latest = ts
+    if latest is None:
+        return None
+    reference = now if now is not None else dt.datetime.now(dt.timezone.utc)
+    return (reference - latest).total_seconds()
+
+
+def format_grace_warning_comment(score: int, threshold: int) -> str:
+    """Comment posted on the FIRST low-Greptile-score detection — gives
+    the contributor a 1-day grace window before the auto-close fires on
+    the next daily cron run.
+
+    Mirrors `format_grace_warning_pr_comment` in
+    `triage_with_llm.py` in spirit (1-day grace + escape hatches), but
+    framed around Greptile's confidence score instead of the LLM judge's
+    rubric since the close trigger here is the Greptile signal.
+    """
+    return (
+        "👋 Hi, thanks for the PR! I'm **Agent Shin**, the automated triage bot for this repository.\n"
+        "\n"
+        "Heads up — Greptile's most recent review scored this PR "
+        f"**{score}/5**, below our merge bar of **{threshold}/5**.\n"
+        "\n"
+        "⏳ **You have 1 day to address Greptile's feedback before this PR is auto-closed.** "
+        "We close low-confidence PRs aggressively to keep the review queue manageable for "
+        "maintainers and contributors alike. **This isn't a rejection of the idea.**\n"
+        "\n"
+        "During the grace period:\n"
+        "\n"
+        "1. Push fixes that address Greptile's feedback (continue using your existing branch is fine).\n"
+        "2. Either:\n"
+        "   - Comment `@greptileai` to request a fresh Greptile review. If the new score is "
+        f"**{threshold}/5 or higher**, the PR stays open.\n"
+        "   - Or comment `@agent-shin reconsider` to have Agent Shin re-evaluate the PR description.\n"
+        "\n"
+        "If this PR is auto-closed in 24 hours, you'll still have options:\n"
+        "\n"
+        "- Comment `@agent-shin reconsider` after pushing fixes — Agent Shin will re-run triage "
+        "and reopen the PR if it now meets the bar.\n"
+        "- Comment `@greptileai` to request a re-review — that works **even after the PR is closed**.\n"
+        "\n"
+        "Thanks for contributing to LiteLLM. We know auto-closures can sting; the goal is to keep "
+        "the project healthy, not to dismiss your work.\n"
+        "\n"
+        f"{GRACE_COMMENT_MARKER}"
+    )
+
+
+def post_grace_warning(
+    pr: dict,
+    score: int,
+    threshold: int,
+    repo: str | None,
+    dry_run: bool,
+) -> None:
+    """Post the 1-day grace-period warning comment on `pr`.
+
+    The warning carries `GRACE_COMMENT_MARKER` so subsequent runs can
+    detect that the contributor has already been told about the
+    pending close. Does NOT close the PR — the close happens on the
+    next eligible run after `GRACE_PERIOD_SECONDS` elapses (handled
+    by `close_pr`).
+    """
+    pr_number = pr["number"]
+    repo_args = ["--repo", repo] if repo else []
+
+    if dry_run:
+        print(
+            f"  [DRY RUN] Would post grace warning to PR #{pr_number} "
+            f"(greptile={score}/5): {pr['title']}"
+        )
+        return
+
+    comment_body = format_grace_warning_comment(score, threshold)
+    gh("pr", "comment", str(pr_number), "--body", comment_body, *repo_args)
+    print(f"  Posted grace warning on PR #{pr_number} (greptile={score}/5)")
+
+
 def close_pr(
     pr: dict,
     score: int,
@@ -204,6 +350,7 @@
     repo: str | None,
     dry_run: bool,
     label: str | None,
+    grace_period_elapsed: bool = True,
 ) -> None:
     """Post the explanatory comment and close the PR."""
     pr_number = pr["number"]
@@ -216,10 +363,19 @@
         )
         return
 
+    score_sentence = (
+        f"Greptile's most recent review scored this PR **{score}/5**, below "
+        f"our merge bar of **{threshold}/5**, and the 1-day grace period since "
+        "the warning has elapsed.\n\n"
+        if grace_period_elapsed
+        else (
+            f"Greptile's most recent review scored this PR **{score}/5**, "
+            f"below our merge bar of **{threshold}/5**.\n\n"
+        )
+    )
     comment_body = (
         f"Closing as part of automated PR triage.\n\n"
-        f"Greptile's most recent review scored this PR **{score}/5**, below "
-        f"our merge bar of **{threshold}/5**.\n\n"
+        f"{score_sentence}"
         "We close low-confidence PRs aggressively to keep the review queue "
         "manageable for maintainers and contributors alike. **This is not a "
         "rejection of the idea** — to bring this back:\n\n"
@@ -233,7 +389,9 @@
         "maintainer, so a fresh PR is the most reliable path forward. If you "
         "would prefer this exact PR re-evaluated, comment "
         "`@agent-shin reconsider` once you've pushed the fixes — Agent Shin "
-        "will re-run triage and reopen this PR if it now meets the bar.\n\n"
+        "will re-run triage and reopen this PR if it now meets the bar. "
+        "You can also comment `@greptileai` to request a fresh Greptile "
+        "review — that works **even after the PR is closed**.\n\n"
         "Thanks for contributing to LiteLLM. We know auto-closures can sting; "
         "the goal is to keep the project healthy, not to dismiss your work."
     )
@@ -258,16 +416,28 @@
     repo: str | None,
     optout_labels: set[str],
 ) -> tuple[str, int | None, int | None]:
-    """Decide whether to close `pr`.
+    """Decide what to do with `pr` on this triage run.
 
     Returns (action, score_or_none, age_days_or_none) where action is one of:
         "skip-too-young", "skip-optout-label", "skip-internal",
-        "skip-no-greptile-score", "skip-score-ok", or "close".
+        "skip-no-greptile-score", "skip-score-ok",
+        "warn-grace", "skip-in-grace-period", or "close".
 
     Drafts are NOT skipped — the goal is "open PR count == PRs internal
     collaborators need to action on", and a draft that Greptile scored <4/5
     is still in that queue. Authors can opt out via the `wip` label (see
     `DEFAULT_OPTOUT_LABELS`) if they need to keep a long-lived draft open.
+
+    Grace-period semantics: the first time a PR fails the rubric, the
+    action is `warn-grace` — the caller should post a warning comment but
+    NOT close the PR. On a subsequent run, if the warning is still less
+    than `GRACE_PERIOD_SECONDS` old AND the PR still fails, the action is
+    `skip-in-grace-period`. Once the warning ages out and the rubric is
+    still failing, the action is `close`.
+
+    Grace is bypassed for `IMMEDIATE_CLOSE_LOGINS` (test/dogfood
+    accounts), which always go straight to `close` on the first failing
+    run so the bot is dogfoodable end-to-end without a 24h delay.
     """
     if has_optout_label(pr, optout_labels):
         return ("skip-optout-label", None, None)
@@ -294,6 +464,16 @@
     if score >= min_score:
         return ("skip-score-ok", score, age_days)
 
+    login = ((pr.get("author") or {}).get("login") or "").lower()
+    if login in IMMEDIATE_CLOSE_LOGINS:
+        return ("close", score, age_days)
+
+    grace_age = seconds_since_last_grace_warning(comments, now=now)
+    if grace_age is None:
+        return ("warn-grace", score, age_days)
+    if grace_age < GRACE_PERIOD_SECONDS:
+        return ("skip-in-grace-period", score, age_days)
+
     return ("close", score, age_days)
 
 
@@ -370,6 +550,8 @@
     closed = 0
     summary = {
         "close": 0,
+        "warn-grace": 0,
+        "skip-in-grace-period": 0,
         "skip-too-young": 0,
         "skip-optout-label": 0,
         "skip-internal": 0,
@@ -388,6 +570,30 @@
         )
         summary[action] = summary.get(action, 0) + 1
 
+        # Per-PR dry-run override: `IMMEDIATE_CLOSE_LOGINS` accounts (e.g.
+        # SwiftWinds) always run in real-close mode regardless of the
+        # global `--close` flag. Lets a maintainer dogfood the bot from
+        # an external account while the rest of the open-PR queue stays
+        # on the safe dry-run default.
+        author_login = ((pr.get("author") or {}).get("login") or "").lower()
+        is_immediate = author_login in IMMEDIATE_CLOSE_LOGINS
+        pr_dry_run = dry_run and not is_immediate
+
+        if action == "warn-grace":
+            assert score is not None
+            print(
+                f"#{pr['number']}: \"{pr['title']}\" "
+                f"(age={age_days}d, greptile={score}/5) -> warn-grace"
+            )
+            post_grace_warning(
+                pr,
+                score=score,
+                threshold=args.min_score,
+                repo=args.repo,
+                dry_run=pr_dry_run,
+            )
+            continue
+
         if action != "close":
             continue
 
@@ -395,6 +601,7 @@
         print(
             f"#{pr['number']}: \"{pr['title']}\" "
             f"(age={age_days}d, greptile={score}/5) -> close"
+            + (" [immediate-close login]" if is_immediate else "")
         )
         close_pr(
             pr,
@@ -402,11 +609,12 @@
             threshold=args.min_score,
             age_days=age_days,
             repo=args.repo,
-            dry_run=dry_run,
+            dry_run=pr_dry_run,
             label=args.close_label,
+            grace_period_elapsed=not is_immediate,
         )
 
-        if not dry_run:
+        if not pr_dry_run:
             closed += 1
             if args.limit is not None and closed >= args.limit:
                 print(f"\nReached --limit={args.limit}; stopping.")
@@ -415,7 +623,21 @@
     print("\n=== Summary ===")
     for key, value in summary.items():
         print(f"  {key:28s} {value}")
-    print(f"\nTotal {'would close' if dry_run else 'closed'}: {summary['close']}")
+    # `IMMEDIATE_CLOSE_LOGINS` PRs are closed even in global dry-run mode, so
+    # report actual closures alongside the dry-run "would close" count to avoid
+    # misleading operators into thinking no writes occurred.
+    would_close = summary["close"] - closed
+    if dry_run:
+        if closed:
+            print(f"\nTotal closed: {closed}; would close: {would_close}")
+        else:
+            print(f"\nTotal would close: {would_close}")
+    else:
+        print(f"\nTotal closed: {closed}")
+    print(
+        f"Total {'would warn (grace)' if dry_run else 'warned (grace)'}: "
+        f"{summary['warn-grace']}"
+    )
     return 0
 
 

diff --git a/.github/scripts/triage_with_llm.py b/.github/scripts/triage_with_llm.py
--- a/.github/scripts/triage_with_llm.py
+++ b/.github/scripts/triage_with_llm.py
@@ -30,6 +30,7 @@
 from __future__ import annotations
 
 import argparse
+import datetime as dt
 import json
 import os
 import re
@@ -42,6 +43,46 @@
 
 INTERNAL_ASSOCIATIONS = frozenset({"OWNER", "MEMBER", "COLLABORATOR"})
 
+# Login of the account that performs Agent Shin's GitHub writes. When the
+# workflow uses `secrets.GITHUB_TOKEN` (our default), the closure / reopen
+# event's `actor.login` is `github-actions[bot]`. The env override exists
+# for local debugging and for repos that wire Agent Shin to a PAT.
+AGENT_SHIN_DEFAULT_BOT_LOGIN = "github-actions[bot]"
+
+# HTML marker appended to every reconsider verdict comment. We grep for this
+# on subsequent reconsider triggers to enforce a short cooldown so that
+# repeated `@agent-shin reconsider` comments don't burn CI/LLM budget.
+# Using a unique HTML comment keeps the marker invisible to humans while
+# being trivially greppable from a comments-list API response.
+RECONSIDER_COMMENT_MARKER = "<!-- agent-shin:reconsider-verdict -->"
+
+# Minimum gap between two reconsider verdicts on the same PR/issue. Set to
+# 10 minutes — long enough that a contributor can't trivially spam the
+# trigger, short enough that a genuine "I just pushed a fix and reupdated
+# the body" iteration loop isn't punished.
+RECONSIDER_RATE_LIMIT_SECONDS = 600
+
+# HTML marker appended to the grace-period warning comment posted on the
+# first low-quality detection. We grep for this on subsequent triage runs
+# to (a) detect that a warning was already posted (so we don't spam the
+# contributor with duplicate warnings) and (b) measure how long ago it
+# was posted so we know when the grace period has elapsed.
+GRACE_COMMENT_MARKER = "<!-- agent-shin:grace-warning -->"
+
+# Length of the grace period between the warning comment and the actual
+# auto-close. Set to 24 hours so the contributor has at least one full
+# working day across any time zone to push fixes or comment
+# `@agent-shin reconsider`.
+GRACE_PERIOD_SECONDS = 86400
+
+# Logins (case-insensitive) that bypass BOTH the 1-day grace period AND
+# the dry-run / `AGENT_SHIN_ENABLED` workflow gating — every Agent Shin
+# verdict against a PR/issue from one of these accounts is treated as a
+# real run with immediate close on fail. Useful for dogfooding the bot
+# from an external account that has no push permissions to the repo.
+# Listed lower-case so callers compare via `login.lower() in ...`.
+IMMEDIATE_CLOSE_LOGINS = frozenset({"swiftwinds"})
+
 # Model families that require `reasoning_effort` to be set, and that reject
 # `temperature != 1` unless `reasoning_effort` is "none". For these models we
 # pass `reasoning_effort="none"` so a `temperature=0` deterministic judgment
@@ -160,6 +201,139 @@
     )
 
 
+def _iter_paginated_json(*api_args: str) -> Any:
+    """Yield JSON objects from `gh api --paginate ... -q '.[]'`.
+
+    `gh api --paginate` on a JSON-array endpoint concatenates pages into
+    one stream; `-q '.[]'` flattens that stream into newline-delimited
+    objects (jq-style). This keeps memory bounded for chatty endpoints
+    like issue events/comments on long-lived PRs.
+    """
+    raw = gh("api", "--paginate", *api_args, "-q", ".[]")
+    for line in raw.splitlines():
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            yield json.loads(line)
+        except json.JSONDecodeError:
+            # A malformed line should not blow up the whole guard. Skip and
+            # carry on — at worst the guard fail-closes (returns False /
+            # None) and the caller treats it as "unknown".
+            continue
+
+
+def fetch_last_close_actor(repo: str, number: int) -> str | None:
+    """Return the login of the actor who most recently closed this PR/issue.
+
+    Returns None if no `closed` event is found (unusual for a closed item,
+    but possible if the events API returns nothing — in which case the
+    bot-closed guard should fail-closed, i.e. refuse to reopen).
+    """
+    last: str | None = None
+    for event in _iter_paginated_json(f"repos/{repo}/issues/{number}/events"):
+        if event.get("event") == "closed":
+            last = (event.get("actor") or {}).get("login")
+    return last
+
+
+def was_closed_by_agent_shin(
+    repo: str, number: int, *, bot_login: str | None = None
+) -> bool:
+    """Return True iff the PR/issue was most-recently closed by Agent Shin.
+
+    This is the guard that stops `@agent-shin reconsider` from being used
+    to override a maintainer's closure for non-rubric reasons (security,
+    duplicate, design rejection, etc.). The check is intentionally
+    fail-closed: any uncertainty about who closed the item must be
+    treated as "not the bot" so the destructive reopen path stays gated.
+    """
+    expected = (
+        bot_login
+        or os.environ.get("AGENT_SHIN_BOT_LOGIN")
+        or AGENT_SHIN_DEFAULT_BOT_LOGIN
+    ).lower()
+    actor = fetch_last_close_actor(repo, number)
+    if not actor:
+        return False
+    return actor.lower() == expected
+
+
+def _seconds_since_latest_marker_comment(
+    repo: str,
+    number: int,
+    *,
+    marker: str,
+    bot_login: str | None = None,
+) -> float | None:
+    """Shared helper: return seconds since the bot's most recent comment
+    that contains the given HTML marker, or None if no such comment exists.
+
+    Used by both the reconsider-verdict cooldown and the grace-period
+    warning detection — keeping the iteration logic centralized stops the
+    two paths from drifting (e.g. one fixing a tz parsing bug and the
+    other forgetting to mirror it).
+    """
+    expected_login = (
+        bot_login
+        or os.environ.get("AGENT_SHIN_BOT_LOGIN")
+        or AGENT_SHIN_DEFAULT_BOT_LOGIN
+    ).lower()
+    latest: dt.datetime | None = None
+    for comment in _iter_paginated_json(f"repos/{repo}/issues/{number}/comments"):
+        author = ((comment.get("user") or {}).get("login") or "").lower()
+        if author != expected_login:
+            continue
+        body = comment.get("body") or ""
+        if marker not in body:
+            continue
+        created = comment.get("created_at")
+        if not created:
+            continue
+        try:
+            ts = dt.datetime.fromisoformat(created.replace("Z", "+00:00"))
+        except ValueError:
+            continue
+        if latest is None or ts > latest:
+            latest = ts
+    if latest is None:
+        return None
+    return (dt.datetime.now(dt.timezone.utc) - latest).total_seconds()
+
+
+def seconds_since_last_reconsider_verdict(
+    repo: str, number: int, *, bot_login: str | None = None
+) -> float | None:
+    """Return seconds since the bot's most recent reconsider verdict comment.
+
+    Detects comments by matching the HTML marker `RECONSIDER_COMMENT_MARKER`
+    appended by `format_reopen_comment` and
+    `format_reconsider_still_failing_comment`. Returns None when the bot
+    has never posted a reconsider verdict on this PR/issue (or when the
+    only matching comments are missing a `created_at` timestamp, which
+    shouldn't happen on a real GitHub response).
+    """
+    return _seconds_since_latest_marker_comment(
+        repo, number, marker=RECONSIDER_COMMENT_MARKER, bot_login=bot_login
+    )
+
+
+def seconds_since_last_grace_warning(
+    repo: str, number: int, *, bot_login: str | None = None
+) -> float | None:
+    """Return seconds since the bot's most recent grace-period warning.
+
+    Detects warning comments by matching the HTML marker
+    `GRACE_COMMENT_MARKER` appended by `format_grace_warning_pr_comment`
+    and `format_grace_warning_issue_comment`. Returns None when no
+    grace warning has ever been posted on this PR/issue — that's the
+    "first low-quality detection" signal that drives the warning path.
+    """
+    return _seconds_since_latest_marker_comment(
+        repo, number, marker=GRACE_COMMENT_MARKER, bot_login=bot_login
+    )
+
+
 # ---------------------------------------------------------------------------
 # Author classification
 
@@ -393,6 +567,9 @@
         "to get back into the review queue.\n"
         "   - **Or** comment `@agent-shin reconsider` on this closed PR after updating the description. "
         "I'll re-run the triage; if it now passes, I'll reopen this PR automatically.\n"
+        "   - You can also comment `@greptileai` on this PR to request a fresh Greptile review — that "
+        "still works **even after the PR is closed**, and a higher score is one of the signals that "
+        "lifts the PR back into the queue.\n"
         "\n"
         "Internal BerriAI contributors: this rubric doesn't apply to you — ping a maintainer.\n"
         "\n"
@@ -433,6 +610,90 @@
     )
 
 
+def format_grace_warning_pr_comment(verdict: dict) -> str:
+    """Comment posted on the FIRST low-quality detection — gives the
+    contributor a 1-day grace window to fix the PR before the next
+    triage run actually closes it.
+
+    This is the "before-close" warning. On the second triage run, if the
+    grace marker is older than `GRACE_PERIOD_SECONDS` AND the PR still
+    fails the rubric, the close path runs (which posts
+    `format_pr_close_comment` and closes the PR).
+    """
+    missing_lines = _format_missing(verdict.get("missing") or [])
+    explanation = verdict.get("explanation") or ""
+    return (
+        "👋 Hi, thanks for the PR! I'm **Agent Shin**, the automated triage bot for this repository.\n"
+        "\n"
+        "Heads up — this PR does not yet meet the bar described in our "
+        "[pull-request template](https://github.com/BerriAI/litellm/blob/main/.github/pull_request_template.md). "
+        "Specifically, I couldn't find:\n"
+        "\n"
+        f"{missing_lines}\n"
+        "\n"
+        f"> {explanation}\n"
+        "\n"
+        "⏳ **You have 1 day to address this before this PR is auto-closed.** "
+        "During the grace period:\n"
+        "\n"
+        "1. Update the PR description to either:\n"
+        "   - Link a related GitHub issue (e.g. `Fixes #1234`), OR\n"
+        "   - Add a clear **problem description**, **expected vs. actual behavior**, and **visual QA proof** "
+        "(before/after screenshots, a short screen recording, or terminal/log output).\n"
+        "2. Comment `@agent-shin reconsider` on this PR after updating it. If your update meets the "
+        "bar, I'll skip the auto-close and a maintainer will take another look.\n"
+        "\n"
+        "If this PR is auto-closed in 24 hours, you'll still have options:\n"
+        "\n"
+        "- Comment `@agent-shin reconsider` to have me re-evaluate (and reopen the PR if it now meets the bar).\n"
+        "- Comment `@greptileai` to request a fresh Greptile review — that works **even after the PR is closed**.\n"
+        "\n"
+        "Internal BerriAI contributors: this rubric doesn't apply to you — ping a maintainer.\n"
+        "\n"
+        "_(I'm an LLM, so I'm not infallible. If you think I got this wrong, comment "
+        "`@agent-shin reconsider` or ping a maintainer — they'll override me.)_\n"
+        "\n"
+        f"{GRACE_COMMENT_MARKER}"
+    )
+
+
+def format_grace_warning_issue_comment(verdict: dict) -> str:
+    """Issue analogue of `format_grace_warning_pr_comment`."""
+    missing_lines = _format_missing(verdict.get("missing") or [])
+    explanation = verdict.get("explanation") or ""
+    return (
+        "👋 Hi, thanks for filing this! I'm **Agent Shin**, the automated triage bot for this repository.\n"
+        "\n"
+        "Heads up — this issue doesn't yet have enough detail for a maintainer to act on. "
+        "Specifically, I couldn't find:\n"
+        "\n"
+        f"{missing_lines}\n"
+        "\n"
+        f"> {explanation}\n"
+        "\n"
+        "⏳ **You have 1 day to address this before this issue is auto-closed.** "
+        "During the grace period:\n"
+        "\n"
+        "1. Edit the issue to add the missing pieces:\n"
+        "   - For **bug reports**: a runnable reproduction (code / curl / config), expected vs. actual behavior, "
+        "and a screenshot / traceback / log showing the bug.\n"
+        "   - For **feature requests**: a concrete description of what should change, plus a use case and example "
+        "(config / API call / UI flow).\n"
+        "2. Comment `@agent-shin reconsider` on this issue after updating it. If your update meets the bar, "
+        "I'll skip the auto-close and a maintainer will take another look.\n"
+        "\n"
+        "If this issue is auto-closed in 24 hours, you can still comment `@agent-shin reconsider` to have "
+        "me re-evaluate (and reopen the issue if it now meets the bar).\n"
+        "\n"
+        "Internal BerriAI contributors: this rubric doesn't apply to you — ping a maintainer.\n"
+        "\n"
+        "_(I'm an LLM, so I'm not infallible. If you think I got this wrong, comment "
+        "`@agent-shin reconsider` or ping a maintainer — they'll override me.)_\n"
+        "\n"
+        f"{GRACE_COMMENT_MARKER}"
+    )
+
+
 # ---------------------------------------------------------------------------
 # Step-summary helpers
 
@@ -458,6 +719,9 @@
 def format_reopen_comment(kind: str) -> str:
     """Comment posted when Agent Shin reopens after a successful reconsider."""
     noun = "PR" if kind == "pr" else "issue"
+    # The trailing HTML marker is used by `seconds_since_last_reconsider_verdict`
+    # to enforce a cooldown between repeated `@agent-shin reconsider` triggers.
+    # Keep the marker on its own line so it doesn't disturb the rendered text.
     return (
         f"♻️ **Re-evaluated and reopened.** Thanks for updating the {noun}!\n"
         "\n"
@@ -467,7 +731,9 @@
         "\n"
         "_(If a maintainer ends up closing this for non-rubric reasons, that "
         "decision stands; comment `@agent-shin reconsider` again only if you "
-        "have substantively new information.)_"
+        "have substantively new information.)_\n"
+        "\n"
+        f"{RECONSIDER_COMMENT_MARKER}"
     )
 
 
@@ -476,6 +742,8 @@
     missing_lines = _format_missing(verdict.get("missing") or [])
     explanation = verdict.get("explanation") or ""
     noun = "PR" if kind == "pr" else "issue"
+    # The trailing HTML marker is used by `seconds_since_last_reconsider_verdict`
+    # to enforce a cooldown between repeated `@agent-shin reconsider` triggers.
     return (
         f"⏸️ **Re-evaluated; this {noun} still doesn't meet the rubric.**\n"
         "\n"
@@ -490,7 +758,9 @@
         "`@agent-shin reconsider` again, or ping a maintainer if you think "
         "I got this wrong.\n"
         "\n"
-        "_(I'm an LLM and I'm not infallible.)_"
+        "_(I'm an LLM and I'm not infallible.)_\n"
+        "\n"
+        f"{RECONSIDER_COMMENT_MARKER}"
     )
 
 
@@ -514,10 +784,22 @@
     fail-but-no-comment is replaced with a "still failing" comment + leave
     closed; a pass triggers `reopen_pr`/`reopen_issue` plus a reopen comment.
     Reconsider mode is intended for the `@agent-shin reconsider` comment
-    trigger. `close` is forced True implicitly when `reconsider` is set
-    because the bot has already decided this is a real (non-dry-run)
-    invocation; it's the caller's responsibility to gate on
-    AGENT_SHIN_ENABLED before calling reconsider mode.
+    trigger. Like regular triage, `close=False` keeps reconsider in dry-run
+    (returns `would-reopen` / `would-reconsider-still-failing` so a local
+    operator can preview without write side effects); the workflow only
+    passes `--close` when `AGENT_SHIN_ENABLED=true`.
+
+    Reconsider mode adds two extra safety guards on top of the regular
+    triage skip-internal-author check:
+
+      1. **Bot-closed guard.** Only reopens if the most recent close was
+         performed by the bot identity (default `github-actions[bot]`).
+         This stops a contributor from using `@agent-shin reconsider` to
+         override a maintainer's close for non-rubric reasons.
+      2. **Rate-limit guard.** If the bot has already posted a reconsider
+         verdict on this PR/issue within `RECONSIDER_RATE_LIMIT_SECONDS`,
+         skip — repeated triggers from the same contributor shouldn't burn
+         CI minutes or LLM budget.
     """
     fetcher = {"pr": fetch_pr, "issue": fetch_issue}[kind]
     item = fetcher(repo, number)
@@ -551,6 +833,20 @@
     if is_internal_contributor(item):
         return {**base_result, "action": "skip-internal-author"}
 
+    # Reconsider-only guards — these run BEFORE the LLM call so a
+    # maintainer-closed PR / rate-limited trigger never spends LLM budget.
+    if reconsider:
+        if not was_closed_by_agent_shin(repo, number):
+            return {**base_result, "action": "skip-not-bot-closed"}
+        age = seconds_since_last_reconsider_verdict(repo, number)
+        if age is not None and age < RECONSIDER_RATE_LIMIT_SECONDS:
+            return {
+                **base_result,
+                "action": "skip-rate-limited",
+                "rate_limit_age_seconds": age,
+                "rate_limit_window_seconds": RECONSIDER_RATE_LIMIT_SECONDS,
+            }
+
     if kind == "pr":
         prompt = build_pr_prompt(title=title, body=body)
         # Short-circuit: if body very clearly links a related issue, just pass.
@@ -567,6 +863,12 @@
             if reconsider:
                 # Pass-on-reconsider -> reopen the PR with a friendly comment.
                 reopen_body = format_reopen_comment(kind)
+                if not close:
+                    return {
+                        **base,
+                        "action": "would-reopen",
+                        "comment": reopen_body,
+                    }
                 post_comment(repo, number, reopen_body)
                 reopen_pr(repo, number)
                 return {
@@ -607,8 +909,20 @@
         # Reconsider: pass -> reopen + post reopen comment;
         # fail -> leave closed + post a "still failing" comment so the
         # contributor can iterate again.
+        # In dry-run (`close=False`) we return `would-*` actions instead
+        # of touching GitHub state, mirroring the regular triage flow's
+        # `would-close`. This lets a local operator preview the outcome
+        # of `python triage_with_llm.py --reconsider --pr N` without
+        # risking accidental comments or reopens.
         if decision != "fail":
             reopen_body = format_reopen_comment(kind)
+            if not close:
+                return {
+                    **base_result,
+                    "action": "would-reopen",
+                    "verdict": verdict,
+                    "comment": reopen_body,
+                }
             post_comment(repo, number, reopen_body)
             if kind == "pr":
                 reopen_pr(repo, number)
@@ -621,6 +935,13 @@
                 "comment": reopen_body,
             }
         still_failing = format_reconsider_still_failing_comment(kind, verdict)
+        if not close:
+            return {
+                **base_result,
+                "action": "would-reconsider-still-failing",
+                "verdict": verdict,
+                "comment": still_failing,
+            }
         post_comment(repo, number, still_failing)
         return {
             **base_result,
@@ -632,7 +953,58 @@
     if decision != "fail":
         return {**base_result, "action": "pass-llm", "verdict": verdict}
 
-    if not close:
+    # Grace-period flow: on the first low-quality detection, post a warning
+    # comment instead of closing immediately. On a subsequent triage run
+    # (manual re-trigger, or the daily `close_low_quality_prs.py` cron
+    # finding the same PR in its own pass), if `GRACE_PERIOD_SECONDS` has
+    # elapsed since the warning AND the PR still fails the rubric, close.
+    #
+    # `IMMEDIATE_CLOSE_LOGINS` (e.g. test/dogfood accounts like SwiftWinds)
+    # bypass the grace period entirely — every fail is treated as a real
+    # close run. This is intentional: those accounts exist specifically to
+    # exercise the bot end-to-end, and waiting a day per iteration kills
+    # the feedback loop.
+    is_immediate = login.lower() in IMMEDIATE_CLOSE_LOGINS
... diff truncated: showing 800 of 2283 lines

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit e0eeb73. Configure here.}

IMMEDIATE_CLOSE_LOGINS PRs are closed even when the global --close flag is not set, but the summary used the global dry-run flag to choose between 'would close' and 'closed'. Split the count so operators can see both actual closures and dry-run would-be closures. Co-authored-by: Yassin Kortam <yassin@berri.ai>

@agent-shin

…and review-gate label lifecycle (#30433) * feat(triage): auto-close stale PRs with Greptile score <4/5 Adds .github/scripts/close_low_quality_prs.py and a daily workflow that closes PRs which: - are open for at least 7 days, and - carry a most-recent greptile-apps review with Confidence Score <4/5, - and are not drafts or opt-out-labeled ('do not close', 'wip', etc.). Each closure posts an explanatory comment telling the contributor how to bring the PR back (rebase, re-request greptile, reopen at 4+/5). The 4/5 bar is already documented in the PR template (.github/pull_request_template.md), so this just enforces it. Tested with a dry run against the live BerriAI/litellm backlog of 1000 open PRs: 100 candidates identified, 598 PRs pass the bar (4+/5), 186 are too young, 97 are drafts, 19 lack any Greptile review and are left alone. Workflow defaults to closing 25 PRs/run as a safety net and supports workflow_dispatch with overrides (close=false for a dry run, custom min_age_days/min_score/limit). 18 unit tests cover score extraction (HTML/markdown/plain text, login variants, multi-review picks latest) and per-PR evaluation (drafts, opt-out labels, age, missing/passing/failing scores). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * docs(templates): require expected/actual + QA proof for external contributions PR template: - Make the rubric explicit at the top: link an issue, OR provide a clear problem description + expected vs. actual + visual QA proof. - Add dedicated sections for each piece so the bot has a deterministic shape to read. - Keep the existing 'Linear ticket' section for internal contributors (they're exempt from the auto-triage rubric). Bug report template: - Split 'What happened?' into 'Actual behavior' + 'Expected behavior'. - Make logs/screenshot a required textarea. - Warning banner at the top tells external contributors that incomplete reports will be auto-closed (with re-evaluation on reopen). Feature request template: - Require a concrete use case + example in the motivation field, not just a one-liner pitch. - Same auto-triage warning banner. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * feat(triage): Agent Shin LLM-as-judge for external PRs and issues Adds a new triage flow that evaluates external pull requests and issues against the project's contribution rubric and, when configured to do so, auto-closes non-conforming ones with an explanatory comment. Contributors can update + reopen to be re-evaluated. Scope: - Internal BerriAI contributors (author_association OWNER/MEMBER/COLLABORATOR) and bot accounts are skipped entirely. - 'Fixes #1234' / 'Resolves https://github.com/.../issues/N' in the PR body short-circuits to PASS without burning LLM tokens. - LLM judge returns structured JSON (verdict, missing[], explanation); parser tolerates markdown fences and embedded JSON. - LLM errors NEVER close PRs/issues — failure surfaces as 'skip-llm-error'. Safety: - pull_request_target / issues triggers are FORCED dry-run in the workflow; only manual workflow_dispatch with close=true (and AGENT_SHIN_ENABLED=true) takes destructive action. - Default mode writes verdicts to GITHUB_STEP_SUMMARY only — no public comments until the team flips the AGENT_SHIN_ENABLED repo variable. - LLM uses an OpenAI-compatible endpoint (model and base URL configurable via repo variables; key via OPENAI_API_KEY secret). Files: - .github/scripts/triage_with_llm.py - judge orchestrator + CLI - .github/workflows/triage_pr_with_llm.yml - .github/workflows/triage_issue_with_llm.yml - tests/test_litellm/test_github_triage_with_llm.py - 33 unit tests End-to-end validated against four real PRs (#28117 internal collaborator, #28108 bot, #28129 'Fixes #28128', #28116 no linked issue) and issue #28132 with a stubbed LLM judge: each path produces the expected action. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * feat(triage): scope Greptile auto-closer to external contributors + dry-run by default - close_low_quality_prs.py now filters by GitHub author_association via the REST API: PRs from OWNER / MEMBER / COLLABORATOR (and bot accounts) are skipped with a new 'skip-internal' summary bucket. - close_low_quality_prs.yml now defaults workflow_dispatch close=false, and ignores 'close=true' unless the new repo variable AGENT_SHIN_ENABLED is set to 'true'. Scheduled runs are dry-run only until the team flips that switch. - Updated unit tests: one new test asserting internal authors are skipped, and an autouse fixture treats unspecified test PRs as external so the rest of the suite still exercises the close path. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(workflows): scheduled cron closes PRs; safe --close strip in triage Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(triage): scheduled cron stays dry-run; dedent prompts before interpolation - close_low_quality_prs.yml: only workflow_dispatch with close=true (and AGENT_SHIN_ENABLED=true) actually closes PRs. Scheduled runs are always dry-run, matching the safety invariant documented for triage_pr/issue. - triage_with_llm.py: textwrap.dedent on an f-string with multi-line interpolated bodies fails because the body's 2nd+ lines start at column 0, making the common-indent zero. Dedent the static template first, then .format() the title/body in. Co-authored-by: Yassin Kortam <yassin@berri.ai> * Fix bugs in auto-close PR triage scripts - close_low_quality_prs.py: Treat author_association API lookup failures as internal (fail-safe) so transient errors don't cause internal contributors' PRs to be auto-closed. - triage_with_llm.py: Update summary heading from 'Would post comment:' to 'Posted comment:' since this branch only runs after the comment has already been posted. Co-authored-by: Yassin Kortam <yassin@berri.ai> * feat(triage): default Agent Shin to gpt-5.4-mini with reasoning_effort=none - Bump DEFAULT_MODEL from gpt-4o-mini to gpt-5.4-mini (more modern; 4M total context window per OpenAI catalog, JSON-schema response format, function calling all supported). - For gpt-5.x family models, pass reasoning_effort="none" via extra_body. gpt-5.x rejects temperature != 1 unless reasoning_effort is explicitly "none"; setting it lets us keep temperature=0 for deterministic JSON rubric judgments. extra_body works across openai SDK versions regardless of whether they natively type the kwarg. - For non-gpt5 overrides (TRIAGE_MODEL=gpt-4o-mini etc.), reasoning_effort is not sent. - 4 new unit tests cover: gpt-5.4-mini -> reasoning_effort=none, capitalized/dated gpt-5 variants -> reasoning_effort=none, gpt-4o-mini -> no extra_body, base_url passthrough. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(triage): bugbot — drop dead gh_json and fix --optout-label append-with-default - Removed the unused gh_json helper (bugbot low-severity dead code). - Replaced argparse `action="append", default=[...]` with default=None + DEFAULT_OPTOUT_LABELS fallback. The mutable-default + append combo silently APPENDS to the canonical defaults instead of replacing them, so --optout-label could not actually scope the opt-out list. - Added tests covering both the canonical default and the flag-replaces-defaults behavior. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(triage): bugbot — tighten linked-issue regex, fail-safe author_association, fix empty TRIAGE_MODEL Three independent bugbot findings against triage_with_llm.py: 1. LINKED_ISSUE_PATTERN included weak keywords (`see`, `ref`, `addresses`) so casual mentions like "See #1234 for context" were short-circuited to pass-linked-issue without ever calling the LLM — contradicting the prompt's own "a bare issue number without a closing keyword counts only if it's clearly the related issue (not a passing mention)" rubric. Limit the regex to GitHub's documented PR-closing keywords (fixes/fix/fixed/closes/close/closed/resolves/resolve/resolved). 2. is_internal_contributor() treated an empty/missing author_association as external (eligible for the destructive close path), while the sibling is_external_pr_author() in close_low_quality_prs.py fail-safes the same case as internal. Align the two so a partial/unknown GitHub response can never make a PR eligible for auto-close. 3. argparse `default=os.environ.get("TRIAGE_MODEL", DEFAULT_MODEL)` returns the empty string when GitHub Actions exposes an unset repo variable as an empty-string env var (the optional vars.TRIAGE_MODEL case in the workflow). Use `os.environ.get(...) or DEFAULT_MODEL` so empty -> default, matching the existing OPENAI_BASE_URL pattern. Tests: - Casual mentions now must fall through to the LLM (parametrized); added an orchestration test ensuring "See #1234" reaches the judge. - Empty/missing author_association now fails safe (parametrized). - Empty TRIAGE_MODEL env var falls back to DEFAULT_MODEL; explicit TRIAGE_MODEL is still honored. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(workflows): bugbot — gate Agent Shin --close on '= true' not '!= false' The PR and issue Agent Shin workflows gated the destructive --close flag with [ "${DISPATCH_CLOSE:-false}" != "false" ]. That pattern treats anything other than the literal string "false" as enabling closure — "True", "yes", "1", typos, accidental whitespace, etc. The workflow_dispatch input UI is a 'true'/'false' choice dropdown so the form is constrained, but the API (`gh workflow run -f close=...`) accepts any string, and a CI cron / external invoker passing a non-canonical truthy value would have silently enabled real contributor PR closures. Mirror the sibling Greptile closer's [ "${CLOSE_FLAG}" = "true" ] pattern: only the EXACT string "true" enables --close; every other value (including the unset/empty default) resolves to dry-run. This is the fail-safe philosophy applied everywhere else in this PR. Added tests/test_litellm/test_github_triage_workflows.py with two parametrized invariants: 1. The destructive gate uses '= "true"' for its env-var comparison (either bare '${ENV}' or '${ENV:-false}' form accepted), and never the fail-open '!= "false"' pattern. 2. Every destructive gate is also gated on AGENT_SHIN_ENABLED being "true" — either by entering the close branch on '=' or by bailing out early on '!=' — so flipping the repo variable off is a true kill switch regardless of per-run inputs. Manually verified the test fails on the buggy '!= "false"' pattern and passes on the fix, so it would have caught the regression at PR time. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * feat(triage): close any PR (incl. drafts, any age); add @agent-shin reconsider flow Follow-up to PR #28117. Three behavior changes + one new workflow, addressing the team's concerns on the original review: 1) Apply auto-close to ALL open PRs, not just those over a week old. - close_low_quality_prs.py: --min-age-days default flipped from 7 to 0. The flag is preserved as an opt-in safety net for one-off backfill runs that want to spare very-young PRs, but the daily scheduled sweep now closes external-author PRs as soon as Greptile scores them <4/5. - close_low_quality_prs.yml: workflow_dispatch input default also flipped to 0; doc comments updated. 2) Apply auto-close to draft PRs too. - close_low_quality_prs.py: removed the skip-draft branch in evaluate_pr. Drafts are NOT a free pass — the team's intent is 'open PR count == PRs internal collaborators need to action on', so a draft Greptile scored 2/5 still belongs in the closed bucket. Authors who genuinely need a long-lived draft can attach the 'wip' opt-out label, which is unchanged. - The 'skip-draft' action is gone; the 'wip' label still skips. 3) Address the 'OSS contributors cannot reopen a bot-closed PR' wrinkle. GitHub does NOT let an external (non-write-access) contributor reopen a PR that was closed by a bot or maintainer (long-standing limitation). The original PR's close-comments told contributors to 'Reopen the PR — I'll re-evaluate automatically', which is broken for the very audience this triage targets. Two changes: a) Reword every close-comment (Greptile sweep + Agent Shin PR close + Agent Shin issue close + PR template) to recommend: - Open a new PR with the updated branch (primary path). - Or comment '@agent-shin reconsider' on the closed PR for a re-evaluation that, on pass, reopens the PR via the bot's GH_TOKEN write access. b) Add the @agent-shin reconsider workflow: - .github/workflows/triage_reconsider.yml: new 'issue_comment'-triggered workflow. Authorizes only the PR/issue author or an internal collaborator (OWNER/MEMBER/COLLABORATOR), gated via a step output so unauthorized commenters never reach the destructive steps. Globally gated on AGENT_SHIN_ENABLED='true' (positive form, matching the test_github_triage_workflows guardrail patterns). - triage_with_llm.py: --reconsider mode. On a closed PR/issue, re-runs the LLM judge (or linked-issue regex short-circuit) and: - on pass: reopens via reopen_pr/reopen_issue + posts a 'Re-evaluated and reopened' comment. - on fail: leaves closed and posts a 'still missing X' comment so the contributor can iterate again. Reconsider-on-open is a no-op ('skip-not-closed'). Internal-author + bot-account skips still take priority over reconsider. 4) Greptile-on-closed-PRs question: the team asked whether Greptile can re-review a closed PR. Greptile's docs don't address this and we shouldn't promise behavior we can't verify, so the new close-comment wording does NOT instruct contributors to 're-request greptile on the closed PR'. Instead it points them at the new-PR path (which Greptile definitely reviews) or the @agent-shin reconsider trigger (which re-runs the LiteLLM-side rubric judge, not Greptile). Tests: 93 passing (was 59). - test_github_close_low_quality_prs.py: replaced 'skip drafts' test with 'closes drafts when score is low' + 'closes brand-new PR when min_age=0' + 'no skip when min_age=0'. The 'skip too young' assertion is preserved as opt-in. - test_github_triage_with_llm.py: 6 new TestTriageOrchestration cases for reconsider mode (skip-not-closed on open, reopen on pass, still-failing comment on fail, linked-issue short-circuit reopen, skip internal author in reconsider, reopen-issue on pass) + a new TestCloseCommentText class that pins the user-facing 'open a new PR' + '@agent-shin reconsider' wording. - test_github_triage_workflows.py: added triage_reconsider.yml to the destructive-gate guardrail table; AGENT_SHIN_ENABLED is its own destructive gate (no separate per-run flag needed). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(triage): pin safe behavior for curly braces in PR/issue title+body Adds regression tests covering the bugbot high-severity finding that str.format() would crash on user-supplied content containing { or }. Empirically str.format() does NOT re-parse interpolated values — only the template literal is scanned for replacement fields — so the bug does not exist in the current code, but pinning the safe behavior prevents a future templating change from silently reintroducing it. Also pins the dedented prompt shape (no leading 8-space indentation on template lines) so a future change to the build_*_prompt functions can't silently regress the LLM judge prompt format on multi-line bodies. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(triage): bugbot — reconsider dry-run + bot-closed guard + rate limit Address three Greptile/veria-ai concerns on the @agent-shin reconsider flow: 1. **Reconsider had no dry-run path.** The previous reconsider mode ignored `--close` and always posted comments + reopened on a pass. A local operator running `python triage_with_llm.py --reconsider --pr N` would silently take destructive GitHub actions with no way to preview. Reconsider now honors `close=False` the same way regular triage does and returns `would-reopen` / `would-reconsider-still-failing` for step-summary rendering. 2. **Reconsider could reopen maintainer-closed PRs/issues** (Medium security finding from veria-ai). The workflow only checked that the commenter was authorized — it did NOT check that the most recent close was performed by Agent Shin. A contributor could comment `@agent-shin reconsider` on a PR a maintainer closed for non-rubric reasons (duplicate, security report, design rejection) and have the bot reopen it. Add `was_closed_by_agent_shin()` which inspects the issue events API for the most recent `closed` actor and only permits reopen when that actor matches the configured bot login (default `github-actions[bot]`, overridable via env). Fail-closed on missing events. 3. **No rate-limiting on the reconsider trigger.** Every `@agent-shin reconsider` comment burns CI minutes + an OpenAI API call. Add a 10-minute cooldown via `seconds_since_last_reconsider_verdict()` which greps the issue's comment list for the bot's own verdict marker (``). Inside the window the triage returns `skip-rate-limited` and the LLM never runs. Workflow update: - `triage_reconsider.yml` now passes `--close` only when `AGENT_SHIN_ENABLED=true`, matching the pattern of `triage_pr_with_llm.yml`. The script runs in both states so the verdict still appears in the step summary for QA. Tests: - Add 5 reconsider safety tests: dry-run for pass / fail / linked-issue short-circuit, bot-closed-guard refusal on maintainer close, rate-limit refusal inside the cooldown window, and cooldown-elapsed acceptance. - Add unit tests for `was_closed_by_agent_shin` (bot / maintainer / missing actor / env-override) and `seconds_since_last_reconsider_verdict` (no marker / multiple markers / non-bot comment with marker / bot comment without marker). - Pin the `` marker in both reopen and still-failing comments — dropping it would silently break the cooldown. Existing reconsider tests updated to pass `close=True` (the production path now) + stub the new guards via `_stub_reconsider_guards`. 112 tests pass (was 93). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * feat(triage): 1-day grace period before close + SwiftWinds immediate-close bypass - Add a 24-hour grace window between the first low-quality detection and the actual auto-close. The first detection posts a warning comment that explicitly says "You have 1 day to address this before this PR is auto-closed" and points the contributor at: * `@agent-shin reconsider` to request another look (and re-open) * `@greptileai` to request a fresh Greptile review — works even after the PR is closed - Both `triage_with_llm.py` (LLM judge) and `close_low_quality_prs.py` (Greptile-score closer) share the same `` HTML marker so a warning posted by either path is recognized by both. - Add IMMEDIATE_CLOSE_LOGINS = {swiftwinds} to bypass BOTH the grace period AND the dry-run / AGENT_SHIN_ENABLED gating. SwiftWinds is the user's personal account (no push permissions to litellm) used to dogfood the bot; user explicitly asked: "For SwiftWinds, just close immediately. Faster iteration that way." - Update the standard close comments to mention that `@greptileai` works even after the PR is closed. - Add 23 new tests covering: warn-grace on first detection, skip during grace window, close after grace expires, SwiftWinds bypass (case insensitive, with close=False, no random-login false positives), the grace-warning text invariants, and the SwiftWinds entry in the IMMEDIATE_CLOSE_LOGINS constant. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix: skip grace-period text in close comment for IMMEDIATE_CLOSE_LOGINS For PRs from IMMEDIATE_CLOSE_LOGINS (e.g. swiftwinds), evaluate_pr returns 'close' immediately without ever posting a grace warning, so the close comment should not reference a 1-day grace period. Make close_pr take a grace_period_elapsed flag, default True, and pass False from the main loop when the close path was the immediate-close branch. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(close-low-quality-prs): report actual closes in dry-run summary IMMEDIATE_CLOSE_LOGINS PRs are closed even when the global --close flag is not set, but the summary used the global dry-run flag to choose between 'would close' and 'closed'. Split the count so operators can see both actual closures and dry-run would-be closures. Co-authored-by: Yassin Kortam <yassin@berri.ai> * chore(triage): vendor Agent Shin (#28117) onto demo branch Brings the Agent Shin OSS-triage scripts, workflows, issue/PR templates, and tests from PR #28117 onto this branch so the new review-gate feature and its end-to-end demo are self-contained and runnable in CI. https://claude.ai/code/session_01XyyWa8t2VYmoGd6mKMEqkZ * feat(triage): add "ready for review" label lifecycle to Agent Shin Adds review_gate(), a state machine that keeps a `ready for review` label in sync with whether an external PR clears BOTH gates — the LLM rubric and Greptile's most recent confidence score: - pass (untagged) -> add label + "ready for review" / "all clear" comment - pass (already tagged) -> no-op (idempotent across re-runs) - regress (Greptile < 4/5 or QA proof removed) -> remove label + "what's missing" comment, PR stays open - recover after a regression -> "all clear again" comment + re-add the label - fail & untagged, < 24h old -> one-time "what's missing" notice (grace window) - fail & untagged, > 24h old -> close + comment (reopen via @agent-shin reconsider) The label itself is the persisted state, so comments fire only on transitions (never on every scheduled run). All side effects are gated behind --close, so the dry-run contract matches the existing triage flow. Lifecycle comments use hidden HTML markers and deliberately avoid the auto-close marker so they never trip the reconsider provenance check. Relocates the shared Greptile helpers (extract_greptile_score, SCORE_PATTERN, GREPTILE_BOT_LOGINS, parse_iso8601) into triage_with_llm.py so the daily sweep and the review gate read the score through one implementation, and adds the review_gate.yml workflow (dry-run unless AGENT_SHIN_ENABLED=true) plus 18 unit tests covering every branch and a full pass->regress->recover cycle. https://claude.ai/code/session_01XyyWa8t2VYmoGd6mKMEqkZ * Port review-gate feature from #28758 onto #28147 triage scripts Adds the "ready for review" label lifecycle (originally PR #28758) on top of #28147's refactored triage_with_llm.py. The original commit was authored against an older snapshot of #28117 and could not be applied cleanly, so the additions were re-applied surgically: - New constants: READY_FOR_REVIEW_LABEL, DEFAULT_GRACE_DAYS, DEFAULT_MIN_GREPTILE_SCORE, READY/REGRESSED/WITHIN_GRACE markers, GREPTILE_BOT_LOGINS, SCORE_PATTERN, AGENT_SHIN_AUTO_CLOSE_MARKER. - New helpers: add_label, remove_label, extract_greptile_score, parse_iso8601 (the latter two mirrored from close_low_quality_prs.py so the daily sweep and the review gate read the score through the same logic). - New comment formatters: format_ready_for_review_comment, format_all_clear_comment, format_regression_comment, format_within_grace_comment. - New entry point: review_gate() implementing the pass/regress/recover state machine, with the label itself acting as persisted state so transition comments fire only on actual transitions. - main() learns --review-gate, --grace-days, --min-greptile-score and dispatches to review_gate() when the flag is set. Verified via tests/test_litellm/test_github_review_gate.py (18 tests) and the existing triage suites (144 more) — all 162 pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * agent_shin: extract shared constants/helpers; cover review_gate.yml in guardrail tests Bug 1: `triage_with_llm.py` and `close_low_quality_prs.py` each defined their own copies of `extract_greptile_score`, `parse_iso8601`, `GREPTILE_BOT_LOGINS`, `SCORE_PATTERN`, `GRACE_COMMENT_MARKER`, `GRACE_PERIOD_SECONDS`, `IMMEDIATE_CLOSE_LOGINS`, and `AGENT_SHIN_DEFAULT_BOT_LOGIN`. The comments explicitly said the two copies had to stay in sync, but nothing enforced it. A future change to one (e.g. extending `SCORE_PATTERN` for a new Greptile output format) would silently diverge from the other and the daily sweep and the LLM judge would disagree on which PRs have low scores. Extract these to `.github/scripts/agent_shin_shared.py` and re-export them from each script so the existing test attribute access (`triage_module.GRACE_COMMENT_MARKER`, etc.) keeps working without any test changes. Bug 2: `review_gate.yml` is a destructive workflow (close PRs, add/remove labels, post comments) with the same gating philosophy as the others (`AGENT_SHIN_ENABLED = "true"` + a per-run `CLOSE_FLAG = "true"`), but it was missing from `DESTRUCTIVE_GATE_ENV` in the guardrail tests. Add it so a future regression (e.g. flipping to `!= "false"`) is caught by the same parameterized invariants as every other workflow. Co-authored-by: Yassin Kortam <yassin@berri.ai> * agent_shin: fix bug bundle (gated LLM key, author-filtered marker dedup, dedup gh/grace helpers) Co-authored-by: Yassin Kortam <yassin@berri.ai> * agent_shin: fix review_gate close-after-regression and case-insensitive label match Co-authored-by: Yassin Kortam <yassin@berri.ai> * feat(triage): add one-shot 7-day heads-up sweep for Agent Shin rollout Adds a rollout-day workflow that comments on every open external PR/issue that the new triage bot WOULD auto-close, giving contributors 7 days to fix their description before any destructive action runs. Why now: merging this PR enables Agent Shin in dry-run. The follow-up "enact" PR (next Monday) flips the destructive paths on. Without this heads-up, contributors would get a close-comment on day 8 with no prior warning. The heads-up names the cutoff date, lists the rubric, calls out each PR/issue's specific missing pieces, and explains the recovery paths (@agent-shin reconsider for PRs, edit + reopen for issues). Files - .github/scripts/_agent_shin_actions.py — thin maybe_post_comment / maybe_close_* / maybe_add_label / etc. wrappers. Each is a single `if dry_run: log; return; else: call_through()` so a dry-run preview differs from the real run in exactly one call site per mutation. The call-through goes via `triage_with_llm.<name>` (module-qualified) so monkeypatching the underlying function in tests is reflected here. - .github/scripts/triage_rollout_heads_up.py — the sweep. Iterates every open PR + issue via `gh pr list` / `gh issue list`, runs the future rubric (review_gate for PRs, triage(kind="issue") for issues), and posts the heads-up on any item that would be auto-closed. Idempotent via a `` marker. Defaults to dry- run; --close opts in to real posts. --close-on overrides the cutoff date (defaults to today + 7 days). - .github/workflows/triage_rollout_heads_up.yml — one-shot workflow. Triggers on push to litellm_internal_staging filtered to the script path (fires on rollout merge) plus workflow_dispatch with a dry_run input that defaults to "true" for safe manual re-runs. - tests/test_litellm/test_triage_rollout_heads_up.py — 28 unit tests covering: the dry-run wrappers (each maybe_* gates correctly), the _would_be_closed predicate for PR vs. issue results, the comment formatter (cutoff/rubric/marker/recovery wording), per-item dispatch (skip-not-open, skip-internal-author, skip-already-notified, skip-passing, would-post/posted), and the sweep loop end-to-end. Local preview (no GitHub mutations): python3 .github/scripts/triage_rollout_heads_up.py --repo BerriAI/litellm Real run (what the workflow does): python3 .github/scripts/triage_rollout_heads_up.py --repo BerriAI/litellm --close TODO: replace the placeholder ROLLOUT_BLOG_URL with the canonical docs URL once the litellm-docs PR ships. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: gate reconsider workflow OPENAI_API_KEY + remove dead actions wrappers - Mirror sibling Agent Shin workflows by only exposing OPENAI_API_KEY in triage_reconsider.yml when vars.AGENT_SHIN_ENABLED == 'true'. Previously the secret was unconditionally exposed, so any PR/issue author could trigger paid LLM calls by commenting '@agent-shin reconsider' even while the bot was supposed to be in dry-run. - Remove the six unused dry-run wrappers (maybe_close_pr, maybe_close_issue, maybe_reopen_pr, maybe_reopen_issue, maybe_add_label, maybe_remove_label) from _agent_shin_actions.py — only maybe_post_comment is used by rollout scripts. Drop the associated tests that exercised the now-removed functions. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: address triage script edge cases - triage_rollout_heads_up.py: replace %-d strftime specifier (GNU-only) with portable day formatting so the script doesn't crash on Windows. - close_low_quality_prs.py: skip malformed JSON lines in fetch_pr_comments instead of letting one bad line abort the daily sweep, matching the pattern in triage_with_llm._iter_paginated_json. - triage_with_llm.py: move has_linked_issue short-circuit before build_pr_prompt to avoid unnecessary prompt construction on PRs that link an issue. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(scripts): per-PR error isolation and limit grace warnings in close_low_quality_prs - Wrap per-PR processing in try/except so a transient GitHub API failure on one PR no longer aborts the entire daily sweep (mirrors the pattern already used in triage_rollout_heads_up.py). - Have --limit bound *all* destructive write actions (closures and grace warnings combined), not just closures. Prevents a backlog of newly failing PRs from flooding contributors with comments in a single run. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(agent-shin): remove 1000-PR cap on bulk sweeps; sweep entire backlog Both bulk-sweep scripts hardcoded `gh {pr,issue} list --limit 1000`, and gh lists newest-first — so the OLDEST ~900 PRs and ~380 issues were silently dropped. That's exactly the stale backlog the daily closer and one-shot rollout heads-up exist to catch. Extract a single `list_open_items(kind, *, repo, fields)` helper into `agent_shin_shared.py` with `GH_LIST_ALL_LIMIT = 100_000` — a ceiling far above any realistic open backlog so gh paginates until the queue is exhausted. `fetch_open_prs` and `_list_open_numbers` both delegate to it, so the limit lives in exactly one place going forward. Verified live against BerriAI/litellm: - `fetch_open_prs` -> 1981 PRs (was 1000) - `_list_open_numbers(issue)` -> 1382 issues (was 1000) - `_list_open_numbers(pr)` -> 1981 PRs (was 1000) Adds 7 regression tests asserting the new limit is passed, the dedicated `gh {pr,issue} list` command + fields are used per kind, bad kind raises ValueError, and both callers delegate to the shared helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(agent-shin): require non-mocked end-to-end QA proof for PR pass The PR rubric previously passed any PR with a linked issue, regardless of whether it showed the fix actually working. Sample spot-check found 21/25 recent external PRs passing, including ones that linked an issue but provided zero QA evidence. Tighten the rubric so a pass now requires BOTH: (1) CONTEXT — a linked issue OR a clear problem description with expected-vs-actual behavior. (2) END-TO-END QA PROOF — at least one of: (a) screenshot(s) of the fix working, (b) screen recording / video, (c) specific commands actually run, paired with their real output, against the real system. Mocked unit tests, generic 'I tested it' claims, 'all tests pass' without output, and the linked issue itself are explicitly excluded from QA proof. Also add 'qa_proof_type' to the JSON schema so the per-PR report surfaces which kind of proof (or 'none') the judge saw. Re-sample on the same 25 recent external PRs shifts the verdict distribution from 21 pass / 4 fail to 4 pass / 21 fail, with zero prior-fails now passing — the stricter rule catches PRs that ship only with unit-test claims and no real integration evidence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(agent-shin): link blog explainer from every action-required bot comment Adds "What's this and why am I getting it?" links to docs.litellm.ai/blog/ agent-shin-triage from the four comments contributors actually read when something went wrong: PR close, PR grace warning, issue close, issue grace warning. PR comments also link the rubric section directly from the QA-proof bullet so contributors can self-serve "what counts as proof" without pinging a maintainer. Pins the new guarantees in tests: blog link must appear in all four comments, and the PR close comment must continue to flag mocked-dependency unit tests as insufficient proof. The linked blog post is in BerriAI/litellm-docs PR #240; the URL will 404 until that lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review_gate): raise sweep limit from 1000 to 100000 to match GH_LIST_ALL_LIMIT gh lists newest-first, so capping at 1000 silently drops the oldest open PRs — exactly the stale ones the daily sweep is meant to reconcile. Use the same ceiling as agent_shin_shared.GH_LIST_ALL_LIMIT so the workflow sees the entire backlog. Co-authored-by: Yassin Kortam <yassin@berri.ai> * Fix three Agent Shin triage edge cases - review_gate: expire the regression-marker short-circuit after grace_days so PRs that were regressed and then abandoned can eventually be closed. - review_gate: when the rubric short-circuits to pass via the linked-issue regex but Greptile drags the PR below the bar, replace the synthetic 'LLM was not called' explanation with the real Greptile shortfall so regression / close comments are not misleading. - triage_rollout_heads_up._comments_have_marker: drop the unused 'kind' parameter and filter by bot author so a contributor quoting the heads-up via 'Quote reply' cannot trick the idempotency check, matching the pattern in triage_with_llm._has_marker. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: pass min_greptile_score through to ready-for-review comment text Co-authored-by: Yassin Kortam <yassin@berri.ai> * feat(agent-shin): warmer triage comments — bullet-train emoji, 'what you got right' section, softer 'park this for later' framing User feedback on the auto-triage comments contributors will see: 1. Tone — the previous 'You have 1 day to address this before this PR is auto-closed' framing reads as an ultimatum. Replace with: 'If the description isn't updated in the next 1 day, I'll auto-close this PR. That's not us saying we don't care about the change — we want the open-PR list to mirror what a maintainer can act on right now, so contributors don't get lost in a backlog. A closed PR is a soft "park this for later," not a rejection. Take your time.' 2. Positive feedback — the previous comments only listed what was missing. Now every close + grace-warning comment opens with a 'What you got right:' section rendered from the judge's per-field flags. Contributors see a checkmark for everything they got right (linked issue, problem description, expected/actual, QA proof for PRs; runnable repro, screenshot/log, expected/actual, motivation+example for issues) before the gaps. The block is omitted entirely when nothing is present so we never render 'What you got right: (nothing).' 3. Reconsider trigger — the previous grace warning told contributors to comment '@agent-shin reconsider' during the grace window. They don't need to — the bot re-checks on every sweep. The new copy says 'just update the description, no need to ping me' for the grace path, and reserves '@agent-shin reconsider' for the post-close recovery path. 4. Bullet-train emoji — replace 👋 with 🚄 (Shinkansen, the symbol of Agent Shin) across every action-required comment: PR close, PR grace warning, issue close, issue grace warning, within-grace, Greptile- closer grace warning, rollout heads-up. Pinned in tests so a future refactor can't silently revert. 5. Greptile-post-close — the @greptileai bullet now explicitly says 'a low Greptile score isn't a blocker either,' since the previous copy buried the fact that @greptileai works after auto-close. Comment templates updated: format_pr_close_comment, format_issue_close_comment, format_grace_warning_pr_comment, format_grace_warning_issue_comment, format_within_grace_comment (triage_with_llm.py); format_grace_warning_comment (close_low_quality_prs.py); format_heads_up_comment header (triage_rollout_heads_up.py). New helpers: _format_present_for_pr / _format_present_for_issue / _format_present_block, driven off the existing per-field flags the LLM judge already emits — no prompt change needed. New tests pin: bullet-train emoji in every action-required comment; 'What you got right' appears with ✅ bullets when fields are present; the block is omitted when no fields are present; 'park this for later' / 'not a rejection' softer framing; grace warnings tell the contributor 'no need to ping' during the grace window (reconsider is the post-close path only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(agent-shin): gate triage on a dogfood allowlist Add ALLOWLIST_LOGINS to agent_shin_shared so Agent Shin only acts on the named accounts while the set is non-empty. mateo-berri and SwiftWinds are allowlisted for the dogfood rollout; everyone else is skipped with skip-not-allowlisted across all four entrypoints (triage, review gate, the daily low-quality sweep, and the rollout heads-up). For an allowlisted author the usual internal/external classification is bypassed, so a maintainer's own org account still gets triaged during testing. Emptying the set lifts the restriction and restores full triage for the public rollout. The gate is dependency-injected via an `allowlist` parameter defaulting to the constant, so the internal/external-skip paths stay testable. * feat(agent-shin): tighten QA-proof and issue rubrics, ack reconsider with reactions Reorder the end-to-end QA proof options to video, then screenshots, then exact commands with their real output across the PR template, the LLM judge prompts, and every contributor-facing comment, and spell out that mocked or stubbed runs (including pytest on the repo's own unit tests, which mock the provider, DB, and network) never count as proof. QA proof is now required of all contributors, not just external ones. Tighten the issue bug-report rubric to require end-to-end evidence of the bug (the "before" half: a video, screenshot, or command paired with real output) plus expected vs. actual behavior, drop the bias toward PASS, and collapse the separate has_repro/has_proof flags into a single has_repro signal. Standardize the bullet-train emoji and strip em dashes from the bot's public-facing messages, and route issue recovery through @agent-shin reconsider since GitHub doesn't let OSS authors reopen an issue a bot closed. Acknowledge an @agent-shin reconsider the moment it's accepted with an eyes reaction and a thumbs-up once the run finishes, both gated on AGENT_SHIN_ENABLED so dry-run leaves no trace. * fix(agent-shin): shorten auto-close grace to 2 hours and drop the instant-close bypass Two dogfooding changes to the Agent Shin grace window. First, the warn-then-close grace (GRACE_PERIOD_SECONDS) drops from a day to 2 hours so the "fix it before it closes" loop can be exercised in one sitting; the constant carries a note to bump it back up for the public rollout. Second, remove IMMEDIATE_CLOSE_LOGINS entirely. SwiftWinds (the external dogfood account) used to skip the grace window and close on first detection, which also meant closing real PRs even during a scheduled dry run because the per-PR override flipped dry_run off. It now follows the same warn-then-close path as every other author, so a low-quality PR is warned first and only closed once the 2-hour window elapses. This also closes the Greptile finding that the sweep could mutate real PRs while AGENT_SHIN_ENABLED was still off. The review gate's separate age-based grace (DEFAULT_GRACE_DAYS) is left unchanged. Regression tests pin that SwiftWinds now warns-grace instead of closing instantly, and that a dry-run sweep over a closeable PR reports "would close" without making any GitHub mutation. * fix(agent-shin): gate reconsider reopen on an Agent Shin close marker was_closed_by_agent_shin only checked that the most recent close actor was the bot identity. That identity defaults to github-actions[bot], which is shared by every workflow in the repo (stale/duplicate sweeps included), so a contributor could @agent-shin reconsider an item another workflow closed and, if the description passed the rubric, get it reopened even though Agent Shin was never the closer. Require a second, Agent-Shin-specific signal alongside the actor check: an auto-close comment stamped with a hidden AGENT_SHIN_CLOSE_MARKER. Both close paths (the grace-period close and the review-gate close) flow through format_pr_close_comment / format_issue_close_comment, so stamping the marker there covers every real close while leaving the grace warnings unmarked. The guard stays fail-closed: no marker, no reopen. This also replaces the unused AGENT_SHIN_AUTO_CLOSE_MARKER constant (a visible phrase the guard never consulted) with the hidden marker the guard now relies on. * fix(agent-shin): stamp close marker on sweep closes and disclose regression deadline The daily Greptile sweep's close comment advertised `@agent-shin reconsider` but never stamped AGENT_SHIN_CLOSE_MARKER, so the reconsider reopen guard (was_closed_by_agent_shin), which now also requires that marker, silently rejected every sweep-closed PR with `skip-not-bot-closed`. Move the marker into agent_shin_shared so both close paths share one source of truth, extract format_close_comment so the sweep close comment is unit-testable, and stamp the marker there. Also disclose the grace_days deadline in the review-gate regression comment; it promised "the PR stays open" without mentioning that a still-failing PR is auto-closed grace_days after the notice, which would surprise contributors with a close they were never warned about. * fix(triage): tighten Agent Shin reconsider reopen guards The bot-closed guard accepted any historical Agent Shin marker comment on the thread as proof that Agent Shin owned the latest close, so a post-reopen close by another workflow under the shared `github-actions[bot]` identity could still satisfy the gate and let `@agent-shin reconsider` reopen a PR that Agent Shin did not close this cycle. `fetch_last_close_event` now also returns the latest `closed` event timestamp, and `was_closed_by_agent_shin` requires the most recent Agent Shin marker comment to sit at (or just before) that timestamp, with a small skew window for clock drift between the events and comments APIs. In the same path the LLM verdict check used `decision != "fail"` to choose the reopen branch, which treated a missing, empty, or typo verdict as a pass. Reopen is destructive, so the check now requires an explicit `decision == "pass"` and ambiguous verdicts fall through to the "still failing" branch instead. * style(agent-shin): black-format reconsider guard hardening * docs(agent-shin): scope dry-run wrapper docstring to the single existing helper The module docstring claimed it wrapped every Agent Shin mutation and referenced post_comment/close_pr/etc., but only maybe_post_comment exists. Describe the single helper accurately while keeping the dry-run pattern guidance for any future wrapper. * chore(agent-shin): defer issue/PR template changes to the rollout PR The triage and review-gate automation is gated to the allowlisted authors (mateo-berri, SwiftWinds) and AGENT_SHIN_ENABLED, so during this rollout it only acts on internal PRs/issues. The issue and PR templates have no such gate; they change for every contributor on merge and advertise that an LLM bot auto-closes external submissions, which won't happen while the allowlist is the sole author gate. Revert bug_report.yml, feature_request.yml, and pull_request_template.md to base so the public-facing messaging lands with the rollout flip instead of ahead of it. The scripts embed their own rubric and never read these files, so triage behavior is unchanged. * ci(agent-shin): hash-pin the openai install in privileged triage workflows The triage workflows install the OpenAI client with `pip install "openai>=1.40.0"`, a floating lower bound that resolves openai and its whole transitive tree to whatever PyPI serves at run time. These jobs run under pull_request_target with a write-scoped GITHUB_TOKEN, and the install plus the triage run happen on every PR open regardless of the AGENT_SHIN_ENABLED dry-run gate (that gate only withholds the LLM key and the destructive --close path), so a compromised release would execute during install or import while the token is in scope. Install instead from a new .github/scripts/triage-requirements.txt that pins openai==2.33.0 and every transitive dependency to an exact version with sha256 hashes, via pip --require-hashes. The workflows already sparse-checkout .github/scripts from the base repo (never fork code), so the pinned file is trusted. Add static guardrails to test_github_triage_workflows.py that fail if any installer workflow reverts to a floating openai install or if the requirements file loses its exact pins or hashes. * ci(agent-shin): gate rollout heads-up real run behind manual dispatch The rollout heads-up workflow fired its real `--close` sweep on every push to litellm_internal_staging that touched the script, and exposed OPENAI_API_KEY unconditionally, unlike every sibling triage workflow which only exposes the key on an enabled or dispatched run. That made merging the script post real heads-up comments (bounded only by the dogfood allowlist), which contradicts the inert-by-default safety invariant; once the allowlist is cleared for the public rollout, any later edit to the file would sweep the whole open backlog with real writes. The heads-up cannot be gated on AGENT_SHIN_ENABLED: its whole job is to warn contributors before that flag flips on, so it has to run while the flag is still off. Instead the automatic push trigger now stays dry-run, and the real one-shot sweep is a deliberate manual workflow_dispatch with dry_run=false, the sole path that adds `--close`. OPENAI_API_KEY is exposed only on that dispatch, matching the sibling workflows. Add static guardrails that fail if the push path regains a `--close`, if the dispatch gate stops fail-closing on the exact string "false", or if the key is exposed unconditionally again. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>

cursor Bot changed the title ~~fix(triage): bugbot on #28117 — reconsider dry-run + bot-closed guard + rate limit~~ fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood May 19, 2026

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread .github/scripts/close_low_quality_prs.py

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread .github/scripts/close_low_quality_prs.py

mateo-berri mentioned this pull request May 25, 2026

Combine #28758 + #28117 + #28147 — Agent Shin auto-close + review-gate label lifecycle #28759

Closed

6 tasks

mateo-berri mentioned this pull request Jun 14, 2026

feat(agent-shin): automated PR/issue triage, low-quality auto-close, and review-gate label lifecycle #30433

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood#28147

fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood#28147
mateo-berri wants to merge 4 commits into
litellm_auto-close-low-quality-prs-1f26from
litellm_fix-agent-shin-reconsider-safety-1af4

mateo-berri commented May 18, 2026 •

edited by cursor Bot

Loading

Uh oh!

CLAassistant commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026

Uh oh!

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mateo-berri commented May 18, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Part 1 — Reconsider safety (Greptile + veria-ai feedback on #28117)

1a. Reconsider had no dry-run path (Greptile)

1b. Reconsider could reopen maintainer-closed PRs/issues (Greptile + veria-ai security)

1c. No rate-limiting on the reconsider trigger (Greptile)

Part 2 — 1-day grace period before close

Behavior

Flow on a real low-quality PR (24h cadence is the daily Greptile cron)

evaluate_pr action surface (close_low_quality_prs.py)

triage() action surface (triage_with_llm.py)

Part 3 — SwiftWinds dogfood bypass (IMMEDIATE_CLOSE_LOGINS)

Files changed

Test results

Out of scope

Type

Uh oh!

CLAassistant commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026

Codecov Report

Uh oh!

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mateo-berri commented May 18, 2026 •

edited by cursor Bot

Loading

`evaluate_pr` action surface (close_low_quality_prs.py)

`triage()` action surface (triage_with_llm.py)

Part 3 — SwiftWinds dogfood bypass (`IMMEDIATE_CLOSE_LOGINS`)

cursor Bot left a comment •

edited

Loading