Skip to content

fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood#28147

Draft
mateo-berri wants to merge 4 commits into
litellm_auto-close-low-quality-prs-1f26from
litellm_fix-agent-shin-reconsider-safety-1af4
Draft

fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood#28147
mateo-berri wants to merge 4 commits into
litellm_auto-close-low-quality-prs-1f26from
litellm_fix-agent-shin-reconsider-safety-1af4

Conversation

@mateo-berri

@mateo-berri mateo-berri commented May 18, 2026

Copy link
Copy Markdown
Collaborator

Fix-on-top for #28117. Combines two layers of changes:

  1. Reconsider safety (original scope) — addresses three concerns Greptile + veria-ai flagged on the @agent-shin reconsider flow.
  2. 1-day grace period before close + SwiftWinds dogfood (new scope) — the user explicitly asked for both in chat:

    "I want to give contributors a 1 day grace (specify in the comment) to fix their pr before it closes. They can still @ the bot the request another look (and thus a possible re-opening). It should state in the message all this and that also even after the pr is closed @'ing greptileai works fine. Also, turn on full (non dry run) mode with SwiftWinds. Thats my personal account I want to test this PR with for this week. It has no push permissions to litellm so its a perfect example."
    "For SwiftWinds, just close immediately. Faster iteration that way."


Part 1 — Reconsider safety (Greptile + veria-ai feedback on #28117)

Greptile review on #28117 (Confidence Score 3/5): "The reconsider mode in triage_with_llm.py ignores --close and makes real write calls unconditionally; an operator running --reconsider locally without the workflow AGENT_SHIN_ENABLED guard would post comments and reopen PRs. The reconsider workflow also has no check that the closure came from the bot, so it can reopen items closed by maintainers for non-quality reasons."

veria-ai review on #28117 (Risk 5/10): "This PR adds GitHub automation that can close and reopen issues/PRs from a write-token workflow. The reconsider path authorizes the original external author but does not verify that the item was previously closed by Agent Shin, so a contributor can make the bot reopen a PR a maintainer closed for another reason."

1a. Reconsider had no dry-run path (Greptile)

Before: --reconsider ignored --close and always posted comments + reopened on pass. After: reconsider honors close=False the same way regular triage does. Returns would-reopen / would-reconsider-still-failing (with the previewed comment body) for step-summary rendering. The workflow only passes --close when AGENT_SHIN_ENABLED=true.

1b. Reconsider could reopen maintainer-closed PRs/issues (Greptile + veria-ai security)

Before: triage_reconsider.yml only checked the commenter — it did NOT check that the most recent close was performed by Agent Shin. A contributor could @agent-shin reconsider on a PR a maintainer closed for non-rubric reasons (duplicate, security report, design rejection) and have the bot reopen it. After: was_closed_by_agent_shin() inspects the issue events API for the most recent closed event's actor and only permits reopen when the actor matches the configured bot login (default github-actions[bot], overridable via AGENT_SHIN_BOT_LOGIN). Fail-closed on missing events. Returns skip-not-bot-closed; the LLM never runs.

1c. No rate-limiting on the reconsider trigger (Greptile)

Before: every @agent-shin reconsider comment burned CI minutes + an OpenAI API call. After: a 10-minute cooldown via seconds_since_last_reconsider_verdict(), which detects the bot's own verdict marker <!-- agent-shin:reconsider-verdict -->. Inside the cooldown window, triage returns skip-rate-limited and the LLM never runs.


Part 2 — 1-day grace period before close

Behavior

When Agent Shin (LLM judge) or the Greptile-score closer flags an external PR/issue as low-quality, the bot now does NOT close immediately. Instead it posts a grace-period warning comment that explicitly states:

  • "You have 1 day to address this before this PR is auto-closed."
  • During the grace window: update the description and comment @agent-shin reconsider to skip the auto-close.
  • After auto-close: @agent-shin reconsider still re-runs triage and reopens; @greptileai also works even after the PR is closed for a fresh re-review.

The warning comment carries an HTML marker <!-- agent-shin:grace-warning --> that both scripts use to coordinate:

  • Both .github/scripts/triage_with_llm.py (LLM judge) and .github/scripts/close_low_quality_prs.py (Greptile-score closer) detect the marker.
  • A warning posted by one path is recognized by the other on the next run.
  • GRACE_PERIOD_SECONDS = 86400 (24h) — defined in both files; tests pin both.

Flow on a real low-quality PR (24h cadence is the daily Greptile cron)

Day Closer evaluates Grace marker Action
1 Greptile 3/5, no warning yet absent warn-grace → post warning, do NOT close
1.5 Greptile 3/5, warning 12h old < 24h skip-in-grace-period → no-op
2 Greptile 3/5, warning 25h old ≥ 24h close → post close comment + close PR
2 (alt) Greptile bumped to 5/5 after fix ≥ 24h skip-score-ok → PR stays open

evaluate_pr action surface (close_low_quality_prs.py)

New: warn-grace, skip-in-grace-period. Existing actions (close, skip-too-young, skip-optout-label, skip-internal, skip-no-greptile-score, skip-score-ok) are unchanged.

triage() action surface (triage_with_llm.py)

New: warned-grace, would-warn-grace (dry-run preview), skip-in-grace-period. Existing closed / would-close now only fire after the grace window.


Part 3 — SwiftWinds dogfood bypass (IMMEDIATE_CLOSE_LOGINS)

The user explicitly named SwiftWinds as their personal account for testing this PR for the week ahead, and explicitly said "for SwiftWinds, just close immediately. Faster iteration that way."

IMMEDIATE_CLOSE_LOGINS = frozenset({"swiftwinds"}) (case-insensitive match) lives in both scripts and bypasses both:

  1. The 1-day grace period — close fires on the first detection.
  2. The dry-run / AGENT_SHIN_ENABLED workflow gating — the script writes (post comment + close PR) for these logins regardless of whether the workflow passed --close.

This is intentional: SwiftWinds has no push permissions to litellm, which makes it a clean dogfood account, and waiting 24h per iteration would kill the testing feedback loop.

The bypass is centralized in the Python scripts — workflows don't need conditional logic. Static workflow tests (test_github_triage_workflows.py) still pin the AGENT_SHIN_ENABLED = "true" gate for the global population.


Files changed

  • .github/scripts/triage_with_llm.py

    • Reconsider safety: was_closed_by_agent_shin(), seconds_since_last_reconsider_verdict(), fetch_last_close_actor(), _iter_paginated_json(); bot-closed + rate-limit guards before the LLM call; honors close=False in reconsider mode; RECONSIDER_COMMENT_MARKER on reopen / still-failing comments.
    • Grace period: GRACE_COMMENT_MARKER, GRACE_PERIOD_SECONDS, IMMEDIATE_CLOSE_LOGINS; seconds_since_last_grace_warning(); format_grace_warning_pr_comment() and format_grace_warning_issue_comment(); triage() now posts a warning on the first failing detection and only closes after the window elapses (or for IMMEDIATE_CLOSE_LOGINS); format_pr_close_comment() updated to mention @greptileai works post-close.
    • Refactor: shared _seconds_since_latest_marker_comment() helper underneath both reconsider and grace helpers so the marker-iteration logic lives in one place.
  • .github/scripts/triage_reconsider.yml

    • Passes --close only when AGENT_SHIN_ENABLED=true (matches the other Agent Shin workflows); always runs the script so the dry-run verdict appears in the step summary. (Part of original PR scope.)
  • .github/scripts/close_low_quality_prs.py

    • Grace period: GRACE_COMMENT_MARKER, GRACE_PERIOD_SECONDS, IMMEDIATE_CLOSE_LOGINS (mirroring the LLM-judge constants); seconds_since_last_grace_warning() (operates on already-fetched comments, with injectable now for tests); format_grace_warning_comment(); evaluate_pr returns warn-grace / skip-in-grace-period / close based on prior warning age; main() posts the warning via new post_grace_warning() and dispatches per-PR pr_dry_run = dry_run and not is_immediate so SwiftWinds bypasses the global --close gate.
    • The standard close comment now mentions @greptileai works even after the PR is closed.
  • tests/test_litellm/test_github_triage_with_llm.py

    • Updates existing close-path tests to use the new _stub_grace_aged_out / _stub_grace_no_warning helpers.
    • New tests (15): TestTriageOrchestration::{test_should_post_grace_warning_on_first_failing_run_in_close_mode, test_should_skip_close_inside_grace_window, test_should_dry_run_grace_warning_when_close_false, test_should_skip_grace_for_swiftwinds_login, test_should_close_swiftwinds_even_when_close_flag_false, test_should_match_immediate_close_login_case_insensitively, test_should_not_treat_random_external_login_as_immediate_close}; TestImmediateCloseLoginsConstant; TestGraceWarningCommentText (5 invariants on the user-facing language: 1-day grace, @agent-shin reconsider, @greptileai, "even after the PR is closed", marker presence); TestSecondsSinceLastGraceWarning (3 cases).
  • tests/test_litellm/test_github_close_low_quality_prs.py

    • Updates existing close-path tests: first detection now produces warn-grace, not close.
    • New tests (12): grace flow for drafts / brand-new PRs / no-prior-warning, post-grace close after window expires, skip inside window, SwiftWinds bypass + case-insensitivity, TestSecondsSinceLastGraceWarning, TestImmediateCloseLoginsConstant, TestGraceWarningCommentText (1-day, reconsider, @greptileai, marker, post-close mentions).

Test results

$ uv run pytest tests/test_litellm/test_github_triage_with_llm.py \
                tests/test_litellm/test_github_close_low_quality_prs.py \
                tests/test_litellm/test_github_triage_workflows.py
============================= 144 passed in 2.36s =============================

(Was 112 on the previous revision; +32 new tests for the grace path, the SwiftWinds bypass, the new comment language, and the new helper.)

Out of scope

  • The grace window length (GRACE_PERIOD_SECONDS = 86400) is a constant. If the team wants per-repo tuning we can wire it through --grace-seconds in a follow-up.
  • IMMEDIATE_CLOSE_LOGINS is hardcoded to {"swiftwinds"} (the user's named test account). Adding more dogfood accounts is a one-line change.
  • No removal of debug code — none was added; all changes are production-quality.

Type

🐛 Bug Fix
🛡️ Security
✨ Feature

Slack Thread

Open in Web Open in Cursor 

Address three Greptile/veria-ai concerns on the @agent-shin reconsider
flow:

1. **Reconsider had no dry-run path.** The previous reconsider mode
   ignored `--close` and always posted comments + reopened on a pass.
   A local operator running
   `python triage_with_llm.py --reconsider --pr N` would silently
   take destructive GitHub actions with no way to preview. Reconsider
   now honors `close=False` the same way regular triage does and
   returns `would-reopen` / `would-reconsider-still-failing` for
   step-summary rendering.

2. **Reconsider could reopen maintainer-closed PRs/issues** (Medium
   security finding from veria-ai). The workflow only checked that the
   commenter was authorized — it did NOT check that the most recent
   close was performed by Agent Shin. A contributor could comment
   `@agent-shin reconsider` on a PR a maintainer closed for non-rubric
   reasons (duplicate, security report, design rejection) and have the
   bot reopen it. Add `was_closed_by_agent_shin()` which inspects the
   issue events API for the most recent `closed` actor and only
   permits reopen when that actor matches the configured bot login
   (default `github-actions[bot]`, overridable via env). Fail-closed
   on missing events.

3. **No rate-limiting on the reconsider trigger.** Every
   `@agent-shin reconsider` comment burns CI minutes + an OpenAI API
   call. Add a 10-minute cooldown via
   `seconds_since_last_reconsider_verdict()` which greps the issue's
   comment list for the bot's own verdict marker
   (`<!-- agent-shin:reconsider-verdict -->`). Inside the window the
   triage returns `skip-rate-limited` and the LLM never runs.

Workflow update:
- `triage_reconsider.yml` now passes `--close` only when
  `AGENT_SHIN_ENABLED=true`, matching the pattern of
  `triage_pr_with_llm.yml`. The script runs in both states so the
  verdict still appears in the step summary for QA.

Tests:
- Add 5 reconsider safety tests: dry-run for pass / fail / linked-issue
  short-circuit, bot-closed-guard refusal on maintainer close,
  rate-limit refusal inside the cooldown window, and cooldown-elapsed
  acceptance.
- Add unit tests for `was_closed_by_agent_shin` (bot / maintainer /
  missing actor / env-override) and
  `seconds_since_last_reconsider_verdict` (no marker / multiple
  markers / non-bot comment with marker / bot comment without marker).
- Pin the `<!-- agent-shin:reconsider-verdict -->` marker in both
  reopen and still-failing comments — dropping it would silently
  break the cooldown.

Existing reconsider tests updated to pass `close=True` (the
production path now) + stub the new guards via
`_stub_reconsider_guards`. 112 tests pass (was 93).

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov

codecov Bot commented May 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…close bypass

- Add a 24-hour grace window between the first low-quality detection
  and the actual auto-close. The first detection posts a warning
  comment that explicitly says "You have 1 day to address this before
  this PR is auto-closed" and points the contributor at:
    * `@agent-shin reconsider` to request another look (and re-open)
    * `@greptileai` to request a fresh Greptile review — works
      even after the PR is closed
- Both `triage_with_llm.py` (LLM judge) and `close_low_quality_prs.py`
  (Greptile-score closer) share the same `<!-- agent-shin:grace-warning -->`
  HTML marker so a warning posted by either path is recognized by both.
- Add IMMEDIATE_CLOSE_LOGINS = {swiftwinds} to bypass BOTH the grace
  period AND the dry-run / AGENT_SHIN_ENABLED gating. SwiftWinds is the
  user's personal account (no push permissions to litellm) used to
  dogfood the bot; user explicitly asked: "For SwiftWinds, just close
  immediately. Faster iteration that way."
- Update the standard close comments to mention that `@greptileai`
  works even after the PR is closed.
- Add 23 new tests covering: warn-grace on first detection, skip during
  grace window, close after grace expires, SwiftWinds bypass (case
  insensitive, with close=False, no random-login false positives), the
  grace-warning text invariants, and the SwiftWinds entry in the
  IMMEDIATE_CLOSE_LOGINS constant.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
@cursor cursor Bot changed the title fix(triage): bugbot on #28117 — reconsider dry-run + bot-closed guard + rate limit fix(triage): bugbot on #28117 — reconsider safety + 1-day grace + @greptileai post-close + SwiftWinds dogfood May 19, 2026
Comment thread .github/scripts/close_low_quality_prs.py
For PRs from IMMEDIATE_CLOSE_LOGINS (e.g. swiftwinds), evaluate_pr
returns 'close' immediately without ever posting a grace warning, so
the close comment should not reference a 1-day grace period.

Make close_pr take a grace_period_elapsed flag, default True, and
pass False from the main loop when the close path was the
immediate-close branch.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using high mode and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Summary misreports "would close" when immediate-close PRs were actually closed
    • Split the dry-run summary to report both actual closures (from the closed counter) and the dry-run "would close" count, so immediate-close logins are no longer misreported.
Preview (b0fc8c224f)
diff --git a/.github/scripts/close_low_quality_prs.py b/.github/scripts/close_low_quality_prs.py
--- a/.github/scripts/close_low_quality_prs.py
+++ b/.github/scripts/close_low_quality_prs.py
@@ -38,6 +38,7 @@
 import argparse
 import datetime as dt
 import json
+import os
 import re
 import subprocess
 import sys
@@ -66,7 +67,34 @@
 # `default=[...]` combination silently mutates the shared default list.
 DEFAULT_OPTOUT_LABELS = ("do not close", "keep open", "wip")
 
+# HTML marker appended to grace-period warning comments. Shared with the
+# Agent Shin LLM-judge script (`triage_with_llm.py`) so a warning posted
+# by either path is recognized by both: the LLM judge can see "Greptile
+# already warned this contributor 12 hours ago" and skip re-warning, and
+# the Greptile closer can see "Agent Shin already warned" and close on
+# the next run if Greptile still has a low score.
+GRACE_COMMENT_MARKER = "<!-- agent-shin:grace-warning -->"
 
+# Length of the grace period between the warning comment and the actual
+# auto-close. Set to 24 hours so the contributor has at least one full
+# working day across any time zone to push fixes or comment
+# `@agent-shin reconsider`. Mirrors the constant of the same name in
+# `triage_with_llm.py` — keep them in sync if either changes.
+GRACE_PERIOD_SECONDS = 86400
+
+# Default login of the GitHub identity that performs Agent Shin's writes;
+# used for matching the author of a grace-warning comment so we don't
+# count somebody quoting the marker. The env override
+# `AGENT_SHIN_BOT_LOGIN` mirrors `triage_with_llm.py`.
+AGENT_SHIN_DEFAULT_BOT_LOGIN = "github-actions[bot]"
+
+# Logins (case-insensitive) that bypass BOTH the 1-day grace period AND
+# the dry-run gating. Mirrors `IMMEDIATE_CLOSE_LOGINS` in
+# `triage_with_llm.py`. Used for dogfooding the bot from external test
+# accounts that have no push permissions to the repo.
+IMMEDIATE_CLOSE_LOGINS = frozenset({"swiftwinds"})
+
+
 def gh(*args: str) -> str:
     """Run a `gh` CLI command and return stdout. Raises on non-zero exit."""
     result = subprocess.run(
@@ -196,6 +224,124 @@
     return bool(labels & {lbl.lower() for lbl in optout_labels})
 
 
+def seconds_since_last_grace_warning(
+    comments: Iterable[dict],
+    *,
+    bot_login: str | None = None,
+    now: dt.datetime | None = None,
+) -> float | None:
+    """Return seconds since the bot's most recent grace-period warning, or
+    None if no such warning has ever been posted on this PR.
+
+    Detects warnings by matching `GRACE_COMMENT_MARKER` in comments
+    authored by the bot identity. Operates on an already-fetched
+    comments list (avoids a second `gh api` call when the caller has
+    already pulled the page for Greptile-score extraction).
+
+    `now` is injectable so callers (and tests) can pin the reference
+    time. The closer runs everything against a single `now` snapshot
+    captured at the top of `main()` so age calculations stay consistent
+    across many PRs in a single run.
+    """
+    expected_login = (
+        bot_login
+        or os.environ.get("AGENT_SHIN_BOT_LOGIN")
+        or AGENT_SHIN_DEFAULT_BOT_LOGIN
+    ).lower()
+    latest: dt.datetime | None = None
+    for comment in comments:
+        author = ((comment.get("user") or {}).get("login") or "").lower()
+        if author != expected_login:
+            continue
+        body = comment.get("body") or ""
+        if GRACE_COMMENT_MARKER not in body:
+            continue
+        created = comment.get("created_at")
+        if not created:
+            continue
+        try:
+            ts = parse_iso8601(created)
+        except ValueError:
+            continue
+        if latest is None or ts > latest:
+            latest = ts
+    if latest is None:
+        return None
+    reference = now if now is not None else dt.datetime.now(dt.timezone.utc)
+    return (reference - latest).total_seconds()
+
+
+def format_grace_warning_comment(score: int, threshold: int) -> str:
+    """Comment posted on the FIRST low-Greptile-score detection — gives
+    the contributor a 1-day grace window before the auto-close fires on
+    the next daily cron run.
+
+    Mirrors `format_grace_warning_pr_comment` in
+    `triage_with_llm.py` in spirit (1-day grace + escape hatches), but
+    framed around Greptile's confidence score instead of the LLM judge's
+    rubric since the close trigger here is the Greptile signal.
+    """
+    return (
+        "👋 Hi, thanks for the PR! I'm **Agent Shin**, the automated triage bot for this repository.\n"
+        "\n"
+        "Heads up — Greptile's most recent review scored this PR "
+        f"**{score}/5**, below our merge bar of **{threshold}/5**.\n"
+        "\n"
+        "⏳ **You have 1 day to address Greptile's feedback before this PR is auto-closed.** "
+        "We close low-confidence PRs aggressively to keep the review queue manageable for "
+        "maintainers and contributors alike. **This isn't a rejection of the idea.**\n"
+        "\n"
+        "During the grace period:\n"
+        "\n"
+        "1. Push fixes that address Greptile's feedback (continue using your existing branch is fine).\n"
+        "2. Either:\n"
+        "   - Comment `@greptileai` to request a fresh Greptile review. If the new score is "
+        f"**{threshold}/5 or higher**, the PR stays open.\n"
+        "   - Or comment `@agent-shin reconsider` to have Agent Shin re-evaluate the PR description.\n"
+        "\n"
+        "If this PR is auto-closed in 24 hours, you'll still have options:\n"
+        "\n"
+        "- Comment `@agent-shin reconsider` after pushing fixes — Agent Shin will re-run triage "
+        "and reopen the PR if it now meets the bar.\n"
+        "- Comment `@greptileai` to request a re-review — that works **even after the PR is closed**.\n"
+        "\n"
+        "Thanks for contributing to LiteLLM. We know auto-closures can sting; the goal is to keep "
+        "the project healthy, not to dismiss your work.\n"
+        "\n"
+        f"{GRACE_COMMENT_MARKER}"
+    )
+
+
+def post_grace_warning(
+    pr: dict,
+    score: int,
+    threshold: int,
+    repo: str | None,
+    dry_run: bool,
+) -> None:
+    """Post the 1-day grace-period warning comment on `pr`.
+
+    The warning carries `GRACE_COMMENT_MARKER` so subsequent runs can
+    detect that the contributor has already been told about the
+    pending close. Does NOT close the PR — the close happens on the
+    next eligible run after `GRACE_PERIOD_SECONDS` elapses (handled
+    by `close_pr`).
+    """
+    pr_number = pr["number"]
+    repo_args = ["--repo", repo] if repo else []
+
+    if dry_run:
+        print(
+            f"  [DRY RUN] Would post grace warning to PR #{pr_number} "
+            f"(greptile={score}/5): {pr['title']}"
+        )
+        return
+
+    comment_body = format_grace_warning_comment(score, threshold)
+    gh("pr", "comment", str(pr_number), "--body", comment_body, *repo_args)
+    print(f"  Posted grace warning on PR #{pr_number} (greptile={score}/5)")
+
+
 def close_pr(
     pr: dict,
     score: int,
@@ -204,6 +350,7 @@
     repo: str | None,
     dry_run: bool,
     label: str | None,
+    grace_period_elapsed: bool = True,
 ) -> None:
     """Post the explanatory comment and close the PR."""
     pr_number = pr["number"]
@@ -216,10 +363,19 @@
         )
         return
 
+    score_sentence = (
+        f"Greptile's most recent review scored this PR **{score}/5**, below "
+        f"our merge bar of **{threshold}/5**, and the 1-day grace period since "
+        "the warning has elapsed.\n\n"
+        if grace_period_elapsed
+        else (
+            f"Greptile's most recent review scored this PR **{score}/5**, "
+            f"below our merge bar of **{threshold}/5**.\n\n"
+        )
+    )
     comment_body = (
         f"Closing as part of automated PR triage.\n\n"
-        f"Greptile's most recent review scored this PR **{score}/5**, below "
-        f"our merge bar of **{threshold}/5**.\n\n"
+        f"{score_sentence}"
         "We close low-confidence PRs aggressively to keep the review queue "
         "manageable for maintainers and contributors alike. **This is not a "
         "rejection of the idea** — to bring this back:\n\n"
@@ -233,7 +389,9 @@
         "maintainer, so a fresh PR is the most reliable path forward. If you "
         "would prefer this exact PR re-evaluated, comment "
         "`@agent-shin reconsider` once you've pushed the fixes — Agent Shin "
-        "will re-run triage and reopen this PR if it now meets the bar.\n\n"
+        "will re-run triage and reopen this PR if it now meets the bar. "
+        "You can also comment `@greptileai` to request a fresh Greptile "
+        "review — that works **even after the PR is closed**.\n\n"
         "Thanks for contributing to LiteLLM. We know auto-closures can sting; "
         "the goal is to keep the project healthy, not to dismiss your work."
     )
@@ -258,16 +416,28 @@
     repo: str | None,
     optout_labels: set[str],
 ) -> tuple[str, int | None, int | None]:
-    """Decide whether to close `pr`.
+    """Decide what to do with `pr` on this triage run.
 
     Returns (action, score_or_none, age_days_or_none) where action is one of:
         "skip-too-young", "skip-optout-label", "skip-internal",
-        "skip-no-greptile-score", "skip-score-ok", or "close".
+        "skip-no-greptile-score", "skip-score-ok",
+        "warn-grace", "skip-in-grace-period", or "close".
 
     Drafts are NOT skipped — the goal is "open PR count == PRs internal
     collaborators need to action on", and a draft that Greptile scored <4/5
     is still in that queue. Authors can opt out via the `wip` label (see
     `DEFAULT_OPTOUT_LABELS`) if they need to keep a long-lived draft open.
+
+    Grace-period semantics: the first time a PR fails the rubric, the
+    action is `warn-grace` — the caller should post a warning comment but
+    NOT close the PR. On a subsequent run, if the warning is still less
+    than `GRACE_PERIOD_SECONDS` old AND the PR still fails, the action is
+    `skip-in-grace-period`. Once the warning ages out and the rubric is
+    still failing, the action is `close`.
+
+    Grace is bypassed for `IMMEDIATE_CLOSE_LOGINS` (test/dogfood
+    accounts), which always go straight to `close` on the first failing
+    run so the bot is dogfoodable end-to-end without a 24h delay.
     """
     if has_optout_label(pr, optout_labels):
         return ("skip-optout-label", None, None)
@@ -294,6 +464,16 @@
     if score >= min_score:
         return ("skip-score-ok", score, age_days)
 
+    login = ((pr.get("author") or {}).get("login") or "").lower()
+    if login in IMMEDIATE_CLOSE_LOGINS:
+        return ("close", score, age_days)
+
+    grace_age = seconds_since_last_grace_warning(comments, now=now)
+    if grace_age is None:
+        return ("warn-grace", score, age_days)
+    if grace_age < GRACE_PERIOD_SECONDS:
+        return ("skip-in-grace-period", score, age_days)
+
     return ("close", score, age_days)
 
 
@@ -370,6 +550,8 @@
     closed = 0
     summary = {
         "close": 0,
+        "warn-grace": 0,
+        "skip-in-grace-period": 0,
         "skip-too-young": 0,
         "skip-optout-label": 0,
         "skip-internal": 0,
@@ -388,6 +570,30 @@
         )
         summary[action] = summary.get(action, 0) + 1
 
+        # Per-PR dry-run override: `IMMEDIATE_CLOSE_LOGINS` accounts (e.g.
+        # SwiftWinds) always run in real-close mode regardless of the
+        # global `--close` flag. Lets a maintainer dogfood the bot from
+        # an external account while the rest of the open-PR queue stays
+        # on the safe dry-run default.
+        author_login = ((pr.get("author") or {}).get("login") or "").lower()
+        is_immediate = author_login in IMMEDIATE_CLOSE_LOGINS
+        pr_dry_run = dry_run and not is_immediate
+
+        if action == "warn-grace":
+            assert score is not None
+            print(
+                f"#{pr['number']}: \"{pr['title']}\" "
+                f"(age={age_days}d, greptile={score}/5) -> warn-grace"
+            )
+            post_grace_warning(
+                pr,
+                score=score,
+                threshold=args.min_score,
+                repo=args.repo,
+                dry_run=pr_dry_run,
+            )
+            continue
+
         if action != "close":
             continue
 
@@ -395,6 +601,7 @@
         print(
             f"#{pr['number']}: \"{pr['title']}\" "
             f"(age={age_days}d, greptile={score}/5) -> close"
+            + (" [immediate-close login]" if is_immediate else "")
         )
         close_pr(
             pr,
@@ -402,11 +609,12 @@
             threshold=args.min_score,
             age_days=age_days,
             repo=args.repo,
-            dry_run=dry_run,
+            dry_run=pr_dry_run,
             label=args.close_label,
+            grace_period_elapsed=not is_immediate,
         )
 
-        if not dry_run:
+        if not pr_dry_run:
             closed += 1
             if args.limit is not None and closed >= args.limit:
                 print(f"\nReached --limit={args.limit}; stopping.")
@@ -415,7 +623,21 @@
     print("\n=== Summary ===")
     for key, value in summary.items():
         print(f"  {key:28s} {value}")
-    print(f"\nTotal {'would close' if dry_run else 'closed'}: {summary['close']}")
+    # `IMMEDIATE_CLOSE_LOGINS` PRs are closed even in global dry-run mode, so
+    # report actual closures alongside the dry-run "would close" count to avoid
+    # misleading operators into thinking no writes occurred.
+    would_close = summary["close"] - closed
+    if dry_run:
+        if closed:
+            print(f"\nTotal closed: {closed}; would close: {would_close}")
+        else:
+            print(f"\nTotal would close: {would_close}")
+    else:
+        print(f"\nTotal closed: {closed}")
+    print(
+        f"Total {'would warn (grace)' if dry_run else 'warned (grace)'}: "
+        f"{summary['warn-grace']}"
+    )
     return 0
 
 

diff --git a/.github/scripts/triage_with_llm.py b/.github/scripts/triage_with_llm.py
--- a/.github/scripts/triage_with_llm.py
+++ b/.github/scripts/triage_with_llm.py
@@ -30,6 +30,7 @@
 from __future__ import annotations
 
 import argparse
+import datetime as dt
 import json
 import os
 import re
@@ -42,6 +43,46 @@
 
 INTERNAL_ASSOCIATIONS = frozenset({"OWNER", "MEMBER", "COLLABORATOR"})
 
+# Login of the account that performs Agent Shin's GitHub writes. When the
+# workflow uses `secrets.GITHUB_TOKEN` (our default), the closure / reopen
+# event's `actor.login` is `github-actions[bot]`. The env override exists
+# for local debugging and for repos that wire Agent Shin to a PAT.
+AGENT_SHIN_DEFAULT_BOT_LOGIN = "github-actions[bot]"
+
+# HTML marker appended to every reconsider verdict comment. We grep for this
+# on subsequent reconsider triggers to enforce a short cooldown so that
+# repeated `@agent-shin reconsider` comments don't burn CI/LLM budget.
+# Using a unique HTML comment keeps the marker invisible to humans while
+# being trivially greppable from a comments-list API response.
+RECONSIDER_COMMENT_MARKER = "<!-- agent-shin:reconsider-verdict -->"
+
+# Minimum gap between two reconsider verdicts on the same PR/issue. Set to
+# 10 minutes — long enough that a contributor can't trivially spam the
+# trigger, short enough that a genuine "I just pushed a fix and reupdated
+# the body" iteration loop isn't punished.
+RECONSIDER_RATE_LIMIT_SECONDS = 600
+
+# HTML marker appended to the grace-period warning comment posted on the
+# first low-quality detection. We grep for this on subsequent triage runs
+# to (a) detect that a warning was already posted (so we don't spam the
+# contributor with duplicate warnings) and (b) measure how long ago it
+# was posted so we know when the grace period has elapsed.
+GRACE_COMMENT_MARKER = "<!-- agent-shin:grace-warning -->"
+
+# Length of the grace period between the warning comment and the actual
+# auto-close. Set to 24 hours so the contributor has at least one full
+# working day across any time zone to push fixes or comment
+# `@agent-shin reconsider`.
+GRACE_PERIOD_SECONDS = 86400
+
+# Logins (case-insensitive) that bypass BOTH the 1-day grace period AND
+# the dry-run / `AGENT_SHIN_ENABLED` workflow gating — every Agent Shin
+# verdict against a PR/issue from one of these accounts is treated as a
+# real run with immediate close on fail. Useful for dogfooding the bot
+# from an external account that has no push permissions to the repo.
+# Listed lower-case so callers compare via `login.lower() in ...`.
+IMMEDIATE_CLOSE_LOGINS = frozenset({"swiftwinds"})
+
 # Model families that require `reasoning_effort` to be set, and that reject
 # `temperature != 1` unless `reasoning_effort` is "none". For these models we
 # pass `reasoning_effort="none"` so a `temperature=0` deterministic judgment
@@ -160,6 +201,139 @@
     )
 
 
+def _iter_paginated_json(*api_args: str) -> Any:
+    """Yield JSON objects from `gh api --paginate ... -q '.[]'`.
+
+    `gh api --paginate` on a JSON-array endpoint concatenates pages into
+    one stream; `-q '.[]'` flattens that stream into newline-delimited
+    objects (jq-style). This keeps memory bounded for chatty endpoints
+    like issue events/comments on long-lived PRs.
+    """
+    raw = gh("api", "--paginate", *api_args, "-q", ".[]")
+    for line in raw.splitlines():
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            yield json.loads(line)
+        except json.JSONDecodeError:
+            # A malformed line should not blow up the whole guard. Skip and
+            # carry on — at worst the guard fail-closes (returns False /
+            # None) and the caller treats it as "unknown".
+            continue
+
+
+def fetch_last_close_actor(repo: str, number: int) -> str | None:
+    """Return the login of the actor who most recently closed this PR/issue.
+
+    Returns None if no `closed` event is found (unusual for a closed item,
+    but possible if the events API returns nothing — in which case the
+    bot-closed guard should fail-closed, i.e. refuse to reopen).
+    """
+    last: str | None = None
+    for event in _iter_paginated_json(f"repos/{repo}/issues/{number}/events"):
+        if event.get("event") == "closed":
+            last = (event.get("actor") or {}).get("login")
+    return last
+
+
+def was_closed_by_agent_shin(
+    repo: str, number: int, *, bot_login: str | None = None
+) -> bool:
+    """Return True iff the PR/issue was most-recently closed by Agent Shin.
+
+    This is the guard that stops `@agent-shin reconsider` from being used
+    to override a maintainer's closure for non-rubric reasons (security,
+    duplicate, design rejection, etc.). The check is intentionally
+    fail-closed: any uncertainty about who closed the item must be
+    treated as "not the bot" so the destructive reopen path stays gated.
+    """
+    expected = (
+        bot_login
+        or os.environ.get("AGENT_SHIN_BOT_LOGIN")
+        or AGENT_SHIN_DEFAULT_BOT_LOGIN
+    ).lower()
+    actor = fetch_last_close_actor(repo, number)
+    if not actor:
+        return False
+    return actor.lower() == expected
+
+
+def _seconds_since_latest_marker_comment(
+    repo: str,
+    number: int,
+    *,
+    marker: str,
+    bot_login: str | None = None,
+) -> float | None:
+    """Shared helper: return seconds since the bot's most recent comment
+    that contains the given HTML marker, or None if no such comment exists.
+
+    Used by both the reconsider-verdict cooldown and the grace-period
+    warning detection — keeping the iteration logic centralized stops the
+    two paths from drifting (e.g. one fixing a tz parsing bug and the
+    other forgetting to mirror it).
+    """
+    expected_login = (
+        bot_login
+        or os.environ.get("AGENT_SHIN_BOT_LOGIN")
+        or AGENT_SHIN_DEFAULT_BOT_LOGIN
+    ).lower()
+    latest: dt.datetime | None = None
+    for comment in _iter_paginated_json(f"repos/{repo}/issues/{number}/comments"):
+        author = ((comment.get("user") or {}).get("login") or "").lower()
+        if author != expected_login:
+            continue
+        body = comment.get("body") or ""
+        if marker not in body:
+            continue
+        created = comment.get("created_at")
+        if not created:
+            continue
+        try:
+            ts = dt.datetime.fromisoformat(created.replace("Z", "+00:00"))
+        except ValueError:
+            continue
+        if latest is None or ts > latest:
+            latest = ts
+    if latest is None:
+        return None
+    return (dt.datetime.now(dt.timezone.utc) - latest).total_seconds()
+
+
+def seconds_since_last_reconsider_verdict(
+    repo: str, number: int, *, bot_login: str | None = None
+) -> float | None:
+    """Return seconds since the bot's most recent reconsider verdict comment.
+
+    Detects comments by matching the HTML marker `RECONSIDER_COMMENT_MARKER`
+    appended by `format_reopen_comment` and
+    `format_reconsider_still_failing_comment`. Returns None when the bot
+    has never posted a reconsider verdict on this PR/issue (or when the
+    only matching comments are missing a `created_at` timestamp, which
+    shouldn't happen on a real GitHub response).
+    """
+    return _seconds_since_latest_marker_comment(
+        repo, number, marker=RECONSIDER_COMMENT_MARKER, bot_login=bot_login
+    )
+
+
+def seconds_since_last_grace_warning(
+    repo: str, number: int, *, bot_login: str | None = None
+) -> float | None:
+    """Return seconds since the bot's most recent grace-period warning.
+
+    Detects warning comments by matching the HTML marker
+    `GRACE_COMMENT_MARKER` appended by `format_grace_warning_pr_comment`
+    and `format_grace_warning_issue_comment`. Returns None when no
+    grace warning has ever been posted on this PR/issue — that's the
+    "first low-quality detection" signal that drives the warning path.
+    """
+    return _seconds_since_latest_marker_comment(
+        repo, number, marker=GRACE_COMMENT_MARKER, bot_login=bot_login
+    )
+
+
 # ---------------------------------------------------------------------------
 # Author classification
 
@@ -393,6 +567,9 @@
         "to get back into the review queue.\n"
         "   - **Or** comment `@agent-shin reconsider` on this closed PR after updating the description. "
         "I'll re-run the triage; if it now passes, I'll reopen this PR automatically.\n"
+        "   - You can also comment `@greptileai` on this PR to request a fresh Greptile review — that "
+        "still works **even after the PR is closed**, and a higher score is one of the signals that "
+        "lifts the PR back into the queue.\n"
         "\n"
         "Internal BerriAI contributors: this rubric doesn't apply to you — ping a maintainer.\n"
         "\n"
@@ -433,6 +610,90 @@
     )
 
 
+def format_grace_warning_pr_comment(verdict: dict) -> str:
+    """Comment posted on the FIRST low-quality detection — gives the
+    contributor a 1-day grace window to fix the PR before the next
+    triage run actually closes it.
+
+    This is the "before-close" warning. On the second triage run, if the
+    grace marker is older than `GRACE_PERIOD_SECONDS` AND the PR still
+    fails the rubric, the close path runs (which posts
+    `format_pr_close_comment` and closes the PR).
+    """
+    missing_lines = _format_missing(verdict.get("missing") or [])
+    explanation = verdict.get("explanation") or ""
+    return (
+        "👋 Hi, thanks for the PR! I'm **Agent Shin**, the automated triage bot for this repository.\n"
+        "\n"
+        "Heads up — this PR does not yet meet the bar described in our "
+        "[pull-request template](https://github.com/BerriAI/litellm/blob/main/.github/pull_request_template.md). "
+        "Specifically, I couldn't find:\n"
+        "\n"
+        f"{missing_lines}\n"
+        "\n"
+        f"> {explanation}\n"
+        "\n"
+        "⏳ **You have 1 day to address this before this PR is auto-closed.** "
+        "During the grace period:\n"
+        "\n"
+        "1. Update the PR description to either:\n"
+        "   - Link a related GitHub issue (e.g. `Fixes #1234`), OR\n"
+        "   - Add a clear **problem description**, **expected vs. actual behavior**, and **visual QA proof** "
+        "(before/after screenshots, a short screen recording, or terminal/log output).\n"
+        "2. Comment `@agent-shin reconsider` on this PR after updating it. If your update meets the "
+        "bar, I'll skip the auto-close and a maintainer will take another look.\n"
+        "\n"
+        "If this PR is auto-closed in 24 hours, you'll still have options:\n"
+        "\n"
+        "- Comment `@agent-shin reconsider` to have me re-evaluate (and reopen the PR if it now meets the bar).\n"
+        "- Comment `@greptileai` to request a fresh Greptile review — that works **even after the PR is closed**.\n"
+        "\n"
+        "Internal BerriAI contributors: this rubric doesn't apply to you — ping a maintainer.\n"
+        "\n"
+        "_(I'm an LLM, so I'm not infallible. If you think I got this wrong, comment "
+        "`@agent-shin reconsider` or ping a maintainer — they'll override me.)_\n"
+        "\n"
+        f"{GRACE_COMMENT_MARKER}"
+    )
+
+
+def format_grace_warning_issue_comment(verdict: dict) -> str:
+    """Issue analogue of `format_grace_warning_pr_comment`."""
+    missing_lines = _format_missing(verdict.get("missing") or [])
+    explanation = verdict.get("explanation") or ""
+    return (
+        "👋 Hi, thanks for filing this! I'm **Agent Shin**, the automated triage bot for this repository.\n"
+        "\n"
+        "Heads up — this issue doesn't yet have enough detail for a maintainer to act on. "
+        "Specifically, I couldn't find:\n"
+        "\n"
+        f"{missing_lines}\n"
+        "\n"
+        f"> {explanation}\n"
+        "\n"
+        "⏳ **You have 1 day to address this before this issue is auto-closed.** "
+        "During the grace period:\n"
+        "\n"
+        "1. Edit the issue to add the missing pieces:\n"
+        "   - For **bug reports**: a runnable reproduction (code / curl / config), expected vs. actual behavior, "
+        "and a screenshot / traceback / log showing the bug.\n"
+        "   - For **feature requests**: a concrete description of what should change, plus a use case and example "
+        "(config / API call / UI flow).\n"
+        "2. Comment `@agent-shin reconsider` on this issue after updating it. If your update meets the bar, "
+        "I'll skip the auto-close and a maintainer will take another look.\n"
+        "\n"
+        "If this issue is auto-closed in 24 hours, you can still comment `@agent-shin reconsider` to have "
+        "me re-evaluate (and reopen the issue if it now meets the bar).\n"
+        "\n"
+        "Internal BerriAI contributors: this rubric doesn't apply to you — ping a maintainer.\n"
+        "\n"
+        "_(I'm an LLM, so I'm not infallible. If you think I got this wrong, comment "
+        "`@agent-shin reconsider` or ping a maintainer — they'll override me.)_\n"
+        "\n"
+        f"{GRACE_COMMENT_MARKER}"
+    )
+
+
 # ---------------------------------------------------------------------------
 # Step-summary helpers
 
@@ -458,6 +719,9 @@
 def format_reopen_comment(kind: str) -> str:
     """Comment posted when Agent Shin reopens after a successful reconsider."""
     noun = "PR" if kind == "pr" else "issue"
+    # The trailing HTML marker is used by `seconds_since_last_reconsider_verdict`
+    # to enforce a cooldown between repeated `@agent-shin reconsider` triggers.
+    # Keep the marker on its own line so it doesn't disturb the rendered text.
     return (
         f"♻️ **Re-evaluated and reopened.** Thanks for updating the {noun}!\n"
         "\n"
@@ -467,7 +731,9 @@
         "\n"
         "_(If a maintainer ends up closing this for non-rubric reasons, that "
         "decision stands; comment `@agent-shin reconsider` again only if you "
-        "have substantively new information.)_"
+        "have substantively new information.)_\n"
+        "\n"
+        f"{RECONSIDER_COMMENT_MARKER}"
     )
 
 
@@ -476,6 +742,8 @@
     missing_lines = _format_missing(verdict.get("missing") or [])
     explanation = verdict.get("explanation") or ""
     noun = "PR" if kind == "pr" else "issue"
+    # The trailing HTML marker is used by `seconds_since_last_reconsider_verdict`
+    # to enforce a cooldown between repeated `@agent-shin reconsider` triggers.
     return (
         f"⏸️ **Re-evaluated; this {noun} still doesn't meet the rubric.**\n"
         "\n"
@@ -490,7 +758,9 @@
         "`@agent-shin reconsider` again, or ping a maintainer if you think "
         "I got this wrong.\n"
         "\n"
-        "_(I'm an LLM and I'm not infallible.)_"
+        "_(I'm an LLM and I'm not infallible.)_\n"
+        "\n"
+        f"{RECONSIDER_COMMENT_MARKER}"
     )
 
 
@@ -514,10 +784,22 @@
     fail-but-no-comment is replaced with a "still failing" comment + leave
     closed; a pass triggers `reopen_pr`/`reopen_issue` plus a reopen comment.
     Reconsider mode is intended for the `@agent-shin reconsider` comment
-    trigger. `close` is forced True implicitly when `reconsider` is set
-    because the bot has already decided this is a real (non-dry-run)
-    invocation; it's the caller's responsibility to gate on
-    AGENT_SHIN_ENABLED before calling reconsider mode.
+    trigger. Like regular triage, `close=False` keeps reconsider in dry-run
+    (returns `would-reopen` / `would-reconsider-still-failing` so a local
+    operator can preview without write side effects); the workflow only
+    passes `--close` when `AGENT_SHIN_ENABLED=true`.
+
+    Reconsider mode adds two extra safety guards on top of the regular
+    triage skip-internal-author check:
+
+      1. **Bot-closed guard.** Only reopens if the most recent close was
+         performed by the bot identity (default `github-actions[bot]`).
+         This stops a contributor from using `@agent-shin reconsider` to
+         override a maintainer's close for non-rubric reasons.
+      2. **Rate-limit guard.** If the bot has already posted a reconsider
+         verdict on this PR/issue within `RECONSIDER_RATE_LIMIT_SECONDS`,
+         skip — repeated triggers from the same contributor shouldn't burn
+         CI minutes or LLM budget.
     """
     fetcher = {"pr": fetch_pr, "issue": fetch_issue}[kind]
     item = fetcher(repo, number)
@@ -551,6 +833,20 @@
     if is_internal_contributor(item):
         return {**base_result, "action": "skip-internal-author"}
 
+    # Reconsider-only guards — these run BEFORE the LLM call so a
+    # maintainer-closed PR / rate-limited trigger never spends LLM budget.
+    if reconsider:
+        if not was_closed_by_agent_shin(repo, number):
+            return {**base_result, "action": "skip-not-bot-closed"}
+        age = seconds_since_last_reconsider_verdict(repo, number)
+        if age is not None and age < RECONSIDER_RATE_LIMIT_SECONDS:
+            return {
+                **base_result,
+                "action": "skip-rate-limited",
+                "rate_limit_age_seconds": age,
+                "rate_limit_window_seconds": RECONSIDER_RATE_LIMIT_SECONDS,
+            }
+
     if kind == "pr":
         prompt = build_pr_prompt(title=title, body=body)
         # Short-circuit: if body very clearly links a related issue, just pass.
@@ -567,6 +863,12 @@
             if reconsider:
                 # Pass-on-reconsider -> reopen the PR with a friendly comment.
                 reopen_body = format_reopen_comment(kind)
+                if not close:
+                    return {
+                        **base,
+                        "action": "would-reopen",
+                        "comment": reopen_body,
+                    }
                 post_comment(repo, number, reopen_body)
                 reopen_pr(repo, number)
                 return {
@@ -607,8 +909,20 @@
         # Reconsider: pass -> reopen + post reopen comment;
         # fail -> leave closed + post a "still failing" comment so the
         # contributor can iterate again.
+        # In dry-run (`close=False`) we return `would-*` actions instead
+        # of touching GitHub state, mirroring the regular triage flow's
+        # `would-close`. This lets a local operator preview the outcome
+        # of `python triage_with_llm.py --reconsider --pr N` without
+        # risking accidental comments or reopens.
         if decision != "fail":
             reopen_body = format_reopen_comment(kind)
+            if not close:
+                return {
+                    **base_result,
+                    "action": "would-reopen",
+                    "verdict": verdict,
+                    "comment": reopen_body,
+                }
             post_comment(repo, number, reopen_body)
             if kind == "pr":
                 reopen_pr(repo, number)
@@ -621,6 +935,13 @@
                 "comment": reopen_body,
             }
         still_failing = format_reconsider_still_failing_comment(kind, verdict)
+        if not close:
+            return {
+                **base_result,
+                "action": "would-reconsider-still-failing",
+                "verdict": verdict,
+                "comment": still_failing,
+            }
         post_comment(repo, number, still_failing)
         return {
             **base_result,
@@ -632,7 +953,58 @@
     if decision != "fail":
         return {**base_result, "action": "pass-llm", "verdict": verdict}
 
-    if not close:
+    # Grace-period flow: on the first low-quality detection, post a warning
+    # comment instead of closing immediately. On a subsequent triage run
+    # (manual re-trigger, or the daily `close_low_quality_prs.py` cron
+    # finding the same PR in its own pass), if `GRACE_PERIOD_SECONDS` has
+    # elapsed since the warning AND the PR still fails the rubric, close.
+    #
+    # `IMMEDIATE_CLOSE_LOGINS` (e.g. test/dogfood accounts like SwiftWinds)
+    # bypass the grace period entirely — every fail is treated as a real
+    # close run. This is intentional: those accounts exist specifically to
+    # exercise the bot end-to-end, and waiting a day per iteration kills
+    # the feedback loop.
+    is_immediate = login.lower() in IMMEDIATE_CLOSE_LOGINS
... diff truncated: showing 800 of 2283 lines

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit e0eeb73. Configure here.

Comment thread .github/scripts/close_low_quality_prs.py
IMMEDIATE_CLOSE_LOGINS PRs are closed even when the global --close flag is
not set, but the summary used the global dry-run flag to choose between
'would close' and 'closed'. Split the count so operators can see both
actual closures and dry-run would-be closures.

Co-authored-by: Yassin Kortam <yassin@berri.ai>
mateo-berri added a commit that referenced this pull request Jun 18, 2026
…and review-gate label lifecycle (#30433)

* feat(triage): auto-close stale PRs with Greptile score <4/5

Adds .github/scripts/close_low_quality_prs.py and a daily workflow that
closes PRs which:
  - are open for at least 7 days, and
  - carry a most-recent greptile-apps review with Confidence Score <4/5,
  - and are not drafts or opt-out-labeled ('do not close', 'wip', etc.).

Each closure posts an explanatory comment telling the contributor how to
bring the PR back (rebase, re-request greptile, reopen at 4+/5). The
4/5 bar is already documented in the PR template
(.github/pull_request_template.md), so this just enforces it.

Tested with a dry run against the live BerriAI/litellm backlog of 1000
open PRs: 100 candidates identified, 598 PRs pass the bar (4+/5), 186
are too young, 97 are drafts, 19 lack any Greptile review and are left
alone.

Workflow defaults to closing 25 PRs/run as a safety net and supports
workflow_dispatch with overrides (close=false for a dry run, custom
min_age_days/min_score/limit).

18 unit tests cover score extraction (HTML/markdown/plain text, login
variants, multi-review picks latest) and per-PR evaluation (drafts,
opt-out labels, age, missing/passing/failing scores).

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* docs(templates): require expected/actual + QA proof for external contributions

PR template:
- Make the rubric explicit at the top: link an issue, OR provide a clear
  problem description + expected vs. actual + visual QA proof.
- Add dedicated sections for each piece so the bot has a deterministic
  shape to read.
- Keep the existing 'Linear ticket' section for internal contributors
  (they're exempt from the auto-triage rubric).

Bug report template:
- Split 'What happened?' into 'Actual behavior' + 'Expected behavior'.
- Make logs/screenshot a required textarea.
- Warning banner at the top tells external contributors that incomplete
  reports will be auto-closed (with re-evaluation on reopen).

Feature request template:
- Require a concrete use case + example in the motivation field, not just
  a one-liner pitch.
- Same auto-triage warning banner.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* feat(triage): Agent Shin LLM-as-judge for external PRs and issues

Adds a new triage flow that evaluates external pull requests and issues
against the project's contribution rubric and, when configured to do so,
auto-closes non-conforming ones with an explanatory comment. Contributors
can update + reopen to be re-evaluated.

Scope:
- Internal BerriAI contributors (author_association OWNER/MEMBER/COLLABORATOR)
  and bot accounts are skipped entirely.
- 'Fixes #1234' / 'Resolves https://github.com/.../issues/N' in the PR body
  short-circuits to PASS without burning LLM tokens.
- LLM judge returns structured JSON (verdict, missing[], explanation);
  parser tolerates markdown fences and embedded JSON.
- LLM errors NEVER close PRs/issues — failure surfaces as 'skip-llm-error'.

Safety:
- pull_request_target / issues triggers are FORCED dry-run in the workflow;
  only manual workflow_dispatch with close=true (and AGENT_SHIN_ENABLED=true)
  takes destructive action.
- Default mode writes verdicts to GITHUB_STEP_SUMMARY only — no public
  comments until the team flips the AGENT_SHIN_ENABLED repo variable.
- LLM uses an OpenAI-compatible endpoint (model and base URL configurable
  via repo variables; key via OPENAI_API_KEY secret).

Files:
- .github/scripts/triage_with_llm.py   - judge orchestrator + CLI
- .github/workflows/triage_pr_with_llm.yml
- .github/workflows/triage_issue_with_llm.yml
- tests/test_litellm/test_github_triage_with_llm.py - 33 unit tests

End-to-end validated against four real PRs (#28117 internal collaborator,
#28108 bot, #28129 'Fixes #28128', #28116 no linked issue) and issue
#28132 with a stubbed LLM judge: each path produces the expected action.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* feat(triage): scope Greptile auto-closer to external contributors + dry-run by default

- close_low_quality_prs.py now filters by GitHub author_association via
  the REST API: PRs from OWNER / MEMBER / COLLABORATOR (and bot accounts)
  are skipped with a new 'skip-internal' summary bucket.
- close_low_quality_prs.yml now defaults workflow_dispatch close=false,
  and ignores 'close=true' unless the new repo variable
  AGENT_SHIN_ENABLED is set to 'true'. Scheduled runs are dry-run only
  until the team flips that switch.
- Updated unit tests: one new test asserting internal authors are
  skipped, and an autouse fixture treats unspecified test PRs as
  external so the rest of the suite still exercises the close path.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(workflows): scheduled cron closes PRs; safe --close strip in triage

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(triage): scheduled cron stays dry-run; dedent prompts before interpolation

- close_low_quality_prs.yml: only workflow_dispatch with close=true (and
  AGENT_SHIN_ENABLED=true) actually closes PRs. Scheduled runs are always
  dry-run, matching the safety invariant documented for triage_pr/issue.
- triage_with_llm.py: textwrap.dedent on an f-string with multi-line
  interpolated bodies fails because the body's 2nd+ lines start at column 0,
  making the common-indent zero. Dedent the static template first, then
  .format() the title/body in.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* Fix bugs in auto-close PR triage scripts

- close_low_quality_prs.py: Treat author_association API lookup failures
  as internal (fail-safe) so transient errors don't cause internal
  contributors' PRs to be auto-closed.
- triage_with_llm.py: Update summary heading from 'Would post comment:'
  to 'Posted comment:' since this branch only runs after the comment
  has already been posted.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* feat(triage): default Agent Shin to gpt-5.4-mini with reasoning_effort=none

- Bump DEFAULT_MODEL from gpt-4o-mini to gpt-5.4-mini (more modern;
  4M total context window per OpenAI catalog, JSON-schema response
  format, function calling all supported).
- For gpt-5.x family models, pass reasoning_effort="none" via
  extra_body. gpt-5.x rejects temperature != 1 unless reasoning_effort
  is explicitly "none"; setting it lets us keep temperature=0 for
  deterministic JSON rubric judgments. extra_body works across openai
  SDK versions regardless of whether they natively type the kwarg.
- For non-gpt5 overrides (TRIAGE_MODEL=gpt-4o-mini etc.), reasoning_effort
  is not sent.
- 4 new unit tests cover: gpt-5.4-mini -> reasoning_effort=none,
  capitalized/dated gpt-5 variants -> reasoning_effort=none,
  gpt-4o-mini -> no extra_body, base_url passthrough.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(triage): bugbot — drop dead gh_json and fix --optout-label append-with-default

- Removed the unused gh_json helper (bugbot low-severity dead code).
- Replaced argparse `action="append", default=[...]` with default=None
  + DEFAULT_OPTOUT_LABELS fallback. The mutable-default + append combo
  silently APPENDS to the canonical defaults instead of replacing them,
  so --optout-label could not actually scope the opt-out list.
- Added tests covering both the canonical default and the
  flag-replaces-defaults behavior.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(triage): bugbot — tighten linked-issue regex, fail-safe author_association, fix empty TRIAGE_MODEL

Three independent bugbot findings against triage_with_llm.py:

1. LINKED_ISSUE_PATTERN included weak keywords (`see`, `ref`,
   `addresses`) so casual mentions like "See #1234 for context" were
   short-circuited to pass-linked-issue without ever calling the LLM —
   contradicting the prompt's own "a bare issue number without a closing
   keyword counts only if it's clearly the related issue (not a passing
   mention)" rubric. Limit the regex to GitHub's documented PR-closing
   keywords (fixes/fix/fixed/closes/close/closed/resolves/resolve/resolved).

2. is_internal_contributor() treated an empty/missing author_association
   as external (eligible for the destructive close path), while the sibling
   is_external_pr_author() in close_low_quality_prs.py fail-safes the same
   case as internal. Align the two so a partial/unknown GitHub response can
   never make a PR eligible for auto-close.

3. argparse `default=os.environ.get("TRIAGE_MODEL", DEFAULT_MODEL)` returns
   the empty string when GitHub Actions exposes an unset repo variable as
   an empty-string env var (the optional vars.TRIAGE_MODEL case in the
   workflow). Use `os.environ.get(...) or DEFAULT_MODEL` so empty -> default,
   matching the existing OPENAI_BASE_URL pattern.

Tests:
- Casual mentions now must fall through to the LLM (parametrized);
  added an orchestration test ensuring "See #1234" reaches the judge.
- Empty/missing author_association now fails safe (parametrized).
- Empty TRIAGE_MODEL env var falls back to DEFAULT_MODEL; explicit
  TRIAGE_MODEL is still honored.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(workflows): bugbot — gate Agent Shin --close on '= true' not '!= false'

The PR and issue Agent Shin workflows gated the destructive --close
flag with [ "${DISPATCH_CLOSE:-false}" != "false" ]. That pattern
treats anything other than the literal string "false" as enabling
closure — "True", "yes", "1", typos, accidental whitespace, etc.
The workflow_dispatch input UI is a 'true'/'false' choice dropdown so
the form is constrained, but the API (`gh workflow run -f close=...`)
accepts any string, and a CI cron / external invoker passing a
non-canonical truthy value would have silently enabled real
contributor PR closures.

Mirror the sibling Greptile closer's [ "${CLOSE_FLAG}" = "true" ]
pattern: only the EXACT string "true" enables --close; every other
value (including the unset/empty default) resolves to dry-run. This is
the fail-safe philosophy applied everywhere else in this PR.

Added tests/test_litellm/test_github_triage_workflows.py with two
parametrized invariants:
  1. The destructive gate uses '= "true"' for its env-var
     comparison (either bare '${ENV}' or '${ENV:-false}' form
     accepted), and never the fail-open '!= "false"' pattern.
  2. Every destructive gate is also gated on AGENT_SHIN_ENABLED being
     "true" — either by entering the close branch on '=' or by
     bailing out early on '!=' — so flipping the repo variable off is
     a true kill switch regardless of per-run inputs.

Manually verified the test fails on the buggy '!= "false"' pattern and
passes on the fix, so it would have caught the regression at PR time.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* feat(triage): close any PR (incl. drafts, any age); add @agent-shin reconsider flow

Follow-up to PR #28117. Three behavior changes + one new workflow,
addressing the team's concerns on the original review:

1) Apply auto-close to ALL open PRs, not just those over a week old.

   - close_low_quality_prs.py: --min-age-days default flipped from 7 to
     0. The flag is preserved as an opt-in safety net for one-off
     backfill runs that want to spare very-young PRs, but the daily
     scheduled sweep now closes external-author PRs as soon as Greptile
     scores them <4/5.
   - close_low_quality_prs.yml: workflow_dispatch input default also
     flipped to 0; doc comments updated.

2) Apply auto-close to draft PRs too.

   - close_low_quality_prs.py: removed the skip-draft branch in
     evaluate_pr. Drafts are NOT a free pass — the team's intent is
     'open PR count == PRs internal collaborators need to action on',
     so a draft Greptile scored 2/5 still belongs in the closed bucket.
     Authors who genuinely need a long-lived draft can attach the 'wip'
     opt-out label, which is unchanged.
   - The 'skip-draft' action is gone; the 'wip' label still skips.

3) Address the 'OSS contributors cannot reopen a bot-closed PR' wrinkle.

   GitHub does NOT let an external (non-write-access) contributor
   reopen a PR that was closed by a bot or maintainer (long-standing
   limitation). The original PR's close-comments told contributors to
   'Reopen the PR — I'll re-evaluate automatically', which is broken
   for the very audience this triage targets. Two changes:

   a) Reword every close-comment (Greptile sweep + Agent Shin PR
      close + Agent Shin issue close + PR template) to recommend:
        - Open a new PR with the updated branch (primary path).
        - Or comment '@agent-shin reconsider' on the closed PR for a
          re-evaluation that, on pass, reopens the PR via the bot's
          GH_TOKEN write access.

   b) Add the @agent-shin reconsider workflow:
        - .github/workflows/triage_reconsider.yml: new
          'issue_comment'-triggered workflow. Authorizes only the
          PR/issue author or an internal collaborator
          (OWNER/MEMBER/COLLABORATOR), gated via a step output so
          unauthorized commenters never reach the destructive steps.
          Globally gated on AGENT_SHIN_ENABLED='true' (positive form,
          matching the test_github_triage_workflows guardrail
          patterns).
        - triage_with_llm.py: --reconsider mode. On a closed PR/issue,
          re-runs the LLM judge (or linked-issue regex short-circuit)
          and:
            - on pass: reopens via reopen_pr/reopen_issue + posts a
              'Re-evaluated and reopened' comment.
            - on fail: leaves closed and posts a 'still missing X'
              comment so the contributor can iterate again.
          Reconsider-on-open is a no-op ('skip-not-closed').
          Internal-author + bot-account skips still take priority over
          reconsider.

4) Greptile-on-closed-PRs question: the team asked whether Greptile can
   re-review a closed PR. Greptile's docs don't address this and we
   shouldn't promise behavior we can't verify, so the new close-comment
   wording does NOT instruct contributors to 're-request greptile on
   the closed PR'. Instead it points them at the new-PR path (which
   Greptile definitely reviews) or the @agent-shin reconsider trigger
   (which re-runs the LiteLLM-side rubric judge, not Greptile).

Tests: 93 passing (was 59).

  - test_github_close_low_quality_prs.py: replaced 'skip drafts' test
    with 'closes drafts when score is low' + 'closes brand-new PR when
    min_age=0' + 'no skip when min_age=0'. The 'skip too young'
    assertion is preserved as opt-in.
  - test_github_triage_with_llm.py: 6 new TestTriageOrchestration cases
    for reconsider mode (skip-not-closed on open, reopen on pass,
    still-failing comment on fail, linked-issue short-circuit reopen,
    skip internal author in reconsider, reopen-issue on pass) + a new
    TestCloseCommentText class that pins the user-facing 'open a new
    PR' + '@agent-shin reconsider' wording.
  - test_github_triage_workflows.py: added triage_reconsider.yml to
    the destructive-gate guardrail table; AGENT_SHIN_ENABLED is its
    own destructive gate (no separate per-run flag needed).

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* test(triage): pin safe behavior for curly braces in PR/issue title+body

Adds regression tests covering the bugbot high-severity finding that
str.format() would crash on user-supplied content containing { or }.
Empirically str.format() does NOT re-parse interpolated values — only
the template literal is scanned for replacement fields — so the bug
does not exist in the current code, but pinning the safe behavior
prevents a future templating change from silently reintroducing it.

Also pins the dedented prompt shape (no leading 8-space indentation on
template lines) so a future change to the build_*_prompt functions can't
silently regress the LLM judge prompt format on multi-line bodies.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix(triage): bugbot — reconsider dry-run + bot-closed guard + rate limit

Address three Greptile/veria-ai concerns on the @agent-shin reconsider
flow:

1. **Reconsider had no dry-run path.** The previous reconsider mode
   ignored `--close` and always posted comments + reopened on a pass.
   A local operator running
   `python triage_with_llm.py --reconsider --pr N` would silently
   take destructive GitHub actions with no way to preview. Reconsider
   now honors `close=False` the same way regular triage does and
   returns `would-reopen` / `would-reconsider-still-failing` for
   step-summary rendering.

2. **Reconsider could reopen maintainer-closed PRs/issues** (Medium
   security finding from veria-ai). The workflow only checked that the
   commenter was authorized — it did NOT check that the most recent
   close was performed by Agent Shin. A contributor could comment
   `@agent-shin reconsider` on a PR a maintainer closed for non-rubric
   reasons (duplicate, security report, design rejection) and have the
   bot reopen it. Add `was_closed_by_agent_shin()` which inspects the
   issue events API for the most recent `closed` actor and only
   permits reopen when that actor matches the configured bot login
   (default `github-actions[bot]`, overridable via env). Fail-closed
   on missing events.

3. **No rate-limiting on the reconsider trigger.** Every
   `@agent-shin reconsider` comment burns CI minutes + an OpenAI API
   call. Add a 10-minute cooldown via
   `seconds_since_last_reconsider_verdict()` which greps the issue's
   comment list for the bot's own verdict marker
   (`<!-- agent-shin:reconsider-verdict -->`). Inside the window the
   triage returns `skip-rate-limited` and the LLM never runs.

Workflow update:
- `triage_reconsider.yml` now passes `--close` only when
  `AGENT_SHIN_ENABLED=true`, matching the pattern of
  `triage_pr_with_llm.yml`. The script runs in both states so the
  verdict still appears in the step summary for QA.

Tests:
- Add 5 reconsider safety tests: dry-run for pass / fail / linked-issue
  short-circuit, bot-closed-guard refusal on maintainer close,
  rate-limit refusal inside the cooldown window, and cooldown-elapsed
  acceptance.
- Add unit tests for `was_closed_by_agent_shin` (bot / maintainer /
  missing actor / env-override) and
  `seconds_since_last_reconsider_verdict` (no marker / multiple
  markers / non-bot comment with marker / bot comment without marker).
- Pin the `<!-- agent-shin:reconsider-verdict -->` marker in both
  reopen and still-failing comments — dropping it would silently
  break the cooldown.

Existing reconsider tests updated to pass `close=True` (the
production path now) + stub the new guards via
`_stub_reconsider_guards`. 112 tests pass (was 93).

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* feat(triage): 1-day grace period before close + SwiftWinds immediate-close bypass

- Add a 24-hour grace window between the first low-quality detection
  and the actual auto-close. The first detection posts a warning
  comment that explicitly says "You have 1 day to address this before
  this PR is auto-closed" and points the contributor at:
    * `@agent-shin reconsider` to request another look (and re-open)
    * `@greptileai` to request a fresh Greptile review — works
      even after the PR is closed
- Both `triage_with_llm.py` (LLM judge) and `close_low_quality_prs.py`
  (Greptile-score closer) share the same `<!-- agent-shin:grace-warning -->`
  HTML marker so a warning posted by either path is recognized by both.
- Add IMMEDIATE_CLOSE_LOGINS = {swiftwinds} to bypass BOTH the grace
  period AND the dry-run / AGENT_SHIN_ENABLED gating. SwiftWinds is the
  user's personal account (no push permissions to litellm) used to
  dogfood the bot; user explicitly asked: "For SwiftWinds, just close
  immediately. Faster iteration that way."
- Update the standard close comments to mention that `@greptileai`
  works even after the PR is closed.
- Add 23 new tests covering: warn-grace on first detection, skip during
  grace window, close after grace expires, SwiftWinds bypass (case
  insensitive, with close=False, no random-login false positives), the
  grace-warning text invariants, and the SwiftWinds entry in the
  IMMEDIATE_CLOSE_LOGINS constant.

Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>

* fix: skip grace-period text in close comment for IMMEDIATE_CLOSE_LOGINS

For PRs from IMMEDIATE_CLOSE_LOGINS (e.g. swiftwinds), evaluate_pr
returns 'close' immediately without ever posting a grace warning, so
the close comment should not reference a 1-day grace period.

Make close_pr take a grace_period_elapsed flag, default True, and
pass False from the main loop when the close path was the
immediate-close branch.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(close-low-quality-prs): report actual closes in dry-run summary

IMMEDIATE_CLOSE_LOGINS PRs are closed even when the global --close flag is
not set, but the summary used the global dry-run flag to choose between
'would close' and 'closed'. Split the count so operators can see both
actual closures and dry-run would-be closures.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* chore(triage): vendor Agent Shin (#28117) onto demo branch

Brings the Agent Shin OSS-triage scripts, workflows, issue/PR templates, and
tests from PR #28117 onto this branch so the new review-gate feature and its
end-to-end demo are self-contained and runnable in CI.

https://claude.ai/code/session_01XyyWa8t2VYmoGd6mKMEqkZ

* feat(triage): add "ready for review" label lifecycle to Agent Shin

Adds review_gate(), a state machine that keeps a `ready for review` label in
sync with whether an external PR clears BOTH gates — the LLM rubric and
Greptile's most recent confidence score:

- pass (untagged)            -> add label + "ready for review" / "all clear" comment
- pass (already tagged)      -> no-op (idempotent across re-runs)
- regress (Greptile < 4/5 or QA proof removed) -> remove label + "what's missing"
  comment, PR stays open
- recover after a regression -> "all clear again" comment + re-add the label
- fail & untagged, < 24h old -> one-time "what's missing" notice (grace window)
- fail & untagged, > 24h old -> close + comment (reopen via @agent-shin reconsider)

The label itself is the persisted state, so comments fire only on transitions
(never on every scheduled run). All side effects are gated behind --close, so
the dry-run contract matches the existing triage flow. Lifecycle comments use
hidden HTML markers and deliberately avoid the auto-close marker so they never
trip the reconsider provenance check.

Relocates the shared Greptile helpers (extract_greptile_score, SCORE_PATTERN,
GREPTILE_BOT_LOGINS, parse_iso8601) into triage_with_llm.py so the daily sweep
and the review gate read the score through one implementation, and adds the
review_gate.yml workflow (dry-run unless AGENT_SHIN_ENABLED=true) plus 18 unit
tests covering every branch and a full pass->regress->recover cycle.

https://claude.ai/code/session_01XyyWa8t2VYmoGd6mKMEqkZ

* Port review-gate feature from #28758 onto #28147 triage scripts

Adds the "ready for review" label lifecycle (originally PR #28758) on top
of #28147's refactored triage_with_llm.py. The original commit was
authored against an older snapshot of #28117 and could not be applied
cleanly, so the additions were re-applied surgically:

- New constants: READY_FOR_REVIEW_LABEL, DEFAULT_GRACE_DAYS,
  DEFAULT_MIN_GREPTILE_SCORE, READY/REGRESSED/WITHIN_GRACE markers,
  GREPTILE_BOT_LOGINS, SCORE_PATTERN, AGENT_SHIN_AUTO_CLOSE_MARKER.
- New helpers: add_label, remove_label, extract_greptile_score,
  parse_iso8601 (the latter two mirrored from close_low_quality_prs.py
  so the daily sweep and the review gate read the score through the
  same logic).
- New comment formatters: format_ready_for_review_comment,
  format_all_clear_comment, format_regression_comment,
  format_within_grace_comment.
- New entry point: review_gate() implementing the pass/regress/recover
  state machine, with the label itself acting as persisted state so
  transition comments fire only on actual transitions.
- main() learns --review-gate, --grace-days, --min-greptile-score and
  dispatches to review_gate() when the flag is set.

Verified via tests/test_litellm/test_github_review_gate.py (18 tests)
and the existing triage suites (144 more) — all 162 pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* agent_shin: extract shared constants/helpers; cover review_gate.yml in guardrail tests

Bug 1: `triage_with_llm.py` and `close_low_quality_prs.py` each defined
their own copies of `extract_greptile_score`, `parse_iso8601`,
`GREPTILE_BOT_LOGINS`, `SCORE_PATTERN`, `GRACE_COMMENT_MARKER`,
`GRACE_PERIOD_SECONDS`, `IMMEDIATE_CLOSE_LOGINS`, and
`AGENT_SHIN_DEFAULT_BOT_LOGIN`. The comments explicitly said the two
copies had to stay in sync, but nothing enforced it. A future change to
one (e.g. extending `SCORE_PATTERN` for a new Greptile output format)
would silently diverge from the other and the daily sweep and the LLM
judge would disagree on which PRs have low scores.

Extract these to `.github/scripts/agent_shin_shared.py` and re-export
them from each script so the existing test attribute access
(`triage_module.GRACE_COMMENT_MARKER`, etc.) keeps working without
any test changes.

Bug 2: `review_gate.yml` is a destructive workflow (close PRs, add/remove
labels, post comments) with the same gating philosophy as the others
(`AGENT_SHIN_ENABLED = "true"` + a per-run `CLOSE_FLAG = "true"`),
but it was missing from `DESTRUCTIVE_GATE_ENV` in the guardrail tests.
Add it so a future regression (e.g. flipping to `!= "false"`) is
caught by the same parameterized invariants as every other workflow.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* agent_shin: fix bug bundle (gated LLM key, author-filtered marker dedup, dedup gh/grace helpers)

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* agent_shin: fix review_gate close-after-regression and case-insensitive label match

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* feat(triage): add one-shot 7-day heads-up sweep for Agent Shin rollout

Adds a rollout-day workflow that comments on every open external PR/issue
that the new triage bot WOULD auto-close, giving contributors 7 days to
fix their description before any destructive action runs.

Why now: merging this PR enables Agent Shin in dry-run. The follow-up
"enact" PR (next Monday) flips the destructive paths on. Without this
heads-up, contributors would get a close-comment on day 8 with no prior
warning. The heads-up names the cutoff date, lists the rubric, calls out
each PR/issue's specific missing pieces, and explains the recovery paths
(@agent-shin reconsider for PRs, edit + reopen for issues).

Files
- .github/scripts/_agent_shin_actions.py — thin maybe_post_comment /
  maybe_close_* / maybe_add_label / etc. wrappers. Each is a single
  `if dry_run: log; return; else: call_through()` so a dry-run preview
  differs from the real run in exactly one call site per mutation. The
  call-through goes via `triage_with_llm.<name>` (module-qualified) so
  monkeypatching the underlying function in tests is reflected here.
- .github/scripts/triage_rollout_heads_up.py — the sweep. Iterates every
  open PR + issue via `gh pr list` / `gh issue list`, runs the future
  rubric (review_gate for PRs, triage(kind="issue") for issues), and
  posts the heads-up on any item that would be auto-closed. Idempotent
  via a `<!-- agent-shin:rollout-heads-up -->` marker. Defaults to dry-
  run; --close opts in to real posts. --close-on overrides the cutoff
  date (defaults to today + 7 days).
- .github/workflows/triage_rollout_heads_up.yml — one-shot workflow.
  Triggers on push to litellm_internal_staging filtered to the script
  path (fires on rollout merge) plus workflow_dispatch with a dry_run
  input that defaults to "true" for safe manual re-runs.
- tests/test_litellm/test_triage_rollout_heads_up.py — 28 unit tests
  covering: the dry-run wrappers (each maybe_* gates correctly), the
  _would_be_closed predicate for PR vs. issue results, the comment
  formatter (cutoff/rubric/marker/recovery wording), per-item dispatch
  (skip-not-open, skip-internal-author, skip-already-notified,
  skip-passing, would-post/posted), and the sweep loop end-to-end.

Local preview (no GitHub mutations):
    python3 .github/scripts/triage_rollout_heads_up.py --repo BerriAI/litellm

Real run (what the workflow does):
    python3 .github/scripts/triage_rollout_heads_up.py --repo BerriAI/litellm --close

TODO: replace the placeholder ROLLOUT_BLOG_URL with the canonical
docs URL once the litellm-docs PR ships.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: gate reconsider workflow OPENAI_API_KEY + remove dead actions wrappers

- Mirror sibling Agent Shin workflows by only exposing OPENAI_API_KEY in
  triage_reconsider.yml when vars.AGENT_SHIN_ENABLED == 'true'. Previously
  the secret was unconditionally exposed, so any PR/issue author could
  trigger paid LLM calls by commenting '@agent-shin reconsider' even while
  the bot was supposed to be in dry-run.
- Remove the six unused dry-run wrappers (maybe_close_pr, maybe_close_issue,
  maybe_reopen_pr, maybe_reopen_issue, maybe_add_label, maybe_remove_label)
  from _agent_shin_actions.py — only maybe_post_comment is used by rollout
  scripts. Drop the associated tests that exercised the now-removed
  functions.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: address triage script edge cases

- triage_rollout_heads_up.py: replace %-d strftime specifier (GNU-only)
  with portable day formatting so the script doesn't crash on Windows.
- close_low_quality_prs.py: skip malformed JSON lines in fetch_pr_comments
  instead of letting one bad line abort the daily sweep, matching the
  pattern in triage_with_llm._iter_paginated_json.
- triage_with_llm.py: move has_linked_issue short-circuit before
  build_pr_prompt to avoid unnecessary prompt construction on PRs that
  link an issue.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(scripts): per-PR error isolation and limit grace warnings in close_low_quality_prs

- Wrap per-PR processing in try/except so a transient GitHub API failure
  on one PR no longer aborts the entire daily sweep (mirrors the pattern
  already used in triage_rollout_heads_up.py).
- Have --limit bound *all* destructive write actions (closures and grace
  warnings combined), not just closures. Prevents a backlog of newly
  failing PRs from flooding contributors with comments in a single run.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(agent-shin): remove 1000-PR cap on bulk sweeps; sweep entire backlog

Both bulk-sweep scripts hardcoded `gh {pr,issue} list --limit 1000`, and gh
lists newest-first — so the OLDEST ~900 PRs and ~380 issues were silently
dropped. That's exactly the stale backlog the daily closer and one-shot
rollout heads-up exist to catch.

Extract a single `list_open_items(kind, *, repo, fields)` helper into
`agent_shin_shared.py` with `GH_LIST_ALL_LIMIT = 100_000` — a ceiling far
above any realistic open backlog so gh paginates until the queue is
exhausted. `fetch_open_prs` and `_list_open_numbers` both delegate to it,
so the limit lives in exactly one place going forward.

Verified live against BerriAI/litellm:
- `fetch_open_prs` -> 1981 PRs (was 1000)
- `_list_open_numbers(issue)` -> 1382 issues (was 1000)
- `_list_open_numbers(pr)` -> 1981 PRs (was 1000)

Adds 7 regression tests asserting the new limit is passed, the dedicated
`gh {pr,issue} list` command + fields are used per kind, bad kind raises
ValueError, and both callers delegate to the shared helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(agent-shin): require non-mocked end-to-end QA proof for PR pass

The PR rubric previously passed any PR with a linked issue, regardless
of whether it showed the fix actually working. Sample spot-check found
21/25 recent external PRs passing, including ones that linked an issue
but provided zero QA evidence.

Tighten the rubric so a pass now requires BOTH:

  (1) CONTEXT — a linked issue OR a clear problem description with
      expected-vs-actual behavior.
  (2) END-TO-END QA PROOF — at least one of:
      (a) screenshot(s) of the fix working,
      (b) screen recording / video,
      (c) specific commands actually run, paired with their real
          output, against the real system.

Mocked unit tests, generic 'I tested it' claims, 'all tests pass'
without output, and the linked issue itself are explicitly excluded
from QA proof.

Also add 'qa_proof_type' to the JSON schema so the per-PR report
surfaces which kind of proof (or 'none') the judge saw.

Re-sample on the same 25 recent external PRs shifts the verdict
distribution from 21 pass / 4 fail to 4 pass / 21 fail, with zero
prior-fails now passing — the stricter rule catches PRs that ship
only with unit-test claims and no real integration evidence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agent-shin): link blog explainer from every action-required bot comment

Adds "What's this and why am I getting it?" links to docs.litellm.ai/blog/
agent-shin-triage from the four comments contributors actually read when
something went wrong: PR close, PR grace warning, issue close, issue grace
warning. PR comments also link the rubric section directly from the
QA-proof bullet so contributors can self-serve "what counts as proof"
without pinging a maintainer.

Pins the new guarantees in tests: blog link must appear in all four
comments, and the PR close comment must continue to flag mocked-dependency
unit tests as insufficient proof.

The linked blog post is in BerriAI/litellm-docs PR #240; the URL will 404
until that lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review_gate): raise sweep limit from 1000 to 100000 to match GH_LIST_ALL_LIMIT

gh lists newest-first, so capping at 1000 silently drops the oldest open
PRs — exactly the stale ones the daily sweep is meant to reconcile. Use
the same ceiling as agent_shin_shared.GH_LIST_ALL_LIMIT so the workflow
sees the entire backlog.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* Fix three Agent Shin triage edge cases

- review_gate: expire the regression-marker short-circuit after grace_days
  so PRs that were regressed and then abandoned can eventually be closed.
- review_gate: when the rubric short-circuits to pass via the linked-issue
  regex but Greptile drags the PR below the bar, replace the synthetic
  'LLM was not called' explanation with the real Greptile shortfall so
  regression / close comments are not misleading.
- triage_rollout_heads_up._comments_have_marker: drop the unused 'kind'
  parameter and filter by bot author so a contributor quoting the
  heads-up via 'Quote reply' cannot trick the idempotency check, matching
  the pattern in triage_with_llm._has_marker.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: pass min_greptile_score through to ready-for-review comment text

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* feat(agent-shin): warmer triage comments — bullet-train emoji, 'what you got right' section, softer 'park this for later' framing

User feedback on the auto-triage comments contributors will see:

1. Tone — the previous 'You have 1 day to address this before this PR is
   auto-closed' framing reads as an ultimatum. Replace with: 'If the
   description isn't updated in the next 1 day, I'll auto-close this PR.
   That's not us saying we don't care about the change — we want the
   open-PR list to mirror what a maintainer can act on right now, so
   contributors don't get lost in a backlog. A closed PR is a soft "park
   this for later," not a rejection. Take your time.'

2. Positive feedback — the previous comments only listed what was missing.
   Now every close + grace-warning comment opens with a 'What you got
   right:' section rendered from the judge's per-field flags. Contributors
   see a checkmark for everything they got right (linked issue, problem
   description, expected/actual, QA proof for PRs; runnable repro,
   screenshot/log, expected/actual, motivation+example for issues) before
   the gaps. The block is omitted entirely when nothing is present so
   we never render 'What you got right: (nothing).'

3. Reconsider trigger — the previous grace warning told contributors to
   comment '@agent-shin reconsider' during the grace window. They don't
   need to — the bot re-checks on every sweep. The new copy says 'just
   update the description, no need to ping me' for the grace path, and
   reserves '@agent-shin reconsider' for the post-close recovery path.

4. Bullet-train emoji — replace 👋 with 🚄 (Shinkansen, the symbol of
   Agent Shin) across every action-required comment: PR close, PR grace
   warning, issue close, issue grace warning, within-grace, Greptile-
   closer grace warning, rollout heads-up. Pinned in tests so a future
   refactor can't silently revert.

5. Greptile-post-close — the @greptileai bullet now explicitly says 'a
   low Greptile score isn't a blocker either,' since the previous copy
   buried the fact that @greptileai works after auto-close.

Comment templates updated: format_pr_close_comment,
format_issue_close_comment, format_grace_warning_pr_comment,
format_grace_warning_issue_comment, format_within_grace_comment
(triage_with_llm.py); format_grace_warning_comment
(close_low_quality_prs.py); format_heads_up_comment header
(triage_rollout_heads_up.py).

New helpers: _format_present_for_pr / _format_present_for_issue /
_format_present_block, driven off the existing per-field flags the
LLM judge already emits — no prompt change needed.

New tests pin: bullet-train emoji in every action-required comment;
'What you got right' appears with ✅ bullets when fields are present;
the block is omitted when no fields are present; 'park this for
later' / 'not a rejection' softer framing; grace warnings tell the
contributor 'no need to ping' during the grace window (reconsider is
the post-close path only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agent-shin): gate triage on a dogfood allowlist

Add ALLOWLIST_LOGINS to agent_shin_shared so Agent Shin only acts on the
named accounts while the set is non-empty. mateo-berri and SwiftWinds are
allowlisted for the dogfood rollout; everyone else is skipped with
skip-not-allowlisted across all four entrypoints (triage, review gate, the
daily low-quality sweep, and the rollout heads-up).

For an allowlisted author the usual internal/external classification is
bypassed, so a maintainer's own org account still gets triaged during
testing. Emptying the set lifts the restriction and restores full triage
for the public rollout. The gate is dependency-injected via an `allowlist`
parameter defaulting to the constant, so the internal/external-skip paths
stay testable.

* feat(agent-shin): tighten QA-proof and issue rubrics, ack reconsider with reactions

Reorder the end-to-end QA proof options to video, then screenshots, then
exact commands with their real output across the PR template, the LLM judge
prompts, and every contributor-facing comment, and spell out that mocked or
stubbed runs (including pytest on the repo's own unit tests, which mock the
provider, DB, and network) never count as proof. QA proof is now required of
all contributors, not just external ones.

Tighten the issue bug-report rubric to require end-to-end evidence of the bug
(the "before" half: a video, screenshot, or command paired with real output)
plus expected vs. actual behavior, drop the bias toward PASS, and collapse the
separate has_repro/has_proof flags into a single has_repro signal.

Standardize the bullet-train emoji and strip em dashes from the bot's
public-facing messages, and route issue recovery through @agent-shin
reconsider since GitHub doesn't let OSS authors reopen an issue a bot closed.

Acknowledge an @agent-shin reconsider the moment it's accepted with an eyes
reaction and a thumbs-up once the run finishes, both gated on
AGENT_SHIN_ENABLED so dry-run leaves no trace.

* fix(agent-shin): shorten auto-close grace to 2 hours and drop the instant-close bypass

Two dogfooding changes to the Agent Shin grace window. First, the warn-then-close
grace (GRACE_PERIOD_SECONDS) drops from a day to 2 hours so the "fix it before it
closes" loop can be exercised in one sitting; the constant carries a note to bump
it back up for the public rollout.

Second, remove IMMEDIATE_CLOSE_LOGINS entirely. SwiftWinds (the external dogfood
account) used to skip the grace window and close on first detection, which also
meant closing real PRs even during a scheduled dry run because the per-PR
override flipped dry_run off. It now follows the same warn-then-close path as
every other author, so a low-quality PR is warned first and only closed once the
2-hour window elapses. This also closes the Greptile finding that the sweep could
mutate real PRs while AGENT_SHIN_ENABLED was still off.

The review gate's separate age-based grace (DEFAULT_GRACE_DAYS) is left unchanged.

Regression tests pin that SwiftWinds now warns-grace instead of closing instantly,
and that a dry-run sweep over a closeable PR reports "would close" without making
any GitHub mutation.

* fix(agent-shin): gate reconsider reopen on an Agent Shin close marker

was_closed_by_agent_shin only checked that the most recent close actor was
the bot identity. That identity defaults to github-actions[bot], which is
shared by every workflow in the repo (stale/duplicate sweeps included), so a
contributor could @agent-shin reconsider an item another workflow closed and,
if the description passed the rubric, get it reopened even though Agent Shin
was never the closer.

Require a second, Agent-Shin-specific signal alongside the actor check: an
auto-close comment stamped with a hidden AGENT_SHIN_CLOSE_MARKER. Both close
paths (the grace-period close and the review-gate close) flow through
format_pr_close_comment / format_issue_close_comment, so stamping the marker
there covers every real close while leaving the grace warnings unmarked. The
guard stays fail-closed: no marker, no reopen.

This also replaces the unused AGENT_SHIN_AUTO_CLOSE_MARKER constant (a visible
phrase the guard never consulted) with the hidden marker the guard now relies
on.

* fix(agent-shin): stamp close marker on sweep closes and disclose regression deadline

The daily Greptile sweep's close comment advertised `@agent-shin reconsider`
but never stamped AGENT_SHIN_CLOSE_MARKER, so the reconsider reopen guard
(was_closed_by_agent_shin), which now also requires that marker, silently
rejected every sweep-closed PR with `skip-not-bot-closed`. Move the marker into
agent_shin_shared so both close paths share one source of truth, extract
format_close_comment so the sweep close comment is unit-testable, and stamp the
marker there.

Also disclose the grace_days deadline in the review-gate regression comment; it
promised "the PR stays open" without mentioning that a still-failing PR is
auto-closed grace_days after the notice, which would surprise contributors with
a close they were never warned about.

* fix(triage): tighten Agent Shin reconsider reopen guards

The bot-closed guard accepted any historical Agent Shin marker comment
on the thread as proof that Agent Shin owned the latest close, so a
post-reopen close by another workflow under the shared
`github-actions[bot]` identity could still satisfy the gate and let
`@agent-shin reconsider` reopen a PR that Agent Shin did not close
this cycle. `fetch_last_close_event` now also returns the latest
`closed` event timestamp, and `was_closed_by_agent_shin` requires
the most recent Agent Shin marker comment to sit at (or just before)
that timestamp, with a small skew window for clock drift between the
events and comments APIs.

In the same path the LLM verdict check used `decision != "fail"` to
choose the reopen branch, which treated a missing, empty, or typo
verdict as a pass. Reopen is destructive, so the check now requires an
explicit `decision == "pass"` and ambiguous verdicts fall through
to the "still failing" branch instead.

* style(agent-shin): black-format reconsider guard hardening

* docs(agent-shin): scope dry-run wrapper docstring to the single existing helper

The module docstring claimed it wrapped every Agent Shin mutation and
referenced post_comment/close_pr/etc., but only maybe_post_comment exists.
Describe the single helper accurately while keeping the dry-run pattern
guidance for any future wrapper.

* chore(agent-shin): defer issue/PR template changes to the rollout PR

The triage and review-gate automation is gated to the allowlisted authors
(mateo-berri, SwiftWinds) and AGENT_SHIN_ENABLED, so during this rollout it
only acts on internal PRs/issues. The issue and PR templates have no such
gate; they change for every contributor on merge and advertise that an LLM
bot auto-closes external submissions, which won't happen while the allowlist
is the sole author gate. Revert bug_report.yml, feature_request.yml, and
pull_request_template.md to base so the public-facing messaging lands with
the rollout flip instead of ahead of it. The scripts embed their own rubric
and never read these files, so triage behavior is unchanged.

* ci(agent-shin): hash-pin the openai install in privileged triage workflows

The triage workflows install the OpenAI client with `pip install
"openai>=1.40.0"`, a floating lower bound that resolves openai and its
whole transitive tree to whatever PyPI serves at run time. These jobs run
under pull_request_target with a write-scoped GITHUB_TOKEN, and the
install plus the triage run happen on every PR open regardless of the
AGENT_SHIN_ENABLED dry-run gate (that gate only withholds the LLM key and
the destructive --close path), so a compromised release would execute
during install or import while the token is in scope.

Install instead from a new .github/scripts/triage-requirements.txt that
pins openai==2.33.0 and every transitive dependency to an exact version
with sha256 hashes, via pip --require-hashes. The workflows already
sparse-checkout .github/scripts from the base repo (never fork code), so
the pinned file is trusted. Add static guardrails to
test_github_triage_workflows.py that fail if any installer workflow
reverts to a floating openai install or if the requirements file loses
its exact pins or hashes.

* ci(agent-shin): gate rollout heads-up real run behind manual dispatch

The rollout heads-up workflow fired its real `--close` sweep on every push
to litellm_internal_staging that touched the script, and exposed
OPENAI_API_KEY unconditionally, unlike every sibling triage workflow which
only exposes the key on an enabled or dispatched run. That made merging the
script post real heads-up comments (bounded only by the dogfood allowlist),
which contradicts the inert-by-default safety invariant; once the allowlist
is cleared for the public rollout, any later edit to the file would sweep
the whole open backlog with real writes.

The heads-up cannot be gated on AGENT_SHIN_ENABLED: its whole job is to warn
contributors before that flag flips on, so it has to run while the flag is
still off. Instead the automatic push trigger now stays dry-run, and the
real one-shot sweep is a deliberate manual workflow_dispatch with
dry_run=false, the sole path that adds `--close`. OPENAI_API_KEY is exposed
only on that dispatch, matching the sibling workflows.

Add static guardrails that fail if the push path regains a `--close`, if the
dispatch gate stops fail-closing on the exact string "false", or if the key
is exposed unconditionally again.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants