Agent Shin rollout: day-7 enactment sweep + flip AGENT_SHIN_ENABLED to live-by-default#28834
Agent Shin rollout: day-7 enactment sweep + flip AGENT_SHIN_ENABLED to live-by-default#28834mateo-berri wants to merge 4 commits into
Conversation
…-by-default This is the second-step PR of the Agent Shin rollout: it merges 7 days after the heads-up (#28759 + this branch) and turns the bot on for real. Two things happen on merge: 1. A one-shot enactment sweep runs over every open external PR/issue, driving each through the steady-state triage logic with --close=true. PRs that fixed their description in the grace week get tagged `ready for review`; PRs/issues still failing the rubric get the standard 24h grace warning (or close, if they already had one and 24h elapsed). The sweep uses the same `_agent_shin_actions` dry-run wrappers as the heads-up so a single boolean toggles real vs. log. 2. The four existing triage workflows (triage_pr_with_llm, triage_issue_with_llm, close_low_quality_prs, review_gate, triage_reconsider) flip from "dry-run unless AGENT_SHIN_ENABLED=true" to "live unless AGENT_SHIN_ENABLED=false". The variable becomes a kill switch instead of an opt-in. Default semantics: unset means live. Files ----- .github/scripts/triage_rollout_enact.py — the enactment sweep. * Calls review_gate(close=False, now=current_time) for PRs and triage(close=False) (wrapped in `_fake_now(current_time)`) for issues, then routes the verdict through the matching maybe_* wrapper. Two single-page dispatch tables (_apply_pr_result, _apply_issue_result) make it easy to audit which actions map to which mutations. * Time-travel dry-run: --simulate-future-hours N (default 24+1s when --close is not set) shifts the clock forward N hours so you can preview what the next daily cron will do. Implemented in exactly two places: review_gate's `now=` parameter for PRs, and a `_fake_now` context manager that patches `agent_shin_shared.dt.datetime.now` for issues. The context manager restores the original module on exit (and on exception). * --simulate-now ISO_TS pins the clock to a specific timestamp. * --close forbids the simulate flags so a real run is always at wall-clock time. .github/workflows/triage_rollout_enact.yml — thin wrapper. Fires --close on push to litellm_internal_staging (the enactment merge) and offers a workflow_dispatch with dry_run + simulate_future_hours inputs for safe re-runs. .github/workflows/{triage_pr_with_llm,triage_issue_with_llm, close_low_quality_prs,review_gate,triage_reconsider}.yml — inverted: * `${AGENT_SHIN_ENABLED:-true}` (default live) * Conditional: `= "false"` enters kill-switch branch * OPENAI_API_KEY env: exposed unless `vars.AGENT_SHIN_ENABLED == 'false'` Comments updated to call out the new kill-switch semantics. tests/test_litellm/test_github_triage_workflows.py — split the destructive-gate constant into PER_RUN_GATE_ENV (per-input gates that must still match `= "true"`) and KILL_SWITCH_WORKFLOWS (all five workflows, must match the inverted `= "false"` / `!= "false"` pattern). Updates the kill-switch test wording to describe the new semantics. tests/test_litellm/test_triage_rollout_enact.py — 22 tests covering: * _fake_now patches and restores (including on exception) * Each branch of _apply_pr_result / _apply_issue_result with a recorder that captures the maybe_* call sequence and dry_run flag * _process_one skip-not-open / skip-internal-author * run() end-to-end: dry-run threads dry_run=True through every wrapper, real run threads False, current_time threads through to the per-item evaluators, --kind / --only-numbers restrict scope. Local preview commands ---------------------- # What this script would do right now (no GitHub writes): python3 .github/scripts/triage_rollout_enact.py --repo BerriAI/litellm # What the next daily cron will do (24h+1s in the future): python3 .github/scripts/triage_rollout_enact.py --repo BerriAI/litellm \\ --simulate-future-hours 24 # Pin to a specific moment: python3 .github/scripts/triage_rollout_enact.py --repo BerriAI/litellm \\ --simulate-now '2026-06-02T09:00:00Z' Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Greptile SummaryThis PR completes the Agent Shin rollout by adding a one-shot day-7 enactment sweep script/workflow and inverting
Confidence Score: 3/5The enactment sweep can re-fire as a real run on any future push to litellm_internal_staging that touches the script, leaving the one-shot sweep able to re-close PRs/issues unexpectedly. The core logic, dry-run wrappers, and kill-switch inversion are correct and well-tested. The push trigger in triage_rollout_enact.yml fires with --close on any future push touching the script, not only the enactment merge; a subsequent bug fix would silently re-run the full sweep. Stale comments in the two triage workflow files are a secondary readability issue with no runtime impact. .github/workflows/triage_rollout_enact.yml needs a second look on its push trigger semantics; .github/workflows/triage_issue_with_llm.yml and .github/workflows/triage_pr_with_llm.yml have stale comment blocks that should be removed.
|
| Filename | Overview |
|---|---|
| .github/scripts/triage_rollout_enact.py | New one-shot enactment sweep script; logic and dry-run wrappers are well-structured, but contains dead code (_FakeDt class) and an unused constant (_DEFAULT_FUTURE_SECONDS) |
| .github/workflows/triage_rollout_enact.yml | New one-shot workflow with a push trigger that fires a real enactment sweep on any future push to litellm_internal_staging touching the script, not just the enactment merge |
| .github/workflows/triage_issue_with_llm.yml | Kill-switch inversion applied correctly, but the old opt-in comment block (lines 69-75) was not removed and directly contradicts the new != 'false' pattern |
| .github/workflows/triage_pr_with_llm.yml | Same stale comment issue as triage_issue_with_llm.yml; kill-switch logic itself is correct |
| tests/test_litellm/test_triage_rollout_enact.py | 22 new unit tests with full stubbing; covers _fake_now, dispatch tables, skip cases, and end-to-end sweep loop - no real network calls |
| tests/test_litellm/test_github_triage_workflows.py | Correctly splits DESTRUCTIVE_GATE_ENV into PER_RUN_GATE_ENV and KILL_SWITCH_WORKFLOWS; updated accepted_patterns match the new kill-switch shell idioms |
| .github/workflows/triage_reconsider.yml | Kill-switch inversion applied cleanly; no stale comments |
| .github/workflows/review_gate.yml | Kill-switch inversion applied cleanly |
| .github/workflows/close_low_quality_prs.yml | Kill-switch inversion applied cleanly |
Comments Outside Diff (2)
-
.github/workflows/triage_issue_with_llm.yml, line 69-78 (link)The first comment block (lines 69–75) is stale: it was written to justify the old
= "true"opt-in gate and explicitly warns that!= "false"would enable closures on typos. The very next line now uses!= "false", so the surviving comment directly contradicts the implementation and would mislead anyone auditing the security model. The same stale block is present intriage_pr_with_llm.ymllines 82–88. -
.github/workflows/triage_pr_with_llm.yml, line 82-91 (link)Same stale comment as in
triage_issue_with_llm.yml: the block originally justified the old= "true"opt-in gate and explicitly warns that!= "false"would enable closures on typos — but line 92 now uses!= "false". The old warning survives and directly contradicts the new implementation.
Reviews (1): Last reviewed commit: "feat(triage): day-7 enactment sweep + fl..." | Re-trigger Greptile
| on: | ||
| push: | ||
| branches: | ||
| - litellm_internal_staging | ||
| paths: | ||
| - ".github/scripts/triage_rollout_enact.py" |
There was a problem hiding this comment.
Re-fire risk on future script edits
The push trigger fires — with --close in the real-run branch — on any push to litellm_internal_staging that touches triage_rollout_enact.py, not just the one-shot enactment merge. If a subsequent bug fix or refactor to the script is ever pushed to that branch, a second real enactment sweep will run and close PRs/issues that have already been processed. Consider removing the push trigger entirely after the enactment merge, or adding a sentinel file (e.g., a .enacted marker) so the workflow is a no-op on re-trigger.
| class _FakeDt: | ||
| """Drop-in replacement for ``datetime`` with a frozen ``now``.""" | ||
|
|
||
| timezone = real_dt.timezone | ||
| datetime = real_dt.datetime | ||
|
|
||
| @staticmethod | ||
| def datetime_now(tz: dt.tzinfo | None = None) -> dt.datetime: | ||
| return when if tz is None else when.astimezone(tz) | ||
|
|
||
| # We only need to override `dt.datetime.now`. Easiest path: install a |
There was a problem hiding this comment.
The
_FakeDt class is dead code: it is defined inside _fake_now but never assigned or used — _DtShim is what actually gets installed on agent_shin_shared.dt. Leaving it in place misleads readers into thinking it plays a role in the time-travel patch.
| class _FakeDt: | |
| """Drop-in replacement for ``datetime`` with a frozen ``now``.""" | |
| timezone = real_dt.timezone | |
| datetime = real_dt.datetime | |
| @staticmethod | |
| def datetime_now(tz: dt.tzinfo | None = None) -> dt.datetime: | |
| return when if tz is None else when.astimezone(tz) | |
| # We only need to override `dt.datetime.now`. Easiest path: install a | |
| # We only need to override `dt.datetime.now`. Easiest path: install a |
| DEFAULT_SIMULATE_HOURS = 24 | ||
| _DEFAULT_FUTURE_SECONDS = DEFAULT_SIMULATE_HOURS * 3600 + 1 |
There was a problem hiding this comment.
_DEFAULT_FUTURE_SECONDS is defined but never referenced anywhere in the module — main() computes the offset inline as DEFAULT_SIMULATE_HOURS + 1 / 3600. The named constant adds no value and its "SECONDS" suffix misleads readers given the surrounding code works in hours.
| DEFAULT_SIMULATE_HOURS = 24 | |
| _DEFAULT_FUTURE_SECONDS = DEFAULT_SIMULATE_HOURS * 3600 + 1 | |
| DEFAULT_SIMULATE_HOURS = 24 |
- _fake_now now patches triage_with_llm.dt (the alias actually used by the issue grace check in _seconds_since_latest_marker_comment) instead of agent_shin_shared.dt, which was a no-op for issues. - Drop the dead _FakeDt inner class that was never installed. - Drop unused _DEFAULT_FUTURE_SECONDS constant. - Update tests to assert against triage_with_llm.dt. Co-authored-by: Yassin Kortam <yassin@berri.ai>
|
|
…Error Co-authored-by: Yassin Kortam <yassin@berri.ai>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Unused imports of PR-specific comment formatters
- Removed the unused
format_grace_warning_pr_commentandformat_pr_close_commentimports from thetriage_with_llmimport block.
- Removed the unused
Preview (5497394a19)
diff --git a/.github/scripts/triage_rollout_enact.py b/.github/scripts/triage_rollout_enact.py
new file mode 100644
--- /dev/null
+++ b/.github/scripts/triage_rollout_enact.py
@@ -1,0 +1,506 @@
+#!/usr/bin/env python3
+"""One-shot day-7 enactment sweep for the Agent Shin rollout.
+
+This runs once, on the merge commit of the enactment PR, exactly 7 days after
+the heads-up sweep. It walks every open external PR/issue and lets Agent
+Shin's steady-state logic decide what to do for each:
+
+ * PR passing the rubric -> tag `ready for review` (via `review_gate`)
+ * PR failing the rubric, has the heads-up marker, past 24h since the warning
+ -> close + post the standard close comment
+ * PR failing the rubric, no heads-up marker yet (created this week)
+ -> post the 24h grace warning (steady-state path)
+ * Issue passing -> no-op
+ * Issue failing past grace -> close + comment
+ * Issue failing in grace -> warn (or already-warned skip)
+ * Internal-author / closed -> skip
+
+The script doesn't replicate the rubric logic — it calls into the existing
+``review_gate`` and ``triage`` paths in dry-run mode, then routes their
+verdicts through the ``maybe_*`` wrappers so a single ``dry_run`` boolean
+toggles between logging and real GitHub mutations.
+
+Time-travel dry-run
+-------------------
+``--simulate-future-hours N`` (default 24h+1s when --dry-run is set and no
+explicit value is given) shifts the script's notion of "now" forward by N
+hours. This lets you preview what the *next* scheduled run will do: any PR
+currently in the grace window will tip into "past grace" after 24h, and the
+preview shows those would-close decisions before they actually fire.
+
+Time-travel is implemented in exactly one place: a ``current_time`` variable
+is computed at the top of ``run()`` and threaded through ``review_gate(now=)``
+for PRs. For issues (whose grace check goes through
+``seconds_since_latest_marker_comment``), we patch ``triage_with_llm``'s
+``dt.datetime.now`` under a context manager for the duration of each
+``triage()`` call — one small surface, easy to audit.
+
+CLI examples
+------------
+
+::
+
+ # Pure preview at current time (no GitHub writes):
+ python3 .github/scripts/triage_rollout_enact.py --repo BerriAI/litellm
+
+ # Preview what the next daily cron will do (24h+1s in the future):
+ python3 .github/scripts/triage_rollout_enact.py --repo BerriAI/litellm \\
+ --simulate-future-hours 24
+
+ # Real run (what the workflow does on merge):
+ python3 .github/scripts/triage_rollout_enact.py --repo BerriAI/litellm --close
+"""
+
+from __future__ import annotations
+
+import argparse
+import contextlib
+import datetime as dt
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Any, Iterator
+
+_SCRIPTS_DIR = Path(__file__).resolve().parent
+if str(_SCRIPTS_DIR) not in sys.path:
+ sys.path.insert(0, str(_SCRIPTS_DIR))
+
+import triage_with_llm # noqa: E402
+from _agent_shin_actions import ( # noqa: E402
+ maybe_add_label,
+ maybe_close_issue,
+ maybe_close_pr,
+ maybe_post_comment,
+ maybe_remove_label,
+)
+from triage_with_llm import ( # noqa: E402
+ DEFAULT_MODEL,
+ READY_FOR_REVIEW_LABEL,
+ fetch_issue,
+ fetch_pr,
+ format_grace_warning_issue_comment,
+ format_issue_close_comment,
+ gh,
+ is_internal_contributor,
+ review_gate,
+ triage,
+)
+
+# Default time-travel offset: the next daily cron runs 24h after this script's
+# real run, so 24h+1s gives a "what fires tomorrow" preview without any edge
+# cases at the boundary itself.
+DEFAULT_SIMULATE_HOURS = 24
+
+
+@contextlib.contextmanager
+def _fake_now(when: dt.datetime) -> Iterator[None]:
+ """Patch ``dt.datetime.now`` in triage_with_llm so issue triage's grace
+ check resolves against ``when`` rather than wall-clock time.
+
+ Patches ``triage_with_llm.dt`` (the module's local alias) rather than
+ the global ``datetime`` so we don't leak the override into unrelated code.
+ The issue grace check reads the wall clock inside
+ ``_seconds_since_latest_marker_comment`` via this module's ``dt`` alias,
+ so this is the only surface that needs to be frozen. The patch is scoped
+ to the ``with`` block — once it exits, the original ``dt`` module is
+ restored, so a per-item call to ``triage()`` is the only code that ever
+ sees the fake clock.
+ """
+ real_dt = triage_with_llm.dt
+
+ # We only need to override `dt.datetime.now`. Easiest path: install a
+ # tiny shim that proxies to the real `datetime` module for everything
+ # except `.now()`.
+ class _DtShim:
+ timezone = real_dt.timezone
+
+ class datetime(real_dt.datetime): # noqa: N801 - mirror stdlib name
+ @classmethod
+ def now(cls, tz: dt.tzinfo | None = None) -> dt.datetime:
+ return when if tz is None else when.astimezone(tz)
+
+ # Pass everything else through to the real module.
+ def __getattr__(self, name: str) -> Any: # pragma: no cover - shim
+ return getattr(real_dt, name)
+
+ triage_with_llm.dt = _DtShim()
+ try:
+ yield
+ finally:
+ triage_with_llm.dt = real_dt
+
+
+def _list_open_numbers(repo: str, kind: str) -> list[int]:
+ cmd = "pr" if kind == "pr" else "issue"
+ raw = gh(
+ cmd,
+ "list",
+ "--repo",
+ repo,
+ "--state",
+ "open",
+ "--limit",
+ "1000",
+ "--json",
+ "number",
+ )
+ return [item["number"] for item in json.loads(raw)]
+
+
+# ---------------------------------------------------------------------------
+# Per-PR / per-issue dispatch.
+#
+# Each helper takes the verdict-shaped result from review_gate / triage, then
+# routes it to the matching maybe_* wrapper. The wrappers each carry the
+# `dry_run` boolean, so a dry-run preview hits exactly the same code path as
+# the real run except for the final GitHub API call.
+
+
+def _apply_pr_result(*, repo: str, number: int, result: dict, dry_run: bool) -> dict:
+ """Translate a ``review_gate`` result into the matching GitHub mutation
+ (or a dry-run log line). Returns the augmented result with the action
+ taken (``"applied"``, ``"would-apply"``, or ``"noop"``)."""
+ action = result.get("action") or "unknown"
+ comment = result.get("comment")
+ base = {"kind": "pr", "number": number, "review_gate_action": action}
+
+ # `review_gate(close=False)` returns `would-*` strings for every
+ # transition; `review_gate(close=True)` returns the already-applied
+ # counterparts. The dispatcher below treats both forms identically so the
+ # enactment script can be driven in either mode (we always run it in
+ # close=False mode to capture the would-* preview, then re-apply the
+ # mutations through the dry-run wrappers).
+ if action in ("noop-passing", "skip-not-open", "skip-internal-author"):
+ return {**base, "result": "noop"}
+ if action in ("skip-no-llm-key", "skip-llm-error"):
+ return {
+ **base,
+ "result": "noop-llm-unavailable",
+ "error": result.get("error"),
+ }
+ if action in ("would-label-ready", "labeled-ready"):
+ assert comment, "review_gate must supply a comment for label-ready"
+ maybe_post_comment(repo, number, comment, dry_run=dry_run)
+ maybe_add_label(repo, number, READY_FOR_REVIEW_LABEL, dry_run=dry_run)
+ return {**base, "result": "labeled-ready"}
+ if action in ("would-remove-label", "label-removed-regressed"):
+ assert comment
+ maybe_remove_label(repo, number, READY_FOR_REVIEW_LABEL, dry_run=dry_run)
+ maybe_post_comment(repo, number, comment, dry_run=dry_run)
+ return {**base, "result": "label-removed-regressed"}
+ if action in ("would-close", "closed"):
+ assert comment
+ maybe_post_comment(repo, number, comment, dry_run=dry_run)
+ maybe_close_pr(repo, number, dry_run=dry_run)
+ return {**base, "result": "closed"}
+ if action in ("would-notify-within-grace", "within-grace-notified"):
+ assert comment
+ maybe_post_comment(repo, number, comment, dry_run=dry_run)
+ return {**base, "result": "warned-within-grace"}
+ if action in ("within-grace-already-notified", "regressed-already-notified"):
+ return {**base, "result": "noop-already-notified"}
+ # Anything else falls through as a no-op so an unexpected verdict from
+ # review_gate (e.g. a future action string) doesn't cause a partial write.
+ return {**base, "result": "noop-unknown-action"}
+
+
+def _apply_issue_result(*, repo: str, number: int, result: dict, dry_run: bool) -> dict:
+ """Translate a ``triage`` (kind='issue') result into the matching
+ mutation. Mirrors `_apply_pr_result` for the issue half of the flow."""
+ action = result.get("action") or "unknown"
+ verdict = result.get("verdict") or {}
+ base = {"kind": "issue", "number": number, "triage_action": action}
+
+ if action in (
+ "pass-llm",
+ "pass-linked-issue",
+ "skip-not-open",
+ "skip-internal-author",
+ ):
+ return {**base, "result": "noop"}
+ if action in ("skip-no-llm-key", "skip-llm-error"):
+ return {
+ **base,
+ "result": "noop-llm-unavailable",
+ "error": result.get("error"),
+ }
+ if action in ("would-warn-grace", "warned-grace"):
+ body = format_grace_warning_issue_comment(verdict)
+ maybe_post_comment(repo, number, body, dry_run=dry_run)
+ return {**base, "result": "warned-within-grace"}
+ if action in ("skip-in-grace-period",):
+ return {**base, "result": "noop-already-warned"}
+ if action in ("would-close", "closed"):
+ body = format_issue_close_comment(verdict)
+ maybe_post_comment(repo, number, body, dry_run=dry_run)
+ maybe_close_issue(repo, number, dry_run=dry_run)
+ return {**base, "result": "closed"}
+ return {**base, "result": "noop-unknown-action"}
+
+
+def _evaluate_pr(
+ *,
+ repo: str,
+ number: int,
+ model: str,
+ current_time: dt.datetime,
+ judge: Any = None,
+) -> dict:
+ """Run ``review_gate`` in preview mode against ``current_time``.
+
+ Always uses ``close=False`` so the underlying review_gate never mutates
+ GitHub directly — the enactment script is the single source of mutations
+ and routes everything through the dry-run wrappers.
+ """
+ return review_gate(
+ repo=repo,
+ number=number,
+ close=False,
+ model=model,
+ judge=judge,
+ now=current_time,
+ )
+
+
+def _evaluate_issue(
+ *,
+ repo: str,
+ number: int,
+ model: str,
+ current_time: dt.datetime,
+ judge: Any = None,
+) -> dict:
+ """Run ``triage(kind='issue')`` in preview mode against ``current_time``.
+
+ ``triage`` doesn't accept a ``now`` parameter, so the time-travel patch is
+ applied here (the only place issues touch the wall clock is the
+ grace-warning age check inside ``seconds_since_latest_marker_comment``).
+ """
+ with _fake_now(current_time):
+ return triage(
+ repo=repo,
+ kind="issue",
+ number=number,
+ close=False,
+ model=model,
+ judge=judge,
+ )
+
+
+def _process_one(
+ *,
+ repo: str,
+ kind: str,
+ number: int,
+ model: str,
+ dry_run: bool,
+ current_time: dt.datetime,
+ judge: Any = None,
+) -> dict:
+ """Evaluate one PR/issue and apply the resulting mutation via the
+ maybe_* wrappers. Skip-cases (not-open, internal author, no key) short-
+ circuit before any LLM call."""
+ fetcher = fetch_pr if kind == "pr" else fetch_issue
+ item = fetcher(repo, number)
+
+ if (item.get("state") or "") != "open":
+ return {"kind": kind, "number": number, "result": "skip-not-open"}
+ if is_internal_contributor(item):
+ return {"kind": kind, "number": number, "result": "skip-internal-author"}
+
+ if kind == "pr":
+ result = _evaluate_pr(
+ repo=repo,
+ number=number,
+ model=model,
+ current_time=current_time,
+ judge=judge,
+ )
+ return _apply_pr_result(
+ repo=repo, number=number, result=result, dry_run=dry_run
+ )
+ result = _evaluate_issue(
+ repo=repo,
+ number=number,
+ model=model,
+ current_time=current_time,
+ judge=judge,
+ )
+ return _apply_issue_result(repo=repo, number=number, result=result, dry_run=dry_run)
+
+
+def _print_summary(results: list[dict], *, current_time: dt.datetime) -> None:
+ counts: dict[str, int] = {}
+ for r in results:
+ counts[r.get("result") or "unknown"] = (
+ counts.get(r.get("result") or "unknown", 0) + 1
+ )
+ print(f"\n=== enactment summary (clock={current_time.isoformat()}) ===")
+ for action in sorted(counts):
+ print(f" {action:35s} {counts[action]}")
+ print(f" total {len(results)}")
+
+
+def run(
+ *,
+ repo: str,
+ close: bool,
+ model: str,
+ current_time: dt.datetime,
+ kinds: tuple[str, ...] = ("pr", "issue"),
+ judge: Any = None,
+ only_numbers: dict[str, list[int]] | None = None,
+) -> list[dict]:
+ """Sweep ``repo`` and apply the enactment verdicts. Returns per-item results."""
+ dry_run = not close
+ mode_label = "DRY RUN" if dry_run else "REAL RUN"
+ print(
+ f"[{mode_label}] enactment sweep over {repo} at clock={current_time.isoformat()}"
+ )
+
+ results: list[dict] = []
+ for kind in kinds:
+ numbers = list((only_numbers or {}).get(kind, [])) or _list_open_numbers(
+ repo, kind
+ )
+ print(f"\n--- {kind}s: {len(numbers)} open ---")
+ for n in numbers:
+ try:
+ result = _process_one(
+ repo=repo,
+ kind=kind,
+ number=n,
+ model=model,
+ dry_run=dry_run,
+ current_time=current_time,
+ judge=judge,
+ )
+ except Exception as exc: # noqa: BLE001 - per-item errors don't abort
+ result = {
+ "kind": kind,
+ "number": n,
+ "result": "error",
+ "error": str(exc),
+ }
+ print(f"!! {kind}#{n}: {exc}", file=sys.stderr)
+ print(f" {kind}#{n}: {result.get('result')}")
+ results.append(result)
+ _print_summary(results, current_time=current_time)
+ return results
+
+
+def main() -> int:
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument("--repo", required=True, help="owner/repo")
+ parser.add_argument(
+ "--close",
+ action="store_true",
+ help=(
+ "Actually post comments and close PRs/issues. Without this flag "
+ "the script runs in dry-run mode and only logs what it would do."
+ ),
+ )
+ parser.add_argument(
+ "--simulate-future-hours",
+ type=float,
+ default=None,
+ help=(
+ "Dry-run only: pretend the wall clock is N hours in the future "
+ f"(default when --close is NOT set: {DEFAULT_SIMULATE_HOURS}h+1s, "
+ "so you preview exactly what the next daily run will do). Set to "
+ "0 to preview at the current clock instead."
+ ),
+ )
+ parser.add_argument(
+ "--simulate-now",
+ type=str,
+ default=None,
+ help=(
+ "Dry-run only: pin the wall clock to this ISO-8601 timestamp "
+ "(e.g. '2026-06-02T09:00:00Z'). Overrides --simulate-future-hours."
+ ),
+ )
+ parser.add_argument(
+ "--model",
+ default=os.environ.get("TRIAGE_MODEL") or DEFAULT_MODEL,
+ help=f"Model for the rubric LLM judge (default: {DEFAULT_MODEL}).",
+ )
+ parser.add_argument(
+ "--kind",
+ choices=("pr", "issue", "both"),
+ default="both",
+ help="Restrict the sweep to PRs or issues only (default: both).",
+ )
+ parser.add_argument(
+ "--only-pr",
+ type=int,
+ action="append",
+ default=[],
+ help="Limit the PR sweep to these PR numbers (repeat for several).",
+ )
+ parser.add_argument(
+ "--only-issue",
+ type=int,
+ action="append",
+ default=[],
+ help="Limit the issue sweep to these issue numbers (repeat for several).",
+ )
+ args = parser.parse_args()
+
+ if args.close and (
+ args.simulate_future_hours is not None or args.simulate_now is not None
+ ):
+ parser.error(
+ "--simulate-future-hours / --simulate-now are only valid in dry-run "
+ "(omit --close to preview a future clock)."
+ )
+
+ if args.close and not os.environ.get("OPENAI_API_KEY"):
+ parser.error("OPENAI_API_KEY must be set for --close (real-run) mode.")
+
+ # Resolve the script's notion of "now".
+ if args.simulate_now is not None:
+ current_time = dt.datetime.fromisoformat(
+ args.simulate_now.replace("Z", "+00:00")
+ )
+ if current_time.tzinfo is None:
+ parser.error(
+ "--simulate-now must include a timezone offset "
+ "(e.g. '2026-06-02T09:00:00Z' or '2026-06-02T09:00:00+00:00')."
+ )
+ else:
+ offset = (
+ args.simulate_future_hours
+ if args.simulate_future_hours is not None
+ else (0 if args.close else DEFAULT_SIMULATE_HOURS + 1 / 3600)
+ )
+ current_time = dt.datetime.now(dt.timezone.utc) + dt.timedelta(hours=offset)
+
+ kinds: tuple[str, ...]
+ if args.kind == "pr":
+ kinds = ("pr",)
+ elif args.kind == "issue":
+ kinds = ("issue",)
+ else:
+ kinds = ("pr", "issue")
+
+ only: dict[str, list[int]] = {}
+ if args.only_pr:
+ only["pr"] = args.only_pr
+ if args.only_issue:
+ only["issue"] = args.only_issue
+
+ run(
+ repo=args.repo,
+ close=args.close,
+ model=args.model,
+ current_time=current_time,
+ kinds=kinds,
+ only_numbers=only or None,
+ )
+ return 0
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/.github/workflows/close_low_quality_prs.yml b/.github/workflows/close_low_quality_prs.yml
--- a/.github/workflows/close_low_quality_prs.yml
+++ b/.github/workflows/close_low_quality_prs.yml
@@ -64,10 +64,10 @@
- name: Run low-quality PR closer
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- # Scheduled runs are ALWAYS dry-run, even when AGENT_SHIN_ENABLED is
- # "true", so the team can QA the closer's verdicts in step summaries
- # before any contributor sees a PR closed. Real closures only happen
- # on manual workflow_dispatch with close=true (and the variable set).
+ # Scheduled runs are ALWAYS dry-run, even after the rollout, so the
+ # team can QA the closer's verdicts in step summaries before any
+ # contributor sees a PR closed. Real closures only happen on manual
+ # workflow_dispatch with close=true (and the kill switch off).
CLOSE_FLAG: ${{ github.event.inputs.close || 'false' }}
AGENT_SHIN_ENABLED: ${{ vars.AGENT_SHIN_ENABLED }}
MIN_AGE_DAYS: ${{ github.event.inputs.min_age_days || '0' }}
@@ -81,12 +81,15 @@
--min-score "${MIN_SCORE}"
--limit "${LIMIT}"
)
- if [ "${AGENT_SHIN_ENABLED:-false}" != "true" ]; then
- echo "::notice::AGENT_SHIN_ENABLED is not 'true' -> forcing dry-run regardless of close input."
+ # Kill switch: AGENT_SHIN_ENABLED="false" forces dry-run. Default
+ # (unset / any other value) is "live", matching the post-enactment
+ # rollout state.
+ if [ "${AGENT_SHIN_ENABLED:-true}" = "false" ]; then
+ echo "::notice::Agent Shin kill switch is ON (AGENT_SHIN_ENABLED='false'). Forcing dry-run."
elif [ "${GITHUB_EVENT_NAME:-}" = "workflow_dispatch" ] && [ "${CLOSE_FLAG}" = "true" ]; then
ARGS+=(--close)
echo "::notice::Running in close-on-fail mode."
else
- echo "::notice::AGENT_SHIN_ENABLED is true but this trigger is dry-run (scheduled event or close=false)."
+ echo "::notice::Kill switch off but this trigger is dry-run (scheduled event or close=false)."
fi
python3 .github/scripts/close_low_quality_prs.py "${ARGS[@]}"
diff --git a/.github/workflows/review_gate.yml b/.github/workflows/review_gate.yml
--- a/.github/workflows/review_gate.yml
+++ b/.github/workflows/review_gate.yml
@@ -79,7 +79,9 @@
# enabled or a collaborator triggers it manually, so an external user
# can't force paid LLM calls by churning a fork PR while the bot is
# still in dry-run.
- OPENAI_API_KEY: ${{ (vars.AGENT_SHIN_ENABLED == 'true' || github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY || '' }}
+ # Kill-switch semantics: only suppress the LLM key when the variable
+ # is literally "false". Unset / any other value -> bot is live.
+ OPENAI_API_KEY: ${{ (vars.AGENT_SHIN_ENABLED != 'false' || github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY || '' }}
OPENAI_BASE_URL: ${{ vars.OPENAI_BASE_URL }}
TRIAGE_MODEL: ${{ vars.TRIAGE_MODEL }}
AGENT_SHIN_ENABLED: ${{ vars.AGENT_SHIN_ENABLED }}
@@ -92,20 +94,20 @@
set -euo pipefail
COMMON=(--review-gate --grace-days "${GRACE_DAYS}" --min-greptile-score "${MIN_GREPTILE_SCORE}")
- # Fail-safe gating, identical philosophy to the Greptile closer:
- # - AGENT_SHIN_ENABLED must be the EXACT string "true" to act at all.
- # - A manual dispatch can still preview with close=false.
- # - Automatic triggers (PR events, schedule) act once enabled — that
- # is the whole point of the gate (re-tag / un-tag automatically).
+ # Kill-switch gating:
+ # - AGENT_SHIN_ENABLED="false" (the exact string) forces dry-run.
+ # - Any other value, including unset, leaves the bot live.
+ # - A manual dispatch can still preview with close=false even
+ # when the bot is live.
DO_CLOSE="false"
- if [ "${AGENT_SHIN_ENABLED:-false}" != "true" ]; then
- echo "::notice::AGENT_SHIN_ENABLED is not 'true' -> dry-run (no labels/comments/closes)."
+ if [ "${AGENT_SHIN_ENABLED:-true}" = "false" ]; then
+ echo "::notice::Agent Shin kill switch is ON (AGENT_SHIN_ENABLED='false'). Forcing dry-run."
elif [ "${GITHUB_EVENT_NAME:-}" = "workflow_dispatch" ] && [ "${CLOSE_FLAG:-false}" = "true" ]; then
DO_CLOSE="true"
echo "::notice::Manual run -> acting for real."
elif [ "${GITHUB_EVENT_NAME:-}" != "workflow_dispatch" ]; then
DO_CLOSE="true"
- echo "::notice::Enabled automatic trigger (${GITHUB_EVENT_NAME:-}) -> acting for real."
+ echo "::notice::Automatic trigger (${GITHUB_EVENT_NAME:-}) -> acting for real (kill switch off)."
else
echo "::notice::Manual dispatch with close=false -> dry-run."
fi
diff --git a/.github/workflows/triage_issue_with_llm.yml b/.github/workflows/triage_issue_with_llm.yml
--- a/.github/workflows/triage_issue_with_llm.yml
+++ b/.github/workflows/triage_issue_with_llm.yml
@@ -2,9 +2,9 @@
# LLM-as-judge triage for external GitHub issues.
#
-# DRY-RUN BY DEFAULT. See .github/workflows/triage_pr_with_llm.yml for the
-# enablement procedure — same repo variable (`AGENT_SHIN_ENABLED=true`)
-# unlocks the PR and issue triage flows together.
+# LIVE BY DEFAULT. See .github/workflows/triage_pr_with_llm.yml for the
+# kill-switch procedure — same repo variable (`AGENT_SHIN_ENABLED="false"`)
+# forces the PR and issue triage flows back into dry-run together.
on:
issues:
@@ -15,7 +15,7 @@
description: "Issue number to triage manually."
required: true
close:
- description: "If true and AGENT_SHIN_ENABLED=true, actually close on fail."
+ description: "If true (and AGENT_SHIN_ENABLED != 'false'), actually close on fail."
required: false
default: "false"
type: choice
@@ -55,7 +55,9 @@
# The Python script calls the LLM whenever this var is set
# (regardless of `--close`); stripping `--close` doesn't suppress
# the API call, only the destructive side effects.
- OPENAI_API_KEY: ${{ (vars.AGENT_SHIN_ENABLED == 'true' || github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY || '' }}
+ # Kill-switch semantics: only suppress the LLM key when the variable
+ # is literally "false". Unset / any other value -> bot is live.
+ OPENAI_API_KEY: ${{ (vars.AGENT_SHIN_ENABLED != 'false' || github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY || '' }}
OPENAI_BASE_URL: ${{ vars.OPENAI_BASE_URL }}
TRIAGE_MODEL: ${{ vars.TRIAGE_MODEL }}
AGENT_SHIN_ENABLED: ${{ vars.AGENT_SHIN_ENABLED }}
@@ -71,13 +73,16 @@
# string, and a `!= "false"` check would treat "True", "yes",
# "1", "TRUE", typos, and accidental whitespace as enabling
# closure. Mirror the Greptile closer's `= "true"` pattern.
- if [ "${AGENT_SHIN_ENABLED:-false}" = "true" ] && [ "${DISPATCH_CLOSE:-false}" = "true" ]; then
+ # Kill switch: AGENT_SHIN_ENABLED="false" forces dry-run even when
+ # the dispatch input asks for close. Default (unset / any other
+ # value) is "live", matching the post-enactment rollout state.
+ if [ "${AGENT_SHIN_ENABLED:-true}" != "false" ] && [ "${DISPATCH_CLOSE:-false}" = "true" ]; then
ARGS+=(--close)
- echo "::notice::Agent Shin is ENABLED and running in close-on-fail mode."
- elif [ "${AGENT_SHIN_ENABLED:-false}" = "true" ]; then
- echo "::notice::Agent Shin is ENABLED but this trigger is dry-run (workflow_dispatch close != 'true')."
+ echo "::notice::Agent Shin is LIVE — running in close-on-fail mode."
+ elif [ "${AGENT_SHIN_ENABLED:-true}" != "false" ]; then
+ echo "::notice::Agent Shin is LIVE but this trigger is dry-run (workflow_dispatch close != 'true')."
else
- echo "::notice::Agent Shin is in DRY-RUN mode (AGENT_SHIN_ENABLED is not 'true'). No comments will be posted; no issues will be closed."
+ echo "::notice::Agent Shin kill switch is ON (AGENT_SHIN_ENABLED='false'). Forcing dry-run."
fi
# Automatic `issues` events stay dry-run regardless until the team
# explicitly invokes workflow_dispatch with close=true.
diff --git a/.github/workflows/triage_pr_with_llm.yml b/.github/workflows/triage_pr_with_llm.yml
--- a/.github/workflows/triage_pr_with_llm.yml
+++ b/.github/workflows/triage_pr_with_llm.yml
@@ -2,15 +2,15 @@
# LLM-as-judge triage for external pull requests.
#
-# DRY-RUN BY DEFAULT. Closures and public comments are gated on the repo
-# variable `AGENT_SHIN_ENABLED` being set to the string `"true"`. Until then,
-# every run only writes its verdict to the workflow step summary so the team
-# can QA the judge's decisions before flipping it on.
+# LIVE BY DEFAULT. Closures and public comments fire whenever the workflow is
+# triggered and the dispatch input asks for `close=true`. The repo variable
+# `AGENT_SHIN_ENABLED` acts as a kill switch — set it to the exact string
+# `"false"` (Settings > Secrets and variables > Actions > Variables) to force
+# every run back to dry-run. Any other value, including unset, leaves the bot
+# enabled.
#
-# To enable for real:
-# 1. Add a repo secret `OPENAI_API_KEY` (or compatible).
-# 2. Set repo variable `AGENT_SHIN_ENABLED` to `true`
-# (Settings > Secrets and variables > Actions > Variables).
+# Required setup:
+# - Repo secret `OPENAI_API_KEY` (or compatible) for the LLM judge.
#
# We use `pull_request_target` so the workflow has access to repo secrets
# and runs against PRs from forks. We never check out fork code — only read
@@ -25,7 +25,7 @@
description: "PR number to triage manually."
required: true
close:
- description: "If true and AGENT_SHIN_ENABLED=true, actually close on fail."
+ description: "If true (and AGENT_SHIN_ENABLED != 'false'), actually close on fail."
required: false
default: "false"
type: choice
@@ -66,7 +66,11 @@
# The Python script calls the LLM whenever this var is set
# (regardless of `--close`); stripping `--close` doesn't suppress
# the API call, only the destructive side effects.
- OPENAI_API_KEY: ${{ (vars.AGENT_SHIN_ENABLED == 'true' || github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY || '' }}
+ # Kill-switch semantics: only suppress the LLM key when the variable
+ # is literally "false". Unset / any other value -> bot is live, key
+ # is exposed. Manual dispatch always gets the key so a collaborator
+ # can force-run even with the kill switch on.
+ OPENAI_API_KEY: ${{ (vars.AGENT_SHIN_ENABLED != 'false' || github.event_name == 'workflow_dispatch') && secrets.OPENAI_API_KEY || '' }}
OPENAI_BASE_URL: ${{ vars.OPENAI_BASE_URL }}
TRIAGE_MODEL: ${{ vars.TRIAGE_MODEL }}
AGENT_SHIN_ENABLED: ${{ vars.AGENT_SHIN_ENABLED }}
@@ -82,13 +86,16 @@
# string, and a `!= "false"` check would treat "True", "yes",
# "1", "TRUE", typos, and accidental whitespace as enabling
# closure. Mirror the Greptile closer's `= "true"` pattern.
- if [ "${AGENT_SHIN_ENABLED:-false}" = "true" ] && [ "${DISPATCH_CLOSE:-false}" = "true" ]; then
+ # Kill switch: AGENT_SHIN_ENABLED="false" forces dry-run even when
+ # the dispatch input asks for close. The default (unset / any other
+ # value) is "live", matching the post-enactment rollout state.
+ if [ "${AGENT_SHIN_ENABLED:-true}" != "false" ] && [ "${DISPATCH_CLOSE:-false}" = "true" ]; then
ARGS+=(--close)
- echo "::notice::Agent Shin is ENABLED and running in close-on-fail mode."
- elif [ "${AGENT_SHIN_ENABLED:-false}" = "true" ]; then
- echo "::notice::Agent Shin is ENABLED but this trigger is dry-run (workflow_dispatch close != 'true' or scheduled event)."
+ echo "::notice::Agent Shin is LIVE — running in close-on-fail mode."
+ elif [ "${AGENT_SHIN_ENABLED:-true}" != "false" ]; then
+ echo "::notice::Agent Shin is LIVE but this trigger is dry-run (workflow_dispatch close != 'true' or scheduled event)."
else
- echo "::notice::Agent Shin is in DRY-RUN mode (AGENT_SHIN_ENABLED is not 'true'). No comments will be posted; no PRs will be closed."
+ echo "::notice::Agent Shin kill switch is ON (AGENT_SHIN_ENABLED='false'). Forcing dry-run."
fi
# On the scheduled/automatic pull_request_target trigger we default to
# dry-run regardless, so the team can review verdicts in the step
diff --git a/.github/workflows/triage_reconsider.yml b/.github/workflows/triage_reconsider.yml
--- a/.github/workflows/triage_reconsider.yml
+++ b/.github/workflows/triage_reconsider.yml
@@ -15,11 +15,11 @@
# (which loses the original PR's history). The bot, on the other hand,
# has write access via GH_TOKEN and can reopen on their behalf.
#
-# DRY-RUN BY DEFAULT — gated on `vars.AGENT_SHIN_ENABLED == 'true'` just
-# like the other Agent Shin workflows. The workflow also gates on the
-# commenter being either the PR/issue author or an internal collaborator
-# (OWNER/MEMBER/COLLABORATOR) so random commenters cannot DOS the LLM
-# judge or force a reopen.
+# LIVE BY DEFAULT — disabled only when `vars.AGENT_SHIN_ENABLED == 'false'`
+# (the kill switch shared with the other Agent Shin workflows). The workflow
+# also gates on the commenter being either the PR/issue author or an internal
+# collaborator (OWNER/MEMBER/COLLABORATOR) so random commenters cannot DOS the
+# LLM judge or force a reopen.
on:
issue_comment:
@@ -108,23 +108,20 @@
ARGS=(--repo "${{ github.repository }}" --issue "${NUMBER}" --reconsider)
fi
# Reconsider's destructive actions (post comment + reopen) are
- # gated on `--close`, mirroring the regular triage workflows.
- # When AGENT_SHIN_ENABLED is not the EXACT string "true", we
- # still run the script so its verdict + would-X action lands in
- # the step summary for QA — but without `--close`, the script
- # returns `would-reopen` / `would-reconsider-still-failing`
- # instead of touching GitHub state.
+ # gated on `--close`. The kill switch is the shared
+ # AGENT_SHIN_ENABLED variable: setting it to the literal string
+ # "false" forces reconsider back to dry-run (the script still
+ # runs so the would-X verdict lands in the step summary for QA,
+ # but without `--close` no GitHub state changes).
#
- # Use the positive `= "true"` gate (not `!= "true" -> exit`) so
- # the workflow guardrails in
- # tests/test_litellm/test_github_triage_workflows.py see the
- # canonical fail-safe enable pattern. Unknown values like
- # "True", "yes", "1", or typos fall through to the dry-run
- # branch, which is the safe default.
- if [ "${AGENT_SHIN_ENABLED:-false}" = "true" ]; then
+ # Negative `!= "false"` against the `:-true` default keeps the
+ # kill-switch semantics symmetric with the other triage
+ # workflows — unknown values (typos, "True", "1") leave the bot
+ # live, matching the post-enactment live-by-default policy.
+ if [ "${AGENT_SHIN_ENABLED:-true}" != "false" ]; then
ARGS+=(--close)
- echo "::notice::Agent Shin reconsider ENABLED — running real triage (close=true)."
+ echo "::notice::Agent Shin reconsider is LIVE — running real triage (close=true)."
else
- echo "::notice::AGENT_SHIN_ENABLED is not 'true' -> reconsider stays in dry-run (no comment, no reopen)."
+ echo "::notice::Agent Shin kill switch is ON (AGENT_SHIN_ENABLED='false'). Reconsider stays in dry-run."
fi
python3 .github/scripts/triage_with_llm.py "${ARGS[@]}"
diff --git a/.github/workflows/triage_rollout_enact.yml b/.github/workflows/triage_rollout_enact.yml
new file mode 100644
--- /dev/null
+++ b/.github/workflows/triage_rollout_enact.yml
@@ -1,0 +1,84 @@
+name: Agent Shin — rollout enactment (one-shot)
+
+# Fires the day-7 enactment sweep: closes any open external PR/issue that's
+# still failing the rubric 7 days after the heads-up (because the contributor
+# didn't update the description), and applies steady-state actions
+# (ready-for-review tag, in-grace warnings) to everything else. From the
+# merge of this workflow onward, the existing daily/cron triage workflows
+# go live for real (the AGENT_SHIN_ENABLED gates are removed by the same PR).
+#
+# Thin shell over `.github/scripts/triage_rollout_enact.py`. The dry-run
+# preview path is the SAME code with --close stripped, so a local preview
+# (with --simulate-future-hours 24 to peek at tomorrow's cron run) is a
+# high-fidelity preview of what the workflow will actually do.
... diff truncated: showing 800 of 1362 linesYou can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 164dbb7. Configure here.
Co-authored-by: Yassin Kortam <yassin@berri.ai>

Merge this 7 days after #28759. This is the second step of the Agent Shin rollout: it turns the bot on for real.
What happens on merge
What this PR contains
Time-travel dry-run
This is the part the brief specifically asked for. The script's notion of "now" is a single `current_time` variable computed at the top of `run()`. In real mode it's wall-clock; in dry-run it defaults to `now + 24h + 1s` so you preview exactly what the next daily cron will do.
```bash
What this script would do right now (no GitHub writes):
python3 .github/scripts/triage_rollout_enact.py --repo BerriAI/litellm
What the next daily cron will do (24h+1s in the future):
python3 .github/scripts/triage_rollout_enact.py --repo BerriAI/litellm \
--simulate-future-hours 24
Pin to a specific moment for reproducibility:
python3 .github/scripts/triage_rollout_enact.py --repo BerriAI/litellm \
--simulate-now '2026-06-02T09:00:00Z'
```
Time-travel is implemented in exactly two places (so it's easy to audit):
Dry-run vs. real run: where the if/else lives
Every GitHub mutation goes through _agent_shin_actions (shipped in #28759). Each `maybe_*` helper is:
```python
def maybe_close_pr(repo, number, *, dry_run: bool) -> None:
if dry_run:
_log(f"[DRY RUN] close PR {repo}#{number}")
return
triage_with_llm.close_pr(repo, number)
```
So a dry-run preview differs from the real run in exactly one line per side effect. The workflow drops `--close` for dry-run and adds it for real; nothing else changes.
Test plan
Reviewer notes / kill-switch usage
If after merge you need to roll back: set `AGENT_SHIN_ENABLED` to the exact string `"false"` in repo Settings > Secrets and variables > Actions > Variables. The scheduled workflows (and any manual dispatch) will fall back to dry-run on the next run.
🤖 Generated with Claude Code
Note
High Risk
Merge enables real auto-comments, label changes, and PR/issue closures across open external items; incorrect gating or enactment logic could mass-close contributor work, though dry-run preview and
AGENT_SHIN_ENABLED=falseprovide rollback paths.Overview
This PR completes the Agent Shin rollout by running a one-shot day-7 enactment sweep and flipping automation from opt-in to live-by-default.
A new script
triage_rollout_enact.py(plustriage_rollout_enact.yml) walks open external PRs and issues, evaluates them via existingreview_gate/triage(preview only), then applies comments, labels, warnings, and closures throughmaybe_*helpers.--closeperforms real GitHub writes; dry-run defaults to simulating +24h so you can preview the next cron. Issue grace timing uses a scoped_fake_nowpatch; PRs passnow=intoreview_gate.AGENT_SHIN_ENABLEDis inverted everywhere it gates behavior: unset or any value except the literal"false"means the bot is live (includingreview_gateautomatic triggers and reconsider). Setting"false"is the kill switch back to dry-run.OPENAI_API_KEYexposure in workflows follows the same rule. Per-runcloseinputs still require= "true"where applicable.Workflow guardrail tests are updated (
PER_RUN_GATE_ENVvsKILL_SWITCH_WORKFLOWS) andtest_triage_rollout_enact.pycovers dispatch, time travel, and sweep behavior.Reviewed by Cursor Bugbot for commit 5497394. Bugbot is set up for automated code reviews on this repo. Configure here.