feat(daemon): gate direct session shell behind explicit opt-in#5031
Conversation
E2E / Verification ReportLocal verification was run on macOS for the direct session shell permission-policy follow-up. Commands run: cd packages/acp-bridge && npx vitest run src/bridge.test.ts
cd packages/cli && npx vitest run src/serve/server.test.ts src/serve/acpHttp/transport.test.ts src/commands/serve.test.ts
npm run build
npm run typecheck
git diff --checkResults:
Manual audit coverage:
|
wenshao
left a comment
There was a problem hiding this comment.
All three R1 findings addressed in this push:
- Critical
server.test.tsTS2367 —session_shell_commandadded toEXPECTED_REGISTERED_FEATURESwith full conditional predicate coverage. Fixed. - Suggestion
dispatch.tserror code —toRpcErrornow maps shell policy errors toRPC.INVALID_PARAMS(-32602), consistent with JSON-RPC 2.0 semantics. All test assertions updated. Fixed. - Suggestion
dispatch.ts:1131abort signal —executeShellCommandnow forwardsbinding.abort.signalfrom the owned session binding. Test asserts the signal is wired and not pre-aborted. Fixed.
Deterministic analysis: tsc 0 errors (after build), eslint 0 errors. Tests: 512/512 passed (transport.test.ts + server.test.ts).
Downgraded from Approve to Comment: CI still running.
— qwen3.7-max via Qwen Code /review
Local runtime verification on Linux (real daemon over HTTP) — PASS ✅I built this branch ( Setup: Linux, fresh Policy matrix (all observed at runtime)
State A — default tokenless loopbackState B — authenticated, opt-in NOT setState C — authenticated +
|
| 状态 | capabilities.features 含 session_shell_command |
ACP initialize 广告 _qwen/session/shell |
POST /session/:id/shell |
|---|---|---|---|
| A — 无 token、无 flag | ❌(59 项) | ❌(40 个方法) | 401 token_required(strict 门先触发) |
| B — 有 token、无 flag | ❌(59) | ❌(40) | 403 session_shell_disabled —— 不进入 bridge |
C — token + --enable-session-shell |
✅(60) | ✅(41) | 见下方门控链 → 执行 |
| D — 有 flag 但无 token | ❌(59) | ❌(40) | 401 token_required(无认证则 opt-in 无效) |
基线 main — 无 flag |
❌ | 不适用 | 200,无门控直接执行 ← 本 PR 封堵的缺口 |
状态 A — 默认无 token loopback
features 含 session_shell_command: False | 共 59
ACP _qwen/session/shell 广告: False | 共 40 个方法
POST /session/<id>/shell(无 token)→ 401 {"code":"token_required"}
状态 B — 已认证、未开启 opt-in
features 含 session_shell_command: False ACP _qwen/session/shell:未广告
POST shell(token、无 client-id) → 403 {"code":"session_shell_disabled","errorKind":"session_shell_disabled"}
POST shell(token、携带已绑定 client-id) → 403 session_shell_disabled ← disabled 优先于 ownership(顺序正确)
守护进程日志:2 次 `status=403 request completed`,0 次 `shell command completed` ← bridge 入口从未到达
状态 C — 已认证 + --enable-session-shell(ownership 链)
features 含 session_shell_command: True(60) ACP _qwen/session/shell:已广告(41 个方法)
POST shell,无 X-Qwen-Client-Id → 403 {"code":"client_id_required"}
POST shell,格式非法 client-id "bad id…" → 400 {"code":"invalid_client_id"}(格式门)
POST shell,格式合法但未绑定 client-id → 400 invalid_client_id "…is not registered for session f20fd677…"
POST shell,来自另一守护进程的真实 client-id → 400 invalid_client_id ← 跨 session 隔离
POST shell,已绑定 client-id,空命令 → 400 "`command` is required and must be a non-empty string"
POST shell,已绑定 client-id,真实命令 → 200 {"exitCode":0,"output":"gate-open\n","aborted":false}
守护进程日志:`… clientId=client_6a95f11f… exitCode=0 shell command completed` ← bridge 执行
🔍 ownership 校验可证明是按 session 范围的,而非"任意持 token 者":resolveTrustedClientId 对被寻址的那个 session 做 entry.clientIds.has(clientId)。为在运行时证明这点(单一 --workspace 守护进程会把每次 POST /session 都 attach 到同一个共享 session,因此无法在一个注册表里凑出两个 session),我在第二个工作区另起了一个开启该能力的守护进程,取它真实下发的 client-id,拿去访问第一个守护进程的 session → 400 invalid_client_id;而该守护进程自己绑定的 id 则返回 200。一个合法、由服务端下发、但属于别处的 client-id,无法解锁未绑定它的 session。
计划之外的探测
- 🔍 对已开启的 shell 路由用错误 bearer token → 401
Unauthorized(认证门先于 shell 策略——纵深防御)。 - 🔍
--enable-session-shell但无--token(状态 D)→ 能力保持隐藏,REST 仍返回401 token_required。证实enableSessionShell === true && token !== undefined的有效策略折叠:仅 opt-in 无效。 - 🔍 基线
main直接拒绝该 flag:Unknown arguments: enable-session-shell, enableSessionShell(yargs 严格模式)——该 flag 确为 PR 新增。 - 🔍 格式非法与未绑定是两条不同路径:格式违规 → REST
parseClientIdHeader400;格式合法但未注册 → bridgeInvalidClientIdError400 并指明 session。二者都呈现invalid_client_id。
观察
- ✅ 三层防御均按设计工作。我完整驱动了 REST 路由(每个错误分支 + 成功路径)以及四种状态下的 ACP
initialize广告。我没有端到端驱动 ACP dispatch 路径(_qwen/session/shellJSON-RPC 调用 →session_shell_disabled/client_id_required)——那需要完整的 ACP 连接+ownership 的 SSE 握手——但它与我经 REST 驱动的 bridge 入口共用同一处,且本 PR 的 transport 单测已覆盖。如实记录,非缺口。 - 基线对比让动机具体化:在
main上,无 token 的 loopback 守护进程仅凭 session id(无 client id)即可执行POST /session/<id>/shell(200 … "MAIN-EXECUTES-UNGATED\n")。这正是本 PR 移除的过度权限。 - 环境说明(非 bug):单
--workspace守护进程会把所有POST /session(包括{"forceNew":true})attach 到同一个共享 session,每次只新铸一个 client-id,全部绑定到那一个 session。因此"同一守护进程两个 session"的跨 ownership 测试无法从单个守护进程触达;上面的第二守护进程方案是正确的验证方式。 - 错误契约与设计文档完全一致:REST
401 token_required/403 session_shell_disabled/403 client_id_required/400 invalid_client_id,两个新增 403 的code与errorKind均填充。
🧪 Local runtime verification (built
|
| Surface | 4601 no-token | 4602 token, no flag | 4603 token + flag |
|---|---|---|---|
/capabilities.features has session_shell_command |
❌ absent | ❌ absent | ✅ present |
ACP initialize methods has _qwen/session/shell |
❌ (40 methods) | ❌ (40 methods) | ✅ (41 methods) |
Both gated strictly on token-configured AND flag-set — exactly enableSessionShell === true && tokenConfigured.
Enforcement layer (the part that actually matters)
- ✅ No-token daemon →
401 token_required.POST /4601/session/<id>/shell→401 {code:"token_required"}. 🔍 Even with a bogusAuthorization: Bearer anythingit's still 401 — the strict mutation gate keys on the daemon having a token configured, not on the request carrying one. (This route went frommutate()tomutate({strict:true}), so no-token loopback shell is now refused — the intended hardening, and a real behavior change for that setup.) - ✅ Token, no flag →
403 session_shell_disabled, before the bridge.POST /4602/...shellwith a valid bearer →403 {code:"session_shell_disabled"}. I sent a non-existent session id and gotsession_shell_disabled, notsession_not_found— proving the disabled check fires before session lookup / bridge dispatch. - ✅ Token + flag, no
X-Qwen-Client-Id→403 client_id_required. - ✅ Token + flag, session-bound client id → executes. Created a session (daemon-assigned clientId), then
POST /4603/.../shellwith that id →200 {"exitCode":0,"output":"GATE-EXEC-OK-7777\n"}. Real command, real bridge. - ✅ Token + flag, unbound client id →
400 invalid_client_id.X-Qwen-Client-Id: cid.unbound999→Client id "cid.unbound999" is not registered for session <id>— the bridge's ownership proof (InvalidClientIdError) rejects a caller who authenticated to the daemon but never owned the session. This is the core of the PR's "session ownership proof." - ✅ ACP execution path works. Over a live ACP connection on 4603 (init → SSE →
session/new→_qwen/session/shell) →{"exitCode":0,"output":"ACP-EXEC-OK-5555\n"}. - ✅ 🔍 Adversarial: unadvertised method on the disabled daemon is still rejected. A client that ignores the advertise list and calls
_qwen/session/shellon 4602 (where it's hidden) gets{code:-32602, errorKind:"session_shell_disabled"}. Advertise-hiding is defense-in-depth; the real gate is in the handler. This is the most important result — the security doesn't depend on clients respecting the capability list. - ✅ Boot warning. Launching
--enable-session-shellwithout a token (4604) emitted exactly:qwen serve: --enable-session-shell ignored because no bearer token is configured. Set QWEN_SERVER_TOKEN or pass --token…and the capability stayed off.
Probes (held)
- 🔍 Empty command (
" ") with a bound client →400 \command` is required and must be a non-empty string`. - 🔍 Malformed client id (
bad id!!#) →400(fails the^[A-Za-z0-9._:-]+$validation before any dispatch).
Findings
⚠️ The session-bound client id is daemon-assigned, not caller-chosen. I sentX-Qwen-Client-Id: cid.alphaon session-create and the response came back bound to a generatedclient_b24c…instead. So a raw REST caller must readclientIdfrom the create/load response and echo it back on the shell call — they cannot predict or pin it. The SDK'sDaemonSessionClient.shellCommand()forwards it automatically (per the PR), so SDK users are fine; this is only a sharp edge for someone wiring the REST route by hand, and it'd be worth one line in the SDK docs' REST example.- The "without reaching the bridge" claim (scenario B) is solidly true and I verified it behaviorally (disabled-error for a non-existent session, not not-found) rather than relying on logs — note the bridge's
executeShellCommanddebug line is debug-level and wouldn't appear on stdout anyway, so don't use log-absence as the proof. - Env/scope: branch is ~38 commits behind
main; a rebase before merge is advisable. ACP responses ride the connection SSE stream, and repeated barePOST /acpinits accumulate connections — a harness detail (I restarted daemons between ACP captures), not a product issue.
Verdict (merge reference)
PASS. Every layer the PR claims to enforce holds at runtime: the capability and ACP method are advertised only under token+flag; REST returns token_required / session_shell_disabled / client_id_required in the right order and before the bridge; bound client ids execute and unbound ids are rejected by the bridge ownership check; the ACP path enforces identically; and — the part I most wanted to break — a client bypassing the advertise list still hits a handler-level session_shell_disabled. The boot warning fires. Nothing here blocks merging; the daemon-assigned-clientId note is a docs nicety, not a defect.
🇨🇳 中文版(点击展开)
🧪 本地运行时验证(tmux 中构建运行 qwen serve 守护进程 + curl,真实 REST + ACP)— ✅ 通过(完整三层门控确认,含一次对抗性绕过尝试)
构建 PR head(0daa475,3 个 commit,落后 main 约 38 个),在 tmux 中用 dist/cli.js 驱动四种真实守护进程配置,在该策略涉及的每个表面上施压——REST、ACP JSON-RPC(经真实 SSE 连接)、capability 注册表、以及 bridge 所有权校验。下面每一行都是对运行中守护进程的真实请求,而非单测。
配置: 4601 无 token loopback · 4602 仅 --token(无 flag)· 4603 --token --enable-session-shell · 4604 有 --enable-session-shell 但无 token。临时工作区,独立 tmux socket。
广告层
| 表面 | 4601 无 token | 4602 有 token 无 flag | 4603 token + flag |
|---|---|---|---|
/capabilities.features 含 session_shell_command |
❌ 无 | ❌ 无 | ✅ 有 |
ACP initialize methods 含 _qwen/session/shell |
❌(40 个) | ❌(40 个) | ✅(41 个) |
两者都严格门控在 配置了 token 且 设置了 flag——正是 enableSessionShell === true && tokenConfigured。
强制层(真正要紧的部分)
- ✅ 无 token 守护进程 →
401 token_required。POST /4601/session/<id>/shell→401 {code:"token_required"}。🔍 即便带上伪造的Authorization: Bearer anything仍是 401——strict mutation gate 看的是守护进程是否配置了 token,而非请求是否携带。(该路由从mutate()改为mutate({strict:true}),所以无 token loopback 的 shell 现在被拒——这是有意的加固,也是该场景下的真实行为变化。) - ✅ 有 token 无 flag →
403 session_shell_disabled,且未到 bridge。 用有效 bearerPOST /4602/...shell→403 {code:"session_shell_disabled"}。我发了一个不存在的 session id,得到的是session_shell_disabled而非session_not_found——证明 disabled 检查在 session 查找 / bridge 派发之前就触发。 - ✅ token + flag,无
X-Qwen-Client-Id→403 client_id_required。 - ✅ token + flag,session 绑定的 client id → 执行。 创建一个 session(守护进程分配 clientId),再用该 id
POST /4603/.../shell→200 {"exitCode":0,"output":"GATE-EXEC-OK-7777\n"}。真实命令、真实 bridge。 - ✅ token + flag,未绑定 client id →
400 invalid_client_id。X-Qwen-Client-Id: cid.unbound999→Client id "cid.unbound999" is not registered for session <id>——bridge 的所有权证明(InvalidClientIdError)拒绝了一个已向守护进程认证但从未拥有该 session 的调用方。这是本 PR "session ownership proof" 的核心。 - ✅ ACP 执行路径可用。 在 4603 上经真实 ACP 连接(init → SSE →
session/new→_qwen/session/shell)→{"exitCode":0,"output":"ACP-EXEC-OK-5555\n"}。 - ✅ 🔍 对抗性:在禁用守护进程上调用未广告的方法仍被拒。 一个无视广告列表、对 4602(该方法被隐藏)调用
_qwen/session/shell的客户端,得到{code:-32602, errorKind:"session_shell_disabled"}。**隐藏广告是纵深防御;真正的门在 handler 里。**这是最重要的结果——安全性不依赖客户端尊重 capability 列表。 - ✅ 启动警告。 无 token 启动
--enable-session-shell(4604)精确打印:qwen serve: --enable-session-shell ignored because no bearer token is configured. Set QWEN_SERVER_TOKEN or pass --token…,且能力保持关闭。
探针(守住)
- 🔍 绑定 client 下空命令(
" ")→400 \command` is required and must be a non-empty string`。 - 🔍 非法 client id(
bad id!!#)→400(在任何派发前未过^[A-Za-z0-9._:-]+$校验)。
发现
⚠️ session 绑定的 client id 由守护进程分配,而非调用方自选。 我在创建 session 时发了X-Qwen-Client-Id: cid.alpha,响应却绑定到一个生成的client_b24c…。所以裸 REST 调用方必须从 create/load 响应里读出clientId并在 shell 调用时回传——无法预测或钉死它。SDK 的DaemonSessionClient.shellCommand()会自动转发(如 PR 所述),故 SDK 用户无碍;这只是手工接 REST 路由者的一处尖角,值得在 SDK 文档的 REST 示例里加一行。- "未到 bridge"(场景 B)的声明确实成立,我用行为验证(对不存在 session 返回 disabled 而非 not-found)而非依赖日志——注意 bridge 的
executeShellCommand调试行是 debug 级、本就不会出现在 stdout,所以别拿"日志缺失"当证据。 - 环境/范围:分支落后
main约 38 个 commit,合并前建议 rebase。ACP 响应走连接 SSE 流,重复裸POST /acpinit 会累积连接——这是 harness 细节(我在 ACP 抓取间重启了守护进程),非产品问题。
结论(合并参考)
**通过。**PR 声称强制的每一层在运行时都成立:capability 与 ACP 方法仅在 token+flag 下广告;REST 按正确顺序、在 bridge 之前返回 token_required / session_shell_disabled / client_id_required;绑定 client id 可执行、未绑定被 bridge 所有权校验拒绝;ACP 路径同样强制;而且——我最想破坏的部分——绕过广告列表的客户端仍命中 handler 级的 session_shell_disabled。启动警告也触发。没有阻碍合并的问题;守护进程分配 clientId 那点是文档优化,非缺陷。
* feat(cli): gate direct session shell execution * fix(cli): address session shell review feedback * codex: address PR review feedback (#5031)
What this PR does
This PR turns direct session shell execution into an explicit daemon opt-in.
qwen serve --enable-session-shellis now required, and the effective policy is only enabled when a bearer token is configured. REST, ACP_qwen/session/shell, and the bridge execution sink all enforce the same policy.It also requires direct shell callers to provide a client id already bound to the target session. ACP calls use the bridge-stamped client id from the owned session binding, and SDK docs now call out the opt-in, bearer auth, and session-bound client id requirements.
For direct
createServeApp()embedders,token: ''is treated as tokenless rather than configured. Strict mutation routes returntoken_required, and the session shell capability stays hidden until a non-empty token is supplied.Passing
--enable-session-shellwithout a token now emits a boot warning and leaves direct session shell disabled.Why it's needed
POST /session/:id/shellbypasses the normal agent tool approval flow and executes directly through the daemon. Leaving that surface reachable by default, or reachable with only a daemon token plus a session id, gives too much authority to a high-risk endpoint. This PR makes the capability disabled by default and adds a session ownership proof before any command can execute.Reviewer Test Plan
How to verify
Start a default loopback daemon without a bearer token and confirm
/capabilities.featuresdoes not includesession_shell_command, ACP initialize does not advertise_qwen/session/shell, and REST shell returns401 token_required.Start an authenticated daemon without
--enable-session-shelland confirm authenticated direct shell calls returnsession_shell_disabledwithout reaching the bridge.Start an authenticated daemon with
--enable-session-shelland confirm shell calls withoutX-Qwen-Client-Idreturnclient_id_required, while calls with a client id bound to the session reach the bridge and execute normally.For direct embedders, call
createServeApp({ token: '' })and confirm it behaves as tokenless: strict shell mutation returnstoken_required, andsession_shell_commandis not advertised.Evidence (Before & After)
N/A for UI. Local automated verification covered the REST, ACP, bridge, CLI argument, build, and typecheck paths.
Tested on
Environment (optional)
Local Node/npm workspace in the repository worktree.
Risk & Scope
DaemonClient.shellCommand(sessionId, command)calls must passopts.clientIdwhen the daemon has direct session shell enabled;DaemonSessionClient.shellCommand()continues to forward the session-bound client id automatically. Direct embedders that pass an empty string token should pass a non-empty bearer token or omittoken; an empty string is not treated as a configured token.Linked Issues
References #4490.
中文说明
What this PR does
这个 PR 把 direct session shell 执行改成 daemon 显式 opt-in 能力。现在必须传
qwen serve --enable-session-shell,并且只有配置了 bearer token 时 effective policy 才会开启。REST、ACP_qwen/session/shell和 bridge 执行入口都会执行同一套策略。它还要求 direct shell 调用方提供已经绑定到目标 session 的 client id。ACP 调用会使用 owned session binding 里由 bridge stamp 的 client id,SDK 注释也补充了 opt-in、bearer auth 和 session-bound client id 的要求。
对直接使用
createServeApp()的 embedder,token: ''会被视为没有配置 token。strict mutation route 会返回token_required,并且只有提供非空 token 后才会暴露session_shell_commandcapability。传了
--enable-session-shell但没有配置 token 时,现在启动阶段会打印 warning,并保持 direct session shell disabled。Why it's needed
POST /session/:id/shell会绕过普通 agent tool approval flow,直接通过 daemon 执行命令。默认可达,或者只凭 daemon token 加 session id 可达,对高风险入口来说权限过大。这个 PR 默认禁用该能力,并在执行任何命令前增加 session ownership proof。Reviewer Test Plan
How to verify
启动一个没有 bearer token 的默认 loopback daemon,确认
/capabilities.features不包含session_shell_command,ACP initialize 不广告_qwen/session/shell,REST shell 返回401 token_required。启动一个带认证但没有
--enable-session-shell的 daemon,确认认证后的 direct shell 调用返回session_shell_disabled,并且不会进入 bridge。启动一个带认证且开启
--enable-session-shell的 daemon,确认缺少X-Qwen-Client-Id的 shell 调用返回client_id_required,而携带绑定到该 session 的 client id 时会进入 bridge 并正常执行。对直接 embed 的路径,调用
createServeApp({ token: '' }),确认它按未配置 token 处理:strict shell mutation 返回token_required,且不广告session_shell_command。Evidence (Before & After)
非 UI 变更,本地自动化验证覆盖了 REST、ACP、bridge、CLI 参数、build 和 typecheck 路径。
Tested on
Environment (optional)
本地 Node/npm 仓库 worktree。
Risk & Scope
DaemonClient.shellCommand(sessionId, command)在 daemon 开启 direct session shell 后必须传opts.clientId;DaemonSessionClient.shellCommand()会继续自动转发 session-bound client id。直接 embedder 如果传了空字符串 token,应改为传非空 bearer token 或省略token;空字符串不会被视为已配置 token。Linked Issues
参考 #4490。