Been messing with a fresh clone on a Pixel 6 (A14) via Tailscale. The core logic is solid but found a few spots where the agent just bricks or hangs in prod.
Quick braindump of what needs hardening (all low LoC, just missing safety)
Zod the envs: Right now GROQ_API_KEY missing just throws a generic TypeError in llm-providers.ts. We already have zod as a dep should just parse process.env in config.ts so it fails fast at startup instead of debugging silent failures.
ADB/LLM flakes: kernel.ts assumes the loop is always happy. If ADB jitters or Groq hits a 429, the whole thing stalls. Need to wrap execAdb in a basic try/catch with exponential backoff. If it hits STUCK_THRESHOLD, we should just retry the last action.
WS is wide open: The Hono server has no auth on the /ws endpoint. Pretty easy to inject goals or DoS it. Need a quick JWT/API key middleware to lock that down.
Provider fallback: If Groq is down, the vision fallback to Ollama currently stalls. Adding a quick healthCheck() on init to cycle Groq > Ollama > OpenRouter would save a lot of "goal_failed" logs.
Concurrency: Need p-limit in actions.ts to cap parallel ADB calls at ~5. Easy to overload the device in multi-step workflows.
Repro: Running "scroll Twitter" on Bun 1.1.3. Stalls hard if the LLM rate limits or the Tailscale handshake is slow.
Been messing with a fresh clone on a Pixel 6 (A14) via Tailscale. The core logic is solid but found a few spots where the agent just bricks or hangs in prod.
Quick braindump of what needs hardening (all low LoC, just missing safety)
Zod the envs: Right now GROQ_API_KEY missing just throws a generic TypeError in llm-providers.ts. We already have zod as a dep should just parse process.env in config.ts so it fails fast at startup instead of debugging silent failures.
ADB/LLM flakes: kernel.ts assumes the loop is always happy. If ADB jitters or Groq hits a 429, the whole thing stalls. Need to wrap execAdb in a basic try/catch with exponential backoff. If it hits STUCK_THRESHOLD, we should just retry the last action.
WS is wide open: The Hono server has no auth on the /ws endpoint. Pretty easy to inject goals or DoS it. Need a quick JWT/API key middleware to lock that down.
Provider fallback: If Groq is down, the vision fallback to Ollama currently stalls. Adding a quick healthCheck() on init to cycle Groq > Ollama > OpenRouter would save a lot of "goal_failed" logs.
Concurrency: Need p-limit in actions.ts to cap parallel ADB calls at ~5. Easy to overload the device in multi-step workflows.
Repro: Running "scroll Twitter" on Bun 1.1.3. Stalls hard if the LLM rate limits or the Tailscale handshake is slow.