Improve Add env validation, error retries, and WS security for better reliability

Been messing with a fresh clone on a Pixel 6 (A14) via Tailscale. The core logic is solid but found a few spots where the agent just bricks or hangs in prod.

Quick braindump of what needs hardening (all low LoC, just missing safety)

Zod the envs: Right now GROQ_API_KEY missing just throws a generic TypeError in llm-providers.ts. We already have zod as a dep should just parse process.env in config.ts so it fails fast at startup instead of debugging silent failures.

ADB/LLM flakes: kernel.ts assumes the loop is always happy. If ADB jitters or Groq hits a 429, the whole thing stalls. Need to wrap execAdb in a basic try/catch with exponential backoff. If it hits STUCK_THRESHOLD, we should just retry the last action.

WS is wide open: The Hono server has no auth on the /ws endpoint. Pretty easy to inject goals or DoS it. Need a quick JWT/API key middleware to lock that down.

Provider fallback: If Groq is down, the vision fallback to Ollama currently stalls. Adding a quick healthCheck() on init to cycle Groq > Ollama > OpenRouter would save a lot of "goal_failed" logs.

Concurrency: Need p-limit in actions.ts to cap parallel ADB calls at ~5. Easy to overload the device in multi-step workflows.

Repro: Running "scroll Twitter" on Bun 1.1.3. Stalls hard if the LLM rate limits or the Tailscale handshake is slow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Add env validation, error retries, and WS security for better reliability #4

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Improve Add env validation, error retries, and WS security for better reliability #4

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions