Skip to content

Improve Add env validation, error retries, and WS security for better reliability #4

Description

@thenotespublisher

Been messing with a fresh clone on a Pixel 6 (A14) via Tailscale. The core logic is solid but found a few spots where the agent just bricks or hangs in prod.

Quick braindump of what needs hardening (all low LoC, just missing safety)

Zod the envs: Right now GROQ_API_KEY missing just throws a generic TypeError in llm-providers.ts. We already have zod as a dep should just parse process.env in config.ts so it fails fast at startup instead of debugging silent failures.

ADB/LLM flakes: kernel.ts assumes the loop is always happy. If ADB jitters or Groq hits a 429, the whole thing stalls. Need to wrap execAdb in a basic try/catch with exponential backoff. If it hits STUCK_THRESHOLD, we should just retry the last action.

WS is wide open: The Hono server has no auth on the /ws endpoint. Pretty easy to inject goals or DoS it. Need a quick JWT/API key middleware to lock that down.

Provider fallback: If Groq is down, the vision fallback to Ollama currently stalls. Adding a quick healthCheck() on init to cycle Groq > Ollama > OpenRouter would save a lot of "goal_failed" logs.

Concurrency: Need p-limit in actions.ts to cap parallel ADB calls at ~5. Easy to overload the device in multi-step workflows.

Repro: Running "scroll Twitter" on Bun 1.1.3. Stalls hard if the LLM rate limits or the Tailscale handshake is slow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions