|
| 1 | +# Kernel Python Sample App - Moondream Computer Use |
| 2 | + |
| 3 | +This Kernel app runs a lightweight computer-use agent powered by Moondream vision models, Groq fast LLM orchestration. |
| 4 | + |
| 5 | +## Setup |
| 6 | + |
| 7 | +1. Get your API keys: |
| 8 | + - **Moondream**: [moondream.ai](https://moondream.ai) |
| 9 | + - **Groq**: [console.groq.com](https://console.groq.com) |
| 10 | + |
| 11 | +2. Deploy the app: |
| 12 | +```bash |
| 13 | +kernel login |
| 14 | +cp .env.example .env # Add your MOONDREAM_API_KEY and GROQ_API_KEY |
| 15 | +kernel deploy main.py --env-file .env |
| 16 | +``` |
| 17 | + |
| 18 | +## Usage |
| 19 | + |
| 20 | +Natural-language query (Groq LLM orchestrates Moondream + Kernel): |
| 21 | +```bash |
| 22 | +kernel invoke python-moondream-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}' |
| 23 | +``` |
| 24 | + |
| 25 | +Structured steps (optional fallback for deterministic automation): |
| 26 | +```bash |
| 27 | +kernel invoke python-moondream-cua cua-task --payload '{ |
| 28 | + "steps": [ |
| 29 | + {"action": "navigate", "url": "https://example.com"}, |
| 30 | + {"action": "caption"}, |
| 31 | + {"action": "click", "target": "More information link", "retries": 4}, |
| 32 | + {"action": "type", "target": "Search input", "text": "kernel", "press_enter": true} |
| 33 | + ] |
| 34 | +}' |
| 35 | +``` |
| 36 | + |
| 37 | +## Step Actions |
| 38 | + |
| 39 | +Each step is a JSON object with an `action` field. Supported actions: |
| 40 | + |
| 41 | +- `navigate`: `{ "url": "https://..." }` |
| 42 | +- `click`: `{ "target": "Button label or description" }` |
| 43 | +- `type`: `{ "target": "Input field description", "text": "...", "press_enter": false }` |
| 44 | +- `scroll`: `{ "direction": "down" }` or `{ "x": 0.5, "y": 0.5, "direction": "down" }` |
| 45 | +- `query`: `{ "question": "Is there a login button?" }` |
| 46 | +- `caption`: `{ "length": "short" | "normal" | "long" }` |
| 47 | +- `wait`: `{ "seconds": 2.5 }` |
| 48 | +- `key`: `{ "keys": "ctrl+l" }` |
| 49 | +- `go_back`, `go_forward`, `search`, `open_web_browser` |
| 50 | + |
| 51 | +Optional step fields: |
| 52 | +- `retries`: override retry attempts for point/click/type |
| 53 | +- `retry_delay_ms`: wait between retries |
| 54 | +- `x`, `y`: normalized (0-1) or pixel coordinates to bypass Moondream pointing (pixel coords use detected screenshot size) |
| 55 | + |
| 56 | +## Replay Recording |
| 57 | + |
| 58 | +Add `"record_replay": true` to the payload to capture a video replay (paid Kernel plans only). |
| 59 | + |
| 60 | +## Notes |
| 61 | + |
| 62 | +- The agent uses Moondream for visual reasoning and pointing. |
| 63 | +- Kernel screenshots are PNG; Moondream queries are sent as base64 data URLs. |
| 64 | +- The Groq LLM must output JSON actions; the agent repairs and parses JSON with json-repair. |
0 commit comments