Skip to content

Commit 5779762

Browse files
template : groq llm orchestration + moondream vision
1 parent 4d9565b commit 5779762

File tree

26 files changed

+4758
-0
lines changed

26 files changed

+4758
-0
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,7 @@ Commands with JSON output support:
148148
- `anthropic-computer-use` - Anthropic Computer Use prompt loop
149149
- `openai-computer-use` - OpenAI Computer Use Agent sample
150150
- `gemini-computer-use` - Implements a Gemini computer use agent (TypeScript only)
151+
- `moondream-groq-computer-use` - Moondream + Groq computer use agent (TypeScript + Python)
151152
- `openagi-computer-use` - OpenAGI Lux computer-use models (Python only)
152153
- `magnitude` - Magnitude framework sample (TypeScript only)
153154
- `claude-agent-sdk` - Claude Agent SDK browser automation agent
@@ -517,6 +518,9 @@ kernel create --name my-agent --language ts --template stagehand
517518
# Create a Python Computer Use app
518519
kernel create --name my-cu-app --language py --template anthropic-computer-use
519520

521+
# Create a Moondream + Groq Computer Use app (TypeScript or Python)
522+
kernel create --name my-moondream-cu --language ts --template moondream-groq-computer-use
523+
520524
# Create a Claude Agent SDK app (TypeScript or Python)
521525
kernel create --name my-claude-agent --language ts --template claude-agent-sdk
522526
```
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
MOONDREAM_API_KEY=
2+
GROQ_API_KEY=
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Kernel Python Sample App - Moondream Computer Use
2+
3+
This Kernel app runs a lightweight computer-use agent powered by Moondream vision models, Groq fast LLM orchestration.
4+
5+
## Setup
6+
7+
1. Get your API keys:
8+
- **Moondream**: [moondream.ai](https://moondream.ai)
9+
- **Groq**: [console.groq.com](https://console.groq.com)
10+
11+
2. Deploy the app:
12+
```bash
13+
kernel login
14+
cp .env.example .env # Add your MOONDREAM_API_KEY and GROQ_API_KEY
15+
kernel deploy main.py --env-file .env
16+
```
17+
18+
## Usage
19+
20+
Natural-language query (Groq LLM orchestrates Moondream + Kernel):
21+
```bash
22+
kernel invoke python-moondream-cua cua-task --payload '{"query": "Navigate to https://example.com and describe the page"}'
23+
```
24+
25+
Structured steps (optional fallback for deterministic automation):
26+
```bash
27+
kernel invoke python-moondream-cua cua-task --payload '{
28+
"steps": [
29+
{"action": "navigate", "url": "https://example.com"},
30+
{"action": "caption"},
31+
{"action": "click", "target": "More information link", "retries": 4},
32+
{"action": "type", "target": "Search input", "text": "kernel", "press_enter": true}
33+
]
34+
}'
35+
```
36+
37+
## Step Actions
38+
39+
Each step is a JSON object with an `action` field. Supported actions:
40+
41+
- `navigate`: `{ "url": "https://..." }`
42+
- `click`: `{ "target": "Button label or description" }`
43+
- `type`: `{ "target": "Input field description", "text": "...", "press_enter": false }`
44+
- `scroll`: `{ "direction": "down" }` or `{ "x": 0.5, "y": 0.5, "direction": "down" }`
45+
- `query`: `{ "question": "Is there a login button?" }`
46+
- `caption`: `{ "length": "short" | "normal" | "long" }`
47+
- `wait`: `{ "seconds": 2.5 }`
48+
- `key`: `{ "keys": "ctrl+l" }`
49+
- `go_back`, `go_forward`, `search`, `open_web_browser`
50+
51+
Optional step fields:
52+
- `retries`: override retry attempts for point/click/type
53+
- `retry_delay_ms`: wait between retries
54+
- `x`, `y`: normalized (0-1) or pixel coordinates to bypass Moondream pointing (pixel coords use detected screenshot size)
55+
56+
## Replay Recording
57+
58+
Add `"record_replay": true` to the payload to capture a video replay (paid Kernel plans only).
59+
60+
## Notes
61+
62+
- The agent uses Moondream for visual reasoning and pointing.
63+
- Kernel screenshots are PNG; Moondream queries are sent as base64 data URLs.
64+
- The Groq LLM must output JSON actions; the agent repairs and parses JSON with json-repair.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.venv/
2+
__pycache__/
3+
*.pyc
4+
.env
5+
.env.local
6+
uv.lock

0 commit comments

Comments
 (0)