Add unified CUA template with multi-provider fallback#143
Add unified CUA template with multi-provider fallback#143masnwilliams wants to merge 8 commits intomainfrom
Conversation
|
🔧 CI Fix Available |
Consolidates the separate anthropic-computer-use, openai-computer-use, and gemini-computer-use templates into a single "cua" template that supports all three providers with automatic fallback. - TypeScript and Python templates with identical structure - Provider selection via CUA_PROVIDER env var - Optional fallback chain via CUA_FALLBACK_PROVIDERS - Shared browser session lifecycle with replay support - Each provider adapter is self-contained and customizable - Registered as "cua" template in templates.go Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
99891de to
73255f9
Compare
Provider resolution at module load crashes during Hypeman's build/discovery phase when env vars aren't available. Use lazy initialization so providers are resolved on first invocation instead. Also fix TS type errors: narrow candidate.content in Gemini provider, cast input items in OpenAI provider, simplify computer_call_output construction. Made-with: Cursor
…odel inputs - Bump all TS and Python deps to latest versions - Fix Anthropic computer use: use computer_20251124 with computer-use-2025-11-24 beta flag (claude-sonnet-4-6 requires the newer tool version) - Fix OpenAI: add missing screenshot action handler - Fix Python: correct SDK API (kernel.App), fix session.delete call, add missing openai dependency - Restore provider and model as per-request payload overrides (were dropped in rewrite). Provider uses a typed enum (anthropic | openai | gemini). Made-with: Cursor
… API, session delete - Add missing screenshot action handler in Python OpenAI provider - Use Part.from_function_response() instead of FunctionResponsePart() in Python Gemini provider (pydantic extra_forbidden in google-genai >=1.71) - Fix session cleanup: use delete_by_id() instead of delete() Made-with: Cursor
…n Gemini - TS Anthropic: move SYSTEM_PROMPT to getSystemPrompt() function so the date is computed per-request instead of freezing at module load - Python Gemini: include screenshot data as inline_data Part alongside function responses so the model can see action results - Remove unused PREDEFINED_ACTIONS list from Python Gemini Made-with: Cursor
…eout) Add optional `browser` field to CUA payload for per-request browser session configuration. Supports proxy_id, profile (id/name/save_changes), extensions, and timeout_seconds. Viewport and stealth remain deploy-time defaults since CUA providers depend on consistent viewport dimensions. Made-with: Cursor
When session_id is provided in the payload, the CUA task uses that existing browser session directly instead of creating a new one. The caller is responsible for the session lifecycle. This lets users pre-configure browsers with any settings and reuse sessions across tasks. Made-with: Cursor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4258604. Configure here.
| this._replayViewUrl = null; | ||
|
|
||
| return info; | ||
| } |
There was a problem hiding this comment.
Session state not reset when replay stop throws
Medium Severity
In stop(), the state cleanup (_sessionId = null, etc.) is placed after the try/finally block. If stopReplay() throws, the finally clause deletes the browser, but the exception then propagates past the cleanup lines without executing them. When the caller's error handler (in both index.ts and main.py) calls stop() a second time, _sessionId is still set, so it attempts to delete the already-destroyed browser — likely throwing a new exception that masks the original error.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 4258604. Configure here.
When an external session_id is provided, retrieve the browser's real viewport dimensions via browsers.retrieve() instead of hardcoding 1280x800. This ensures coordinate mapping is correct regardless of how the browser was created. Made-with: Cursor
| // Shared interface every provider adapter must implement. | ||
| export interface TaskOptions { | ||
| query: string; | ||
| model?: string; |
There was a problem hiding this comment.
Each provider hardcodes model-specific API features that will break if you swap in a different model. Sharing Cursor's analysis, which is pretty aligned with what i experienced building other templates. I think it would be smart to at least have in the template which models from which providers are compatible. I don't think we need to lock it down to specific models, particularly if we have defaults.
Cursor Reco
The model field should either be removed from the public payload (keep it as an env var only) or the template should validate model compatibility per provider before calling the API. At minimum, the README and input description should warn that only specific computer-use-capable models work. Right now a user seeing that free-text field in the dashboard will absolutely try claude-haiku-3-5 or gpt-4o and get a confusing 400 error.
| break; | ||
| } | ||
| case 'scroll_document': | ||
| case 'scroll_at': { |
There was a problem hiding this comment.
The Gemini provider in both ts-cua and python-cua is not likely optimized for scroll behavior. In the standalone templates, we have logic in place (though not perfect) to handle the fact that Gemini reports back a magnitude as the value in pixels.
Cursor summary of issue:
It's missing the magnitude ÷ 60 pixel-to-notch conversion and the max(1, min(17, ...)) clamp. When Gemini asks to scroll with its default magnitude (~400 pixels), the unified template will fire 400 wheel notches instead of 7. Both TS and Python have the same bug — they're consistent with each other, but both wrong compared to the standalone template.
Getting API exhaustion errors with gemini right now, but wanted to surface this. I put in the % 60 and clamp logic in the standalone templates. It likely could be improved
There was a problem hiding this comment.
The behavior in anthropic and openai providers matches the standalone templates right now.
dprevoznik
left a comment
There was a problem hiding this comment.
Left some comments. All three providers working on both ts and python versions. Though Gemini api exhaustion error made it so I couldn't test the scroll logic comment I made.


Summary
cuatemplate (TypeScript + Python) that consolidates the separateanthropic-computer-use,openai-computer-use, andgemini-computer-usetemplates into a single multi-provider templateCUA_PROVIDERenv var, with automatic fallback viaCUA_FALLBACK_PROVIDERStemplates.gofor both TypeScript and PythonStructure
Test plan
go build ./...passesgo test ./pkg/create/...passeskernel createshows "Unified CUA" template for both TS and Python🤖 Generated with Claude Code
Note
Medium Risk
Adds large new TypeScript/Python template code that orchestrates browser sessions and calls multiple external LLM provider APIs, so integration and runtime behavior may be flaky despite being mostly isolated to templates.
Overview
Adds a new Unified CUA template (
cua) for both TypeScript and Python, exposing a singlecua-taskaction that can run computer-use workflows against Anthropic/OpenAI/Gemini with ordered fallback viaCUA_PROVIDER/CUA_FALLBACK_PROVIDERS.Registers the new template in
pkg/create/templates.go(template metadata plus deploy/invoke samples), and includes shared browser session helpers with optional replay recording plus per-provider adapters implementing each provider’s agent loop and tool/action translation.Reviewed by Cursor Bugbot for commit 0620134. Bugbot is set up for automated code reviews on this repo. Configure here.