Skip to content

[Enhancement] - Adding new environments - OpenApps Env from FAIR#276

Merged
Darktex merged 45 commits into
huggingface:mainfrom
AlirezaShamsoshoara:ali/feature/openapp_env
Jan 16, 2026
Merged

[Enhancement] - Adding new environments - OpenApps Env from FAIR#276
Darktex merged 45 commits into
huggingface:mainfrom
AlirezaShamsoshoara:ali/feature/openapp_env

Conversation

@AlirezaShamsoshoara

Copy link
Copy Markdown
Contributor

Add OpenApp Environment to OpenEnv

OpenApps_OpenEnv_RL

Overview

This PR adds the OpenApp Environment to OpenEnv, integrating the OpenApps framework for UI agent training with web applications.

OpenApps Resources:

What is OpenApp Environment?

OpenApp Environment provides a simulated web application ecosystem where agents can interact with various apps (calendar, todo, messenger, maps) using browser-based actions. It wraps the OpenApps framework and BrowserGym to create a standardized OpenEnv-compatible environment for:

  • Training and evaluating UI agents
  • Testing web automation strategies
  • Researching human-computer interaction
  • Developing multimodal agents

Key Features

  • Multiple Web Apps: Calendar, todo list, messenger, and map applications
  • Browser-Based Actions: Click, fill forms, navigate, scroll, and more
  • Task-Based Evaluation: Optional task goals with automatic reward calculation
  • Docker Support: Fully self-contained Docker image with both OpenApps and environment server
  • BrowserGym Integration: Built on top of BrowserGym for robust browser interaction
  • HTTP Client Interface: Compatible with OpenEnv's standard client API
  • Web Interface: Interactive UI for manual testing and visualization

Changes Made

New Files Added

envs/openapp_env/
├── __init__.py                   # Package exports
├── client.py                     # HTTP client for connecting to OpenApp
├── models.py                     # Data models for actions and observations
├── pyproject.toml                # Package dependencies and configuration
├── openenv.yaml                  # OpenEnv environment configuration
├── test_openapp_env.py           # Unit tests
├── README.md                     # Documentation
├── IMPLEMENTATION.md             # Implementation details and design decisions
├── assets/                       # Images and media
│   ├── OpenApps_OpenEnv_RL.png
│   └── openapps-demo.gif
└── server/                       # Server-side implementation
    ├── __init__.py
    ├── app.py                    # FastAPI server application
    ├── openapp_environment.py    # Core environment logic
    ├── Dockerfile                # Docker image definition
    └── start.sh                  # Container startup script

Updated Files

  • .github/workflows/docker-build.yml: Added openapp-env to CI/CD build matrix
  • docs/environments.md: Added OpenApp environment card to documentation
  • examples/openapp_example.py: Example demonstrating both Docker and local modes
  • examples/openapp_recording_demo.py: Demo for recording videos of agent interactions

Architecture

The OpenApp environment uses a dual-server architecture:

  1. OpenApps Server (port 5001): Provides the web applications (calendar, todo, messenger, maps)
  2. FastAPI Server (port 8000): Exposes the OpenEnv HTTP API

In Docker mode, both servers run inside the container automatically. In local mode, users must start the OpenApps server separately.

Usage Examples

Docker Mode (Recommended)

from openapp_env import OpenAppAction, OpenAppEnv

# Create environment from Docker image
env = OpenAppEnv.from_docker_image("openapp-env:latest")

# Reset to initial state
result = env.reset()

# Navigate to calendar app
result = env.step(OpenAppAction(
    action_type="goto",
    url="http://localhost:5001/calendar"
))

# Cleanup
env.close()

Building the Docker Image

docker build -t openapp-env:latest -f envs/openapp_env/server/Dockerfile .

Running the Example

# Docker mode (recommended)
python examples/openapp_example.py --mode docker --num-steps 20

# Local mode (requires OpenApps server running separately)
export OPENAPPS_URL=http://localhost:5001
python examples/openapp_example.py --mode local

Action Types Supported

  • click: Click on an element (requires bid)
  • fill: Fill a text input field (requires bid, text)
  • select_option: Select from dropdown (requires bid, value)
  • goto: Navigate to a URL (requires url)
  • scroll: Scroll the page (requires direction)
  • send_keys: Send keyboard input (requires text)
  • noop: No operation

Observations

Each observation includes:

  • html: Current page HTML content
  • url: Current page URL
  • open_pages_urls: List of all open page URLs
  • active_page_index: Index of currently active page
  • screenshot: Base64-encoded screenshot (optional)
  • axtree_txt: Accessibility tree for element interaction
  • app_state: Current state of all apps (events, todos, messages, etc.)
  • task_info: Information about current task (if using tasks)
  • last_action_error: Error message if last action failed

Dependencies

The environment requires:

  • Core: openenv-core>=0.1.1,<0.2.0 (pinned due to openai dependency conflict)
  • Web Framework: FastAPI, Uvicorn, Pydantic
  • Browser Automation: BrowserGym, Playwright
  • OpenApps: Installed from GitHub (includes AgentLab dependency)

Note: Using openenv-core>=0.1.1,<0.2.0 to avoid openai version conflict. OpenApps requires openai<2 via agentlab, while openenv-core==0.2.0 requires openai>=2.7.2.

Testing

# Install the environment
cd envs/openapp_env
pip install -e .

# Run unit tests
python test_openapp_env.py

# Run example
python examples/openapp_example.py --mode docker

Docker Build Details

  • Base image: python:3.11-slim
  • Size: ~5.7GB (includes Chromium browser and dependencies)
  • Ports: 8000 (FastAPI), 5001 (OpenApps)
  • Multi-platform: Supports linux/amd64 and linux/arm64
  • Health check: Automatic readiness checks on port 8000

CI/CD Integration

The environment is integrated into the GitHub Actions workflow:

  • Automatically builds on pushes to main
  • Published to GitHub Container Registry as ghcr.io/openenv/openenv-openapp-env:latest
  • Uses cached layers for faster builds
  • Supports multi-platform builds (amd64/arm64)

Implementation Notes

Dual Import Pattern

The code supports both in-repo development and standalone/Docker deployment using a try/except import pattern:

try:
    from core.client_types import StepResult  # In-repo mode
    from .models import OpenAppAction  # Relative imports
except ImportError:
    from openenv_core.client_types import StepResult  # Standalone mode
    from openapp_env.models import OpenAppAction  # Absolute imports

This ensures the environment works both during development and when installed as a package.

Docker Context

The Dockerfile uses the project root (.) as build context and copies the environment directory:

COPY envs/openapp_env/ .
COPY envs/openapp_env/server/start.sh /app/start.sh

This allows the environment to access the openenv-core package during build.

Related Work

This environment builds upon:

  • OpenApps (GitHub): Web application simulation framework
  • BrowserGym (GitHub): Browser automation environment
  • AgentLab (GitHub): Agent evaluation framework

Future Enhancements

Potential improvements for future PRs:

  • Add support for custom task definitions
  • Implement VNC visualization for Docker mode
  • Add more example agents and evaluation scripts
  • Support for additional OpenApps configurations
  • Upgrade to openenv-core 0.2.0 when OpenApps updates openai dependency

Citation

If you use this environment in your research, please cite both OpenEnv and OpenApps:

@article{ullrich2025openapps0,
  title   = {OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability},
  author  = {Karen Ullrich and Jingtong Su and Claudia Shi and Arjun Subramonian and Amir Bar and Ivan Evtimov and Nikolaos Tsilivis and Randall Balestriero and Julia Kempe and Mark Ibrahim},
  year    = {2025},
  journal = {arXiv preprint arXiv: 2511.20766}
}

Checklist

  • Added new environment in envs/openapp_env/
  • Created Dockerfile with full setup
  • Added startup script for Docker container
  • Implemented HTTP client interface
  • Added example scripts demonstrating usage
  • Updated CI/CD workflow to build Docker image
  • Added documentation in README.md and IMPLEMENTATION.md
  • Updated docs/environments.md with environment card
  • Tested Docker build and execution
  • Tested local mode execution
  • Added unit tests

@AlirezaShamsoshoara AlirezaShamsoshoara self-assigned this Dec 31, 2025
@AlirezaShamsoshoara AlirezaShamsoshoara added the enhancement New feature or request label Dec 31, 2025
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 31, 2025
@AlirezaShamsoshoara

Copy link
Copy Markdown
Contributor Author

Pushed env to HF link:
https://huggingface.co/spaces/Crashbandicoote2/openapp_env

Darktex
Darktex previously approved these changes Jan 13, 2026

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.



Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

@Darktex Darktex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.


Alignment Review Report

I've completed a comprehensive review of PR #276 adding the OpenApp Environment. This is a substantial addition (4,125 additions across 26 files) that integrates the OpenApps framework for UI agent training.

Automated Checks

  • Lint: ✅ PASS - No formatting issues detected
  • Debug code: ✅ CLEAN - No debug markers (breakpoint, TODO, FIXME) found in production code
  • Print statements: ⚠️ Found in server code (see Tier 1)
  • Test coverage: ✅ Basic structure tests included

Tier 1: Fixes Required

Critical Issues

  • envs/openapp_env/server/app.py:42-46 - CRITICAL BUG: Wrong create_app pattern

    # Current (WRONG - breaks WebSocket support):
    env = OpenAppEnvironment(openapps_url=openapps_url) if openapps_url else OpenAppEnvironment()
    app = create_app(env, OpenAppAction, OpenAppObservation, env_name="openapp_env")
    
    # Should be (pass class, not instance):
    app = create_app(OpenAppEnvironment, OpenAppAction, OpenAppObservation, env_name="openapp_env")

    Impact: Passing an instance instead of a class breaks WebSocket session support. Each client needs its own environment instance, but with a single shared instance, all clients would share the same browser session and step counter.

    Reference: See envs/echo_env/server/app.py:38 which correctly passes the class with comment: "Pass the class (factory) instead of an instance for WebSocket session support"

High Priority Issues

  • envs/openapp_env/server/openapp_environment.py:218-245 - Print statements in production code

    • Line 218: print(f"Using existing OpenApps server at {self.openapps_url}")
    • Line 308: print(f"Warning: Failed to reset browser environment: {e}")

    Fix: Use proper logging instead of print statements. Example:

    import logging
    logger = logging.getLogger(__name__)
    logger.info(f"Using existing OpenApps server at {self.openapps_url}")
    logger.warning(f"Failed to reset browser environment: {e}")
  • envs/openapp_env/server/start.sh:26-40 - Brittle port check logic
    Uses raw Python socket connection for health check. Consider using curl or wget which are already installed in the Dockerfile.

    Suggested fix:

    if curl -f http://localhost:${OPENAPPS_PORT:-5001} >/dev/null 2>&1; then
        echo "OpenApps server is ready!"
        break
    fi

Medium Priority Issues

  • envs/openapp_env/pyproject.toml:20 - Version constraint comment needs update
    The comment says "Using <0.2.0 to avoid openai>=2.7.2 dependency conflict" but should note this is temporary and link to upstream issue.

    Suggested addition:

    # TODO: Upgrade to openenv-core>=0.2.0 when OpenApps updates openai dependency
    # See: https://github.com/facebookresearch/OpenApps/issues/XXX
    "openenv-core>=0.1.1,<0.2.0",
  • examples/openapp_example.py - Client imports from server
    Lines 209-210 and 226-228 show examples importing from openapp_env.server.*:

    from openapp_env.server.openapp_environment import OpenAppEnvironment

    While this is in example code (not production), it teaches the wrong pattern. Examples should use the client API, not import server internals. Consider adding a note that this is only for local development/testing.

Low Priority Issues

  • envs/openapp_env/client.py:30 - HTTPEnvClient vs WebSocket
    This environment uses HTTPEnvClient which is being deprecated (per INVARIANTS.md note about PR #252). However, since several other environments still use HTTP (browsergym_env, snake_env), this is acceptable for now but should be flagged for future migration.

Tier 2: Alignment Discussion Points

ALIGNMENT FLAG 1: HTTP Client Instead of WebSocket

  • Principle at stake: Communication patterns (INVARIANTS.md line 69-73)
  • The concern: This PR uses HTTPEnvClient for the client implementation, while the project is transitioning to WebSocket-only. The INVARIANTS.md document states "We are in the process of deprecating HTTP (see PR #252)". However, several existing environments (browsergym_env, snake_env, coding_env) still use HTTP.
  • Resolution path: This appears acceptable as part of the transition period, but should be documented as technical debt. The environment should eventually migrate to WebSocket.
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 2: Example Code Imports Server Modules

  • Principle at stake: Client-server separation (INVARIANTS.md line 59-62)

  • The concern: Multiple example files show direct imports from openapp_env.server.openapp_environment:

    • examples/openapp_example.py:209
    • examples/openapp_example.py:226
    • examples/openapp_recording_demo.py (implied from PR description)
    • envs/openapp_env/example_usage.py (legacy)

    While these are examples/demos rather than production code, they teach users to violate the client-server boundary. The INVARIANTS state: "Clients must never import from server/ directory."

  • The trade-off: The examples demonstrate "local mode" where users run the environment directly (useful for development), vs "client mode" where they connect over HTTP/WebSocket. Local mode is convenient for debugging but breaks the boundary.

  • Suggested approach:

    1. Keep the local mode examples but add prominent warnings about the pattern
    2. Make Docker mode the primary recommended approach in docs
    3. Consider creating a separate "Development Guide" that explains when server imports are acceptable
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 3: Test Files Import Server Directly

  • Principle at stake: Client-server separation (INVARIANTS.md line 59-62)
  • The concern: envs/openapp_env/test_openapp_env.py:29 imports from server:
    from openapp_env.server.openapp_environment import OpenAppEnvironment
    This is a test file, but it's testing the environment structure rather than the client API. Should tests respect the boundary or are they allowed to cross it?
  • Question for human review: Is it acceptable for test files to import from server/ for testing purposes, or should all tests go through the client API?
  • Suggested reviewer: @Darktex

ALIGNMENT FLAG 4: Print Statements vs Logging

  • Principle at stake: Production-readiness (PRINCIPLES.md line 17)
  • The concern: The server implementation uses print() for status messages instead of proper logging. While functional, this doesn't follow production-ready patterns and makes it harder to control output verbosity.
  • Trade-off: Print statements are simpler and acceptable for demos/examples, but this is server-side production code.
  • Suggested reviewer: @Darktex

Positive Observations

Well-structured implementation: Clear separation of models, client, and server
Comprehensive documentation: Excellent README with installation, usage, and troubleshooting
Docker integration: Proper CI/CD setup with multi-platform builds
Dual import pattern: Correctly handles both in-repo and standalone usage
Type safety: Uses dataclasses with validation for actions and observations
No security issues: No credential exposure or security vulnerabilities detected
Follows Gymnasium API: Implements standard reset(), step(), state interface
Good error handling: Validates action parameters and provides helpful error messages


Summary

Mechanical Issues: 1 critical, 2 high priority, 2 medium priority, 1 low priority (6 total)
Alignment Points: 4 items for human review

Critical Path: The create_app() pattern bug (Tier 1, first item) must be fixed before merge - it breaks WebSocket support which is the project's direction per PR #252.

The alignment flags are primarily about documentation and teaching patterns rather than hard violations. The code is functionally sound but the examples/tests blur the client-server boundary in ways that might confuse users about the intended architecture.

Recommendation: Fix the critical create_app() bug and the print statements, then this is ready for merge with the alignment points noted for future improvement.


Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

@Darktex Darktex dismissed their stale review January 13, 2026 05:51

Dismissing automated approval due to bug in review bot. The original review either had blank content or approved despite finding blocking issues. Please disregard this approval.

@AlirezaShamsoshoara

Copy link
Copy Markdown
Contributor Author

@Darktex Thanks for reviewing, I just addressed the issues in some new commits (pushed already here).

@Darktex Darktex merged commit 37158b7 into huggingface:main Jan 16, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. enhancement New feature or request New Environment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants