[Enhancement] - Adding new environments - OpenApps Env from FAIR by AlirezaShamsoshoara · Pull Request #276 · huggingface/OpenEnv

AlirezaShamsoshoara · 2025-12-31T06:46:37Z

Add OpenApp Environment to OpenEnv

Overview

This PR adds the OpenApp Environment to OpenEnv, integrating the OpenApps framework for UI agent training with web applications.

OpenApps Resources:

Paper: OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
GitHub: facebookresearch/OpenApps
Demo: OpenApps Demo Page

What is OpenApp Environment?

OpenApp Environment provides a simulated web application ecosystem where agents can interact with various apps (calendar, todo, messenger, maps) using browser-based actions. It wraps the OpenApps framework and BrowserGym to create a standardized OpenEnv-compatible environment for:

Training and evaluating UI agents
Testing web automation strategies
Researching human-computer interaction
Developing multimodal agents

Key Features

Multiple Web Apps: Calendar, todo list, messenger, and map applications
Browser-Based Actions: Click, fill forms, navigate, scroll, and more
Task-Based Evaluation: Optional task goals with automatic reward calculation
Docker Support: Fully self-contained Docker image with both OpenApps and environment server
BrowserGym Integration: Built on top of BrowserGym for robust browser interaction
HTTP Client Interface: Compatible with OpenEnv's standard client API
Web Interface: Interactive UI for manual testing and visualization

Changes Made

New Files Added

envs/openapp_env/
├── __init__.py                   # Package exports
├── client.py                     # HTTP client for connecting to OpenApp
├── models.py                     # Data models for actions and observations
├── pyproject.toml                # Package dependencies and configuration
├── openenv.yaml                  # OpenEnv environment configuration
├── test_openapp_env.py           # Unit tests
├── README.md                     # Documentation
├── IMPLEMENTATION.md             # Implementation details and design decisions
├── assets/                       # Images and media
│   ├── OpenApps_OpenEnv_RL.png
│   └── openapps-demo.gif
└── server/                       # Server-side implementation
    ├── __init__.py
    ├── app.py                    # FastAPI server application
    ├── openapp_environment.py    # Core environment logic
    ├── Dockerfile                # Docker image definition
    └── start.sh                  # Container startup script

Updated Files

.github/workflows/docker-build.yml: Added openapp-env to CI/CD build matrix
docs/environments.md: Added OpenApp environment card to documentation
examples/openapp_example.py: Example demonstrating both Docker and local modes
examples/openapp_recording_demo.py: Demo for recording videos of agent interactions

Architecture

The OpenApp environment uses a dual-server architecture:

OpenApps Server (port 5001): Provides the web applications (calendar, todo, messenger, maps)
FastAPI Server (port 8000): Exposes the OpenEnv HTTP API

In Docker mode, both servers run inside the container automatically. In local mode, users must start the OpenApps server separately.

Usage Examples

Docker Mode (Recommended)

from openapp_env import OpenAppAction, OpenAppEnv

# Create environment from Docker image
env = OpenAppEnv.from_docker_image("openapp-env:latest")

# Reset to initial state
result = env.reset()

# Navigate to calendar app
result = env.step(OpenAppAction(
    action_type="goto",
    url="http://localhost:5001/calendar"
))

# Cleanup
env.close()

Building the Docker Image

docker build -t openapp-env:latest -f envs/openapp_env/server/Dockerfile .

Running the Example

# Docker mode (recommended)
python examples/openapp_example.py --mode docker --num-steps 20

# Local mode (requires OpenApps server running separately)
export OPENAPPS_URL=http://localhost:5001
python examples/openapp_example.py --mode local

Action Types Supported

click: Click on an element (requires bid)
fill: Fill a text input field (requires bid, text)
select_option: Select from dropdown (requires bid, value)
goto: Navigate to a URL (requires url)
scroll: Scroll the page (requires direction)
send_keys: Send keyboard input (requires text)
noop: No operation

Observations

Each observation includes:

html: Current page HTML content
url: Current page URL
open_pages_urls: List of all open page URLs
active_page_index: Index of currently active page
screenshot: Base64-encoded screenshot (optional)
axtree_txt: Accessibility tree for element interaction
app_state: Current state of all apps (events, todos, messages, etc.)
task_info: Information about current task (if using tasks)
last_action_error: Error message if last action failed

Dependencies

The environment requires:

Core: openenv-core>=0.1.1,<0.2.0 (pinned due to openai dependency conflict)
Web Framework: FastAPI, Uvicorn, Pydantic
Browser Automation: BrowserGym, Playwright
OpenApps: Installed from GitHub (includes AgentLab dependency)

Note: Using openenv-core>=0.1.1,<0.2.0 to avoid openai version conflict. OpenApps requires openai<2 via agentlab, while openenv-core==0.2.0 requires openai>=2.7.2.

Testing

# Install the environment
cd envs/openapp_env
pip install -e .

# Run unit tests
python test_openapp_env.py

# Run example
python examples/openapp_example.py --mode docker

Docker Build Details

Base image: python:3.11-slim
Size: ~5.7GB (includes Chromium browser and dependencies)
Ports: 8000 (FastAPI), 5001 (OpenApps)
Multi-platform: Supports linux/amd64 and linux/arm64
Health check: Automatic readiness checks on port 8000

CI/CD Integration

The environment is integrated into the GitHub Actions workflow:

Automatically builds on pushes to main
Published to GitHub Container Registry as ghcr.io/openenv/openenv-openapp-env:latest
Uses cached layers for faster builds
Supports multi-platform builds (amd64/arm64)

Implementation Notes

Dual Import Pattern

The code supports both in-repo development and standalone/Docker deployment using a try/except import pattern:

try:
    from core.client_types import StepResult  # In-repo mode
    from .models import OpenAppAction  # Relative imports
except ImportError:
    from openenv_core.client_types import StepResult  # Standalone mode
    from openapp_env.models import OpenAppAction  # Absolute imports

This ensures the environment works both during development and when installed as a package.

Docker Context

The Dockerfile uses the project root (.) as build context and copies the environment directory:

COPY envs/openapp_env/ .
COPY envs/openapp_env/server/start.sh /app/start.sh

This allows the environment to access the openenv-core package during build.

Related Work

This environment builds upon:

OpenApps (GitHub): Web application simulation framework
BrowserGym (GitHub): Browser automation environment
AgentLab (GitHub): Agent evaluation framework

Future Enhancements

Potential improvements for future PRs:

Add support for custom task definitions
Implement VNC visualization for Docker mode
Add more example agents and evaluation scripts
Support for additional OpenApps configurations
Upgrade to openenv-core 0.2.0 when OpenApps updates openai dependency

Citation

If you use this environment in your research, please cite both OpenEnv and OpenApps:

@article{ullrich2025openapps0,
  title   = {OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability},
  author  = {Karen Ullrich and Jingtong Su and Claudia Shi and Arjun Subramonian and Amir Bar and Ivan Evtimov and Nikolaos Tsilivis and Randall Balestriero and Julia Kempe and Mark Ibrahim},
  year    = {2025},
  journal = {arXiv preprint arXiv: 2511.20766}
}

Checklist

AlirezaShamsoshoara · 2025-12-31T21:25:05Z

Pushed env to HF link:
https://huggingface.co/spaces/Crashbandicoote2/openapp_env

Darktex

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.

Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

Darktex

Note: This is an automated review by Claude Code (alignment-reviewer agent), not a human review. The account posting this is shared with the human maintainer.

Alignment Review Report

I've completed a comprehensive review of PR #276 adding the OpenApp Environment. This is a substantial addition (4,125 additions across 26 files) that integrates the OpenApps framework for UI agent training.

Automated Checks

Lint: ✅ PASS - No formatting issues detected
Debug code: ✅ CLEAN - No debug markers (breakpoint, TODO, FIXME) found in production code
Print statements: ⚠️ Found in server code (see Tier 1)
Test coverage: ✅ Basic structure tests included

Tier 1: Fixes Required

Critical Issues

envs/openapp_env/server/app.py:42-46 - CRITICAL BUG: Wrong create_app pattern
```
# Current (WRONG - breaks WebSocket support):
env = OpenAppEnvironment(openapps_url=openapps_url) if openapps_url else OpenAppEnvironment()
app = create_app(env, OpenAppAction, OpenAppObservation, env_name="openapp_env")

# Should be (pass class, not instance):
app = create_app(OpenAppEnvironment, OpenAppAction, OpenAppObservation, env_name="openapp_env")
```
Impact: Passing an instance instead of a class breaks WebSocket session support. Each client needs its own environment instance, but with a single shared instance, all clients would share the same browser session and step counter.

Reference: See envs/echo_env/server/app.py:38 which correctly passes the class with comment: "Pass the class (factory) instead of an instance for WebSocket session support"

High Priority Issues

envs/openapp_env/server/openapp_environment.py:218-245 - Print statements in production code
- Line 218: print(f"Using existing OpenApps server at {self.openapps_url}")
- Line 308: print(f"Warning: Failed to reset browser environment: {e}")
Fix: Use proper logging instead of print statements. Example:
```
import logging
logger = logging.getLogger(__name__)
logger.info(f"Using existing OpenApps server at {self.openapps_url}")
logger.warning(f"Failed to reset browser environment: {e}")
```
envs/openapp_env/server/start.sh:26-40 - Brittle port check logic
Uses raw Python socket connection for health check. Consider using curl or wget which are already installed in the Dockerfile.

Suggested fix:
```
if curl -f http://localhost:${OPENAPPS_PORT:-5001} >/dev/null 2>&1; then
    echo "OpenApps server is ready!"
    break
fi
```

Medium Priority Issues

envs/openapp_env/pyproject.toml:20 - Version constraint comment needs update
The comment says "Using <0.2.0 to avoid openai>=2.7.2 dependency conflict" but should note this is temporary and link to upstream issue.

Suggested addition:
```
# TODO: Upgrade to openenv-core>=0.2.0 when OpenApps updates openai dependency
# See: https://github.com/facebookresearch/OpenApps/issues/XXX
"openenv-core>=0.1.1,<0.2.0",
```
examples/openapp_example.py - Client imports from server
Lines 209-210 and 226-228 show examples importing from openapp_env.server.*:
```
from openapp_env.server.openapp_environment import OpenAppEnvironment
```
While this is in example code (not production), it teaches the wrong pattern. Examples should use the client API, not import server internals. Consider adding a note that this is only for local development/testing.

Low Priority Issues

envs/openapp_env/client.py:30 - HTTPEnvClient vs WebSocket
This environment uses HTTPEnvClient which is being deprecated (per INVARIANTS.md note about PR #252). However, since several other environments still use HTTP (browsergym_env, snake_env), this is acceptable for now but should be flagged for future migration.

Tier 2: Alignment Discussion Points

ALIGNMENT FLAG 1: HTTP Client Instead of WebSocket

Principle at stake: Communication patterns (INVARIANTS.md line 69-73)
The concern: This PR uses HTTPEnvClient for the client implementation, while the project is transitioning to WebSocket-only. The INVARIANTS.md document states "We are in the process of deprecating HTTP (see PR #252)". However, several existing environments (browsergym_env, snake_env, coding_env) still use HTTP.
Resolution path: This appears acceptable as part of the transition period, but should be documented as technical debt. The environment should eventually migrate to WebSocket.
Suggested reviewer: @Darktex

ALIGNMENT FLAG 2: Example Code Imports Server Modules

Principle at stake: Client-server separation (INVARIANTS.md line 59-62)
The concern: Multiple example files show direct imports from openapp_env.server.openapp_environment:
- examples/openapp_example.py:209
- examples/openapp_example.py:226
- examples/openapp_recording_demo.py (implied from PR description)
- envs/openapp_env/example_usage.py (legacy)
While these are examples/demos rather than production code, they teach users to violate the client-server boundary. The INVARIANTS state: "Clients must never import from server/ directory."
The trade-off: The examples demonstrate "local mode" where users run the environment directly (useful for development), vs "client mode" where they connect over HTTP/WebSocket. Local mode is convenient for debugging but breaks the boundary.
Suggested approach:
1. Keep the local mode examples but add prominent warnings about the pattern
2. Make Docker mode the primary recommended approach in docs
3. Consider creating a separate "Development Guide" that explains when server imports are acceptable
Suggested reviewer: @Darktex

ALIGNMENT FLAG 3: Test Files Import Server Directly

Principle at stake: Client-server separation (INVARIANTS.md line 59-62)
The concern: envs/openapp_env/test_openapp_env.py:29 imports from server:
```
from openapp_env.server.openapp_environment import OpenAppEnvironment
```
This is a test file, but it's testing the environment structure rather than the client API. Should tests respect the boundary or are they allowed to cross it?
Question for human review: Is it acceptable for test files to import from server/ for testing purposes, or should all tests go through the client API?
Suggested reviewer: @Darktex

ALIGNMENT FLAG 4: Print Statements vs Logging

Principle at stake: Production-readiness (PRINCIPLES.md line 17)
The concern: The server implementation uses print() for status messages instead of proper logging. While functional, this doesn't follow production-ready patterns and makes it harder to control output verbosity.
Trade-off: Print statements are simpler and acceptable for demos/examples, but this is server-side production code.
Suggested reviewer: @Darktex

Positive Observations

✅ Well-structured implementation: Clear separation of models, client, and server
✅ Comprehensive documentation: Excellent README with installation, usage, and troubleshooting
✅ Docker integration: Proper CI/CD setup with multi-platform builds
✅ Dual import pattern: Correctly handles both in-repo and standalone usage
✅ Type safety: Uses dataclasses with validation for actions and observations
✅ No security issues: No credential exposure or security vulnerabilities detected
✅ Follows Gymnasium API: Implements standard reset(), step(), state interface
✅ Good error handling: Validates action parameters and provides helpful error messages

Summary

Mechanical Issues: 1 critical, 2 high priority, 2 medium priority, 1 low priority (6 total)
Alignment Points: 4 items for human review

Critical Path: The create_app() pattern bug (Tier 1, first item) must be fixed before merge - it breaks WebSocket support which is the project's direction per PR #252.

The alignment flags are primarily about documentation and teaching patterns rather than hard violations. The code is functionally sound but the examples/tests blur the client-server boundary in ways that might confuse users about the intended architecture.

Recommendation: Fix the critical create_app() bug and the print statements, then this is ready for merge with the alignment points noted for future improvement.

Automated review by Claude Code | Learn more about OpenEnv's agentic workflow

Dismissing automated approval due to bug in review bot. The original review either had blank content or approved despite finding blocking issues. Please disregard this approval.

AlirezaShamsoshoara · 2026-01-15T17:54:52Z

@Darktex Thanks for reviewing, I just addressed the issues in some new commits (pushed already here).

… the server

AlirezaShamsoshoara requested review from Darktex, burtenshaw, init27 and pankit-eng December 31, 2025 06:46

AlirezaShamsoshoara self-assigned this Dec 31, 2025

AlirezaShamsoshoara added the enhancement New feature or request label Dec 31, 2025

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 31, 2025

AlirezaShamsoshoara added the New Environment label Jan 12, 2026

Darktex previously approved these changes Jan 13, 2026

View reviewed changes

Darktex reviewed Jan 13, 2026

View reviewed changes

AlirezaShamsoshoara force-pushed the ali/feature/openapp_env branch from 35a2e90 to e575845 Compare January 15, 2026 18:28

AlirezaShamsoshoara added 16 commits January 15, 2026 12:04

update the gitignore to not collect logs

c235b06

add the openApps example

cc84fdd

add the assets for the readme

4423af0

add the Server folder for the OpenApps

e7b8d85

add the init for openApps

b35b794

add the client file

a316bef

add the models file for the openApp

7ceeb80

add the yaml file for openapp-openenv

dff37fa

add the pyproject

fa9d6fe

update the README file

eda43f5

add the dir structure to the README file

f00b521

update the gitignore to avoid the session key

3615129

update the openapp example to include interactive visualization

c23756c

update README regarding the visualization

41f67ef

add the test case

6a59eaa

add the example usage

54adab8

AlirezaShamsoshoara added 28 commits January 15, 2026 12:04

add the start bash script for docker

6e685f2

update the readme based on new changes in the docker and how to start…

9bc2b75

… the server

update the pyproject to fix the server issue

a7f2a58

fix the Docker issue to start the OpenApp server

fad70e9

add the OPENAPPS_URL to handle run from docker

0000739

make the giff size bigger

7f10d12

update openapp_env to have click ability

8ec4b92

add the demo file for recording

e420acd

add the demo showcase for the hackathon submission

695bf9f

update the HTML demo page for videos

06103df

add the videos to the assets

a844215

add the .gitignore for the sub project of OpenApps - openEnv

52b2fdc

update to openenv v0.2.0

237756e

update the docker build

f42dc31

update the env md for the openapp

73519ad

update the readme for the new docker install

d728a11

revert the start.sh

03a2bf1

update the docker file on how to build the image

9efb4e5

update the docker build workflow for openapps

ccfa04b

update the README

2b26082

update the DockerFile to be compatible with HF and local/github actions

64e2315

addressing the PR review

9d1cf50

update the client to address the pr review

c58dd18

address the PR review re model

49399aa

address the PR review for pyproject.toml file

78ff4bd

fix docker issue to have latest on openenv-core

7af2cc8

update the create_app function in app for 0.2.0

104cb95

address the PR review

cee4d06

AlirezaShamsoshoara force-pushed the ali/feature/openapp_env branch from e575845 to cee4d06 Compare January 16, 2026 00:12

Darktex merged commit 37158b7 into huggingface:main Jan 16, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] - Adding new environments - OpenApps Env from FAIR#276

[Enhancement] - Adding new environments - OpenApps Env from FAIR#276
Darktex merged 45 commits into
huggingface:mainfrom
AlirezaShamsoshoara:ali/feature/openapp_env

AlirezaShamsoshoara commented Dec 31, 2025

Uh oh!

AlirezaShamsoshoara commented Dec 31, 2025

Uh oh!

Darktex left a comment

Uh oh!

Darktex left a comment •

edited by AlirezaShamsoshoara

Loading

Uh oh!

AlirezaShamsoshoara commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

AlirezaShamsoshoara commented Dec 31, 2025

Add OpenApp Environment to OpenEnv

Overview

What is OpenApp Environment?

Key Features

Changes Made

New Files Added

Updated Files

Architecture

Usage Examples

Docker Mode (Recommended)

Building the Docker Image

Running the Example

Action Types Supported

Observations

Dependencies

Testing

Docker Build Details

CI/CD Integration

Implementation Notes

Dual Import Pattern

Docker Context

Related Work

Future Enhancements

Citation

Checklist

Uh oh!

AlirezaShamsoshoara commented Dec 31, 2025

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Uh oh!

Darktex left a comment • edited by AlirezaShamsoshoara Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Alignment Review Report

Automated Checks

Tier 1: Fixes Required

Critical Issues

High Priority Issues

Medium Priority Issues

Low Priority Issues

Tier 2: Alignment Discussion Points

ALIGNMENT FLAG 1: HTTP Client Instead of WebSocket

ALIGNMENT FLAG 2: Example Code Imports Server Modules

ALIGNMENT FLAG 3: Test Files Import Server Directly

ALIGNMENT FLAG 4: Print Statements vs Logging

Positive Observations

Summary

Uh oh!

AlirezaShamsoshoara commented Jan 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Darktex left a comment •

edited by AlirezaShamsoshoara

Loading