Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions src/label/prompts/default.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,15 +42,23 @@ Here are the user input events:
- Identify folders, files, UI elements, spreadsheet cells (with values and labels), browser fields, etc.
- Include **visible labels or contents** of buttons, menu items, folders, etc.

5. **Ignore technical rendering details**
5. **Quote exact text for copy/cut/paste/select/delete/find-replace actions**

- Always include the exact text content in quotes and the precise location (filename, line number, cell, field name, URL).

6. **Name the application and location**

- Always name the app (e.g. VS Code, Chrome, Terminal). Include filename + line number for editors, site name for browsers, working directory for terminals.

7. **Ignore technical rendering details**

- Do not mention coordinates, cursor paths, or raw keycodes.

6. **Favor screenshot over input events**
8. **Favor screenshot over input events**

- In cases where input logs and screenshots conflict, or logs are harder to understand, prioritize the **visual evidence** from screenshots.

6. IMPORTANT: **Merge repeated identical actions**
9. IMPORTANT: **Merge repeated identical actions**

- If the same action is done repeatedly with no change or intermediate action, **merge them into one action** with a wider start–end interval. For example, instead of multiple "Ran the command \"ls\" in the terminal," generate it ONLY once.
- If the user repeatedly clicks / switches between applications without performing any intermediate action, merge them into a single combined action.
Expand All @@ -68,6 +76,10 @@ Generated captions must be in past tense, and at the level of detail as the exam
- Ran "cd /home/user/projects/gs-utils" in the terminal.
- Deleted the text "hyundai i30" from cell I2.
- Clicked the "Downloads" folder in the sidebar.
- Copied "export default App" from line 24 of App.tsx in VS Code.
- Pasted "border-radius: 8px;" into the .card class in styles.css at line 31.
- Selected "return None" on line 15 of utils.py in VS Code.
- Replaced "http" with "https" using Find and Replace in config.yaml.

You MUST quote specific things from the screen so it's easy to reproduce your steps.

Expand Down
18 changes: 15 additions & 3 deletions src/label/prompts/screenshots_only.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,23 @@ Your job is to generate **fully detailed captions** describing **exactly what th
- Identify folders, files, UI elements, spreadsheet cells (with values and labels), browser fields, etc.
- Include **visible labels or contents** of buttons, menu items, folders, etc.

5. **Ignore technical rendering details**
5. **Quote exact text for copy/cut/paste/select/delete/find-replace actions**

- Always include the exact text content in quotes and the precise location (filename, line number, cell, field name, URL).

6. **Name the application and location**

- Always name the app (e.g. VS Code, Chrome, Terminal). Include filename + line number for editors, site name for browsers, working directory for terminals.

7. **Ignore technical rendering details**

- Do not mention coordinates, cursor paths, or raw keycodes.

6. **Favor screenshot over input events**
8. **Favor screenshot over input events**

- In cases where input logs and screenshots conflict, or logs are harder to understand, prioritize the **visual evidence** from screenshots.

6. IMPORTANT: **Merge repeated identical actions**
9. IMPORTANT: **Merge repeated identical actions**

- If the same action is done repeatedly with no change or intermediate action, **merge them into one action** with a wider start–end interval. For example, instead of multiple "Ran the command \"ls\" in the terminal," generate it ONLY once.
- If the user repeatedly clicks / switches between applications without performing any intermediate action, merge them into a single combined action.
Expand All @@ -57,6 +65,10 @@ Generated captions must be in past tense, and at the level of detail as the exam
- Ran "cd /home/user/projects/gs-utils" in the terminal.
- Deleted the text "hyundai i30" from cell I2.
- Clicked the "Downloads" folder in the sidebar.
- Copied "export default App" from line 24 of App.tsx in VS Code.
- Pasted "border-radius: 8px;" into the .card class in styles.css at line 31.
- Selected "return None" on line 15 of utils.py in VS Code.
- Replaced "http" with "https" using Find and Replace in config.yaml.

You MUST quote specific things from the screen so it's easy to reproduce your steps.

Expand Down
10 changes: 10 additions & 0 deletions src/record/handlers/input_event.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ def __init__(
"""
self.event_queue = event_queue
self._monitors = list(get_monitors())
self._monitors_last_refresh = time.time()
self._monitors_refresh_interval = 5.0
Comment on lines 35 to +37
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_monitors() is called unguarded during initialization, but refreshes are wrapped in a try/except. If screeninfo fails or returns an empty list at startup, the handler can crash before it ever reaches the safer refresh logic. Consider wrapping the initial get_monitors() call similarly and ensuring _monitors is non-empty (fallback to a single synthetic monitor or a safe default).

Copilot uses AI. Check for mistakes.
self.accessibility_enabled = accessibility
self.accessibility_handler = None

Expand Down Expand Up @@ -68,6 +70,14 @@ def _get_monitor(self, x: int, y: int) -> int:
Returns:
Monitor index (0-based)
"""
now = time.time()
if now - self._monitors_last_refresh > self._monitors_refresh_interval:
Comment on lines 70 to +74
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _get_monitor docstring/type annotation says it returns a single int monitor index, but the implementation returns a tuple (idx, monitor_dict). Update the return type/docstring to match the actual return value to avoid misleading callers and type checkers.

Copilot uses AI. Check for mistakes.
try:
self._monitors = list(get_monitors())
except Exception:
pass
Comment on lines +77 to +78
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
pass
except Exception as exc:
# Keep using previously cached monitor information if refresh fails.
print(f"Warning: Failed to refresh monitor information: {exc}")

Copilot uses AI. Check for mistakes.
self._monitors_last_refresh = now

Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The monitor refresh can leave self._monitors empty (e.g., get_monitors() returns []). The rest of _get_monitor assumes at least one monitor exists (uses self._monitors[0] later), which can raise IndexError. Add a guard after refresh (and before the loop) to handle the empty-list case safely.

Suggested change
# If no monitors are available (e.g., get_monitors() returned an empty list),
# return a safe default monitor description to avoid IndexError.
if not self._monitors:
return 0, {"left": 0, "top": 0, "width": 0, "height": 0}

Copilot uses AI. Check for mistakes.
def to_monitor_dict(monitor):
return {
"left": monitor.x, "top": monitor.y, "width": monitor.width, "height": monitor.height
Expand Down
30 changes: 21 additions & 9 deletions src/record/models/event_queue.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def enqueue(self, event: InputEvent) -> None:

queue = self.aggregations[agg_type]
config = self.configs[agg_type]
screenshots = self._collect_screenshots(event.timestamp)
screenshots = self._collect_screenshots(event.timestamp, event.monitor_index)

last_event, last_screenshots = queue[-1] if queue else (None, None)
first_event, first_screenshots = queue[0] if queue else (None, None)
Expand Down Expand Up @@ -171,7 +171,7 @@ def _end_burst(self, agg_type: str, event: InputEvent, screenshot: Any) -> None:
current_burst_id = self._get_burst_id_for_type(agg_type)

# Get screenshot with padding after
end_screenshot = self._collect_end_screenshot(event.timestamp)
end_screenshot = self._collect_end_screenshot(event.timestamp, event.monitor_index)

request = self._create_request(
event=event,
Expand Down Expand Up @@ -255,21 +255,33 @@ def _get_burst_id_for_type(self, agg_type: str) -> int:
self.next_burst_id
)

def _collect_screenshots(self, timestamp: float) -> Any:
"""Get screenshot before timestamp."""
def _collect_screenshots(self, timestamp: float, monitor_index: int = None) -> Any:
"""Get screenshot before timestamp, preferring the given monitor."""
constants = constants_manager.get()
Comment on lines +258 to 260
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type hint mismatch: monitor_index defaults to None but is annotated as int. Use Optional[int] (and propagate the same change to _collect_end_screenshot) to reflect actual usage and avoid confusing static analysis.

Copilot uses AI. Check for mistakes.
start_candidates = self.image_queue.get_entries_before(
timestamp, milliseconds=constants.PADDING_BEFORE
)
return start_candidates[-1] if start_candidates else None

def _collect_end_screenshot(self, timestamp: float) -> Any:
"""Get screenshot after timestamp with padding."""
if not start_candidates:
return None
if monitor_index is not None:
matching = [s for s in start_candidates if s.monitor_index == monitor_index]
if matching:
return matching[-1]
return start_candidates[-1]
Comment on lines +266 to +270
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list comprehension allocates a new list of all matching screenshots just to pick the last one. You can avoid the extra allocation by iterating start_candidates in reverse and returning the first match, which is both faster and uses less memory.

Copilot uses AI. Check for mistakes.

def _collect_end_screenshot(self, timestamp: float, monitor_index: int = None) -> Any:
"""Get screenshot after timestamp with padding, preferring the given monitor."""
constants = constants_manager.get()
exact_candidates = self.image_queue.get_entries_after(
timestamp, milliseconds=constants.PADDING_AFTER
)
return exact_candidates[-1] if exact_candidates else None
if not exact_candidates:
return None
if monitor_index is not None:
matching = [s for s in exact_candidates if s.monitor_index == monitor_index]
if matching:
return matching[-1]
return exact_candidates[-1]
Comment on lines +280 to +284
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: building matching allocates a list when you only need a single element. Iterate exact_candidates in reverse and return the first matching monitor entry to reduce allocations.

Copilot uses AI. Check for mistakes.

def _save_event_to_jsonl(self, event: InputEvent) -> None:
if self.session_dir:
Expand Down