Skip to content

feat(iii-aura): add on-device multimodal voice + vision worker#19

Open
rohitg00 wants to merge 5 commits intomainfrom
feat/iii-aura
Open

feat(iii-aura): add on-device multimodal voice + vision worker#19
rohitg00 wants to merge 5 commits intomainfrom
feat/iii-aura

Conversation

@rohitg00
Copy link
Copy Markdown
Contributor

@rohitg00 rohitg00 commented Apr 8, 2026

Summary

  • Adds iii-aura, a real-time on-device multimodal AI assistant (voice + vision) wired entirely through iii primitives
  • Python worker runs Gemma 4 E2B for speech/vision understanding and Kokoro TTS for voice output, orchestrated through channels, state, and triggers
  • Browser connects as a full iii worker via iii-browser-sdk — registers functions, creates channels, handles playback

What's included

Component Description
python-worker/ Gemma 4 E2B inference + Kokoro TTS, registers aura::session::open, aura::ingest::turn, aura::interrupt
browser/ Vite app with VAD, camera capture, audio playback — registers ui::aura::transcript, ui::aura::playback
iii-config.example.yaml Engine config with dual worker-managers (internal + RBAC), state, HTTP, observability
README.md Architecture diagram, quick start, env vars, extending guide

iii primitives used

  • iii-browser-sdk — browser acts as a full worker
  • Channels — binary audio streaming (browser ↔ worker)
  • State — session metadata persistence
  • Triggers (Void) — fire-and-forget push from worker → browser
  • iii-worker-manager (2 ports) — internal for Python, RBAC-filtered for browser

Test plan

  • uv sync && uv run iii-aura starts and connects to the engine
  • Browser at localhost:5180 connects, opens session, captures voice + camera
  • LLM inference returns transcription + response via ui::aura::transcript
  • TTS audio streams back via channel and plays in browser
  • Barge-in (speaking during playback) interrupts and restarts listening

Summary by CodeRabbit

  • New Features

    • Introduced iii-aura: on-device multimodal voice+vision app with browser UI, camera + waveform visualization, VAD-based capture, barge-in support, real-time transcription, AI responses, and sentence-by-sentence streamed TTS playback.
  • Documentation

    • Added a comprehensive README with architecture, quick-start commands, env config, session/queue guidance, and extension tips.
  • Chores

    • Added packaging, example runtime config, browser build tooling, Python worker packaging, and a .gitignore to exclude local/generated artifacts.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Adds Aura: an on‑device multimodal voice+vision system with a browser UI, a Python worker for LLM inference and TTS, example iii engine config, and supporting build/package files to enable audio/image capture, channel-based ingestion, and streamed sentence-by-sentence TTS playback.

Changes

Cohort / File(s) Summary
Project metadata & docs
iii-aura/.gitignore, iii-aura/README.md, iii-aura/iii-config.example.yaml
Adds gitignore, README documenting architecture, flows, env vars, quickstart, and an example iii engine YAML (workers, state, HTTP, RBAC, observability, optional queue).
Browser app & build
iii-aura/browser/index.html, iii-aura/browser/package.json, iii-aura/browser/tsconfig.json, iii-aura/browser/vite.config.ts
New Vite/ESM browser project with TypeScript config, dev server port/open, package deps, and a dark, mobile-friendly HTML UI scaffold (video, waveform, messages, status controls).
Browser runtime
iii-aura/browser/src/aura.ts
Adds export async function init() that binds DOM, starts camera/mic capture, AnalyserNode waveform drawing, Silero VAD, connects via iii-browser-sdk, registers ui::aura::transcript/ui::aura::playback, opens sessions, sends channels for ingestion, and handles barge-in/interrupt and streamed playback.
Python packaging
iii-aura/python-worker/pyproject.toml, iii-aura/python-worker/src/iii_aura/__init__.py
Adds Python package metadata (hatchling), dependencies, optional platform groups, console entry iii-aura, and package docstring.
Inference module
iii-aura/python-worker/src/iii_aura/inference.py
Adds model path resolution (env or HF), Gemma-4 engine lifecycle (load/unload), global engine and tool_result, and respond_to_user() to record tool outputs.
TTS module
iii-aura/python-worker/src/iii_aura/tts.py
Introduces TTSBackend interface plus MLXBackend (Apple Silicon) and ONNXBackend (fallback), platform detection, HF/MLX model loading, and load() to select backend and expose sample rate.
Python worker core
iii-aura/python-worker/src/iii_aura/worker.py
Adds worker main registering aura::session::open, aura::ingest::turn, aura::interrupt, aura::session::close; manages per-session interrupt events, custom executor, multimodal prompt assembly, LLM inference usage, emits ui::aura::transcript, streams sentence-by-sentence TTS over playback channels, and installs an HTTP trigger.

Sequence Diagram

sequenceDiagram
    participant Browser as Browser Client
    participant VAD as VAD Detector
    participant Engine as iii Engine
    participant Worker as Python Worker
    participant LLM as LLM Inference
    participant TTS as TTS Generator

    Browser->>VAD: Capture mic audio stream
    VAD-->>Browser: Speech start/end events
    Browser->>Browser: Convert to WAV, capture optional camera frame
    Browser->>Engine: Create channel, publish audio + image metadata
    Browser->>Engine: Trigger aura::ingest::turn
    Engine->>Worker: Deliver turn data (channel reader)
    Worker->>LLM: Run multimodal inference (transcription + response)
    LLM-->>Worker: Return text response
    Worker->>Engine: Emit ui::aura::transcript
    Worker->>TTS: Generate audio per sentence
    TTS-->>Worker: Stream PCM chunks
    Worker->>Engine: Stream playback channel audio chunks
    Engine-->>Browser: Browser receives audio chunks
    Browser->>Browser: Play audio via WebAudio
    Browser->>Engine: Trigger aura::interrupt (barge-in)
    Engine->>Worker: Signal cancellation for in-flight inference/TTS
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

"I hop and hum where audio meets sight,
Channels flow softly through day and night.
From browser burrow to worker lair,
Multimodal magic fills the air. 🐇✨"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.10% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding an on-device multimodal voice and vision worker to the iii framework.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/iii-aura

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (5)
iii-aura/README.md (1)

7-19: Add language identifier to the fenced code block.

The architecture diagram code block is missing a language specifier. While this is an ASCII diagram rather than code, adding a language identifier (or text/plaintext) satisfies the markdown linter and improves rendering consistency.

Suggested fix
-```
+```text
 Browser (iii-browser-sdk)          iii Engine               Python Worker (iii-sdk)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iii-aura/README.md` around lines 7 - 19, The fenced ASCII diagram block in
the README (the triple-backtick block containing "Browser (iii-browser-sdk)     
iii Engine               Python Worker (iii-sdk)" and the diagram lines) lacks a
language identifier; update that opening fence to include a plain text language
(e.g., "text" or "plaintext") so the markdown linter accepts it and rendering
stays consistent—locate the triple-backtick block in README.md and change ``` to
```text (or ```plaintext).
iii-aura/python-worker/src/iii_aura/worker.py (3)

90-91: Document why the localhost→127.0.0.1 URL replacement is needed.

This URL replacement appears in two places (lines 90-91 and 177-180). Consider extracting this to a helper function and adding a comment explaining why this workaround is necessary (e.g., DNS resolution issues in certain container environments).

Suggested refactor
+def _fix_localhost_url(obj: Any, attr: str) -> None:
+    """Replace 'localhost' with '127.0.0.1' in channel URLs.
+    
+    Workaround for DNS resolution issues in certain container/network configurations
+    where 'localhost' may not resolve correctly.
+    """
+    if hasattr(obj, attr):
+        url = getattr(obj, attr)
+        if '://localhost' in url:
+            setattr(obj, attr, url.replace('://localhost', '://127.0.0.1'))
+

 async def _ingest_turn(data: dict[str, Any]) -> dict[str, Any]:
     # ...
-    if hasattr(reader, '_url'):
-        reader._url = reader._url.replace('://localhost', '://127.0.0.1')
+    _fix_localhost_url(reader, '_url')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iii-aura/python-worker/src/iii_aura/worker.py` around lines 90 - 91, Extract
the repeated localhost→127.0.0.1 replacement into a small helper (e.g.,
normalize_localhost_url or replace_localhost_with_loopback) that accepts the
reader object, checks hasattr(reader, '_url'), and updates reader._url =
reader._url.replace('://localhost', '://127.0.0.1'); replace the two inline
blocks with calls to that helper and add a concise comment on the helper
explaining the rationale (workaround for DNS/localhost resolution issues in some
container/network environments where localhost does not resolve or bypasses
network stack), so both occurrences (the current inline replacement around
reader._url at the two locations) are consolidated and documented.

146-149: Replace lambda assignment with a named function.

Per Ruff E731, lambda expressions should not be assigned to variables. Use a def statement instead for better readability and debugging.

Suggested fix
+def _strip_markers(s: str) -> str:
+    return s.replace('<|"|>', "").strip()
+
     response = await loop.run_in_executor(_executor, _infer)
     llm_time = time.time() - t0

-    strip = lambda s: s.replace('<|"|>', "").strip()
     if inference.tool_result:
-        transcription = strip(inference.tool_result.get("transcription", ""))
-        text_response = strip(inference.tool_result.get("response", ""))
+        transcription = _strip_markers(inference.tool_result.get("transcription", ""))
+        text_response = _strip_markers(inference.tool_result.get("response", ""))
     else:
         transcription = None
         text_response = response["content"][0]["text"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iii-aura/python-worker/src/iii_aura/worker.py` around lines 146 - 149, The
code assigns a lambda to the name `strip` which violates Ruff E731; replace the
lambda assignment with a proper function definition (e.g., define def strip(s):
return s.replace('<|"|>', "").strip()) and keep the subsequent usage unchanged
where `transcription = strip(inference.tool_result.get("transcription", ""))`
and `text_response = strip(inference.tool_result.get("response", ""))`; ensure
the new `strip` function is defined in the same scope so calls from the
`inference.tool_result` handling still work.

127-127: Use asyncio.get_running_loop() instead of deprecated get_event_loop().

asyncio.get_event_loop() is deprecated since Python 3.10 when called from a coroutine. Use asyncio.get_running_loop() which is the recommended approach in async contexts.

Suggested fix
-    loop = asyncio.get_event_loop()
+    loop = asyncio.get_running_loop()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iii-aura/python-worker/src/iii_aura/worker.py` at line 127, Replace the
deprecated asyncio.get_event_loop() call used to assign loop (the line creating
the variable named loop) with asyncio.get_running_loop() so the code uses the
running event loop in async contexts; ensure this change is made where loop is
set (in worker.py) and only called from within a coroutine or async function (or
adjust the caller to obtain the loop in an appropriate async context) to avoid
RuntimeError.
iii-aura/python-worker/src/iii_aura/tts.py (1)

42-47: Consider using the sample rate returned by the model instead of hardcoding.

ONNXBackend hardcodes sample_rate = 24000 but the create() method returns the actual sample rate as _sr which is discarded. If the model version changes, this could cause audio playback issues.

Suggested fix
 class ONNXBackend(TTSBackend):
     def __init__(self):
         import kokoro_onnx  # type: ignore[import-not-found]
         from huggingface_hub import hf_hub_download

         model_path = hf_hub_download("fastrtc/kokoro-onnx", "kokoro-v1.0.onnx")
         voices_path = hf_hub_download("fastrtc/kokoro-onnx", "voices-v1.0.bin")

         self._model = kokoro_onnx.Kokoro(model_path, voices_path)
-        self.sample_rate = 24000
+        # Initialize with expected rate; will be updated on first generate() if different
+        self.sample_rate = 24000

     def generate(self, text: str, voice: str = "af_heart", speed: float = 1.1) -> np.ndarray:
         pcm, _sr = self._model.create(text, voice=voice, speed=speed)
+        if _sr != self.sample_rate:
+            self.sample_rate = _sr
         return pcm
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iii-aura/python-worker/src/iii_aura/tts.py` around lines 42 - 47, The class
currently hardcodes self.sample_rate = 24000 but the model returns the actual
rate from _model.create(text, ...). Change the code so the sample rate comes
from the model: use the _sr returned by kokoro_onnx.Kokoro.create (called in
generate) to set self.sample_rate (or initialize it from the model if
kokoro_onnx exposes a sample rate) instead of keeping the fixed 24000; update
the generate method (and/or __init__) to assign self.sample_rate = _sr after
calling _model.create so playback uses the model-provided rate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@iii-aura/browser/index.html`:
- Around line 348-349: Remove the two CDN script tags that load onnxruntime-web
and `@ricky0123/vad-web` from iii-aura/browser/index.html and instead add them as
npm dependencies (onnxruntime-web and `@ricky0123/vad-web`), import the modules
from those package names in your front-end entry module (e.g., main.ts or
main.js) so Vite will bundle them, and update any global references to use the
imported symbols (e.g., replace window.ort or window.VAD usage with the local
imports) ensuring Vite config does not exclude these packages from the build.

In `@iii-aura/browser/src/aura.ts`:
- Around line 371-377: The addMessage function currently assigns untrusted
`text` and `meta` into `div.innerHTML`, creating an XSS risk; change it to
create and append child nodes instead: set `div.className` as-is, create a
content node and assign `text` to its `textContent` (or use a separate
`renderHtml` boolean parameter to opt into sanitized HTML rendering), create a
separate `metaEl` element and set its `textContent` to `meta` before appending,
then append both to `messagesEl` and update scroll; ensure any caller that
intentionally passes HTML (e.g., the loading-dots usage) either passes
`renderHtml: true` and the content is sanitized or is refactored to use a
dedicated HTML-only helper so untrusted LLM responses always use `textContent`.
- Around line 69-76: The code assigns untrusted data.transcription directly to
lastUserMsg.innerHTML causing an XSS risk; change this to set the transcription
as plain text (use lastUserMsg.textContent or createTextNode) and preserve the
existing meta element by locating meta (from lastUserMsg.querySelector('.meta'))
and appending meta.cloneNode(true) after the text node (or re-attach the
original meta element) instead of concatenating strings into innerHTML; update
the block that references messagesEl, lastUserMsg, meta and innerHTML to use
textContent/createTextNode + appendChild(cloneNode) so transcription is not
parsed as HTML.
- Around line 100-114: The audio playback relies only on receiving the
'audio_end' message via reader.onMessage (inside the audio_end branch), so if
that message is lost the UI can remain in 'speaking' state; add a timeout-based
fallback: when you start playback (where ignoreIncomingAudio is set/stopPlayback
is called), start or reset a playback timeout (e.g., playbackTimeout) and clear
it when reader.onMessage sees parsed.type === 'audio_end' (and when stopPlayback
is called), and on timeout expiry call stopPlayback() and setState('listening')
to recover; ensure the timeout is canceled/cleared whenever ignoreIncomingAudio
is toggled or normal audio_end processing runs to avoid double-handling.

In `@iii-aura/iii-config.example.yaml`:
- Around line 36-44: The CORS default is too permissive: when host is set to
0.0.0.0 and the config key allowed_origins contains "*" any website can call the
HTTP API from browsers; change the example in iii-config.example.yaml so
allowed_origins no longer uses a wildcard and instead restricts to safe defaults
(e.g., localhost entries such as "http://localhost:3111" and
"http://127.0.0.1:3111" or empty list), leaving allowed_methods (GET, POST,
OPTIONS) unchanged; update the allowed_origins value and add a short comment
next to host/port to indicate these are localhost-only defaults for development.

In `@iii-aura/python-worker/pyproject.toml`:
- Around line 10-24: The project is missing the huggingface-hub dependency
required by iii_aura/inference.py and iii_aura/tts.py which import
hf_hub_download; add "huggingface-hub>=0.16.0" to the main dependencies list in
pyproject.toml (alongside "iii-sdk", "litert-lm>=0.10.1", "numpy>=2.0") so fresh
installs won't raise ModuleNotFoundError, using the permissive lower bound
suggested instead of the overly restrictive >=0.23.0.

In `@iii-aura/python-worker/src/iii_aura/inference.py`:
- Around line 44-56: The load() function calls engine.__enter__() on a
litert_lm.Engine instance (created via litert_lm.Engine(...)) but never releases
it; add a corresponding cleanup path by implementing an unload() that calls
engine.__exit__(None, None, None) (and sets engine = None) and register it with
atexit.register(unload) after engine is created, or alternatively ensure
whatever lifecycle manager calls resolve_model_path()/load() will invoke
engine.__exit__(); reference the load(), engine, engine.__enter__(),
engine.__exit__(None, None, None), and atexit to locate the change.

In `@iii-aura/python-worker/src/iii_aura/worker.py`:
- Line 33: The _interrupts dict currently stores asyncio.Event per session and
is never cleaned up; add explicit cleanup to remove the session key when a
session or turn completes (e.g., after handle_turn or at session close) or
implement a session close function that deletes _interrupts[session_id] (and
cancels/sets the Event if needed) to avoid memory leaks; update functions that
create entries (where _interrupts[session_id] = asyncio.Event()) to ensure
corresponding removal and defensively check presence before use.

---

Nitpick comments:
In `@iii-aura/python-worker/src/iii_aura/tts.py`:
- Around line 42-47: The class currently hardcodes self.sample_rate = 24000 but
the model returns the actual rate from _model.create(text, ...). Change the code
so the sample rate comes from the model: use the _sr returned by
kokoro_onnx.Kokoro.create (called in generate) to set self.sample_rate (or
initialize it from the model if kokoro_onnx exposes a sample rate) instead of
keeping the fixed 24000; update the generate method (and/or __init__) to assign
self.sample_rate = _sr after calling _model.create so playback uses the
model-provided rate.

In `@iii-aura/python-worker/src/iii_aura/worker.py`:
- Around line 90-91: Extract the repeated localhost→127.0.0.1 replacement into a
small helper (e.g., normalize_localhost_url or replace_localhost_with_loopback)
that accepts the reader object, checks hasattr(reader, '_url'), and updates
reader._url = reader._url.replace('://localhost', '://127.0.0.1'); replace the
two inline blocks with calls to that helper and add a concise comment on the
helper explaining the rationale (workaround for DNS/localhost resolution issues
in some container/network environments where localhost does not resolve or
bypasses network stack), so both occurrences (the current inline replacement
around reader._url at the two locations) are consolidated and documented.
- Around line 146-149: The code assigns a lambda to the name `strip` which
violates Ruff E731; replace the lambda assignment with a proper function
definition (e.g., define def strip(s): return s.replace('<|"|>', "").strip())
and keep the subsequent usage unchanged where `transcription =
strip(inference.tool_result.get("transcription", ""))` and `text_response =
strip(inference.tool_result.get("response", ""))`; ensure the new `strip`
function is defined in the same scope so calls from the `inference.tool_result`
handling still work.
- Line 127: Replace the deprecated asyncio.get_event_loop() call used to assign
loop (the line creating the variable named loop) with asyncio.get_running_loop()
so the code uses the running event loop in async contexts; ensure this change is
made where loop is set (in worker.py) and only called from within a coroutine or
async function (or adjust the caller to obtain the loop in an appropriate async
context) to avoid RuntimeError.

In `@iii-aura/README.md`:
- Around line 7-19: The fenced ASCII diagram block in the README (the
triple-backtick block containing "Browser (iii-browser-sdk)          iii Engine 
Python Worker (iii-sdk)" and the diagram lines) lacks a language identifier;
update that opening fence to include a plain text language (e.g., "text" or
"plaintext") so the markdown linter accepts it and rendering stays
consistent—locate the triple-backtick block in README.md and change ``` to
```text (or ```plaintext).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b4445f09-3423-4303-ba5b-2cd7067d4e36

📥 Commits

Reviewing files that changed from the base of the PR and between b0f6adc and 7aed39e.

📒 Files selected for processing (13)
  • iii-aura/.gitignore
  • iii-aura/README.md
  • iii-aura/browser/index.html
  • iii-aura/browser/package.json
  • iii-aura/browser/src/aura.ts
  • iii-aura/browser/tsconfig.json
  • iii-aura/browser/vite.config.ts
  • iii-aura/iii-config.example.yaml
  • iii-aura/python-worker/pyproject.toml
  • iii-aura/python-worker/src/iii_aura/__init__.py
  • iii-aura/python-worker/src/iii_aura/inference.py
  • iii-aura/python-worker/src/iii_aura/tts.py
  • iii-aura/python-worker/src/iii_aura/worker.py

Comment thread iii-aura/browser/index.html Outdated
Comment thread iii-aura/browser/src/aura.ts
Comment thread iii-aura/browser/src/aura.ts
Comment thread iii-aura/browser/src/aura.ts
Comment thread iii-aura/iii-config.example.yaml
Comment thread iii-aura/python-worker/pyproject.toml
Comment thread iii-aura/python-worker/src/iii_aura/inference.py
Comment thread iii-aura/python-worker/src/iii_aura/worker.py
Real-time voice and vision AI assistant powered by Gemma 4 E2B and
Kokoro TTS, orchestrated entirely through iii primitives. Browser
connects via iii-browser-sdk, Python worker handles LLM inference
and streaming TTS over channels.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
iii-aura/python-worker/src/iii_aura/worker.py (1)

47-59: Fragile workaround accessing CPython internals.

The _SafeExecutor workaround for litert_lm poisoning concurrent.futures.thread._shutdown is well-documented but relies on CPython implementation details. This could break with Python version updates. Consider adding a comment noting which Python versions this has been tested with, and monitor for breakage when upgrading.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iii-aura/python-worker/src/iii_aura/worker.py` around lines 47 - 59, Add a
short, clear comment above the _SafeExecutor class (and near the _cft._shutdown
write in submit) documenting which CPython versions this workaround was tested
on and warning that it relies on CPython internals
(concurrent.futures.thread._shutdown) and may break on upgrades; also add a TODO
to monitor or remove this workaround when upgrading Python and reference the
symbols _SafeExecutor, submit, _cft._shutdown, and the module-level _executor so
reviewers can find the implementation easily.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@iii-aura/python-worker/src/iii_aura/worker.py`:
- Around line 158-161: The else branch assumes response["content"][0]["text"]
exists, which can raise KeyError/IndexError when inference.tool_result is falsy
or response has no content; update the logic in the worker processing where
transcription and text_response are set (variables: transcription,
text_response, response, inference.tool_result) to defensively validate response
is a dict with a non-empty "content" list and that the first item has a "text"
key before accessing it, and if validation fails set text_response to a safe
default (e.g., empty string or an error message) and log a warning/error so the
turn fails gracefully instead of raising an exception.
- Around line 282-287: The main() function currently registers handlers (e.g.,
via iii.on_functions_available) then returns, allowing the process to exit while
async handlers still expect a running event loop; add a blocking mechanism in
main() using a threading.Event (e.g., stop = threading.Event()), register signal
handlers for SIGTERM and SIGINT that call stop.set(), call stop.wait() to block
the main thread, and on unblock call iii.shutdown() to cleanly stop the worker
so async handlers can run without the process exiting prematurely.

---

Nitpick comments:
In `@iii-aura/python-worker/src/iii_aura/worker.py`:
- Around line 47-59: Add a short, clear comment above the _SafeExecutor class
(and near the _cft._shutdown write in submit) documenting which CPython versions
this workaround was tested on and warning that it relies on CPython internals
(concurrent.futures.thread._shutdown) and may break on upgrades; also add a TODO
to monitor or remove this workaround when upgrading Python and reference the
symbols _SafeExecutor, submit, _cft._shutdown, and the module-level _executor so
reviewers can find the implementation easily.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c66c2808-36eb-4360-9899-36a6cf217aeb

📥 Commits

Reviewing files that changed from the base of the PR and between 7aed39e and 7f3f8b1.

📒 Files selected for processing (13)
  • iii-aura/.gitignore
  • iii-aura/README.md
  • iii-aura/browser/index.html
  • iii-aura/browser/package.json
  • iii-aura/browser/src/aura.ts
  • iii-aura/browser/tsconfig.json
  • iii-aura/browser/vite.config.ts
  • iii-aura/iii-config.example.yaml
  • iii-aura/python-worker/pyproject.toml
  • iii-aura/python-worker/src/iii_aura/__init__.py
  • iii-aura/python-worker/src/iii_aura/inference.py
  • iii-aura/python-worker/src/iii_aura/tts.py
  • iii-aura/python-worker/src/iii_aura/worker.py
✅ Files skipped from review due to trivial changes (8)
  • iii-aura/python-worker/src/iii_aura/init.py
  • iii-aura/browser/vite.config.ts
  • iii-aura/browser/tsconfig.json
  • iii-aura/browser/package.json
  • iii-aura/iii-config.example.yaml
  • iii-aura/python-worker/pyproject.toml
  • iii-aura/README.md
  • iii-aura/.gitignore
🚧 Files skipped from review as they are similar to previous changes (2)
  • iii-aura/python-worker/src/iii_aura/inference.py
  • iii-aura/python-worker/src/iii_aura/tts.py

Comment thread iii-aura/python-worker/src/iii_aura/worker.py
Comment thread iii-aura/python-worker/src/iii_aura/worker.py
@rohitg00
Copy link
Copy Markdown
Contributor Author

Heads up — main is being bumped to iii-sdk ==0.11.3 in #33. Python pin is currently iii-sdk>=0.11.0.dev9. Please rebase and bump to iii-sdk==0.11.3 so the worker locks to a stable release.

Release notes: https://github.com/iii-hq/iii/releases/tag/iii/v0.11.3

…rocess lifecycle

Browser:
- Move onnxruntime-web and @ricky0123/vad-web from CDN <script> tags to
  npm deps (bundled by Vite). Removes third-party-CDN runtime dependency
  and undermines the 'on-device' claim less.
- addMessage: drop the opt-in 'html' flag; always use textContent for the
  LLM-generated text and meta. Replace the one caller that relied on html
  with a dedicated addLoadingMessage() that builds the dots via DOM nodes
  — closes the innerHTML XSS vector when an LLM returns HTML.
- ui::aura::playback: add a 60s playback timeout that fires stopPlayback +
  setState('listening') if audio_end never arrives (dropped control msg).

Python worker:
- Defensive read of response['content'][0]['text'] — empty string +
  warning log when the LLM response shape doesn't match, so the turn
  completes instead of raising KeyError/IndexError.
- main(): block on threading.Event + SIGTERM/SIGINT handlers instead of
  returning after register_function. Previously the process exited right
  after registration and async handlers never ran. Shutdown path calls
  iii.shutdown() on signal.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@iii-aura/browser/src/aura.ts`:
- Around line 197-211: The await call to iii.trigger for function_id
'aura::ingest::turn' can resolve without starting ui::aura::playback, leaving
the UI stuck in 'processing'; after awaiting iii.trigger(...) inspect the
returned result (or check appState) and if it indicates no playback started
(e.g., result.interrupted or result.error or appState still === 'processing')
call setState('listening') and setStatus('connected','Connected') so the UI
transitions back; update the code around the iii.trigger call in aura.ts to
handle both rejected and resolved non-playback outcomes by performing this check
and state reset.
- Around line 85-139: The playback handler can leave the module stuck with
ignoreIncomingAudio=true; to fix it always register the reader listeners inside
the ui::aura::playback handler (register reader.onMessage and reader.onBinary
regardless of the initial ignoreIncomingAudio) so the audio_end message can be
consumed and clear ignoreIncomingAudio, and modify the playback timeout callback
(and clearPlaybackTimeout path) to also set ignoreIncomingAudio = false before
calling stopPlayback() and setState('listening'); ensure the early-return only
prevents queueAudioChunk in reader.onBinary (i.e., check ignoreIncomingAudio
inside that callback) rather than returning from the whole ui::aura::playback
function.

In `@iii-aura/python-worker/src/iii_aura/worker.py`:
- Around line 110-114: The code calls json.loads(messages[0]) and immediately
.get("image"), which will raise AttributeError if the JSON parses to a non-dict
(e.g., string, number, null, list); update the logic around
messages[0]/json.loads to validate the parsed value is a dict before calling
.get (or use isinstance(parsed, dict) and only then set image_b64 =
parsed.get("image")), and ensure image_b64 is safely defaulted/handled when the
payload is not a dict so the surrounding function (in worker.py where
messages[0], json.loads, and image_b64 are used) does not abort the turn.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 43983ec7-8446-475b-af91-98d125eae830

📥 Commits

Reviewing files that changed from the base of the PR and between 7f3f8b1 and 1444c78.

📒 Files selected for processing (5)
  • iii-aura/browser/index.html
  • iii-aura/browser/package.json
  • iii-aura/browser/src/aura.ts
  • iii-aura/python-worker/pyproject.toml
  • iii-aura/python-worker/src/iii_aura/worker.py
✅ Files skipped from review due to trivial changes (1)
  • iii-aura/python-worker/pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • iii-aura/browser/package.json

Comment on lines +85 to +139
iii.registerFunction(
'ui::aura::playback',
async (data: { reader: ChannelReader; sample_rate: number; sentence_count: number }) => {
if (ignoreIncomingAudio) return null

streamSampleRate = data.sample_rate || 24000
startStreamPlayback()

const reader = data.reader

// Fallback in case the `audio_end` control message is dropped on the
// channel — without it the UI would be stuck in 'speaking'. Timeout
// forces the same cleanup path that audio_end would have taken.
const PLAYBACK_TIMEOUT_MS = 60_000
let playbackTimeout: ReturnType<typeof setTimeout> | null = setTimeout(() => {
console.warn('ui::aura::playback: no audio_end within timeout, resetting')
stopPlayback()
setState('listening')
playbackTimeout = null
}, PLAYBACK_TIMEOUT_MS)
const clearPlaybackTimeout = () => {
if (playbackTimeout) {
clearTimeout(playbackTimeout)
playbackTimeout = null
}
}

reader.onBinary((pcmBytes: Uint8Array) => {
if (ignoreIncomingAudio) return
const int16 = new Int16Array(pcmBytes.buffer, pcmBytes.byteOffset, pcmBytes.byteLength / 2)
const float32 = new Float32Array(int16.length)
for (let i = 0; i < int16.length; i++) float32[i] = int16[i] / 32768
queueAudioChunk(float32)
})

reader.onMessage((msg: string) => {
try {
const parsed = JSON.parse(msg)
if (parsed.type === 'audio_end') {
clearPlaybackTimeout()
if (ignoreIncomingAudio) {
ignoreIncomingAudio = false
stopPlayback()
setState('listening')
return
}
const meta = messagesEl.querySelector('.msg.assistant:last-child .meta')
if (meta) meta.textContent += ` · TTS ${parsed.tts_time}s`
}
} catch { /* ignore */ }
})

return null
},
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ignoreIncomingAudio can get permanently stuck, silently dropping all future playback.

The flag is only cleared inside the audio_end branch of onMessage (line 126), but two paths bypass that:

  1. Early return at line 88: if ignoreIncomingAudio is already true when ui::aura::playback is invoked (barge-in fired before playback trigger arrived), the handler returns immediately without ever registering onBinary/onMessage. The channel's audio_end message is never consumed, so the flag stays true, and every subsequent TTS response is silently dropped.
  2. Timeout at lines 99–104: if audio_end is lost, the timeout resets state to listening but does not clear ignoreIncomingAudio. If the timeout happens after a barge-in where audio_end truly went missing, the flag is stuck true indefinitely.
🛠️ Proposed fix
   iii.registerFunction(
     'ui::aura::playback',
     async (data: { reader: ChannelReader; sample_rate: number; sentence_count: number }) => {
-      if (ignoreIncomingAudio) return null
+      if (ignoreIncomingAudio) {
+        // Drain/close the reader and reset the flag so future turns work.
+        ignoreIncomingAudio = false
+        try { data.reader.onBinary(() => {}) } catch { /* ignore */ }
+        return null
+      }

       streamSampleRate = data.sample_rate || 24000
       startStreamPlayback()

       const reader = data.reader

       // Fallback in case the `audio_end` control message is dropped on the
       // channel — without it the UI would be stuck in 'speaking'. Timeout
       // forces the same cleanup path that audio_end would have taken.
       const PLAYBACK_TIMEOUT_MS = 60_000
       let playbackTimeout: ReturnType<typeof setTimeout> | null = setTimeout(() => {
         console.warn('ui::aura::playback: no audio_end within timeout, resetting')
+        ignoreIncomingAudio = false
         stopPlayback()
         setState('listening')
         playbackTimeout = null
       }, PLAYBACK_TIMEOUT_MS)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iii-aura/browser/src/aura.ts` around lines 85 - 139, The playback handler can
leave the module stuck with ignoreIncomingAudio=true; to fix it always register
the reader listeners inside the ui::aura::playback handler (register
reader.onMessage and reader.onBinary regardless of the initial
ignoreIncomingAudio) so the audio_end message can be consumed and clear
ignoreIncomingAudio, and modify the playback timeout callback (and
clearPlaybackTimeout path) to also set ignoreIncomingAudio = false before
calling stopPlayback() and setState('listening'); ensure the early-return only
prevents queueAudioChunk in reader.onBinary (i.e., check ignoreIncomingAudio
inside that callback) rather than returning from the whole ui::aura::playback
function.

Comment on lines +197 to +211
try {
await iii.trigger({
function_id: 'aura::ingest::turn',
payload: {
session_id: sessionId,
reader: channel.readerRef,
has_image: hasImage,
},
})
} catch (err) {
console.error('Ingest failed', err)
setState('listening')
setStatus('connected', 'Connected')
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

UI can stall in processing when the backend returns without streaming playback.

If aura::ingest::turn resolves successfully with a result that doesn't trigger ui::aura::playback (e.g., the worker returns {"interrupted": true} at line 181/195 or {"error": "no_input"} at line 135 in worker.py), no state transition runs. The UI stays at processing forever since only the catch branch transitions to listening.

Consider resetting state to listening when the trigger resolves without starting playback (e.g., inspect the result, or switch to listening if appState is still processing after the await).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iii-aura/browser/src/aura.ts` around lines 197 - 211, The await call to
iii.trigger for function_id 'aura::ingest::turn' can resolve without starting
ui::aura::playback, leaving the UI stuck in 'processing'; after awaiting
iii.trigger(...) inspect the returned result (or check appState) and if it
indicates no playback started (e.g., result.interrupted or result.error or
appState still === 'processing') call setState('listening') and
setStatus('connected','Connected') so the UI transitions back; update the code
around the iii.trigger call in aura.ts to handle both rejected and resolved
non-playback outcomes by performing this check and state reset.

Comment on lines +110 to +114
if messages:
try:
image_b64 = json.loads(messages[0]).get("image")
except json.JSONDecodeError:
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

messages[0] could parse to a non-dict and raise AttributeError.

json.loads(messages[0]) only has JSONDecodeError handled, but a legitimately-parseable payload like "foo", null, [], or 42 will succeed and then .get("image") will raise AttributeError, aborting the whole turn.

🛠️ Proposed fix
     image_b64: str | None = None
     if messages:
         try:
-            image_b64 = json.loads(messages[0]).get("image")
-        except json.JSONDecodeError:
-            pass
+            parsed = json.loads(messages[0])
+            if isinstance(parsed, dict):
+                img = parsed.get("image")
+                if isinstance(img, str):
+                    image_b64 = img
+        except json.JSONDecodeError:
+            pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@iii-aura/python-worker/src/iii_aura/worker.py` around lines 110 - 114, The
code calls json.loads(messages[0]) and immediately .get("image"), which will
raise AttributeError if the JSON parses to a non-dict (e.g., string, number,
null, list); update the logic around messages[0]/json.loads to validate the
parsed value is a dict before calling .get (or use isinstance(parsed, dict) and
only then set image_b64 = parsed.get("image")), and ensure image_b64 is safely
defaulted/handled when the payload is not a dict so the surrounding function (in
worker.py where messages[0], json.loads, and image_b64 are used) does not abort
the turn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant