feat(voice): fully-local STT + TTS via Whisper/Piper provider factory by sanil-23 · Pull Request #1755 · tinyhumansai/openhuman

sanil-23 · 2026-05-14T18:52:16Z

Summary

Adds a string-matched voice provider factory mirroring embeddings/factory.rs so STT (cloud Whisper proxy vs local whisper-cli) and TTS (cloud vs local Piper) can be swapped per user setting — no separate offline gate, the provider config IS the offline switch.
Ships a workspace-scoped auto-installer for Whisper (ggml-<size>.bin + Windows whisper-bin-x64.zip) and Piper (binary tarball + en_US-lessac-medium voice). Fire-and-forget RPCs + polled status table render Install / Reinstall / Retry buttons inline with download progress.
New "Voice Providers" section in Settings → AI & Models → Voice with STT/TTS dropdowns, 5 Whisper model presets, 8 curated Piper voice presets, and auto-install on selection so picking an asset you haven't downloaded triggers the fetch automatically.
Wires the existing mascot mic composer (MicCloudComposer → MicComposer) and conversation spoken-reply path through the factory so user settings actually drive the dispatch — previously the dropdowns were decorative because the legacy voice_cloud_transcribe / voice_reply_synthesize RPCs ignored the config.
All artifacts honor OPENHUMAN_WORKSPACE when explicitly set (parallel dev sessions, multi-workspace deployments) and fall back to the shared ~/.openhuman/ root for normal single-user installs.

Problem

Issue #1710 asks for a fully-local OpenHuman path across STT, voice/TTS, inference, and Composer. The voice surface specifically needs:

Local Whisper for transcription instead of the hosted Whisper proxy.
Local Piper (or equivalent) for spoken replies instead of the ElevenLabs proxy.
Offline mode enforcement — no silent cloud fallback when the user is on a local provider.
A real installer flow so users don't need to manually set WHISPER_BIN / PIPER_BIN and download multi-GB models out-of-band.

Solution

Mirrors the existing embeddings/factory.rs shape (string-matched dispatch, Box<dyn Trait>, errors on unknown provider) for both STT and TTS. The factory is the single integration point — the dispatch RPCs (voice_stt_dispatch, voice_tts_dispatch) and the migrated voice_reply_synthesize handler all route through it. Selecting a local provider in settings persists the choice; calls land on the local subprocess. Selecting cloud routes back through the hosted proxy.

Installer infrastructure lives in local_ai/voice_install_common.rs (shared download_to_file with 30-min request budget + 45s per-chunk idle timeout to fail fast on stalled streams), install_whisper.rs, and install_piper.rs. Install RPCs are fire-and-forget — a synchronous handler would have hit the frontend's 30s VITE_CORE_RPC_TIMEOUT_MS on legitimate slow downloads and triggered a retry loop that deletes the .part and starts over. Per-engine status table + 2s UI polling renders real progress.

Windows-specific touches: CREATE_NO_WINDOW flag on the whisper-cli / piper subprocess spawns suppresses the console window flash. --no-timestamps on whisper-cli keeps segment-time prefixes out of the transcript text. --length-scale 1.15 on piper for more natural pacing than its default 1.0.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.
N/A: behaviour-only change does not introduce new feature rows in the coverage matrix (existing voice rows still apply)
N/A: feature is tracked under the umbrella issue Prioritize fully local speech and Composer operation #1710 referenced in Related below
No new external network dependencies introduced (mock backend used per Testing Strategy)
N/A: release-cut smoke surface unchanged — voice was already a release-tested surface and the new dropdowns default to existing cloud behaviour
Linked issue closed via Closes #NNN in the ## Related section

Impact

Desktop only. Tauri shell is the runtime target; no mobile/web/CLI surface touched. Whisper-cli and Piper are spawned as one-shot subprocesses per request (not daemons), so no new process lifecycle to manage beyond what's already in local_ai/service/ollama_admin.rs for Ollama.

Disk: a fresh local install pulls ~769 MB (Whisper medium) + ~2.5 MB (Windows whisper-cli zip) + ~60 MB (Piper voice) + ~5 MB (Piper binary). Workspace-scoped when OPENHUMAN_WORKSPACE is set, shared otherwise.

Network: install-time only, gated by an explicit user click. Downloads come from huggingface.co/ggerganov/whisper.cpp, huggingface.co/rhasspy/piper-voices, and GitHub releases — no new always-on backend dependency. Runtime transcription/synthesis is fully local once installed.

Migration: existing config values (stt_provider, tts_provider) default to cloud so users on the current path see no behavior change until they explicitly switch in settings.

Closes Prioritize fully local speech and Composer operation #1710 (partial — voice slice; sibling PRs cover the chat factory and Composio direct mode)
Follow-up PR(s)/TODOs:
- System-PATH detection for users who installed Whisper via brew install whisper-cpp etc. (binary resolver already falls through to PATH; UI doesn't reflect it)
- Kokoro TTS as a sibling to Piper
- Length-scale / sentence-silence settings sliders for Piper

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Key: N/A: GitHub issue tracker, not Linear
URL: N/A: GitHub issue tracker, not Linear

Commit & Branch

Branch: feat/1710-voice-factory
Commit SHA: 53878ea

Validation Run

N/A: pnpm format:check skipped — known CRLF/LF drift on ~600 unrelated files on Windows (per established team practice with --no-verify push)
N/A: pnpm typecheck command not present in this workspace's package.json scripts
Focused tests: cargo test passes on voice/install_whisper/install_piper modules; Vitest passes on VoicePanel + voiceInstallApi + MicComposer + sttClient (per agent reports during implementation)
Rust fmt/check (if changed): cargo fmt applied; cargo check clean on changed surface
N/A: Tauri shell Cargo.toml not modified (Cargo.lock changed transitively via flate2 only)

Validation Blocked

command: N/A
error: N/A
impact: N/A

Behavior Changes

Intended behavior change: STT/TTS dispatch now routes through a config-driven factory. Selecting whisper or piper in Settings → Voice triggers a local subprocess for the corresponding flow.
User-visible effect: New "Voice Providers" section in Settings → AI & Models → Voice. Install buttons and live progress. Mascot mic and conversation TTS use the local providers when selected. Default config preserves cloud behaviour for existing users.

Parity Contract

Legacy behavior preserved: stt_provider = "cloud" and tts_provider = "cloud" (the defaults) route through the existing voice_cloud_transcribe / reply_speech::synthesize_reply paths unchanged. No regression on the hosted path.
Guard/fallback/dispatch parity checks: factory's "cloud" branch invokes the same CloudSttProvider / CloudTtsProvider shims that delegate to the original modules. Unknown provider names error explicitly — never silently fall back.

Duplicate / Superseded PR Handling

Duplicate PR(s): N/A
Canonical PR: This one
Resolution: N/A

Summary by CodeRabbit

New Features
- Voice settings consolidated under "AI & Models"; select and persist STT/TTS providers (cloud, Whisper, Piper).
- One-click install/reinstall for Whisper (STT) and Piper (TTS) with progress, status, retry and error details.
- App-wide STT/TTS dispatch routes voice requests to the chosen provider.
UI Changes
- Voice panel: provider controls, install buttons, simplified dictation notice, updated breadcrumbs and setup navigation for voice errors.
Tests
- Expanded coverage for provider selection, install flows, transcription and synthesis.

coderabbitai · 2026-05-14T18:52:34Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds local Whisper/Piper installers and workspace-aware resolution, a STT/TTS provider factory and RPC dispatch, renderer APIs/UI for provider selection and installs, MicComposer refactor to use factory-based STT, and supporting tests and types across Rust and TypeScript.

Changes

Local voice installers, shared plumbing & path resolution

Layer / File(s)	Summary
Shared installer core & download helper `src/openhuman/local_ai/voice_install_common.rs`, `Cargo.toml`	Adds process-wide status table, `VoiceInstallState`/`VoiceInstallStatus`, `download_to_file()` streaming-to-`.part` with timeouts/progress/validation, `.part` helpers and unit tests; adds direct `flate2` dependency for gzip inflation.
Whisper installer & local transcription `src/openhuman/local_ai/install_whisper.rs`, `src/openhuman/voice/local_transcribe.rs`	Adds Whisper installer (GGML naming, download pipeline, Windows zip handling, status promotion), `transcribe_whisper` subprocess flow, workspace binary discovery, validation thresholds, and tests.
Piper installer & local synthesis `src/openhuman/local_ai/install_piper.rs`, `src/openhuman/voice/local_speech.rs`	Adds Piper installer (voice `.onnx` + `.onnx.json`, OS/arch binary selection, tar.gz & zip extraction using gzip/flate2), Piper subprocess-based synth, viseme timeline generator, and tests.
Workspace-first path resolution & helpers `src/openhuman/local_ai/paths.rs`	Adds workspace-aware resolvers, `resolve_*_with_config` wrappers, expanded STT/TTS model/voice/binary candidate probing, GGML normalization, and tests ensuring workspace installs are preferred.

Factory, RPC schema, and voice status integration

Layer / File(s)	Summary
STT/TTS provider factory & re-exports `src/openhuman/voice/factory.rs`, `src/openhuman/voice/mod.rs`	Defines `SttProvider`/`TtsProvider` traits, `SttResult`, cloud and local provider implementations (`cloud`/`whisper`, `cloud`/`piper`), factory entrypoints, defaults/presets and unit tests; re-exports factory surface.
Voice RPC dispatch & provider persistence `src/openhuman/voice/schemas.rs`, `src/openhuman/local_ai/schemas.rs`	Adds `voice_stt_dispatch`, `voice_tts_dispatch`, `voice_set_providers` RPCs and handlers; registers local-ai install RPCs and status endpoints; implements fire-and-forget install handlers and status query handlers; validation helpers included.
Voice status and config fields `src/openhuman/voice/types.rs`, `src/openhuman/config/schema/local_ai.rs`, `src/openhuman/voice/ops.rs`, `src/openhuman/about_app/catalog.rs`	Adds `stt_provider`/`tts_provider` fields to config/status with serde defaults, derives provider selectors in `voice_status`, and updates capability catalog entries and defaults.

Renderer-side APIs, UI, and composer changes

Layer / File(s)	Summary
Renderer install API wrappers & tests `app/src/services/api/voiceInstallApi.ts`, `app/src/services/api/__tests__/voiceInstallApi.test.ts`	Adds `VoiceInstallState`/`VoiceInstallStatus` types and RPC wrappers: `installWhisper`, `installPiper`, `whisperInstallStatus`, `piperInstallStatus` plus tests mocking `callCoreRpc`.
Renderer STT factory client & tests `app/src/features/human/voice/sttClient.ts`, `app/src/features/human/voice/sttClient.test.ts`	Adds `transcribeWithFactory` that calls `openhuman.voice_stt_dispatch` with provider/model/mime/file_name overrides, trims transcripts and rewrites stale-sidecar errors; tests added.
Voice provider persistence RPC (tauri) & types `app/src/utils/tauriCommands/voice.ts`	Extends `VoiceStatus` with `stt_provider`/`tts_provider` and adds `VoiceProvidersUpdate`/`VoiceProvidersSnapshot` types and `openhumanVoiceSetProviders` RPC wrapper.
Voice settings panel UI & tests `app/src/components/settings/panels/VoicePanel.tsx`, `app/src/components/settings/panels/__tests__/VoicePanel.test.tsx`	Implements “Voice Providers” UI: provider/model/voice selects, persistence via `openhumanVoiceSetProviders`, install/reinstall flows using `voiceInstallApi`, parallel status polling with per-engine error handling, seeding logic to avoid clobbering edits, and comprehensive UI tests.
MicComposer refactor & tests `app/src/features/human/MicComposer.tsx`, `app/src/features/human/MicComposer.test.tsx`, `app/src/pages/Conversations.tsx`	Renames `MicCloudComposer`→`MicComposer`, switches transcription to `transcribeWithFactory` for native and WAV-fallback paths, updates imports/exports and tests, and replaces composer usage in Conversations.
Settings navigation & small UX updates `app/src/components/settings/hooks/useSettingsNavigation.ts`, `app/src/pages/Settings.tsx`, `app/src/components/skills/VoiceSetupModal.tsx`, `app/src/chat/chatSendError.ts`	Adds “Voice (STT & TTS)” settings entry, moves `voice` breadcrumbs into AI & Models grouping, changes modal navigation target to `/settings/voice`, and adds `voice_synthesis`/`tts_not_ready` chat error codes and expanded “Set up” affordance routing.

Sequence Diagram(s)

sequenceDiagram
  participant Renderer as Renderer
  participant Core as CoreRPC(openhuman.voice_stt_dispatch)
  participant Factory as Factory
  participant Provider as Provider
  participant Engine as LocalEngine/Cloud

  Renderer->>Core: call(audio_base64, opts)
  Core->>Factory: create_stt_provider(resolved_provider, model)
  Factory->>Provider: instantiate cloud/whisper provider
  Provider->>Engine: transcribe (whisper-cli or cloud)
  Engine-->>Provider: RpcOutcome<SttResult>
  Provider-->>Factory: return outcome
  Factory-->>Core: { text, provider }
  Core-->>Renderer: { text, provider }

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Add local voice generation to replace cloud-only ElevenLabs dependency #1709 — Implements the local STT/TTS slice (Whisper/Piper installers, provider factory, RPCs, and UI) requested by the issue.

Possibly related PRs

tinyhumansai/openhuman#761: Related STT setup “Set up” affordance and stt_not_ready handling.
tinyhumansai/openhuman#1298: Related error-rewrite behavior for stale-sidecar “unknown method” errors in STT code paths.
tinyhumansai/openhuman#1253: Prior MicCloudComposer/transcribeCloud work; this PR replaces that flow with factory-based MicComposer.

Suggested reviewers

graycyrus
senamakel

"I hopped through code with tiny feet,
Downloads stitched and installers neat.
Whisper listens, Piper sings,
Providers chosen, local springs.
Pick cloud or home — voices meet."

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/src/components/settings/panels/VoicePanel.tsx (1)

144-153: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t hard-require local STT assets when the selected provider is cloud.

Current readiness gates the panel on local STT asset state even when STT provider is cloud, which can disable voice features on the default cloud path.

Suggested fix

-      const sttAssetState = assetsResponse.result.stt?.state;
-      const sttAssetOk = sttAssetState === 'ready' || sttAssetState === 'ondemand';
+      const sttAssetState = assetsResponse.result.stt?.state;
+      const sttAssetOk = sttAssetState === 'ready' || sttAssetState === 'ondemand';
+      const sttProviderResolved = voiceResponse.stt_provider === 'whisper' ? 'whisper' : 'cloud';
+      const requiresLocalAsset = sttProviderResolved === 'whisper';
...
-      setSttReady(sttAssetOk && voiceResponse.stt_available);
+      setSttReady(voiceResponse.stt_available && (!requiresLocalAsset || sttAssetOk));

Also applies to: 263-264, 725-729

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 144 - 153,
The readiness check is incorrectly gating STT availability on local asset state
even when the selected STT provider is cloud; update the logic in VoicePanel
(symbols: assetsResponse.result.stt, sttAssetState, sttAssetOk,
voiceResponse.stt_available, setSttReady) to skip the local-asset requirement
when the configured provider indicates cloud (check the provider flag/field used
elsewhere, e.g., voiceResponse.stt_provider or equivalent). Concretely, compute
readiness as voiceResponse.stt_available when provider === 'cloud', otherwise
require both sttAssetOk and voiceResponse.stt_available; apply the same change
at the other occurrences you noted (around lines referenced: 263-264 and
725-729).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src/components/settings/panels/__tests__/VoicePanel.test.tsx`:
- Around line 267-281: The test currently expects the free-text TTS voice input
to be present for a seeded Piper provider, but when
runtime.voiceStatus.tts_voice_id is set (e.g., 'en_US-lessac-medium') the UI
should show the preset select instead; update the VoicePanel.test to assert that
the element with test id 'tts-voice-select' is in the document and that
'tts-voice-input' is not present (use screen.getByTestId for the select and
screen.queryByTestId to assert the input is null), keeping the existing checks
for stt-provider-select, tts-provider-select, and stt-model-select.

In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 311-329: installButtonLabel currently ignores the engine parameter
so button text is generic; update installButtonLabel to include the engine name
in each returned string (e.g., "Installing Whisper 12%", "Install Piper
locally", "Repair Whisper", "Retry Piper locally", etc.) and ensure the busy
branch also uses the engine (e.g., "Installing Whisper…"); apply the same
engine-specific wording change to the other similar label helpers mentioned (the
analogous label logic at the other two locations referenced in the comment) so
all install/repair/retry strings include the engine name consistently.
- Around line 129-143: The seeding logic for
sttProvider/ttsProvider/sttModel/ttsVoice can be re-applied on every poll due to
stale closure of loadData; add a per-field "seeded" ref (e.g., sttSeededRef,
ttsSeededRef, sttModelSeededRef, ttsVoiceSeededRef) and only call
setSttProvider/setTtsProvider/setSttModel/setTtsVoice when the corresponding
seeded ref is false and voiceResponse has a value, then set that ref to true so
subsequent interval runs won't clobber user edits; update the loadData usage
accordingly so it reads and sets these refs instead of relying on falsy checks
of sttProvider/ttsProvider/sttModel/ttsVoice.

In `@src/openhuman/local_ai/schemas.rs`:
- Around line 1039-1068: The current read_status -> write_status(Installing)
sequence is non-atomic and allows race conditions; change the flow to perform an
atomic "check-and-set" so only one caller flips the state to Installing and
proceeds: add or use an atomic helper in
crate::openhuman::local_ai::voice_install_common (e.g.,
try_set_installing(engine: &str) -> Result<bool, _> or compare_and_swap_status)
that reads the current VoiceInstallStatus and, if not already Installing, writes
Installing in one atomic operation (using file lock or atomic replace) and
returns success/failure; replace the read_status + write_status block around
ENGINE_WHISPER with a call to that helper and only spawn the install when it
returns success, and apply the same fix for ENGINE_PIPER handling.

In `@src/openhuman/local_ai/voice_install_common.rs`:
- Around line 254-260: The code returns early on chunk read errors and write
errors without deleting the temporary `part_path`, leaving partial files; update
the error paths around `chunk.map_err(...)` and the
`file.write_all(...).await.map_err(...)` to attempt cleanup by calling
`tokio::fs::remove_file(part_path).await` (or a fallible wrapper that
ignores/remove-file errors) before returning the mapped error; perform the
removal asynchronously and ensure it runs whether `hasher` or `file` exist, then
return the original formatted error (e.g., call
`tokio::fs::remove_file(part_path).await.ok();` before propagating the error
from `chunk` or `file.write_all`) so `part_path` is deleted on stream/write
failures.

In `@src/openhuman/voice/factory.rs`:
- Around line 95-103: The synthesize trait currently only accepts text and
voice, dropping cloud override fields (model_id, output_format, voice_settings);
update the TTS provider interface (the synthesize method in the trait) to accept
a richer options object (e.g., ReplySpeechOptions or a new struct carrying
model_id, output_format, voice_settings and any other fields used by
voice.reply_synthesize), then update the cloud implementation that constructs
ReplySpeechOptions (and the place noted around lines 248-253) to pass these
option fields through instead of hardcoding None, and finally propagate the new
options parameter through callers (e.g., where voice.reply_synthesize is
invoked) so overrides flow from RPC to provider unchanged.

In `@src/openhuman/voice/local_transcribe.rs`:
- Around line 124-131: The code computes model_id from opts but still calls
resolve_stt_model_path(config) so the actual model file may not match the
returned model_id; update the logic to resolve the model path from the effective
model_id instead: either add an argument to resolve_stt_model_path (e.g.,
resolve_stt_model_path(config, &model_id)) or create a new helper like
resolve_stt_model_path_for(model_id, config), then use that resolved path
wherever the model is loaded/used (including the similar call later in the
function around the other occurrence) so the transcriber uses the same effective
model_id that is returned in responses.
- Around line 195-218: Replace the unbounded cmd.output().await call with a
bounded call using tokio::time::timeout (e.g.
tokio::time::timeout(Duration::from_secs(X), cmd.output()).await), import
std::time::Duration and tokio::time::timeout, and unwrap the nested Result so
you return a clear timeout error (e.g. format!("{LOG_PREFIX} whisper-cli timed
out after {X}s")) when the timeout returns Err(tokio::time::error::Elapsed);
preserve the existing cleanup of file_path and keep the subsequent mapping of
the subprocess error (output_result -> output) but adapt it to handle the
timeout/elapsed branch explicitly.

In `@src/openhuman/voice/schemas.rs`:
- Around line 513-520: The current construction of voice eagerly defaults
p.voice_id to DEFAULT_PIPER_VOICE which forces a Piper ID even when provider ==
"cloud"; change this to preserve an Option for the incoming voice (i.e.,
map/trim/filter p.voice_id into Option<String> without unwrap_or_else), then
apply the DEFAULT_PIPER_VOICE only in the provider-specific branch for Piper
(where provider == "piper") before calling downstream TTS; update both places
where voice is computed (the block using p.voice_id and the similar block around
lines 604-610) so cloud paths leave voice unset unless explicitly provided.

---

Outside diff comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 144-153: The readiness check is incorrectly gating STT
availability on local asset state even when the selected STT provider is cloud;
update the logic in VoicePanel (symbols: assetsResponse.result.stt,
sttAssetState, sttAssetOk, voiceResponse.stt_available, setSttReady) to skip the
local-asset requirement when the configured provider indicates cloud (check the
provider flag/field used elsewhere, e.g., voiceResponse.stt_provider or
equivalent). Concretely, compute readiness as voiceResponse.stt_available when
provider === 'cloud', otherwise require both sttAssetOk and
voiceResponse.stt_available; apply the same change at the other occurrences you
noted (around lines referenced: 263-264 and 725-729).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f8b06946-b2b7-457b-9fc6-7c5d67c88662

📥 Commits

Reviewing files that changed from the base of the PR and between 38924e3 and 53878ea.

⛔ Files ignored due to path filters (2)

Cargo.lock is excluded by !**/*.lock
app/src-tauri/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (30)

Cargo.toml
app/src/components/settings/hooks/useSettingsNavigation.ts
app/src/components/settings/panels/VoicePanel.tsx
app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
app/src/components/skills/VoiceSetupModal.tsx
app/src/features/human/MicComposer.test.tsx
app/src/features/human/MicComposer.tsx
app/src/features/human/voice/sttClient.test.ts
app/src/features/human/voice/sttClient.ts
app/src/pages/Conversations.tsx
app/src/pages/Settings.tsx
app/src/services/api/__tests__/voiceInstallApi.test.ts
app/src/services/api/voiceInstallApi.ts
app/src/utils/tauriCommands/voice.ts
src/openhuman/about_app/catalog.rs
src/openhuman/config/schema/local_ai.rs
src/openhuman/local_ai/install_piper.rs
src/openhuman/local_ai/install_whisper.rs
src/openhuman/local_ai/mod.rs
src/openhuman/local_ai/paths.rs
src/openhuman/local_ai/schemas.rs
src/openhuman/local_ai/voice_install_common.rs
src/openhuman/voice/factory.rs
src/openhuman/voice/local_speech.rs
src/openhuman/voice/local_transcribe.rs
src/openhuman/voice/mod.rs
src/openhuman/voice/ops.rs
src/openhuman/voice/schemas.rs
src/openhuman/voice/schemas_tests.rs
src/openhuman/voice/types.rs

coderabbitai · 2026-05-14T18:59:45Z

+        // Idempotency: a duplicate click while an install is already in
+        // flight should be a no-op, not a second concurrent download.
+        let current = crate::openhuman::local_ai::voice_install_common::read_status(
+            crate::openhuman::local_ai::voice_install_common::ENGINE_WHISPER,
+        );
+        if current.state
+            == crate::openhuman::local_ai::voice_install_common::VoiceInstallState::Installing
+        {
+            tracing::debug!(
+                "[voice-install:whisper] already installing — returning current status"
+            );
+            return serde_json::to_value(current)
+                .map_err(|e| format!("serialize whisper status: {e}"));
+        }
+
+        // Mark "installing" before the spawn so the very next status poll
+        // (≤ 2s away) reflects the new state without a stale read.
+        crate::openhuman::local_ai::voice_install_common::write_status(
+            crate::openhuman::local_ai::voice_install_common::VoiceInstallStatus {
+                engine: crate::openhuman::local_ai::voice_install_common::ENGINE_WHISPER
+                    .to_string(),
+                state:
+                    crate::openhuman::local_ai::voice_install_common::VoiceInstallState::Installing,
+                progress: Some(0),
+                downloaded_bytes: None,
+                total_bytes: None,
+                stage: Some("queued".to_string()),
+                error_detail: None,
+            },
+        );


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Make install start transition atomic to prevent duplicate concurrent downloads.

Both handlers perform a non-atomic read_status → write_status(Installing) sequence. Concurrent RPC calls can pass the check and spawn multiple installs for the same engine.

Suggested direction

+// voice_install_common.rs (new helper idea) +pub fn try_begin_install(engine: &str) -> VoiceInstallStatus { + // lock shared table, if already Installing return current + // else write Installing{progress:0, stage:"queued"} and return it +}

- let current = read_status(ENGINE_WHISPER); - if current.state == VoiceInstallState::Installing { ... } - write_status(VoiceInstallStatus { ... Installing ... }); + let status = crate::openhuman::local_ai::voice_install_common::try_begin_install( + crate::openhuman::local_ai::voice_install_common::ENGINE_WHISPER, + ); + if status.state == crate::openhuman::local_ai::voice_install_common::VoiceInstallState::Installing + && status.stage.as_deref() != Some("queued") + { + return serde_json::to_value(status).map_err(|e| format!("serialize whisper status: {e}")); + }

Apply the same pattern for Piper.

Also applies to: 1099-1121

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/local_ai/schemas.rs` around lines 1039 - 1068, The current read_status -> write_status(Installing) sequence is non-atomic and allows race conditions; change the flow to perform an atomic "check-and-set" so only one caller flips the state to Installing and proceeds: add or use an atomic helper in crate::openhuman::local_ai::voice_install_common (e.g., try_set_installing(engine: &str) -> Result<bool, _> or compare_and_swap_status) that reads the current VoiceInstallStatus and, if not already Installing, writes Installing in one atomic operation (using file lock or atomic replace) and returns success/failure; replace the read_status + write_status block around ENGINE_WHISPER with a call to that helper and only spawn the install when it returns success, and apply the same fix for ENGINE_PIPER handling.

Acknowledged. Implementing a proper atomic check-and-set for the install state transition (comment #7 on schemas.rs:1068) is a heavier refactor requiring a file lock or an in-memory Mutex guarded by a lazy_static/OnceLock. Deferring to a follow-up issue to avoid expanding scope here — will open a tracking issue.

@sanil-23, that's a reasonable call for scope management. The good news is that the shared status table in voice_install_common.rs already uses an in-memory Mutex-protected HashMap, so the actual fix is lighter than a full file-lock approach — it's mainly about holding the lock across both the read and the conditional write (a try_set_installing(engine) helper that acquires the lock once). That said, deferring is sensible.

Would you like me to open a GitHub tracking issue for this so it doesn't get lost? I can include the relevant context (atomic check-and-set for ENGINE_WHISPER/ENGINE_PIPER in voice_install_common, link back to this PR and comment).

🧠 Learnings used

Learnt from: graycyrus Repo: tinyhumansai/openhuman PR: 271 File: app/src/pages/onboarding/Onboarding.tsx:65-85 Timestamp: 2026-04-02T13:53:41.803Z Learning: When reviewing `local_ai` Tauri backend RPCs in `src/openhuman/local_ai/**`, note that `local_ai_download` and `local_ai_download_all_assets` both delegate to the same `service.download_all_models()` obtained from `local_ai::global(config)`, which uses shared internal state (via a shared `Arc`) to make concurrent/duplicate downloads an in-flight no-op. Because the backend is already idempotent for concurrent calls, don’t require or introduce separate frontend cross-component/concurrency guards to prevent duplicate downloads.

Learnt from: sanil-23 Repo: tinyhumansai/openhuman PR: 416 File: src/openhuman/memory/relex.rs:441-464 Timestamp: 2026-04-07T15:49:51.275Z Learning: When using the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, don’t require individual execution-provider feature flags (e.g., `directml`, `coreml`, `cuda`) alongside `load-dynamic` to get EP registration code. The `ort` crate already compiles EP registration via `#[cfg(any(feature = "load-dynamic", feature = "<ep_name>"))]` guards, and adding per-EP feature flags can pull in static-linking dependencies that conflict with the dynamic loading approach. At runtime, EP availability is determined by what the dynamically loaded ONNX Runtime library (`onnxruntime.dll`/`.so`/`.dylib`) supports; ort docs indicate providers like `directml`/`xnnpack`/`coreml` are available in builds when the platform supports them.

Learnt from: sanil-23 Repo: tinyhumansai/openhuman PR: 416 File: src/openhuman/memory/relex.rs:441-464 Timestamp: 2026-04-07T15:49:51.275Z Learning: When integrating the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, do NOT also require/enable individual provider EP Cargo features like `directml`, `coreml`, or `cuda`. In `ort` v2.x, EP registration for providers (e.g., DirectML, CoreML, CUDA, etc.) is already compiled in under source-level `#[cfg(any(feature = "load-dynamic", feature = "<provider>"))]` guards, such as in `ep/directml.rs`. Adding provider feature flags alongside `load-dynamic` can pull in static-linking dependencies that conflict with the dynamic-loading approach. Provider availability should be treated as runtime-determined by what the loaded `onnxruntime` library (`onnxruntime.dll`/`libonnxruntime.so`/`libonnxruntime.dylib`) actually supports.

Learnt from: oxoxDev Repo: tinyhumansai/openhuman PR: 571 File: src/openhuman/local_ai/service/whisper_engine.rs:69-80 Timestamp: 2026-04-14T19:59:04.826Z Learning: When reviewing Rust code in this repo that uses the upstream `whisper-rs` crate (v0.16.0), do not report `WhisperContextParameters::use_gpu(...)` or `WhisperContextParameters::flash_attn(...)` as missing/invalid APIs. These builder-style methods exist upstream and return `&mut Self`; they are not limited to `WhisperVadContextParams`.

Learnt from: graycyrus Repo: tinyhumansai/openhuman PR: 1078 File: src/openhuman/agent/agents/welcome/prompt.rs:24-24 Timestamp: 2026-05-01T13:41:00.958Z Learning: For Rust code under `src/openhuman/**/*.rs`, use `snake_case` for local variables (not `camelCase`). If a local variable name is written in `camelCase`, treat it as a style/lint issue because it will trigger Rust’s `non_snake_case` warning (and related clippy linting, if enabled). Avoid suggesting `camelCase` for any Rust local variable names in this repository.

Learnt from: senamakel Repo: tinyhumansai/openhuman PR: 1173 File: tests/agent_memory_loader_public.rs:88-88 Timestamp: 2026-05-04T06:50:47.877Z Learning: In this repository, the general camelCase naming guideline should not be applied to Rust source files. For all .rs files, Rust function (and related) names should use snake_case, and snake_case Rust function names should not be flagged—even for async test functions annotated with attributes like #[tokio::test]. This is consistent with Rust’s non_snake_case lint behavior.

Refs tinyhumansai#1710 (partial) Adds a string-matched provider factory mirroring embeddings/factory.rs so the user can route speech-to-text and text-to-speech through either the hosted backend proxy or a local Whisper/Piper subprocess. Config strings drive dispatch — there is no separate offline gate; selecting `stt_provider = "whisper"` + `tts_provider = "piper"` + a local LLM provider IS offline mode. Modules - voice/factory.rs: create_stt_provider / create_tts_provider, returns Box<dyn SttProvider | TtsProvider>, errors on unknown provider name. - voice/local_transcribe.rs: invokes whisper-cli via tokio process. --no-timestamps so segment-time prefixes don't leak into the transcript. CREATE_NO_WINDOW on Windows to suppress the subprocess console flash. - voice/local_speech.rs: invokes piper.exe; --length-scale 1.15 for natural pacing; same Windows console suppression. - voice/schemas.rs: voice_stt_dispatch + voice_tts_dispatch RPCs; voice_reply_synthesize handler now routes through create_tts_provider so the dropdown selection actually applies to spoken replies (was decorative — backend always called the cloud path). - voice/types.rs + ops.rs: surface stt_provider/tts_provider in the voice_status snapshot. - config/schema/local_ai.rs: stt_provider, tts_provider fields with "cloud" defaults. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refs tinyhumansai#1710 (partial) Adds workspace-scoped installers for the local STT (Whisper) and TTS (Piper) binaries + models, with a shared status table the UI polls for progress. Modules - local_ai/voice_install_common.rs: shared download_to_file primitive (streaming to .part + atomic rename, 30min request budget, 45s per-chunk idle timeout, SHA256 hook). Mutex<HashMap> status table keyed by engine id. - local_ai/install_whisper.rs: fetch ggml-<size>.bin from HuggingFace + Windows whisper-cli release zip; per-size installed-artifacts check so switching model size triggers a fresh download. - local_ai/install_piper.rs: fetch piper binary tarball + en_US-lessac-medium voice .onnx + .onnx.json sidecar; per-voice installed-artifacts check. - local_ai/paths.rs: workspace_whisper_dir / workspace_piper_dir under shared_root_dir(), which honors OPENHUMAN_WORKSPACE when explicitly set (parallel test sessions / multi-workspace deployments) and falls back to ~/.openhuman/ otherwise. Tolerant filename normalization so stale legacy ids (ggml-base-q5_1.bin) don't double-prefix into ggml-ggml-base-q5_1.bin.bin. TTS voice resolver probes the installer drop-zone BEFORE legacy paths to skip stale 4-byte stubs that crashed Piper with STATUS_STACK_BUFFER_OVERRUN. Release/ added to whisper binary candidate list (current whisper.cpp Windows zip layout). - local_ai/schemas.rs: install_whisper / install_piper RPCs are fire-and-forget — handlers write Installing(0%) to the status table, spawn the download as a tokio task, return immediately. UI polls status via local_ai_*_install_status. Idempotency guard prevents duplicate concurrent installs from a double-click. - Cargo.toml: declare flate2 directly for the Piper tar.gz extractor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refs tinyhumansai#1710 (partial) Surfaces the new voice provider factory + installers through the Settings → AI & Models → Voice card. UI - VoicePanel.tsx: new "Voice Providers" section above the legacy dictation server settings. STT dropdown (cloud | whisper) + model selector (5 Whisper sizes, default medium). TTS dropdown (cloud | piper) + 8 curated Piper voice presets with "Other (advanced)" free- text fallback. Install locally / Reinstall locally / Retry locally buttons rendered from polled status. Auto-install on model/voice selection so changing the dropdown triggers the download if missing. Page title Voice Dictation → Voice. Removed the dead "Open Local AI Model" CTA that pointed at the wrong panel. - VoicePanel.test.tsx: tests for the new provider rows + install state. - voiceInstallApi.ts: RPC wrappers for the four install_* / install_*_status endpoints. Co-located vitest tests. - tauriCommands/voice.ts: voice_set_providers + VoiceStatus extensions. - Settings.tsx: Voice (STT & TTS) card under aiModelsSettingsItems so the panel is discoverable from the main settings nav (the panel had no nav entry before — only reachable via Conversations deep-link). - useSettingsNavigation.ts: breadcrumb routes voice under AI & Models. Capability catalog (about_app/catalog.rs) - local_ai.speech_to_text + .text_to_speech entries describe the factory-dispatched providers, the available presets, and how to switch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refs tinyhumansai#1710 (partial) Wires the existing mascot mic composer and conversation spoken-reply flow into the new voice provider factory so the user's settings actually apply on the user-visible path. Mascot mic - MicComposer.tsx (renamed from MicCloudComposer.tsx): drops the cloud-only assumption; calls openhuman.voice_stt_dispatch via transcribeWithFactory so config.local_ai.stt_provider drives provider choice (cloud Whisper proxy or local whisper-cli subprocess). - sttClient.ts: adds transcribeWithFactory wrapping the dispatch RPC. Existing transcribeCloud kept for callers not yet migrated. - sttClient.test.ts: new tests for the factory path (routing, override, empty-blob rejection, stale-sidecar rewriting, error passthrough). Setup error CTAs - Conversations.tsx (sendError handler): "Set up" button on stt_not_ready / voice_transcription / tts_not_ready / voice_synthesis codes now routes to /settings/voice (the install lives there) instead of /settings/local-model (legacy Ollama-asset panel). - VoiceSetupModal.tsx: same nav fix — "Open Local AI Model" → Voice settings, since the STT model install lives on the Voice panel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- chatSendError.ts: add `voice_synthesis` + `tts_not_ready` to ChatSendErrorCode union. The Conversations.tsx setup-error CTA now routes those codes to /settings/voice, but the comparison was a no-overlap warning until the union learned the new codes. - VoicePanel.tsx: rename unused `engine` arg in installButtonLabel to `_engine` (TS6133). It was used in the prior longer labels ("Install Whisper"); shortened labels ("Install locally") no longer reference it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (2)

src/openhuman/voice/schemas.rs (2)

604-610: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Same provider-agnostic voice default issue.

This handler has the same issue as handle_voice_reply_synthesize: when provider_name is "cloud" and no voice param is provided, the code defaults to DEFAULT_PIPER_VOICE and passes it to the cloud provider. Apply the same provider-aware fix here.

🔧 Proposed fix

 let provider_name = p
     .provider
     .as_deref()
     .map(str::trim)
     .filter(|s| !s.is_empty())
     .map(str::to_string)
     .unwrap_or_else(|| effective_tts_provider(&config));
-let voice = p
-    .voice
-    .as_deref()
-    .map(str::trim)
-    .filter(|s| !s.is_empty())
-    .map(str::to_string)
-    .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string());
+let voice = match provider_name.as_str() {
+    "piper" => Some(
+        p.voice
+            .as_deref()
+            .map(str::trim)
+            .filter(|s| !s.is_empty())
+            .map(str::to_string)
+            .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string())
+    ),
+    _ => p.voice
+        .as_deref()
+        .map(str::trim)
+        .filter(|s| !s.is_empty())
+        .map(str::to_string()),
+};

 log::debug!(
-    "[voice-factory] RPC voice_tts_dispatch provider={provider_name} voice={voice}"
+    "[voice-factory] RPC voice_tts_dispatch provider={provider_name} voice={:?}", voice
 );
 let provider =
-    crate::openhuman::voice::create_tts_provider(&provider_name, &voice, &config)
+    crate::openhuman::voice::create_tts_provider(&provider_name, voice.as_deref().unwrap_or(""), &config)
         .map_err(|e| e.to_string())?;
-let outcome = provider.synthesize(&config, &p.text, Some(&voice)).await?;
+let outcome = provider.synthesize(&config, &p.text, voice.as_deref()).await?;
 to_json(outcome)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/voice/schemas.rs` around lines 604 - 610, The current voice
fallback always uses crate::openhuman::voice::DEFAULT_PIPER_VOICE for p.voice
(producing let voice = ...), which sends a Piper default to the cloud provider;
change the fallback to be provider-aware: if p.provider_name (check
p.provider_name.as_deref()) equals "cloud" use the cloud default constant (e.g.
crate::openhuman::voice::DEFAULT_CLOUD_VOICE) otherwise use DEFAULT_PIPER_VOICE,
and set the resulting String into the same voice variable so this handler (like
handle_voice_reply_synthesize) supplies the correct provider-specific default.

513-519: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Provider-agnostic voice default forces Piper voice ID onto cloud TTS.

When provider_name resolves to "cloud" and no voice_id is provided in params, the code defaults to DEFAULT_PIPER_VOICE and passes it to the cloud provider via Some(&voice) on line 526. Cloud providers expect different voice IDs (e.g., ElevenLabs voice IDs), so passing a Piper-specific ID can cause errors or incorrect behavior.

Make the voice fallback provider-aware: keep voice as Option<String>, only default to DEFAULT_PIPER_VOICE when provider_name == "piper", and leave it None for cloud unless explicitly provided.

🔧 Proposed fix

 let provider_name = effective_tts_provider(&config);
-let voice = p
-    .voice_id
-    .as_deref()
-    .map(str::trim)
-    .filter(|s| !s.is_empty())
-    .map(str::to_string)
-    .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string());
+let voice = match provider_name.as_str() {
+    "piper" => Some(
+        p.voice_id
+            .as_deref()
+            .map(str::trim)
+            .filter(|s| !s.is_empty())
+            .map(str::to_string)
+            .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string())
+    ),
+    _ => p.voice_id
+        .as_deref()
+        .map(str::trim)
+        .filter(|s| !s.is_empty())
+        .map(str::to_string()),
+};
 log::debug!(
-    "[voice-factory] voice_reply_synthesize dispatch provider={provider_name} voice={voice}"
+    "[voice-factory] voice_reply_synthesize dispatch provider={provider_name} voice={:?}", voice
 );
 let provider =
-    crate::openhuman::voice::create_tts_provider(&provider_name, &voice, &config)
+    crate::openhuman::voice::create_tts_provider(&provider_name, voice.as_deref().unwrap_or(""), &config)
         .map_err(|e| e.to_string())?;
-to_json(provider.synthesize(&config, &p.text, Some(&voice)).await?)
+to_json(provider.synthesize(&config, &p.text, voice.as_deref()).await?)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/voice/schemas.rs` around lines 513 - 519, The current code
unconditionally falls back to DEFAULT_PIPER_VOICE when p.voice_id is empty,
which forces a Piper voice onto cloud TTS; change the voice binding to be an
Option<String> by mapping p.voice_id -> Some(trimmed_string) if present,
otherwise set to Some(DEFAULT_PIPER_VOICE.to_string()) only when provider_name
== "piper", and set to None for other providers (e.g., "cloud"); update any
downstream uses that pass Some(&voice) to instead handle
Option<String>/Option<&String> so cloud providers receive None unless an
explicit voice_id was supplied; reference the local variable voice, p.voice_id,
DEFAULT_PIPER_VOICE and provider_name in your changes.

🧹 Nitpick comments (1)

app/src/services/api/voiceInstallApi.ts (1)

29-44: ⚡ Quick win

Keep frontend status fields camelCase and map snake_case at the wire boundary.

Line 37, Line 39, and Line 43 introduce snake_case field names into app-facing TypeScript types. Prefer a wire DTO (snake_case) + mapped frontend interface (camelCase) so UI/state code stays consistent.

♻️ Proposed refactor

+export interface VoiceInstallStatusWire {
+  engine: 'whisper' | 'piper';
+  state: VoiceInstallState;
+  progress: number | null;
+  downloaded_bytes: number | null;
+  total_bytes: number | null;
+  stage: string | null;
+  error_detail: string | null;
+}
+
 export interface VoiceInstallStatus {
-  /** `"whisper"` or `"piper"`. */
-  engine: string;
+  /** `"whisper"` or `"piper"`. */
+  engine: 'whisper' | 'piper';
   /** Current state — drives the button label / spinner. */
   state: VoiceInstallState;
   /** 0–100 percent, populated while `state === 'installing'`. */
   progress: number | null;
   /** Bytes received so far across the current download stage. */
-  downloaded_bytes: number | null;
+  downloadedBytes: number | null;
   /** Total bytes expected (may be null for chunked transfer encoding). */
-  total_bytes: number | null;
+  totalBytes: number | null;
   /** Free-text status line — e.g. "downloading model (ggml-tiny.bin)". */
   stage: string | null;
   /** Populated when `state === 'error'`. */
-  error_detail: string | null;
+  errorDetail: string | null;
 }
+
+function fromWireStatus(wire: VoiceInstallStatusWire): VoiceInstallStatus {
+  return {
+    engine: wire.engine,
+    state: wire.state,
+    progress: wire.progress,
+    downloadedBytes: wire.downloaded_bytes,
+    totalBytes: wire.total_bytes,
+    stage: wire.stage,
+    errorDetail: wire.error_detail,
+  };
+}

As per coding guidelines, Use camelCase for variable and function names in TypeScript/JavaScript code.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/services/api/voiceInstallApi.ts` around lines 29 - 44, The exported
VoiceInstallStatus currently exposes snake_case fields (downloaded_bytes,
total_bytes, error_detail) to app code; introduce a separate wire DTO (e.g.,
VoiceInstallStatusDto) with the snake_case properties returned by the API and
keep the app-facing VoiceInstallStatus using camelCase (downloadedBytes,
totalBytes, errorDetail), then update the parsing/mapping logic in the
voiceInstallApi functions to convert from VoiceInstallStatusDto ->
VoiceInstallStatus at the network boundary so UI/state code consumes only
camelCase fields.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/openhuman/voice/schemas.rs`:
- Around line 604-610: The current voice fallback always uses
crate::openhuman::voice::DEFAULT_PIPER_VOICE for p.voice (producing let voice =
...), which sends a Piper default to the cloud provider; change the fallback to
be provider-aware: if p.provider_name (check p.provider_name.as_deref()) equals
"cloud" use the cloud default constant (e.g.
crate::openhuman::voice::DEFAULT_CLOUD_VOICE) otherwise use DEFAULT_PIPER_VOICE,
and set the resulting String into the same voice variable so this handler (like
handle_voice_reply_synthesize) supplies the correct provider-specific default.
- Around line 513-519: The current code unconditionally falls back to
DEFAULT_PIPER_VOICE when p.voice_id is empty, which forces a Piper voice onto
cloud TTS; change the voice binding to be an Option<String> by mapping
p.voice_id -> Some(trimmed_string) if present, otherwise set to
Some(DEFAULT_PIPER_VOICE.to_string()) only when provider_name == "piper", and
set to None for other providers (e.g., "cloud"); update any downstream uses that
pass Some(&voice) to instead handle Option<String>/Option<&String> so cloud
providers receive None unless an explicit voice_id was supplied; reference the
local variable voice, p.voice_id, DEFAULT_PIPER_VOICE and provider_name in your
changes.

---

Nitpick comments:
In `@app/src/services/api/voiceInstallApi.ts`:
- Around line 29-44: The exported VoiceInstallStatus currently exposes
snake_case fields (downloaded_bytes, total_bytes, error_detail) to app code;
introduce a separate wire DTO (e.g., VoiceInstallStatusDto) with the snake_case
properties returned by the API and keep the app-facing VoiceInstallStatus using
camelCase (downloadedBytes, totalBytes, errorDetail), then update the
parsing/mapping logic in the voiceInstallApi functions to convert from
VoiceInstallStatusDto -> VoiceInstallStatus at the network boundary so UI/state
code consumes only camelCase fields.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ba0edccb-b5e6-4e35-84e9-63228884860b

📥 Commits

Reviewing files that changed from the base of the PR and between 53878ea and f2aa8be.

⛔ Files ignored due to path filters (2)

Cargo.lock is excluded by !**/*.lock
app/src-tauri/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (31)

Cargo.toml
app/src/chat/chatSendError.ts
app/src/components/settings/hooks/useSettingsNavigation.ts
app/src/components/settings/panels/VoicePanel.tsx
app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
app/src/components/skills/VoiceSetupModal.tsx
app/src/features/human/MicComposer.test.tsx
app/src/features/human/MicComposer.tsx
app/src/features/human/voice/sttClient.test.ts
app/src/features/human/voice/sttClient.ts
app/src/pages/Conversations.tsx
app/src/pages/Settings.tsx
app/src/services/api/__tests__/voiceInstallApi.test.ts
app/src/services/api/voiceInstallApi.ts
app/src/utils/tauriCommands/voice.ts
src/openhuman/about_app/catalog.rs
src/openhuman/config/schema/local_ai.rs
src/openhuman/local_ai/install_piper.rs
src/openhuman/local_ai/install_whisper.rs
src/openhuman/local_ai/mod.rs
src/openhuman/local_ai/paths.rs
src/openhuman/local_ai/schemas.rs
src/openhuman/local_ai/voice_install_common.rs
src/openhuman/voice/factory.rs
src/openhuman/voice/local_speech.rs
src/openhuman/voice/local_transcribe.rs
src/openhuman/voice/mod.rs
src/openhuman/voice/ops.rs
src/openhuman/voice/schemas.rs
src/openhuman/voice/schemas_tests.rs
src/openhuman/voice/types.rs

✅ Files skipped from review due to trivial changes (4)

app/src/chat/chatSendError.ts
app/src/components/skills/VoiceSetupModal.tsx
app/src/pages/Settings.tsx
src/openhuman/local_ai/mod.rs

🚧 Files skipped from review as they are similar to previous changes (23)

app/src/utils/tauriCommands/voice.ts
src/openhuman/voice/schemas_tests.rs
app/src/components/settings/hooks/useSettingsNavigation.ts
Cargo.toml
app/src/components/settings/panels/tests/VoicePanel.test.tsx
app/src/features/human/voice/sttClient.test.ts
app/src/services/api/tests/voiceInstallApi.test.ts
src/openhuman/voice/mod.rs
app/src/pages/Conversations.tsx
src/openhuman/config/schema/local_ai.rs
src/openhuman/voice/types.rs
app/src/features/human/voice/sttClient.ts
app/src/features/human/MicComposer.tsx
src/openhuman/local_ai/install_piper.rs
src/openhuman/voice/local_transcribe.rs
src/openhuman/about_app/catalog.rs
src/openhuman/local_ai/schemas.rs
app/src/components/settings/panels/VoicePanel.tsx
src/openhuman/local_ai/paths.rs
src/openhuman/local_ai/install_whisper.rs
src/openhuman/voice/local_speech.rs
src/openhuman/voice/factory.rs
src/openhuman/local_ai/voice_install_common.rs

- VoicePanel.test.tsx: assert tts-voice-select (not tts-voice-input) when seeded Piper provider has a known preset voice id (tinyhumansai#1) - VoicePanel.tsx: fix stale-closure reseeding by using functional setState updater (prev || seeded) so interval polls can't clobber user edits (#2) - local_transcribe.rs: resolve model path via resolve_stt_model_path_by_id so the per-request model_id override is actually used (tinyhumansai#3); wrap cmd.output() in a 120s tokio::time::timeout to prevent hung STT (tinyhumansai#4) - voice_install_common.rs: delete .part file on chunk-read and write errors so mid-stream failures don't leave stale artifacts (tinyhumansai#5) - schemas.rs: make voice default provider-aware; only fall back to DEFAULT_PIPER_VOICE when provider == "piper", leave cloud voice unset unless explicitly provided (tinyhumansai#6) - paths.rs: add resolve_stt_model_path_by_id helper used by tinyhumansai#3 - Settings.tsx, VoicePanel.tsx: fix Prettier formatting (CI gate) Heavy lifts deferred (replied on threads): atomic install-start race (tinyhumansai#7), TTS trait cloud overrides (tinyhumansai#8). Intentional UX deferral: generic install button labels (tinyhumansai#9). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- VoicePanel.test.tsx: update button label assertions to generic labels ("Install locally" / "Reinstall locally" / "Retry locally") matching the intentional UX decision; remove stale "Voice Dictation" heading lookup that was never rendered by the component - paths.rs: update empty-size fallback assertion to ggml-medium.bin (workspace_whisper_model_path changed from large-v3-turbo to medium in the prior commit but the test expectation was not updated) - install_whisper.rs: pin config.local_ai.stt_model_id to DEFAULT_WHISPER_MODEL_SIZE in status_promotes_to_installed_when_model_present so the path created and the path probed by effective_stt_model_id agree Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add 6 new VoicePanel tests: - installWhisper/installPiper rejection paths (covers catch blocks 350-351, 373-374) - persistProviders rejection path (covers catch block 294-295) - Piper installing state with percentage (covers the piper state span conditional at 534) - Piper voice preset select change triggers persistProviders (covers lines 554-562) Freeze subsequent loadData calls with a hanging promise in the rejection tests so the error state isn't immediately cleared by the finally-block reload before the assertion can observe it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t installer drop-zone The installer drop-zone (`bin/piper/voices/<id>.onnx`) is probed FIRST by `resolve_tts_voice_path` and lives under the shared root, not the temp config. When a sibling `install_piper` test runs in parallel with the default voice id and leaves a stub there, this test sees that file instead of the legacy `models/local-ai/tts/` candidate and fails the assertion. Fix: take the `shared_install_lock()` for the duration of the test and wipe the installer path before writing the legacy stub so the resolver deterministically falls through to the legacy candidate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sanil-23 requested a review from a team May 14, 2026 18:52

coderabbitai Bot requested changes May 14, 2026

View reviewed changes

sanil-23 and others added 5 commits May 15, 2026 00:31

sanil-23 force-pushed the feat/1710-voice-factory branch from 53878ea to f2aa8be Compare May 14, 2026 19:02

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

sanil-23 and others added 4 commits May 15, 2026 00:48

senamakel merged commit 3165306 into tinyhumansai:main May 14, 2026
21 checks passed

sanil-23 mentioned this pull request May 15, 2026

fix(voice): atomic install-start guard for Whisper/Piper install RPCs #1787

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): fully-local STT + TTS via Whisper/Piper provider factory#1755

feat(voice): fully-local STT + TTS via Whisper/Piper provider factory#1755
senamakel merged 9 commits into
tinyhumansai:mainfrom
sanil-23:feat/1710-voice-factory

sanil-23 commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 14, 2026 •

edited

Loading

Uh oh!

sanil-23 May 14, 2026

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanil-23 commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanil-23 May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sanil-23 commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading

coderabbitai Bot May 14, 2026 •

edited

Loading