feat(voice): fully-local STT + TTS via Whisper/Piper provider factory#1755
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds local Whisper/Piper installers and workspace-aware resolution, a STT/TTS provider factory and RPC dispatch, renderer APIs/UI for provider selection and installs, MicComposer refactor to use factory-based STT, and supporting tests and types across Rust and TypeScript. ChangesLocal voice installers, shared plumbing & path resolution
Factory, RPC schema, and voice status integration
Renderer-side APIs, UI, and composer changes
Sequence Diagram(s)sequenceDiagram
participant Renderer as Renderer
participant Core as CoreRPC(openhuman.voice_stt_dispatch)
participant Factory as Factory
participant Provider as Provider
participant Engine as LocalEngine/Cloud
Renderer->>Core: call(audio_base64, opts)
Core->>Factory: create_stt_provider(resolved_provider, model)
Factory->>Provider: instantiate cloud/whisper provider
Provider->>Engine: transcribe (whisper-cli or cloud)
Engine-->>Provider: RpcOutcome<SttResult>
Provider-->>Factory: return outcome
Factory-->>Core: { text, provider }
Core-->>Renderer: { text, provider }
Estimated code review effort 🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
|
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/src/components/settings/panels/VoicePanel.tsx (1)
144-153:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t hard-require local STT assets when the selected provider is cloud.
Current readiness gates the panel on local STT asset state even when STT provider is
cloud, which can disable voice features on the default cloud path.Suggested fix
- const sttAssetState = assetsResponse.result.stt?.state; - const sttAssetOk = sttAssetState === 'ready' || sttAssetState === 'ondemand'; + const sttAssetState = assetsResponse.result.stt?.state; + const sttAssetOk = sttAssetState === 'ready' || sttAssetState === 'ondemand'; + const sttProviderResolved = voiceResponse.stt_provider === 'whisper' ? 'whisper' : 'cloud'; + const requiresLocalAsset = sttProviderResolved === 'whisper'; ... - setSttReady(sttAssetOk && voiceResponse.stt_available); + setSttReady(voiceResponse.stt_available && (!requiresLocalAsset || sttAssetOk));Also applies to: 263-264, 725-729
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 144 - 153, The readiness check is incorrectly gating STT availability on local asset state even when the selected STT provider is cloud; update the logic in VoicePanel (symbols: assetsResponse.result.stt, sttAssetState, sttAssetOk, voiceResponse.stt_available, setSttReady) to skip the local-asset requirement when the configured provider indicates cloud (check the provider flag/field used elsewhere, e.g., voiceResponse.stt_provider or equivalent). Concretely, compute readiness as voiceResponse.stt_available when provider === 'cloud', otherwise require both sttAssetOk and voiceResponse.stt_available; apply the same change at the other occurrences you noted (around lines referenced: 263-264 and 725-729).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/settings/panels/__tests__/VoicePanel.test.tsx`:
- Around line 267-281: The test currently expects the free-text TTS voice input
to be present for a seeded Piper provider, but when
runtime.voiceStatus.tts_voice_id is set (e.g., 'en_US-lessac-medium') the UI
should show the preset select instead; update the VoicePanel.test to assert that
the element with test id 'tts-voice-select' is in the document and that
'tts-voice-input' is not present (use screen.getByTestId for the select and
screen.queryByTestId to assert the input is null), keeping the existing checks
for stt-provider-select, tts-provider-select, and stt-model-select.
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 311-329: installButtonLabel currently ignores the engine parameter
so button text is generic; update installButtonLabel to include the engine name
in each returned string (e.g., "Installing Whisper 12%", "Install Piper
locally", "Repair Whisper", "Retry Piper locally", etc.) and ensure the busy
branch also uses the engine (e.g., "Installing Whisper…"); apply the same
engine-specific wording change to the other similar label helpers mentioned (the
analogous label logic at the other two locations referenced in the comment) so
all install/repair/retry strings include the engine name consistently.
- Around line 129-143: The seeding logic for
sttProvider/ttsProvider/sttModel/ttsVoice can be re-applied on every poll due to
stale closure of loadData; add a per-field "seeded" ref (e.g., sttSeededRef,
ttsSeededRef, sttModelSeededRef, ttsVoiceSeededRef) and only call
setSttProvider/setTtsProvider/setSttModel/setTtsVoice when the corresponding
seeded ref is false and voiceResponse has a value, then set that ref to true so
subsequent interval runs won't clobber user edits; update the loadData usage
accordingly so it reads and sets these refs instead of relying on falsy checks
of sttProvider/ttsProvider/sttModel/ttsVoice.
In `@src/openhuman/local_ai/schemas.rs`:
- Around line 1039-1068: The current read_status -> write_status(Installing)
sequence is non-atomic and allows race conditions; change the flow to perform an
atomic "check-and-set" so only one caller flips the state to Installing and
proceeds: add or use an atomic helper in
crate::openhuman::local_ai::voice_install_common (e.g.,
try_set_installing(engine: &str) -> Result<bool, _> or compare_and_swap_status)
that reads the current VoiceInstallStatus and, if not already Installing, writes
Installing in one atomic operation (using file lock or atomic replace) and
returns success/failure; replace the read_status + write_status block around
ENGINE_WHISPER with a call to that helper and only spawn the install when it
returns success, and apply the same fix for ENGINE_PIPER handling.
In `@src/openhuman/local_ai/voice_install_common.rs`:
- Around line 254-260: The code returns early on chunk read errors and write
errors without deleting the temporary `part_path`, leaving partial files; update
the error paths around `chunk.map_err(...)` and the
`file.write_all(...).await.map_err(...)` to attempt cleanup by calling
`tokio::fs::remove_file(part_path).await` (or a fallible wrapper that
ignores/remove-file errors) before returning the mapped error; perform the
removal asynchronously and ensure it runs whether `hasher` or `file` exist, then
return the original formatted error (e.g., call
`tokio::fs::remove_file(part_path).await.ok();` before propagating the error
from `chunk` or `file.write_all`) so `part_path` is deleted on stream/write
failures.
In `@src/openhuman/voice/factory.rs`:
- Around line 95-103: The synthesize trait currently only accepts text and
voice, dropping cloud override fields (model_id, output_format, voice_settings);
update the TTS provider interface (the synthesize method in the trait) to accept
a richer options object (e.g., ReplySpeechOptions or a new struct carrying
model_id, output_format, voice_settings and any other fields used by
voice.reply_synthesize), then update the cloud implementation that constructs
ReplySpeechOptions (and the place noted around lines 248-253) to pass these
option fields through instead of hardcoding None, and finally propagate the new
options parameter through callers (e.g., where voice.reply_synthesize is
invoked) so overrides flow from RPC to provider unchanged.
In `@src/openhuman/voice/local_transcribe.rs`:
- Around line 124-131: The code computes model_id from opts but still calls
resolve_stt_model_path(config) so the actual model file may not match the
returned model_id; update the logic to resolve the model path from the effective
model_id instead: either add an argument to resolve_stt_model_path (e.g.,
resolve_stt_model_path(config, &model_id)) or create a new helper like
resolve_stt_model_path_for(model_id, config), then use that resolved path
wherever the model is loaded/used (including the similar call later in the
function around the other occurrence) so the transcriber uses the same effective
model_id that is returned in responses.
- Around line 195-218: Replace the unbounded cmd.output().await call with a
bounded call using tokio::time::timeout (e.g.
tokio::time::timeout(Duration::from_secs(X), cmd.output()).await), import
std::time::Duration and tokio::time::timeout, and unwrap the nested Result so
you return a clear timeout error (e.g. format!("{LOG_PREFIX} whisper-cli timed
out after {X}s")) when the timeout returns Err(tokio::time::error::Elapsed);
preserve the existing cleanup of file_path and keep the subsequent mapping of
the subprocess error (output_result -> output) but adapt it to handle the
timeout/elapsed branch explicitly.
In `@src/openhuman/voice/schemas.rs`:
- Around line 513-520: The current construction of voice eagerly defaults
p.voice_id to DEFAULT_PIPER_VOICE which forces a Piper ID even when provider ==
"cloud"; change this to preserve an Option for the incoming voice (i.e.,
map/trim/filter p.voice_id into Option<String> without unwrap_or_else), then
apply the DEFAULT_PIPER_VOICE only in the provider-specific branch for Piper
(where provider == "piper") before calling downstream TTS; update both places
where voice is computed (the block using p.voice_id and the similar block around
lines 604-610) so cloud paths leave voice unset unless explicitly provided.
---
Outside diff comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 144-153: The readiness check is incorrectly gating STT
availability on local asset state even when the selected STT provider is cloud;
update the logic in VoicePanel (symbols: assetsResponse.result.stt,
sttAssetState, sttAssetOk, voiceResponse.stt_available, setSttReady) to skip the
local-asset requirement when the configured provider indicates cloud (check the
provider flag/field used elsewhere, e.g., voiceResponse.stt_provider or
equivalent). Concretely, compute readiness as voiceResponse.stt_available when
provider === 'cloud', otherwise require both sttAssetOk and
voiceResponse.stt_available; apply the same change at the other occurrences you
noted (around lines referenced: 263-264 and 725-729).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f8b06946-b2b7-457b-9fc6-7c5d67c88662
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockapp/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (30)
Cargo.tomlapp/src/components/settings/hooks/useSettingsNavigation.tsapp/src/components/settings/panels/VoicePanel.tsxapp/src/components/settings/panels/__tests__/VoicePanel.test.tsxapp/src/components/skills/VoiceSetupModal.tsxapp/src/features/human/MicComposer.test.tsxapp/src/features/human/MicComposer.tsxapp/src/features/human/voice/sttClient.test.tsapp/src/features/human/voice/sttClient.tsapp/src/pages/Conversations.tsxapp/src/pages/Settings.tsxapp/src/services/api/__tests__/voiceInstallApi.test.tsapp/src/services/api/voiceInstallApi.tsapp/src/utils/tauriCommands/voice.tssrc/openhuman/about_app/catalog.rssrc/openhuman/config/schema/local_ai.rssrc/openhuman/local_ai/install_piper.rssrc/openhuman/local_ai/install_whisper.rssrc/openhuman/local_ai/mod.rssrc/openhuman/local_ai/paths.rssrc/openhuman/local_ai/schemas.rssrc/openhuman/local_ai/voice_install_common.rssrc/openhuman/voice/factory.rssrc/openhuman/voice/local_speech.rssrc/openhuman/voice/local_transcribe.rssrc/openhuman/voice/mod.rssrc/openhuman/voice/ops.rssrc/openhuman/voice/schemas.rssrc/openhuman/voice/schemas_tests.rssrc/openhuman/voice/types.rs
| // Idempotency: a duplicate click while an install is already in | ||
| // flight should be a no-op, not a second concurrent download. | ||
| let current = crate::openhuman::local_ai::voice_install_common::read_status( | ||
| crate::openhuman::local_ai::voice_install_common::ENGINE_WHISPER, | ||
| ); | ||
| if current.state | ||
| == crate::openhuman::local_ai::voice_install_common::VoiceInstallState::Installing | ||
| { | ||
| tracing::debug!( | ||
| "[voice-install:whisper] already installing — returning current status" | ||
| ); | ||
| return serde_json::to_value(current) | ||
| .map_err(|e| format!("serialize whisper status: {e}")); | ||
| } | ||
|
|
||
| // Mark "installing" before the spawn so the very next status poll | ||
| // (≤ 2s away) reflects the new state without a stale read. | ||
| crate::openhuman::local_ai::voice_install_common::write_status( | ||
| crate::openhuman::local_ai::voice_install_common::VoiceInstallStatus { | ||
| engine: crate::openhuman::local_ai::voice_install_common::ENGINE_WHISPER | ||
| .to_string(), | ||
| state: | ||
| crate::openhuman::local_ai::voice_install_common::VoiceInstallState::Installing, | ||
| progress: Some(0), | ||
| downloaded_bytes: None, | ||
| total_bytes: None, | ||
| stage: Some("queued".to_string()), | ||
| error_detail: None, | ||
| }, | ||
| ); |
There was a problem hiding this comment.
Make install start transition atomic to prevent duplicate concurrent downloads.
Both handlers perform a non-atomic read_status → write_status(Installing) sequence. Concurrent RPC calls can pass the check and spawn multiple installs for the same engine.
Suggested direction
+// voice_install_common.rs (new helper idea)
+pub fn try_begin_install(engine: &str) -> VoiceInstallStatus {
+ // lock shared table, if already Installing return current
+ // else write Installing{progress:0, stage:"queued"} and return it
+}- let current = read_status(ENGINE_WHISPER);
- if current.state == VoiceInstallState::Installing { ... }
- write_status(VoiceInstallStatus { ... Installing ... });
+ let status = crate::openhuman::local_ai::voice_install_common::try_begin_install(
+ crate::openhuman::local_ai::voice_install_common::ENGINE_WHISPER,
+ );
+ if status.state == crate::openhuman::local_ai::voice_install_common::VoiceInstallState::Installing
+ && status.stage.as_deref() != Some("queued")
+ {
+ return serde_json::to_value(status).map_err(|e| format!("serialize whisper status: {e}"));
+ }Apply the same pattern for Piper.
Also applies to: 1099-1121
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/local_ai/schemas.rs` around lines 1039 - 1068, The current
read_status -> write_status(Installing) sequence is non-atomic and allows race
conditions; change the flow to perform an atomic "check-and-set" so only one
caller flips the state to Installing and proceeds: add or use an atomic helper
in crate::openhuman::local_ai::voice_install_common (e.g.,
try_set_installing(engine: &str) -> Result<bool, _> or compare_and_swap_status)
that reads the current VoiceInstallStatus and, if not already Installing, writes
Installing in one atomic operation (using file lock or atomic replace) and
returns success/failure; replace the read_status + write_status block around
ENGINE_WHISPER with a call to that helper and only spawn the install when it
returns success, and apply the same fix for ENGINE_PIPER handling.
There was a problem hiding this comment.
Acknowledged. Implementing a proper atomic check-and-set for the install state transition (comment #7 on schemas.rs:1068) is a heavier refactor requiring a file lock or an in-memory Mutex guarded by a lazy_static/OnceLock. Deferring to a follow-up issue to avoid expanding scope here — will open a tracking issue.
There was a problem hiding this comment.
@sanil-23, that's a reasonable call for scope management. The good news is that the shared status table in voice_install_common.rs already uses an in-memory Mutex-protected HashMap, so the actual fix is lighter than a full file-lock approach — it's mainly about holding the lock across both the read and the conditional write (a try_set_installing(engine) helper that acquires the lock once). That said, deferring is sensible.
Would you like me to open a GitHub tracking issue for this so it doesn't get lost? I can include the relevant context (atomic check-and-set for ENGINE_WHISPER/ENGINE_PIPER in voice_install_common, link back to this PR and comment).
🧠 Learnings used
Learnt from: graycyrus
Repo: tinyhumansai/openhuman PR: 271
File: app/src/pages/onboarding/Onboarding.tsx:65-85
Timestamp: 2026-04-02T13:53:41.803Z
Learning: When reviewing `local_ai` Tauri backend RPCs in `src/openhuman/local_ai/**`, note that `local_ai_download` and `local_ai_download_all_assets` both delegate to the same `service.download_all_models()` obtained from `local_ai::global(config)`, which uses shared internal state (via a shared `Arc`) to make concurrent/duplicate downloads an in-flight no-op. Because the backend is already idempotent for concurrent calls, don’t require or introduce separate frontend cross-component/concurrency guards to prevent duplicate downloads.
Learnt from: sanil-23
Repo: tinyhumansai/openhuman PR: 416
File: src/openhuman/memory/relex.rs:441-464
Timestamp: 2026-04-07T15:49:51.275Z
Learning: When using the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, don’t require individual execution-provider feature flags (e.g., `directml`, `coreml`, `cuda`) alongside `load-dynamic` to get EP registration code. The `ort` crate already compiles EP registration via `#[cfg(any(feature = "load-dynamic", feature = "<ep_name>"))]` guards, and adding per-EP feature flags can pull in static-linking dependencies that conflict with the dynamic loading approach. At runtime, EP availability is determined by what the dynamically loaded ONNX Runtime library (`onnxruntime.dll`/`.so`/`.dylib`) supports; ort docs indicate providers like `directml`/`xnnpack`/`coreml` are available in builds when the platform supports them.
Learnt from: sanil-23
Repo: tinyhumansai/openhuman PR: 416
File: src/openhuman/memory/relex.rs:441-464
Timestamp: 2026-04-07T15:49:51.275Z
Learning: When integrating the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, do NOT also require/enable individual provider EP Cargo features like `directml`, `coreml`, or `cuda`. In `ort` v2.x, EP registration for providers (e.g., DirectML, CoreML, CUDA, etc.) is already compiled in under source-level `#[cfg(any(feature = "load-dynamic", feature = "<provider>"))]` guards, such as in `ep/directml.rs`. Adding provider feature flags alongside `load-dynamic` can pull in static-linking dependencies that conflict with the dynamic-loading approach. Provider availability should be treated as runtime-determined by what the loaded `onnxruntime` library (`onnxruntime.dll`/`libonnxruntime.so`/`libonnxruntime.dylib`) actually supports.
Learnt from: oxoxDev
Repo: tinyhumansai/openhuman PR: 571
File: src/openhuman/local_ai/service/whisper_engine.rs:69-80
Timestamp: 2026-04-14T19:59:04.826Z
Learning: When reviewing Rust code in this repo that uses the upstream `whisper-rs` crate (v0.16.0), do not report `WhisperContextParameters::use_gpu(...)` or `WhisperContextParameters::flash_attn(...)` as missing/invalid APIs. These builder-style methods exist upstream and return `&mut Self`; they are not limited to `WhisperVadContextParams`.
Learnt from: graycyrus
Repo: tinyhumansai/openhuman PR: 1078
File: src/openhuman/agent/agents/welcome/prompt.rs:24-24
Timestamp: 2026-05-01T13:41:00.958Z
Learning: For Rust code under `src/openhuman/**/*.rs`, use `snake_case` for local variables (not `camelCase`). If a local variable name is written in `camelCase`, treat it as a style/lint issue because it will trigger Rust’s `non_snake_case` warning (and related clippy linting, if enabled). Avoid suggesting `camelCase` for any Rust local variable names in this repository.
Learnt from: senamakel
Repo: tinyhumansai/openhuman PR: 1173
File: tests/agent_memory_loader_public.rs:88-88
Timestamp: 2026-05-04T06:50:47.877Z
Learning: In this repository, the general camelCase naming guideline should not be applied to Rust source files. For all .rs files, Rust function (and related) names should use snake_case, and snake_case Rust function names should not be flagged—even for async test functions annotated with attributes like #[tokio::test]. This is consistent with Rust’s non_snake_case lint behavior.
Refs tinyhumansai#1710 (partial) Adds a string-matched provider factory mirroring embeddings/factory.rs so the user can route speech-to-text and text-to-speech through either the hosted backend proxy or a local Whisper/Piper subprocess. Config strings drive dispatch — there is no separate offline gate; selecting `stt_provider = "whisper"` + `tts_provider = "piper"` + a local LLM provider IS offline mode. Modules - voice/factory.rs: create_stt_provider / create_tts_provider, returns Box<dyn SttProvider | TtsProvider>, errors on unknown provider name. - voice/local_transcribe.rs: invokes whisper-cli via tokio process. --no-timestamps so segment-time prefixes don't leak into the transcript. CREATE_NO_WINDOW on Windows to suppress the subprocess console flash. - voice/local_speech.rs: invokes piper.exe; --length-scale 1.15 for natural pacing; same Windows console suppression. - voice/schemas.rs: voice_stt_dispatch + voice_tts_dispatch RPCs; voice_reply_synthesize handler now routes through create_tts_provider so the dropdown selection actually applies to spoken replies (was decorative — backend always called the cloud path). - voice/types.rs + ops.rs: surface stt_provider/tts_provider in the voice_status snapshot. - config/schema/local_ai.rs: stt_provider, tts_provider fields with "cloud" defaults. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs tinyhumansai#1710 (partial) Adds workspace-scoped installers for the local STT (Whisper) and TTS (Piper) binaries + models, with a shared status table the UI polls for progress. Modules - local_ai/voice_install_common.rs: shared download_to_file primitive (streaming to .part + atomic rename, 30min request budget, 45s per-chunk idle timeout, SHA256 hook). Mutex<HashMap> status table keyed by engine id. - local_ai/install_whisper.rs: fetch ggml-<size>.bin from HuggingFace + Windows whisper-cli release zip; per-size installed-artifacts check so switching model size triggers a fresh download. - local_ai/install_piper.rs: fetch piper binary tarball + en_US-lessac-medium voice .onnx + .onnx.json sidecar; per-voice installed-artifacts check. - local_ai/paths.rs: workspace_whisper_dir / workspace_piper_dir under shared_root_dir(), which honors OPENHUMAN_WORKSPACE when explicitly set (parallel test sessions / multi-workspace deployments) and falls back to ~/.openhuman/ otherwise. Tolerant filename normalization so stale legacy ids (ggml-base-q5_1.bin) don't double-prefix into ggml-ggml-base-q5_1.bin.bin. TTS voice resolver probes the installer drop-zone BEFORE legacy paths to skip stale 4-byte stubs that crashed Piper with STATUS_STACK_BUFFER_OVERRUN. Release/ added to whisper binary candidate list (current whisper.cpp Windows zip layout). - local_ai/schemas.rs: install_whisper / install_piper RPCs are fire-and-forget — handlers write Installing(0%) to the status table, spawn the download as a tokio task, return immediately. UI polls status via local_ai_*_install_status. Idempotency guard prevents duplicate concurrent installs from a double-click. - Cargo.toml: declare flate2 directly for the Piper tar.gz extractor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs tinyhumansai#1710 (partial) Surfaces the new voice provider factory + installers through the Settings → AI & Models → Voice card. UI - VoicePanel.tsx: new "Voice Providers" section above the legacy dictation server settings. STT dropdown (cloud | whisper) + model selector (5 Whisper sizes, default medium). TTS dropdown (cloud | piper) + 8 curated Piper voice presets with "Other (advanced)" free- text fallback. Install locally / Reinstall locally / Retry locally buttons rendered from polled status. Auto-install on model/voice selection so changing the dropdown triggers the download if missing. Page title Voice Dictation → Voice. Removed the dead "Open Local AI Model" CTA that pointed at the wrong panel. - VoicePanel.test.tsx: tests for the new provider rows + install state. - voiceInstallApi.ts: RPC wrappers for the four install_* / install_*_status endpoints. Co-located vitest tests. - tauriCommands/voice.ts: voice_set_providers + VoiceStatus extensions. - Settings.tsx: Voice (STT & TTS) card under aiModelsSettingsItems so the panel is discoverable from the main settings nav (the panel had no nav entry before — only reachable via Conversations deep-link). - useSettingsNavigation.ts: breadcrumb routes voice under AI & Models. Capability catalog (about_app/catalog.rs) - local_ai.speech_to_text + .text_to_speech entries describe the factory-dispatched providers, the available presets, and how to switch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs tinyhumansai#1710 (partial) Wires the existing mascot mic composer and conversation spoken-reply flow into the new voice provider factory so the user's settings actually apply on the user-visible path. Mascot mic - MicComposer.tsx (renamed from MicCloudComposer.tsx): drops the cloud-only assumption; calls openhuman.voice_stt_dispatch via transcribeWithFactory so config.local_ai.stt_provider drives provider choice (cloud Whisper proxy or local whisper-cli subprocess). - sttClient.ts: adds transcribeWithFactory wrapping the dispatch RPC. Existing transcribeCloud kept for callers not yet migrated. - sttClient.test.ts: new tests for the factory path (routing, override, empty-blob rejection, stale-sidecar rewriting, error passthrough). Setup error CTAs - Conversations.tsx (sendError handler): "Set up" button on stt_not_ready / voice_transcription / tts_not_ready / voice_synthesis codes now routes to /settings/voice (the install lives there) instead of /settings/local-model (legacy Ollama-asset panel). - VoiceSetupModal.tsx: same nav fix — "Open Local AI Model" → Voice settings, since the STT model install lives on the Voice panel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- chatSendError.ts: add `voice_synthesis` + `tts_not_ready` to
ChatSendErrorCode union. The Conversations.tsx setup-error CTA now
routes those codes to /settings/voice, but the comparison was a
no-overlap warning until the union learned the new codes.
- VoicePanel.tsx: rename unused `engine` arg in installButtonLabel to
`_engine` (TS6133). It was used in the prior longer labels ("Install
Whisper"); shortened labels ("Install locally") no longer reference it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
53878ea to
f2aa8be
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/openhuman/voice/schemas.rs (2)
604-610:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftSame provider-agnostic voice default issue.
This handler has the same issue as
handle_voice_reply_synthesize: whenprovider_nameis"cloud"and novoiceparam is provided, the code defaults toDEFAULT_PIPER_VOICEand passes it to the cloud provider. Apply the same provider-aware fix here.🔧 Proposed fix
let provider_name = p .provider .as_deref() .map(str::trim) .filter(|s| !s.is_empty()) .map(str::to_string) .unwrap_or_else(|| effective_tts_provider(&config)); -let voice = p - .voice - .as_deref() - .map(str::trim) - .filter(|s| !s.is_empty()) - .map(str::to_string) - .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string()); +let voice = match provider_name.as_str() { + "piper" => Some( + p.voice + .as_deref() + .map(str::trim) + .filter(|s| !s.is_empty()) + .map(str::to_string) + .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string()) + ), + _ => p.voice + .as_deref() + .map(str::trim) + .filter(|s| !s.is_empty()) + .map(str::to_string()), +}; log::debug!( - "[voice-factory] RPC voice_tts_dispatch provider={provider_name} voice={voice}" + "[voice-factory] RPC voice_tts_dispatch provider={provider_name} voice={:?}", voice ); let provider = - crate::openhuman::voice::create_tts_provider(&provider_name, &voice, &config) + crate::openhuman::voice::create_tts_provider(&provider_name, voice.as_deref().unwrap_or(""), &config) .map_err(|e| e.to_string())?; -let outcome = provider.synthesize(&config, &p.text, Some(&voice)).await?; +let outcome = provider.synthesize(&config, &p.text, voice.as_deref()).await?; to_json(outcome)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice/schemas.rs` around lines 604 - 610, The current voice fallback always uses crate::openhuman::voice::DEFAULT_PIPER_VOICE for p.voice (producing let voice = ...), which sends a Piper default to the cloud provider; change the fallback to be provider-aware: if p.provider_name (check p.provider_name.as_deref()) equals "cloud" use the cloud default constant (e.g. crate::openhuman::voice::DEFAULT_CLOUD_VOICE) otherwise use DEFAULT_PIPER_VOICE, and set the resulting String into the same voice variable so this handler (like handle_voice_reply_synthesize) supplies the correct provider-specific default.
513-519:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftProvider-agnostic voice default forces Piper voice ID onto cloud TTS.
When
provider_nameresolves to"cloud"and novoice_idis provided in params, the code defaults toDEFAULT_PIPER_VOICEand passes it to the cloud provider viaSome(&voice)on line 526. Cloud providers expect different voice IDs (e.g., ElevenLabs voice IDs), so passing a Piper-specific ID can cause errors or incorrect behavior.Make the voice fallback provider-aware: keep
voiceasOption<String>, only default toDEFAULT_PIPER_VOICEwhenprovider_name == "piper", and leave itNonefor cloud unless explicitly provided.🔧 Proposed fix
let provider_name = effective_tts_provider(&config); -let voice = p - .voice_id - .as_deref() - .map(str::trim) - .filter(|s| !s.is_empty()) - .map(str::to_string) - .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string()); +let voice = match provider_name.as_str() { + "piper" => Some( + p.voice_id + .as_deref() + .map(str::trim) + .filter(|s| !s.is_empty()) + .map(str::to_string) + .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string()) + ), + _ => p.voice_id + .as_deref() + .map(str::trim) + .filter(|s| !s.is_empty()) + .map(str::to_string()), +}; log::debug!( - "[voice-factory] voice_reply_synthesize dispatch provider={provider_name} voice={voice}" + "[voice-factory] voice_reply_synthesize dispatch provider={provider_name} voice={:?}", voice ); let provider = - crate::openhuman::voice::create_tts_provider(&provider_name, &voice, &config) + crate::openhuman::voice::create_tts_provider(&provider_name, voice.as_deref().unwrap_or(""), &config) .map_err(|e| e.to_string())?; -to_json(provider.synthesize(&config, &p.text, Some(&voice)).await?) +to_json(provider.synthesize(&config, &p.text, voice.as_deref()).await?)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/voice/schemas.rs` around lines 513 - 519, The current code unconditionally falls back to DEFAULT_PIPER_VOICE when p.voice_id is empty, which forces a Piper voice onto cloud TTS; change the voice binding to be an Option<String> by mapping p.voice_id -> Some(trimmed_string) if present, otherwise set to Some(DEFAULT_PIPER_VOICE.to_string()) only when provider_name == "piper", and set to None for other providers (e.g., "cloud"); update any downstream uses that pass Some(&voice) to instead handle Option<String>/Option<&String> so cloud providers receive None unless an explicit voice_id was supplied; reference the local variable voice, p.voice_id, DEFAULT_PIPER_VOICE and provider_name in your changes.
🧹 Nitpick comments (1)
app/src/services/api/voiceInstallApi.ts (1)
29-44: ⚡ Quick winKeep frontend status fields camelCase and map snake_case at the wire boundary.
Line 37, Line 39, and Line 43 introduce snake_case field names into app-facing TypeScript types. Prefer a wire DTO (snake_case) + mapped frontend interface (camelCase) so UI/state code stays consistent.
♻️ Proposed refactor
+export interface VoiceInstallStatusWire { + engine: 'whisper' | 'piper'; + state: VoiceInstallState; + progress: number | null; + downloaded_bytes: number | null; + total_bytes: number | null; + stage: string | null; + error_detail: string | null; +} + export interface VoiceInstallStatus { - /** `"whisper"` or `"piper"`. */ - engine: string; + /** `"whisper"` or `"piper"`. */ + engine: 'whisper' | 'piper'; /** Current state — drives the button label / spinner. */ state: VoiceInstallState; /** 0–100 percent, populated while `state === 'installing'`. */ progress: number | null; /** Bytes received so far across the current download stage. */ - downloaded_bytes: number | null; + downloadedBytes: number | null; /** Total bytes expected (may be null for chunked transfer encoding). */ - total_bytes: number | null; + totalBytes: number | null; /** Free-text status line — e.g. "downloading model (ggml-tiny.bin)". */ stage: string | null; /** Populated when `state === 'error'`. */ - error_detail: string | null; + errorDetail: string | null; } + +function fromWireStatus(wire: VoiceInstallStatusWire): VoiceInstallStatus { + return { + engine: wire.engine, + state: wire.state, + progress: wire.progress, + downloadedBytes: wire.downloaded_bytes, + totalBytes: wire.total_bytes, + stage: wire.stage, + errorDetail: wire.error_detail, + }; +}As per coding guidelines,
Use camelCase for variable and function names in TypeScript/JavaScript code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/services/api/voiceInstallApi.ts` around lines 29 - 44, The exported VoiceInstallStatus currently exposes snake_case fields (downloaded_bytes, total_bytes, error_detail) to app code; introduce a separate wire DTO (e.g., VoiceInstallStatusDto) with the snake_case properties returned by the API and keep the app-facing VoiceInstallStatus using camelCase (downloadedBytes, totalBytes, errorDetail), then update the parsing/mapping logic in the voiceInstallApi functions to convert from VoiceInstallStatusDto -> VoiceInstallStatus at the network boundary so UI/state code consumes only camelCase fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/openhuman/voice/schemas.rs`:
- Around line 604-610: The current voice fallback always uses
crate::openhuman::voice::DEFAULT_PIPER_VOICE for p.voice (producing let voice =
...), which sends a Piper default to the cloud provider; change the fallback to
be provider-aware: if p.provider_name (check p.provider_name.as_deref()) equals
"cloud" use the cloud default constant (e.g.
crate::openhuman::voice::DEFAULT_CLOUD_VOICE) otherwise use DEFAULT_PIPER_VOICE,
and set the resulting String into the same voice variable so this handler (like
handle_voice_reply_synthesize) supplies the correct provider-specific default.
- Around line 513-519: The current code unconditionally falls back to
DEFAULT_PIPER_VOICE when p.voice_id is empty, which forces a Piper voice onto
cloud TTS; change the voice binding to be an Option<String> by mapping
p.voice_id -> Some(trimmed_string) if present, otherwise set to
Some(DEFAULT_PIPER_VOICE.to_string()) only when provider_name == "piper", and
set to None for other providers (e.g., "cloud"); update any downstream uses that
pass Some(&voice) to instead handle Option<String>/Option<&String> so cloud
providers receive None unless an explicit voice_id was supplied; reference the
local variable voice, p.voice_id, DEFAULT_PIPER_VOICE and provider_name in your
changes.
---
Nitpick comments:
In `@app/src/services/api/voiceInstallApi.ts`:
- Around line 29-44: The exported VoiceInstallStatus currently exposes
snake_case fields (downloaded_bytes, total_bytes, error_detail) to app code;
introduce a separate wire DTO (e.g., VoiceInstallStatusDto) with the snake_case
properties returned by the API and keep the app-facing VoiceInstallStatus using
camelCase (downloadedBytes, totalBytes, errorDetail), then update the
parsing/mapping logic in the voiceInstallApi functions to convert from
VoiceInstallStatusDto -> VoiceInstallStatus at the network boundary so UI/state
code consumes only camelCase fields.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ba0edccb-b5e6-4e35-84e9-63228884860b
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockapp/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (31)
Cargo.tomlapp/src/chat/chatSendError.tsapp/src/components/settings/hooks/useSettingsNavigation.tsapp/src/components/settings/panels/VoicePanel.tsxapp/src/components/settings/panels/__tests__/VoicePanel.test.tsxapp/src/components/skills/VoiceSetupModal.tsxapp/src/features/human/MicComposer.test.tsxapp/src/features/human/MicComposer.tsxapp/src/features/human/voice/sttClient.test.tsapp/src/features/human/voice/sttClient.tsapp/src/pages/Conversations.tsxapp/src/pages/Settings.tsxapp/src/services/api/__tests__/voiceInstallApi.test.tsapp/src/services/api/voiceInstallApi.tsapp/src/utils/tauriCommands/voice.tssrc/openhuman/about_app/catalog.rssrc/openhuman/config/schema/local_ai.rssrc/openhuman/local_ai/install_piper.rssrc/openhuman/local_ai/install_whisper.rssrc/openhuman/local_ai/mod.rssrc/openhuman/local_ai/paths.rssrc/openhuman/local_ai/schemas.rssrc/openhuman/local_ai/voice_install_common.rssrc/openhuman/voice/factory.rssrc/openhuman/voice/local_speech.rssrc/openhuman/voice/local_transcribe.rssrc/openhuman/voice/mod.rssrc/openhuman/voice/ops.rssrc/openhuman/voice/schemas.rssrc/openhuman/voice/schemas_tests.rssrc/openhuman/voice/types.rs
✅ Files skipped from review due to trivial changes (4)
- app/src/chat/chatSendError.ts
- app/src/components/skills/VoiceSetupModal.tsx
- app/src/pages/Settings.tsx
- src/openhuman/local_ai/mod.rs
🚧 Files skipped from review as they are similar to previous changes (23)
- app/src/utils/tauriCommands/voice.ts
- src/openhuman/voice/schemas_tests.rs
- app/src/components/settings/hooks/useSettingsNavigation.ts
- Cargo.toml
- app/src/components/settings/panels/tests/VoicePanel.test.tsx
- app/src/features/human/voice/sttClient.test.ts
- app/src/services/api/tests/voiceInstallApi.test.ts
- src/openhuman/voice/mod.rs
- app/src/pages/Conversations.tsx
- src/openhuman/config/schema/local_ai.rs
- src/openhuman/voice/types.rs
- app/src/features/human/voice/sttClient.ts
- app/src/features/human/MicComposer.tsx
- src/openhuman/local_ai/install_piper.rs
- src/openhuman/voice/local_transcribe.rs
- src/openhuman/about_app/catalog.rs
- src/openhuman/local_ai/schemas.rs
- app/src/components/settings/panels/VoicePanel.tsx
- src/openhuman/local_ai/paths.rs
- src/openhuman/local_ai/install_whisper.rs
- src/openhuman/voice/local_speech.rs
- src/openhuman/voice/factory.rs
- src/openhuman/local_ai/voice_install_common.rs
- VoicePanel.test.tsx: assert tts-voice-select (not tts-voice-input) when seeded Piper provider has a known preset voice id (tinyhumansai#1) - VoicePanel.tsx: fix stale-closure reseeding by using functional setState updater (prev || seeded) so interval polls can't clobber user edits (#2) - local_transcribe.rs: resolve model path via resolve_stt_model_path_by_id so the per-request model_id override is actually used (tinyhumansai#3); wrap cmd.output() in a 120s tokio::time::timeout to prevent hung STT (tinyhumansai#4) - voice_install_common.rs: delete .part file on chunk-read and write errors so mid-stream failures don't leave stale artifacts (tinyhumansai#5) - schemas.rs: make voice default provider-aware; only fall back to DEFAULT_PIPER_VOICE when provider == "piper", leave cloud voice unset unless explicitly provided (tinyhumansai#6) - paths.rs: add resolve_stt_model_path_by_id helper used by tinyhumansai#3 - Settings.tsx, VoicePanel.tsx: fix Prettier formatting (CI gate) Heavy lifts deferred (replied on threads): atomic install-start race (tinyhumansai#7), TTS trait cloud overrides (tinyhumansai#8). Intentional UX deferral: generic install button labels (tinyhumansai#9). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VoicePanel.test.tsx: update button label assertions to generic labels
("Install locally" / "Reinstall locally" / "Retry locally") matching the
intentional UX decision; remove stale "Voice Dictation" heading lookup
that was never rendered by the component
- paths.rs: update empty-size fallback assertion to ggml-medium.bin
(workspace_whisper_model_path changed from large-v3-turbo to medium
in the prior commit but the test expectation was not updated)
- install_whisper.rs: pin config.local_ai.stt_model_id to
DEFAULT_WHISPER_MODEL_SIZE in status_promotes_to_installed_when_model_present
so the path created and the path probed by effective_stt_model_id agree
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add 6 new VoicePanel tests: - installWhisper/installPiper rejection paths (covers catch blocks 350-351, 373-374) - persistProviders rejection path (covers catch block 294-295) - Piper installing state with percentage (covers the piper state span conditional at 534) - Piper voice preset select change triggers persistProviders (covers lines 554-562) Freeze subsequent loadData calls with a hanging promise in the rejection tests so the error state isn't immediately cleared by the finally-block reload before the assertion can observe it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t installer drop-zone The installer drop-zone (`bin/piper/voices/<id>.onnx`) is probed FIRST by `resolve_tts_voice_path` and lives under the shared root, not the temp config. When a sibling `install_piper` test runs in parallel with the default voice id and leaves a stub there, this test sees that file instead of the legacy `models/local-ai/tts/` candidate and fails the assertion. Fix: take the `shared_install_lock()` for the duration of the test and wipe the installer path before writing the legacy stub so the resolver deterministically falls through to the legacy candidate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
embeddings/factory.rsso STT (cloud Whisper proxy vs local whisper-cli) and TTS (cloud vs local Piper) can be swapped per user setting — no separate offline gate, the provider config IS the offline switch.ggml-<size>.bin+ Windowswhisper-bin-x64.zip) and Piper (binary tarball + en_US-lessac-medium voice). Fire-and-forget RPCs + polled status table render Install / Reinstall / Retry buttons inline with download progress.MicCloudComposer→MicComposer) and conversation spoken-reply path through the factory so user settings actually drive the dispatch — previously the dropdowns were decorative because the legacyvoice_cloud_transcribe/voice_reply_synthesizeRPCs ignored the config.OPENHUMAN_WORKSPACEwhen explicitly set (parallel dev sessions, multi-workspace deployments) and fall back to the shared~/.openhuman/root for normal single-user installs.Problem
Issue #1710 asks for a fully-local OpenHuman path across STT, voice/TTS, inference, and Composer. The voice surface specifically needs:
WHISPER_BIN/PIPER_BINand download multi-GB models out-of-band.Solution
Mirrors the existing
embeddings/factory.rsshape (string-matched dispatch,Box<dyn Trait>, errors on unknown provider) for both STT and TTS. The factory is the single integration point — the dispatch RPCs (voice_stt_dispatch,voice_tts_dispatch) and the migratedvoice_reply_synthesizehandler all route through it. Selecting a local provider in settings persists the choice; calls land on the local subprocess. Selecting cloud routes back through the hosted proxy.Installer infrastructure lives in
local_ai/voice_install_common.rs(shareddownload_to_filewith 30-min request budget + 45s per-chunk idle timeout to fail fast on stalled streams),install_whisper.rs, andinstall_piper.rs. Install RPCs are fire-and-forget — a synchronous handler would have hit the frontend's 30sVITE_CORE_RPC_TIMEOUT_MSon legitimate slow downloads and triggered a retry loop that deletes the.partand starts over. Per-engine status table + 2s UI polling renders real progress.Windows-specific touches:
CREATE_NO_WINDOWflag on the whisper-cli / piper subprocess spawns suppresses the console window flash.--no-timestampson whisper-cli keeps segment-time prefixes out of the transcript text.--length-scale 1.15on piper for more natural pacing than its default 1.0.Submission Checklist
diff-cover) meet the gate enforced by.github/workflows/coverage.yml. Runpnpm test:coverageandpnpm test:rustlocally; PRs below 80% on changed lines will not merge.Closes #NNNin the## RelatedsectionImpact
Desktop only. Tauri shell is the runtime target; no mobile/web/CLI surface touched. Whisper-cli and Piper are spawned as one-shot subprocesses per request (not daemons), so no new process lifecycle to manage beyond what's already in
local_ai/service/ollama_admin.rsfor Ollama.Disk: a fresh local install pulls ~769 MB (Whisper medium) + ~2.5 MB (Windows whisper-cli zip) + ~60 MB (Piper voice) + ~5 MB (Piper binary). Workspace-scoped when
OPENHUMAN_WORKSPACEis set, shared otherwise.Network: install-time only, gated by an explicit user click. Downloads come from
huggingface.co/ggerganov/whisper.cpp,huggingface.co/rhasspy/piper-voices, and GitHub releases — no new always-on backend dependency. Runtime transcription/synthesis is fully local once installed.Migration: existing config values (
stt_provider,tts_provider) default tocloudso users on the current path see no behavior change until they explicitly switch in settings.Related
brew install whisper-cppetc. (binary resolver already falls through to PATH; UI doesn't reflect it)AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
Validation Blocked
Behavior Changes
whisperorpiperin Settings → Voice triggers a local subprocess for the corresponding flow.Parity Contract
stt_provider = "cloud"andtts_provider = "cloud"(the defaults) route through the existingvoice_cloud_transcribe/reply_speech::synthesize_replypaths unchanged. No regression on the hosted path."cloud"branch invokes the sameCloudSttProvider/CloudTtsProvidershims that delegate to the original modules. Unknown provider names error explicitly — never silently fall back.Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
UI Changes
Tests