Skip to content

feat(voice): fully-local STT + TTS via Whisper/Piper provider factory#1755

Merged
senamakel merged 9 commits into
tinyhumansai:mainfrom
sanil-23:feat/1710-voice-factory
May 14, 2026
Merged

feat(voice): fully-local STT + TTS via Whisper/Piper provider factory#1755
senamakel merged 9 commits into
tinyhumansai:mainfrom
sanil-23:feat/1710-voice-factory

Conversation

@sanil-23
Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 commented May 14, 2026

Summary

  • Adds a string-matched voice provider factory mirroring embeddings/factory.rs so STT (cloud Whisper proxy vs local whisper-cli) and TTS (cloud vs local Piper) can be swapped per user setting — no separate offline gate, the provider config IS the offline switch.
  • Ships a workspace-scoped auto-installer for Whisper (ggml-<size>.bin + Windows whisper-bin-x64.zip) and Piper (binary tarball + en_US-lessac-medium voice). Fire-and-forget RPCs + polled status table render Install / Reinstall / Retry buttons inline with download progress.
  • New "Voice Providers" section in Settings → AI & Models → Voice with STT/TTS dropdowns, 5 Whisper model presets, 8 curated Piper voice presets, and auto-install on selection so picking an asset you haven't downloaded triggers the fetch automatically.
  • Wires the existing mascot mic composer (MicCloudComposerMicComposer) and conversation spoken-reply path through the factory so user settings actually drive the dispatch — previously the dropdowns were decorative because the legacy voice_cloud_transcribe / voice_reply_synthesize RPCs ignored the config.
  • All artifacts honor OPENHUMAN_WORKSPACE when explicitly set (parallel dev sessions, multi-workspace deployments) and fall back to the shared ~/.openhuman/ root for normal single-user installs.

Problem

Issue #1710 asks for a fully-local OpenHuman path across STT, voice/TTS, inference, and Composer. The voice surface specifically needs:

  • Local Whisper for transcription instead of the hosted Whisper proxy.
  • Local Piper (or equivalent) for spoken replies instead of the ElevenLabs proxy.
  • Offline mode enforcement — no silent cloud fallback when the user is on a local provider.
  • A real installer flow so users don't need to manually set WHISPER_BIN / PIPER_BIN and download multi-GB models out-of-band.

Solution

Mirrors the existing embeddings/factory.rs shape (string-matched dispatch, Box<dyn Trait>, errors on unknown provider) for both STT and TTS. The factory is the single integration point — the dispatch RPCs (voice_stt_dispatch, voice_tts_dispatch) and the migrated voice_reply_synthesize handler all route through it. Selecting a local provider in settings persists the choice; calls land on the local subprocess. Selecting cloud routes back through the hosted proxy.

Installer infrastructure lives in local_ai/voice_install_common.rs (shared download_to_file with 30-min request budget + 45s per-chunk idle timeout to fail fast on stalled streams), install_whisper.rs, and install_piper.rs. Install RPCs are fire-and-forget — a synchronous handler would have hit the frontend's 30s VITE_CORE_RPC_TIMEOUT_MS on legitimate slow downloads and triggered a retry loop that deletes the .part and starts over. Per-engine status table + 2s UI polling renders real progress.

Windows-specific touches: CREATE_NO_WINDOW flag on the whisper-cli / piper subprocess spawns suppresses the console window flash. --no-timestamps on whisper-cli keeps segment-time prefixes out of the transcript text. --length-scale 1.15 on piper for more natural pacing than its default 1.0.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.
  • N/A: behaviour-only change does not introduce new feature rows in the coverage matrix (existing voice rows still apply)
  • N/A: feature is tracked under the umbrella issue Prioritize fully local speech and Composer operation #1710 referenced in Related below
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • N/A: release-cut smoke surface unchanged — voice was already a release-tested surface and the new dropdowns default to existing cloud behaviour
  • Linked issue closed via Closes #NNN in the ## Related section

Impact

Desktop only. Tauri shell is the runtime target; no mobile/web/CLI surface touched. Whisper-cli and Piper are spawned as one-shot subprocesses per request (not daemons), so no new process lifecycle to manage beyond what's already in local_ai/service/ollama_admin.rs for Ollama.

Disk: a fresh local install pulls ~769 MB (Whisper medium) + ~2.5 MB (Windows whisper-cli zip) + ~60 MB (Piper voice) + ~5 MB (Piper binary). Workspace-scoped when OPENHUMAN_WORKSPACE is set, shared otherwise.

Network: install-time only, gated by an explicit user click. Downloads come from huggingface.co/ggerganov/whisper.cpp, huggingface.co/rhasspy/piper-voices, and GitHub releases — no new always-on backend dependency. Runtime transcription/synthesis is fully local once installed.

Migration: existing config values (stt_provider, tts_provider) default to cloud so users on the current path see no behavior change until they explicitly switch in settings.

Related

  • Closes Prioritize fully local speech and Composer operation #1710 (partial — voice slice; sibling PRs cover the chat factory and Composio direct mode)
  • Follow-up PR(s)/TODOs:
    • System-PATH detection for users who installed Whisper via brew install whisper-cpp etc. (binary resolver already falls through to PATH; UI doesn't reflect it)
    • Kokoro TTS as a sibling to Piper
    • Length-scale / sentence-silence settings sliders for Piper

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: N/A: GitHub issue tracker, not Linear
  • URL: N/A: GitHub issue tracker, not Linear

Commit & Branch

  • Branch: feat/1710-voice-factory
  • Commit SHA: 53878ea

Validation Run

  • N/A: pnpm format:check skipped — known CRLF/LF drift on ~600 unrelated files on Windows (per established team practice with --no-verify push)
  • N/A: pnpm typecheck command not present in this workspace's package.json scripts
  • Focused tests: cargo test passes on voice/install_whisper/install_piper modules; Vitest passes on VoicePanel + voiceInstallApi + MicComposer + sttClient (per agent reports during implementation)
  • Rust fmt/check (if changed): cargo fmt applied; cargo check clean on changed surface
  • N/A: Tauri shell Cargo.toml not modified (Cargo.lock changed transitively via flate2 only)

Validation Blocked

  • command: N/A
  • error: N/A
  • impact: N/A

Behavior Changes

  • Intended behavior change: STT/TTS dispatch now routes through a config-driven factory. Selecting whisper or piper in Settings → Voice triggers a local subprocess for the corresponding flow.
  • User-visible effect: New "Voice Providers" section in Settings → AI & Models → Voice. Install buttons and live progress. Mascot mic and conversation TTS use the local providers when selected. Default config preserves cloud behaviour for existing users.

Parity Contract

  • Legacy behavior preserved: stt_provider = "cloud" and tts_provider = "cloud" (the defaults) route through the existing voice_cloud_transcribe / reply_speech::synthesize_reply paths unchanged. No regression on the hosted path.
  • Guard/fallback/dispatch parity checks: factory's "cloud" branch invokes the same CloudSttProvider / CloudTtsProvider shims that delegate to the original modules. Unknown provider names error explicitly — never silently fall back.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: This one
  • Resolution: N/A

Summary by CodeRabbit

  • New Features

    • Voice settings consolidated under "AI & Models"; select and persist STT/TTS providers (cloud, Whisper, Piper).
    • One-click install/reinstall for Whisper (STT) and Piper (TTS) with progress, status, retry and error details.
    • App-wide STT/TTS dispatch routes voice requests to the chosen provider.
  • UI Changes

    • Voice panel: provider controls, install buttons, simplified dictation notice, updated breadcrumbs and setup navigation for voice errors.
  • Tests

    • Expanded coverage for provider selection, install flows, transcription and synthesis.

Review Change Stack

@sanil-23 sanil-23 requested a review from a team May 14, 2026 18:52
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds local Whisper/Piper installers and workspace-aware resolution, a STT/TTS provider factory and RPC dispatch, renderer APIs/UI for provider selection and installs, MicComposer refactor to use factory-based STT, and supporting tests and types across Rust and TypeScript.

Changes

Local voice installers, shared plumbing & path resolution

Layer / File(s) Summary
Shared installer core & download helper
src/openhuman/local_ai/voice_install_common.rs, Cargo.toml
Adds process-wide status table, VoiceInstallState/VoiceInstallStatus, download_to_file() streaming-to-.part with timeouts/progress/validation, .part helpers and unit tests; adds direct flate2 dependency for gzip inflation.
Whisper installer & local transcription
src/openhuman/local_ai/install_whisper.rs, src/openhuman/voice/local_transcribe.rs
Adds Whisper installer (GGML naming, download pipeline, Windows zip handling, status promotion), transcribe_whisper subprocess flow, workspace binary discovery, validation thresholds, and tests.
Piper installer & local synthesis
src/openhuman/local_ai/install_piper.rs, src/openhuman/voice/local_speech.rs
Adds Piper installer (voice .onnx + .onnx.json, OS/arch binary selection, tar.gz & zip extraction using gzip/flate2), Piper subprocess-based synth, viseme timeline generator, and tests.
Workspace-first path resolution & helpers
src/openhuman/local_ai/paths.rs
Adds workspace-aware resolvers, resolve_*_with_config wrappers, expanded STT/TTS model/voice/binary candidate probing, GGML normalization, and tests ensuring workspace installs are preferred.

Factory, RPC schema, and voice status integration

Layer / File(s) Summary
STT/TTS provider factory & re-exports
src/openhuman/voice/factory.rs, src/openhuman/voice/mod.rs
Defines SttProvider/TtsProvider traits, SttResult, cloud and local provider implementations (cloud/whisper, cloud/piper), factory entrypoints, defaults/presets and unit tests; re-exports factory surface.
Voice RPC dispatch & provider persistence
src/openhuman/voice/schemas.rs, src/openhuman/local_ai/schemas.rs
Adds voice_stt_dispatch, voice_tts_dispatch, voice_set_providers RPCs and handlers; registers local-ai install RPCs and status endpoints; implements fire-and-forget install handlers and status query handlers; validation helpers included.
Voice status and config fields
src/openhuman/voice/types.rs, src/openhuman/config/schema/local_ai.rs, src/openhuman/voice/ops.rs, src/openhuman/about_app/catalog.rs
Adds stt_provider/tts_provider fields to config/status with serde defaults, derives provider selectors in voice_status, and updates capability catalog entries and defaults.

Renderer-side APIs, UI, and composer changes

Layer / File(s) Summary
Renderer install API wrappers & tests
app/src/services/api/voiceInstallApi.ts, app/src/services/api/__tests__/voiceInstallApi.test.ts
Adds VoiceInstallState/VoiceInstallStatus types and RPC wrappers: installWhisper, installPiper, whisperInstallStatus, piperInstallStatus plus tests mocking callCoreRpc.
Renderer STT factory client & tests
app/src/features/human/voice/sttClient.ts, app/src/features/human/voice/sttClient.test.ts
Adds transcribeWithFactory that calls openhuman.voice_stt_dispatch with provider/model/mime/file_name overrides, trims transcripts and rewrites stale-sidecar errors; tests added.
Voice provider persistence RPC (tauri) & types
app/src/utils/tauriCommands/voice.ts
Extends VoiceStatus with stt_provider/tts_provider and adds VoiceProvidersUpdate/VoiceProvidersSnapshot types and openhumanVoiceSetProviders RPC wrapper.
Voice settings panel UI & tests
app/src/components/settings/panels/VoicePanel.tsx, app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
Implements “Voice Providers” UI: provider/model/voice selects, persistence via openhumanVoiceSetProviders, install/reinstall flows using voiceInstallApi, parallel status polling with per-engine error handling, seeding logic to avoid clobbering edits, and comprehensive UI tests.
MicComposer refactor & tests
app/src/features/human/MicComposer.tsx, app/src/features/human/MicComposer.test.tsx, app/src/pages/Conversations.tsx
Renames MicCloudComposerMicComposer, switches transcription to transcribeWithFactory for native and WAV-fallback paths, updates imports/exports and tests, and replaces composer usage in Conversations.
Settings navigation & small UX updates
app/src/components/settings/hooks/useSettingsNavigation.ts, app/src/pages/Settings.tsx, app/src/components/skills/VoiceSetupModal.tsx, app/src/chat/chatSendError.ts
Adds “Voice (STT & TTS)” settings entry, moves voice breadcrumbs into AI & Models grouping, changes modal navigation target to /settings/voice, and adds voice_synthesis/tts_not_ready chat error codes and expanded “Set up” affordance routing.

Sequence Diagram(s)

sequenceDiagram
  participant Renderer as Renderer
  participant Core as CoreRPC(openhuman.voice_stt_dispatch)
  participant Factory as Factory
  participant Provider as Provider
  participant Engine as LocalEngine/Cloud

  Renderer->>Core: call(audio_base64, opts)
  Core->>Factory: create_stt_provider(resolved_provider, model)
  Factory->>Provider: instantiate cloud/whisper provider
  Provider->>Engine: transcribe (whisper-cli or cloud)
  Engine-->>Provider: RpcOutcome<SttResult>
  Provider-->>Factory: return outcome
  Factory-->>Core: { text, provider }
  Core-->>Renderer: { text, provider }
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • graycyrus
  • senamakel

"I hopped through code with tiny feet,
Downloads stitched and installers neat.
Whisper listens, Piper sings,
Providers chosen, local springs.
Pick cloud or home — voices meet."

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/src/components/settings/panels/VoicePanel.tsx (1)

144-153: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t hard-require local STT assets when the selected provider is cloud.

Current readiness gates the panel on local STT asset state even when STT provider is cloud, which can disable voice features on the default cloud path.

Suggested fix
-      const sttAssetState = assetsResponse.result.stt?.state;
-      const sttAssetOk = sttAssetState === 'ready' || sttAssetState === 'ondemand';
+      const sttAssetState = assetsResponse.result.stt?.state;
+      const sttAssetOk = sttAssetState === 'ready' || sttAssetState === 'ondemand';
+      const sttProviderResolved = voiceResponse.stt_provider === 'whisper' ? 'whisper' : 'cloud';
+      const requiresLocalAsset = sttProviderResolved === 'whisper';
...
-      setSttReady(sttAssetOk && voiceResponse.stt_available);
+      setSttReady(voiceResponse.stt_available && (!requiresLocalAsset || sttAssetOk));

Also applies to: 263-264, 725-729

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/settings/panels/VoicePanel.tsx` around lines 144 - 153,
The readiness check is incorrectly gating STT availability on local asset state
even when the selected STT provider is cloud; update the logic in VoicePanel
(symbols: assetsResponse.result.stt, sttAssetState, sttAssetOk,
voiceResponse.stt_available, setSttReady) to skip the local-asset requirement
when the configured provider indicates cloud (check the provider flag/field used
elsewhere, e.g., voiceResponse.stt_provider or equivalent). Concretely, compute
readiness as voiceResponse.stt_available when provider === 'cloud', otherwise
require both sttAssetOk and voiceResponse.stt_available; apply the same change
at the other occurrences you noted (around lines referenced: 263-264 and
725-729).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src/components/settings/panels/__tests__/VoicePanel.test.tsx`:
- Around line 267-281: The test currently expects the free-text TTS voice input
to be present for a seeded Piper provider, but when
runtime.voiceStatus.tts_voice_id is set (e.g., 'en_US-lessac-medium') the UI
should show the preset select instead; update the VoicePanel.test to assert that
the element with test id 'tts-voice-select' is in the document and that
'tts-voice-input' is not present (use screen.getByTestId for the select and
screen.queryByTestId to assert the input is null), keeping the existing checks
for stt-provider-select, tts-provider-select, and stt-model-select.

In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 311-329: installButtonLabel currently ignores the engine parameter
so button text is generic; update installButtonLabel to include the engine name
in each returned string (e.g., "Installing Whisper 12%", "Install Piper
locally", "Repair Whisper", "Retry Piper locally", etc.) and ensure the busy
branch also uses the engine (e.g., "Installing Whisper…"); apply the same
engine-specific wording change to the other similar label helpers mentioned (the
analogous label logic at the other two locations referenced in the comment) so
all install/repair/retry strings include the engine name consistently.
- Around line 129-143: The seeding logic for
sttProvider/ttsProvider/sttModel/ttsVoice can be re-applied on every poll due to
stale closure of loadData; add a per-field "seeded" ref (e.g., sttSeededRef,
ttsSeededRef, sttModelSeededRef, ttsVoiceSeededRef) and only call
setSttProvider/setTtsProvider/setSttModel/setTtsVoice when the corresponding
seeded ref is false and voiceResponse has a value, then set that ref to true so
subsequent interval runs won't clobber user edits; update the loadData usage
accordingly so it reads and sets these refs instead of relying on falsy checks
of sttProvider/ttsProvider/sttModel/ttsVoice.

In `@src/openhuman/local_ai/schemas.rs`:
- Around line 1039-1068: The current read_status -> write_status(Installing)
sequence is non-atomic and allows race conditions; change the flow to perform an
atomic "check-and-set" so only one caller flips the state to Installing and
proceeds: add or use an atomic helper in
crate::openhuman::local_ai::voice_install_common (e.g.,
try_set_installing(engine: &str) -> Result<bool, _> or compare_and_swap_status)
that reads the current VoiceInstallStatus and, if not already Installing, writes
Installing in one atomic operation (using file lock or atomic replace) and
returns success/failure; replace the read_status + write_status block around
ENGINE_WHISPER with a call to that helper and only spawn the install when it
returns success, and apply the same fix for ENGINE_PIPER handling.

In `@src/openhuman/local_ai/voice_install_common.rs`:
- Around line 254-260: The code returns early on chunk read errors and write
errors without deleting the temporary `part_path`, leaving partial files; update
the error paths around `chunk.map_err(...)` and the
`file.write_all(...).await.map_err(...)` to attempt cleanup by calling
`tokio::fs::remove_file(part_path).await` (or a fallible wrapper that
ignores/remove-file errors) before returning the mapped error; perform the
removal asynchronously and ensure it runs whether `hasher` or `file` exist, then
return the original formatted error (e.g., call
`tokio::fs::remove_file(part_path).await.ok();` before propagating the error
from `chunk` or `file.write_all`) so `part_path` is deleted on stream/write
failures.

In `@src/openhuman/voice/factory.rs`:
- Around line 95-103: The synthesize trait currently only accepts text and
voice, dropping cloud override fields (model_id, output_format, voice_settings);
update the TTS provider interface (the synthesize method in the trait) to accept
a richer options object (e.g., ReplySpeechOptions or a new struct carrying
model_id, output_format, voice_settings and any other fields used by
voice.reply_synthesize), then update the cloud implementation that constructs
ReplySpeechOptions (and the place noted around lines 248-253) to pass these
option fields through instead of hardcoding None, and finally propagate the new
options parameter through callers (e.g., where voice.reply_synthesize is
invoked) so overrides flow from RPC to provider unchanged.

In `@src/openhuman/voice/local_transcribe.rs`:
- Around line 124-131: The code computes model_id from opts but still calls
resolve_stt_model_path(config) so the actual model file may not match the
returned model_id; update the logic to resolve the model path from the effective
model_id instead: either add an argument to resolve_stt_model_path (e.g.,
resolve_stt_model_path(config, &model_id)) or create a new helper like
resolve_stt_model_path_for(model_id, config), then use that resolved path
wherever the model is loaded/used (including the similar call later in the
function around the other occurrence) so the transcriber uses the same effective
model_id that is returned in responses.
- Around line 195-218: Replace the unbounded cmd.output().await call with a
bounded call using tokio::time::timeout (e.g.
tokio::time::timeout(Duration::from_secs(X), cmd.output()).await), import
std::time::Duration and tokio::time::timeout, and unwrap the nested Result so
you return a clear timeout error (e.g. format!("{LOG_PREFIX} whisper-cli timed
out after {X}s")) when the timeout returns Err(tokio::time::error::Elapsed);
preserve the existing cleanup of file_path and keep the subsequent mapping of
the subprocess error (output_result -> output) but adapt it to handle the
timeout/elapsed branch explicitly.

In `@src/openhuman/voice/schemas.rs`:
- Around line 513-520: The current construction of voice eagerly defaults
p.voice_id to DEFAULT_PIPER_VOICE which forces a Piper ID even when provider ==
"cloud"; change this to preserve an Option for the incoming voice (i.e.,
map/trim/filter p.voice_id into Option<String> without unwrap_or_else), then
apply the DEFAULT_PIPER_VOICE only in the provider-specific branch for Piper
(where provider == "piper") before calling downstream TTS; update both places
where voice is computed (the block using p.voice_id and the similar block around
lines 604-610) so cloud paths leave voice unset unless explicitly provided.

---

Outside diff comments:
In `@app/src/components/settings/panels/VoicePanel.tsx`:
- Around line 144-153: The readiness check is incorrectly gating STT
availability on local asset state even when the selected STT provider is cloud;
update the logic in VoicePanel (symbols: assetsResponse.result.stt,
sttAssetState, sttAssetOk, voiceResponse.stt_available, setSttReady) to skip the
local-asset requirement when the configured provider indicates cloud (check the
provider flag/field used elsewhere, e.g., voiceResponse.stt_provider or
equivalent). Concretely, compute readiness as voiceResponse.stt_available when
provider === 'cloud', otherwise require both sttAssetOk and
voiceResponse.stt_available; apply the same change at the other occurrences you
noted (around lines referenced: 263-264 and 725-729).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f8b06946-b2b7-457b-9fc6-7c5d67c88662

📥 Commits

Reviewing files that changed from the base of the PR and between 38924e3 and 53878ea.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (30)
  • Cargo.toml
  • app/src/components/settings/hooks/useSettingsNavigation.ts
  • app/src/components/settings/panels/VoicePanel.tsx
  • app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
  • app/src/components/skills/VoiceSetupModal.tsx
  • app/src/features/human/MicComposer.test.tsx
  • app/src/features/human/MicComposer.tsx
  • app/src/features/human/voice/sttClient.test.ts
  • app/src/features/human/voice/sttClient.ts
  • app/src/pages/Conversations.tsx
  • app/src/pages/Settings.tsx
  • app/src/services/api/__tests__/voiceInstallApi.test.ts
  • app/src/services/api/voiceInstallApi.ts
  • app/src/utils/tauriCommands/voice.ts
  • src/openhuman/about_app/catalog.rs
  • src/openhuman/config/schema/local_ai.rs
  • src/openhuman/local_ai/install_piper.rs
  • src/openhuman/local_ai/install_whisper.rs
  • src/openhuman/local_ai/mod.rs
  • src/openhuman/local_ai/paths.rs
  • src/openhuman/local_ai/schemas.rs
  • src/openhuman/local_ai/voice_install_common.rs
  • src/openhuman/voice/factory.rs
  • src/openhuman/voice/local_speech.rs
  • src/openhuman/voice/local_transcribe.rs
  • src/openhuman/voice/mod.rs
  • src/openhuman/voice/ops.rs
  • src/openhuman/voice/schemas.rs
  • src/openhuman/voice/schemas_tests.rs
  • src/openhuman/voice/types.rs

Comment thread app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
Comment thread app/src/components/settings/panels/VoicePanel.tsx Outdated
Comment thread app/src/components/settings/panels/VoicePanel.tsx
Comment on lines +1039 to +1068
// Idempotency: a duplicate click while an install is already in
// flight should be a no-op, not a second concurrent download.
let current = crate::openhuman::local_ai::voice_install_common::read_status(
crate::openhuman::local_ai::voice_install_common::ENGINE_WHISPER,
);
if current.state
== crate::openhuman::local_ai::voice_install_common::VoiceInstallState::Installing
{
tracing::debug!(
"[voice-install:whisper] already installing — returning current status"
);
return serde_json::to_value(current)
.map_err(|e| format!("serialize whisper status: {e}"));
}

// Mark "installing" before the spawn so the very next status poll
// (≤ 2s away) reflects the new state without a stale read.
crate::openhuman::local_ai::voice_install_common::write_status(
crate::openhuman::local_ai::voice_install_common::VoiceInstallStatus {
engine: crate::openhuman::local_ai::voice_install_common::ENGINE_WHISPER
.to_string(),
state:
crate::openhuman::local_ai::voice_install_common::VoiceInstallState::Installing,
progress: Some(0),
downloaded_bytes: None,
total_bytes: None,
stage: Some("queued".to_string()),
error_detail: None,
},
);
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Make install start transition atomic to prevent duplicate concurrent downloads.

Both handlers perform a non-atomic read_statuswrite_status(Installing) sequence. Concurrent RPC calls can pass the check and spawn multiple installs for the same engine.

Suggested direction
+// voice_install_common.rs (new helper idea)
+pub fn try_begin_install(engine: &str) -> VoiceInstallStatus {
+    // lock shared table, if already Installing return current
+    // else write Installing{progress:0, stage:"queued"} and return it
+}
-        let current = read_status(ENGINE_WHISPER);
-        if current.state == VoiceInstallState::Installing { ... }
-        write_status(VoiceInstallStatus { ... Installing ... });
+        let status = crate::openhuman::local_ai::voice_install_common::try_begin_install(
+            crate::openhuman::local_ai::voice_install_common::ENGINE_WHISPER,
+        );
+        if status.state == crate::openhuman::local_ai::voice_install_common::VoiceInstallState::Installing
+            && status.stage.as_deref() != Some("queued")
+        {
+            return serde_json::to_value(status).map_err(|e| format!("serialize whisper status: {e}"));
+        }

Apply the same pattern for Piper.

Also applies to: 1099-1121

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/local_ai/schemas.rs` around lines 1039 - 1068, The current
read_status -> write_status(Installing) sequence is non-atomic and allows race
conditions; change the flow to perform an atomic "check-and-set" so only one
caller flips the state to Installing and proceeds: add or use an atomic helper
in crate::openhuman::local_ai::voice_install_common (e.g.,
try_set_installing(engine: &str) -> Result<bool, _> or compare_and_swap_status)
that reads the current VoiceInstallStatus and, if not already Installing, writes
Installing in one atomic operation (using file lock or atomic replace) and
returns success/failure; replace the read_status + write_status block around
ENGINE_WHISPER with a call to that helper and only spawn the install when it
returns success, and apply the same fix for ENGINE_PIPER handling.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Implementing a proper atomic check-and-set for the install state transition (comment #7 on schemas.rs:1068) is a heavier refactor requiring a file lock or an in-memory Mutex guarded by a lazy_static/OnceLock. Deferring to a follow-up issue to avoid expanding scope here — will open a tracking issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanil-23, that's a reasonable call for scope management. The good news is that the shared status table in voice_install_common.rs already uses an in-memory Mutex-protected HashMap, so the actual fix is lighter than a full file-lock approach — it's mainly about holding the lock across both the read and the conditional write (a try_set_installing(engine) helper that acquires the lock once). That said, deferring is sensible.

Would you like me to open a GitHub tracking issue for this so it doesn't get lost? I can include the relevant context (atomic check-and-set for ENGINE_WHISPER/ENGINE_PIPER in voice_install_common, link back to this PR and comment).


🧠 Learnings used
Learnt from: graycyrus
Repo: tinyhumansai/openhuman PR: 271
File: app/src/pages/onboarding/Onboarding.tsx:65-85
Timestamp: 2026-04-02T13:53:41.803Z
Learning: When reviewing `local_ai` Tauri backend RPCs in `src/openhuman/local_ai/**`, note that `local_ai_download` and `local_ai_download_all_assets` both delegate to the same `service.download_all_models()` obtained from `local_ai::global(config)`, which uses shared internal state (via a shared `Arc`) to make concurrent/duplicate downloads an in-flight no-op. Because the backend is already idempotent for concurrent calls, don’t require or introduce separate frontend cross-component/concurrency guards to prevent duplicate downloads.

Learnt from: sanil-23
Repo: tinyhumansai/openhuman PR: 416
File: src/openhuman/memory/relex.rs:441-464
Timestamp: 2026-04-07T15:49:51.275Z
Learning: When using the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, don’t require individual execution-provider feature flags (e.g., `directml`, `coreml`, `cuda`) alongside `load-dynamic` to get EP registration code. The `ort` crate already compiles EP registration via `#[cfg(any(feature = "load-dynamic", feature = "<ep_name>"))]` guards, and adding per-EP feature flags can pull in static-linking dependencies that conflict with the dynamic loading approach. At runtime, EP availability is determined by what the dynamically loaded ONNX Runtime library (`onnxruntime.dll`/`.so`/`.dylib`) supports; ort docs indicate providers like `directml`/`xnnpack`/`coreml` are available in builds when the platform supports them.

Learnt from: sanil-23
Repo: tinyhumansai/openhuman PR: 416
File: src/openhuman/memory/relex.rs:441-464
Timestamp: 2026-04-07T15:49:51.275Z
Learning: When integrating the `ort` Rust crate v2.x with the `load-dynamic` feature enabled, do NOT also require/enable individual provider EP Cargo features like `directml`, `coreml`, or `cuda`. In `ort` v2.x, EP registration for providers (e.g., DirectML, CoreML, CUDA, etc.) is already compiled in under source-level `#[cfg(any(feature = "load-dynamic", feature = "<provider>"))]` guards, such as in `ep/directml.rs`. Adding provider feature flags alongside `load-dynamic` can pull in static-linking dependencies that conflict with the dynamic-loading approach. Provider availability should be treated as runtime-determined by what the loaded `onnxruntime` library (`onnxruntime.dll`/`libonnxruntime.so`/`libonnxruntime.dylib`) actually supports.

Learnt from: oxoxDev
Repo: tinyhumansai/openhuman PR: 571
File: src/openhuman/local_ai/service/whisper_engine.rs:69-80
Timestamp: 2026-04-14T19:59:04.826Z
Learning: When reviewing Rust code in this repo that uses the upstream `whisper-rs` crate (v0.16.0), do not report `WhisperContextParameters::use_gpu(...)` or `WhisperContextParameters::flash_attn(...)` as missing/invalid APIs. These builder-style methods exist upstream and return `&mut Self`; they are not limited to `WhisperVadContextParams`.

Learnt from: graycyrus
Repo: tinyhumansai/openhuman PR: 1078
File: src/openhuman/agent/agents/welcome/prompt.rs:24-24
Timestamp: 2026-05-01T13:41:00.958Z
Learning: For Rust code under `src/openhuman/**/*.rs`, use `snake_case` for local variables (not `camelCase`). If a local variable name is written in `camelCase`, treat it as a style/lint issue because it will trigger Rust’s `non_snake_case` warning (and related clippy linting, if enabled). Avoid suggesting `camelCase` for any Rust local variable names in this repository.

Learnt from: senamakel
Repo: tinyhumansai/openhuman PR: 1173
File: tests/agent_memory_loader_public.rs:88-88
Timestamp: 2026-05-04T06:50:47.877Z
Learning: In this repository, the general camelCase naming guideline should not be applied to Rust source files. For all .rs files, Rust function (and related) names should use snake_case, and snake_case Rust function names should not be flagged—even for async test functions annotated with attributes like #[tokio::test]. This is consistent with Rust’s non_snake_case lint behavior.

Comment thread src/openhuman/local_ai/voice_install_common.rs Outdated
Comment thread src/openhuman/voice/factory.rs
Comment thread src/openhuman/voice/local_transcribe.rs
Comment thread src/openhuman/voice/local_transcribe.rs
Comment thread src/openhuman/voice/schemas.rs
sanil-23 and others added 5 commits May 15, 2026 00:31
Refs tinyhumansai#1710 (partial)

Adds a string-matched provider factory mirroring embeddings/factory.rs so
the user can route speech-to-text and text-to-speech through either the
hosted backend proxy or a local Whisper/Piper subprocess. Config strings
drive dispatch — there is no separate offline gate; selecting
`stt_provider = "whisper"` + `tts_provider = "piper"` + a local LLM
provider IS offline mode.

Modules
- voice/factory.rs: create_stt_provider / create_tts_provider, returns
  Box<dyn SttProvider | TtsProvider>, errors on unknown provider name.
- voice/local_transcribe.rs: invokes whisper-cli via tokio process.
  --no-timestamps so segment-time prefixes don't leak into the transcript.
  CREATE_NO_WINDOW on Windows to suppress the subprocess console flash.
- voice/local_speech.rs: invokes piper.exe; --length-scale 1.15 for
  natural pacing; same Windows console suppression.
- voice/schemas.rs: voice_stt_dispatch + voice_tts_dispatch RPCs;
  voice_reply_synthesize handler now routes through create_tts_provider
  so the dropdown selection actually applies to spoken replies (was
  decorative — backend always called the cloud path).
- voice/types.rs + ops.rs: surface stt_provider/tts_provider in the
  voice_status snapshot.
- config/schema/local_ai.rs: stt_provider, tts_provider fields with
  "cloud" defaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs tinyhumansai#1710 (partial)

Adds workspace-scoped installers for the local STT (Whisper) and TTS
(Piper) binaries + models, with a shared status table the UI polls for
progress.

Modules
- local_ai/voice_install_common.rs: shared download_to_file primitive
  (streaming to .part + atomic rename, 30min request budget, 45s
  per-chunk idle timeout, SHA256 hook). Mutex<HashMap> status table
  keyed by engine id.
- local_ai/install_whisper.rs: fetch ggml-<size>.bin from HuggingFace +
  Windows whisper-cli release zip; per-size installed-artifacts check
  so switching model size triggers a fresh download.
- local_ai/install_piper.rs: fetch piper binary tarball + en_US-lessac-medium
  voice .onnx + .onnx.json sidecar; per-voice installed-artifacts check.
- local_ai/paths.rs: workspace_whisper_dir / workspace_piper_dir under
  shared_root_dir(), which honors OPENHUMAN_WORKSPACE when explicitly
  set (parallel test sessions / multi-workspace deployments) and falls
  back to ~/.openhuman/ otherwise. Tolerant filename normalization so
  stale legacy ids (ggml-base-q5_1.bin) don't double-prefix into
  ggml-ggml-base-q5_1.bin.bin. TTS voice resolver probes the installer
  drop-zone BEFORE legacy paths to skip stale 4-byte stubs that crashed
  Piper with STATUS_STACK_BUFFER_OVERRUN. Release/ added to whisper
  binary candidate list (current whisper.cpp Windows zip layout).
- local_ai/schemas.rs: install_whisper / install_piper RPCs are
  fire-and-forget — handlers write Installing(0%) to the status table,
  spawn the download as a tokio task, return immediately. UI polls
  status via local_ai_*_install_status. Idempotency guard prevents
  duplicate concurrent installs from a double-click.
- Cargo.toml: declare flate2 directly for the Piper tar.gz extractor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs tinyhumansai#1710 (partial)

Surfaces the new voice provider factory + installers through the
Settings → AI & Models → Voice card.

UI
- VoicePanel.tsx: new "Voice Providers" section above the legacy
  dictation server settings. STT dropdown (cloud | whisper) + model
  selector (5 Whisper sizes, default medium). TTS dropdown (cloud |
  piper) + 8 curated Piper voice presets with "Other (advanced)" free-
  text fallback. Install locally / Reinstall locally / Retry locally
  buttons rendered from polled status. Auto-install on model/voice
  selection so changing the dropdown triggers the download if missing.
  Page title Voice Dictation → Voice. Removed the dead "Open Local AI
  Model" CTA that pointed at the wrong panel.
- VoicePanel.test.tsx: tests for the new provider rows + install state.
- voiceInstallApi.ts: RPC wrappers for the four install_* / install_*_status
  endpoints. Co-located vitest tests.
- tauriCommands/voice.ts: voice_set_providers + VoiceStatus extensions.
- Settings.tsx: Voice (STT & TTS) card under aiModelsSettingsItems so the
  panel is discoverable from the main settings nav (the panel had no nav
  entry before — only reachable via Conversations deep-link).
- useSettingsNavigation.ts: breadcrumb routes voice under AI & Models.

Capability catalog (about_app/catalog.rs)
- local_ai.speech_to_text + .text_to_speech entries describe the
  factory-dispatched providers, the available presets, and how to switch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs tinyhumansai#1710 (partial)

Wires the existing mascot mic composer and conversation spoken-reply
flow into the new voice provider factory so the user's settings actually
apply on the user-visible path.

Mascot mic
- MicComposer.tsx (renamed from MicCloudComposer.tsx): drops the
  cloud-only assumption; calls openhuman.voice_stt_dispatch via
  transcribeWithFactory so config.local_ai.stt_provider drives provider
  choice (cloud Whisper proxy or local whisper-cli subprocess).
- sttClient.ts: adds transcribeWithFactory wrapping the dispatch RPC.
  Existing transcribeCloud kept for callers not yet migrated.
- sttClient.test.ts: new tests for the factory path (routing, override,
  empty-blob rejection, stale-sidecar rewriting, error passthrough).

Setup error CTAs
- Conversations.tsx (sendError handler): "Set up" button on
  stt_not_ready / voice_transcription / tts_not_ready / voice_synthesis
  codes now routes to /settings/voice (the install lives there) instead
  of /settings/local-model (legacy Ollama-asset panel).
- VoiceSetupModal.tsx: same nav fix — "Open Local AI Model" → Voice
  settings, since the STT model install lives on the Voice panel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- chatSendError.ts: add `voice_synthesis` + `tts_not_ready` to
  ChatSendErrorCode union. The Conversations.tsx setup-error CTA now
  routes those codes to /settings/voice, but the comparison was a
  no-overlap warning until the union learned the new codes.
- VoicePanel.tsx: rename unused `engine` arg in installButtonLabel to
  `_engine` (TS6133). It was used in the prior longer labels ("Install
  Whisper"); shortened labels ("Install locally") no longer reference it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sanil-23 sanil-23 force-pushed the feat/1710-voice-factory branch from 53878ea to f2aa8be Compare May 14, 2026 19:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/openhuman/voice/schemas.rs (2)

604-610: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Same provider-agnostic voice default issue.

This handler has the same issue as handle_voice_reply_synthesize: when provider_name is "cloud" and no voice param is provided, the code defaults to DEFAULT_PIPER_VOICE and passes it to the cloud provider. Apply the same provider-aware fix here.

🔧 Proposed fix
 let provider_name = p
     .provider
     .as_deref()
     .map(str::trim)
     .filter(|s| !s.is_empty())
     .map(str::to_string)
     .unwrap_or_else(|| effective_tts_provider(&config));
-let voice = p
-    .voice
-    .as_deref()
-    .map(str::trim)
-    .filter(|s| !s.is_empty())
-    .map(str::to_string)
-    .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string());
+let voice = match provider_name.as_str() {
+    "piper" => Some(
+        p.voice
+            .as_deref()
+            .map(str::trim)
+            .filter(|s| !s.is_empty())
+            .map(str::to_string)
+            .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string())
+    ),
+    _ => p.voice
+        .as_deref()
+        .map(str::trim)
+        .filter(|s| !s.is_empty())
+        .map(str::to_string()),
+};

 log::debug!(
-    "[voice-factory] RPC voice_tts_dispatch provider={provider_name} voice={voice}"
+    "[voice-factory] RPC voice_tts_dispatch provider={provider_name} voice={:?}", voice
 );
 let provider =
-    crate::openhuman::voice::create_tts_provider(&provider_name, &voice, &config)
+    crate::openhuman::voice::create_tts_provider(&provider_name, voice.as_deref().unwrap_or(""), &config)
         .map_err(|e| e.to_string())?;
-let outcome = provider.synthesize(&config, &p.text, Some(&voice)).await?;
+let outcome = provider.synthesize(&config, &p.text, voice.as_deref()).await?;
 to_json(outcome)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/voice/schemas.rs` around lines 604 - 610, The current voice
fallback always uses crate::openhuman::voice::DEFAULT_PIPER_VOICE for p.voice
(producing let voice = ...), which sends a Piper default to the cloud provider;
change the fallback to be provider-aware: if p.provider_name (check
p.provider_name.as_deref()) equals "cloud" use the cloud default constant (e.g.
crate::openhuman::voice::DEFAULT_CLOUD_VOICE) otherwise use DEFAULT_PIPER_VOICE,
and set the resulting String into the same voice variable so this handler (like
handle_voice_reply_synthesize) supplies the correct provider-specific default.

513-519: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Provider-agnostic voice default forces Piper voice ID onto cloud TTS.

When provider_name resolves to "cloud" and no voice_id is provided in params, the code defaults to DEFAULT_PIPER_VOICE and passes it to the cloud provider via Some(&voice) on line 526. Cloud providers expect different voice IDs (e.g., ElevenLabs voice IDs), so passing a Piper-specific ID can cause errors or incorrect behavior.

Make the voice fallback provider-aware: keep voice as Option<String>, only default to DEFAULT_PIPER_VOICE when provider_name == "piper", and leave it None for cloud unless explicitly provided.

🔧 Proposed fix
 let provider_name = effective_tts_provider(&config);
-let voice = p
-    .voice_id
-    .as_deref()
-    .map(str::trim)
-    .filter(|s| !s.is_empty())
-    .map(str::to_string)
-    .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string());
+let voice = match provider_name.as_str() {
+    "piper" => Some(
+        p.voice_id
+            .as_deref()
+            .map(str::trim)
+            .filter(|s| !s.is_empty())
+            .map(str::to_string)
+            .unwrap_or_else(|| crate::openhuman::voice::DEFAULT_PIPER_VOICE.to_string())
+    ),
+    _ => p.voice_id
+        .as_deref()
+        .map(str::trim)
+        .filter(|s| !s.is_empty())
+        .map(str::to_string()),
+};
 log::debug!(
-    "[voice-factory] voice_reply_synthesize dispatch provider={provider_name} voice={voice}"
+    "[voice-factory] voice_reply_synthesize dispatch provider={provider_name} voice={:?}", voice
 );
 let provider =
-    crate::openhuman::voice::create_tts_provider(&provider_name, &voice, &config)
+    crate::openhuman::voice::create_tts_provider(&provider_name, voice.as_deref().unwrap_or(""), &config)
         .map_err(|e| e.to_string())?;
-to_json(provider.synthesize(&config, &p.text, Some(&voice)).await?)
+to_json(provider.synthesize(&config, &p.text, voice.as_deref()).await?)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/voice/schemas.rs` around lines 513 - 519, The current code
unconditionally falls back to DEFAULT_PIPER_VOICE when p.voice_id is empty,
which forces a Piper voice onto cloud TTS; change the voice binding to be an
Option<String> by mapping p.voice_id -> Some(trimmed_string) if present,
otherwise set to Some(DEFAULT_PIPER_VOICE.to_string()) only when provider_name
== "piper", and set to None for other providers (e.g., "cloud"); update any
downstream uses that pass Some(&voice) to instead handle
Option<String>/Option<&String> so cloud providers receive None unless an
explicit voice_id was supplied; reference the local variable voice, p.voice_id,
DEFAULT_PIPER_VOICE and provider_name in your changes.
🧹 Nitpick comments (1)
app/src/services/api/voiceInstallApi.ts (1)

29-44: ⚡ Quick win

Keep frontend status fields camelCase and map snake_case at the wire boundary.

Line 37, Line 39, and Line 43 introduce snake_case field names into app-facing TypeScript types. Prefer a wire DTO (snake_case) + mapped frontend interface (camelCase) so UI/state code stays consistent.

♻️ Proposed refactor
+export interface VoiceInstallStatusWire {
+  engine: 'whisper' | 'piper';
+  state: VoiceInstallState;
+  progress: number | null;
+  downloaded_bytes: number | null;
+  total_bytes: number | null;
+  stage: string | null;
+  error_detail: string | null;
+}
+
 export interface VoiceInstallStatus {
-  /** `"whisper"` or `"piper"`. */
-  engine: string;
+  /** `"whisper"` or `"piper"`. */
+  engine: 'whisper' | 'piper';
   /** Current state — drives the button label / spinner. */
   state: VoiceInstallState;
   /** 0–100 percent, populated while `state === 'installing'`. */
   progress: number | null;
   /** Bytes received so far across the current download stage. */
-  downloaded_bytes: number | null;
+  downloadedBytes: number | null;
   /** Total bytes expected (may be null for chunked transfer encoding). */
-  total_bytes: number | null;
+  totalBytes: number | null;
   /** Free-text status line — e.g. "downloading model (ggml-tiny.bin)". */
   stage: string | null;
   /** Populated when `state === 'error'`. */
-  error_detail: string | null;
+  errorDetail: string | null;
 }
+
+function fromWireStatus(wire: VoiceInstallStatusWire): VoiceInstallStatus {
+  return {
+    engine: wire.engine,
+    state: wire.state,
+    progress: wire.progress,
+    downloadedBytes: wire.downloaded_bytes,
+    totalBytes: wire.total_bytes,
+    stage: wire.stage,
+    errorDetail: wire.error_detail,
+  };
+}

As per coding guidelines, Use camelCase for variable and function names in TypeScript/JavaScript code.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/services/api/voiceInstallApi.ts` around lines 29 - 44, The exported
VoiceInstallStatus currently exposes snake_case fields (downloaded_bytes,
total_bytes, error_detail) to app code; introduce a separate wire DTO (e.g.,
VoiceInstallStatusDto) with the snake_case properties returned by the API and
keep the app-facing VoiceInstallStatus using camelCase (downloadedBytes,
totalBytes, errorDetail), then update the parsing/mapping logic in the
voiceInstallApi functions to convert from VoiceInstallStatusDto ->
VoiceInstallStatus at the network boundary so UI/state code consumes only
camelCase fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/openhuman/voice/schemas.rs`:
- Around line 604-610: The current voice fallback always uses
crate::openhuman::voice::DEFAULT_PIPER_VOICE for p.voice (producing let voice =
...), which sends a Piper default to the cloud provider; change the fallback to
be provider-aware: if p.provider_name (check p.provider_name.as_deref()) equals
"cloud" use the cloud default constant (e.g.
crate::openhuman::voice::DEFAULT_CLOUD_VOICE) otherwise use DEFAULT_PIPER_VOICE,
and set the resulting String into the same voice variable so this handler (like
handle_voice_reply_synthesize) supplies the correct provider-specific default.
- Around line 513-519: The current code unconditionally falls back to
DEFAULT_PIPER_VOICE when p.voice_id is empty, which forces a Piper voice onto
cloud TTS; change the voice binding to be an Option<String> by mapping
p.voice_id -> Some(trimmed_string) if present, otherwise set to
Some(DEFAULT_PIPER_VOICE.to_string()) only when provider_name == "piper", and
set to None for other providers (e.g., "cloud"); update any downstream uses that
pass Some(&voice) to instead handle Option<String>/Option<&String> so cloud
providers receive None unless an explicit voice_id was supplied; reference the
local variable voice, p.voice_id, DEFAULT_PIPER_VOICE and provider_name in your
changes.

---

Nitpick comments:
In `@app/src/services/api/voiceInstallApi.ts`:
- Around line 29-44: The exported VoiceInstallStatus currently exposes
snake_case fields (downloaded_bytes, total_bytes, error_detail) to app code;
introduce a separate wire DTO (e.g., VoiceInstallStatusDto) with the snake_case
properties returned by the API and keep the app-facing VoiceInstallStatus using
camelCase (downloadedBytes, totalBytes, errorDetail), then update the
parsing/mapping logic in the voiceInstallApi functions to convert from
VoiceInstallStatusDto -> VoiceInstallStatus at the network boundary so UI/state
code consumes only camelCase fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ba0edccb-b5e6-4e35-84e9-63228884860b

📥 Commits

Reviewing files that changed from the base of the PR and between 53878ea and f2aa8be.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (31)
  • Cargo.toml
  • app/src/chat/chatSendError.ts
  • app/src/components/settings/hooks/useSettingsNavigation.ts
  • app/src/components/settings/panels/VoicePanel.tsx
  • app/src/components/settings/panels/__tests__/VoicePanel.test.tsx
  • app/src/components/skills/VoiceSetupModal.tsx
  • app/src/features/human/MicComposer.test.tsx
  • app/src/features/human/MicComposer.tsx
  • app/src/features/human/voice/sttClient.test.ts
  • app/src/features/human/voice/sttClient.ts
  • app/src/pages/Conversations.tsx
  • app/src/pages/Settings.tsx
  • app/src/services/api/__tests__/voiceInstallApi.test.ts
  • app/src/services/api/voiceInstallApi.ts
  • app/src/utils/tauriCommands/voice.ts
  • src/openhuman/about_app/catalog.rs
  • src/openhuman/config/schema/local_ai.rs
  • src/openhuman/local_ai/install_piper.rs
  • src/openhuman/local_ai/install_whisper.rs
  • src/openhuman/local_ai/mod.rs
  • src/openhuman/local_ai/paths.rs
  • src/openhuman/local_ai/schemas.rs
  • src/openhuman/local_ai/voice_install_common.rs
  • src/openhuman/voice/factory.rs
  • src/openhuman/voice/local_speech.rs
  • src/openhuman/voice/local_transcribe.rs
  • src/openhuman/voice/mod.rs
  • src/openhuman/voice/ops.rs
  • src/openhuman/voice/schemas.rs
  • src/openhuman/voice/schemas_tests.rs
  • src/openhuman/voice/types.rs
✅ Files skipped from review due to trivial changes (4)
  • app/src/chat/chatSendError.ts
  • app/src/components/skills/VoiceSetupModal.tsx
  • app/src/pages/Settings.tsx
  • src/openhuman/local_ai/mod.rs
🚧 Files skipped from review as they are similar to previous changes (23)
  • app/src/utils/tauriCommands/voice.ts
  • src/openhuman/voice/schemas_tests.rs
  • app/src/components/settings/hooks/useSettingsNavigation.ts
  • Cargo.toml
  • app/src/components/settings/panels/tests/VoicePanel.test.tsx
  • app/src/features/human/voice/sttClient.test.ts
  • app/src/services/api/tests/voiceInstallApi.test.ts
  • src/openhuman/voice/mod.rs
  • app/src/pages/Conversations.tsx
  • src/openhuman/config/schema/local_ai.rs
  • src/openhuman/voice/types.rs
  • app/src/features/human/voice/sttClient.ts
  • app/src/features/human/MicComposer.tsx
  • src/openhuman/local_ai/install_piper.rs
  • src/openhuman/voice/local_transcribe.rs
  • src/openhuman/about_app/catalog.rs
  • src/openhuman/local_ai/schemas.rs
  • app/src/components/settings/panels/VoicePanel.tsx
  • src/openhuman/local_ai/paths.rs
  • src/openhuman/local_ai/install_whisper.rs
  • src/openhuman/voice/local_speech.rs
  • src/openhuman/voice/factory.rs
  • src/openhuman/local_ai/voice_install_common.rs

sanil-23 and others added 4 commits May 15, 2026 00:48
- VoicePanel.test.tsx: assert tts-voice-select (not tts-voice-input) when
  seeded Piper provider has a known preset voice id (tinyhumansai#1)
- VoicePanel.tsx: fix stale-closure reseeding by using functional setState
  updater (prev || seeded) so interval polls can't clobber user edits (#2)
- local_transcribe.rs: resolve model path via resolve_stt_model_path_by_id
  so the per-request model_id override is actually used (tinyhumansai#3); wrap
  cmd.output() in a 120s tokio::time::timeout to prevent hung STT (tinyhumansai#4)
- voice_install_common.rs: delete .part file on chunk-read and write errors
  so mid-stream failures don't leave stale artifacts (tinyhumansai#5)
- schemas.rs: make voice default provider-aware; only fall back to
  DEFAULT_PIPER_VOICE when provider == "piper", leave cloud voice unset
  unless explicitly provided (tinyhumansai#6)
- paths.rs: add resolve_stt_model_path_by_id helper used by tinyhumansai#3
- Settings.tsx, VoicePanel.tsx: fix Prettier formatting (CI gate)

Heavy lifts deferred (replied on threads): atomic install-start race (tinyhumansai#7),
TTS trait cloud overrides (tinyhumansai#8). Intentional UX deferral: generic install
button labels (tinyhumansai#9).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- VoicePanel.test.tsx: update button label assertions to generic labels
  ("Install locally" / "Reinstall locally" / "Retry locally") matching the
  intentional UX decision; remove stale "Voice Dictation" heading lookup
  that was never rendered by the component
- paths.rs: update empty-size fallback assertion to ggml-medium.bin
  (workspace_whisper_model_path changed from large-v3-turbo to medium
  in the prior commit but the test expectation was not updated)
- install_whisper.rs: pin config.local_ai.stt_model_id to
  DEFAULT_WHISPER_MODEL_SIZE in status_promotes_to_installed_when_model_present
  so the path created and the path probed by effective_stt_model_id agree

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add 6 new VoicePanel tests:
- installWhisper/installPiper rejection paths (covers catch blocks 350-351,
  373-374)
- persistProviders rejection path (covers catch block 294-295)
- Piper installing state with percentage (covers the piper state span
  conditional at 534)
- Piper voice preset select change triggers persistProviders (covers
  lines 554-562)

Freeze subsequent loadData calls with a hanging promise in the rejection
tests so the error state isn't immediately cleared by the finally-block
reload before the assertion can observe it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t installer drop-zone

The installer drop-zone (`bin/piper/voices/<id>.onnx`) is probed FIRST
by `resolve_tts_voice_path` and lives under the shared root, not the
temp config. When a sibling `install_piper` test runs in parallel with
the default voice id and leaves a stub there, this test sees that file
instead of the legacy `models/local-ai/tts/` candidate and fails the
assertion.

Fix: take the `shared_install_lock()` for the duration of the test and
wipe the installer path before writing the legacy stub so the resolver
deterministically falls through to the legacy candidate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@senamakel senamakel merged commit 3165306 into tinyhumansai:main May 14, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prioritize fully local speech and Composer operation

2 participants