Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ add_subdirectory(asr)
add_subdirectory(translate)
add_subdirectory(tts)
add_subdirectory(pipeline)
if(VOX_BUILD_APPS)
if(VOX_BUILD_APPS OR VOX_BUILD_TESTS)
add_subdirectory(apps)
endif()

Expand Down
25 changes: 21 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,23 @@ cmake --build build --target vox -j

## Model

The `vox` CLI can list, download, verify, repair, and remove known local models:

```sh
./build/bin/vox model list
./build/bin/vox model download qwen3-asr-1.7b
./build/bin/vox model download kokoro-tts
./build/bin/vox model download qwen3-tts
./build/bin/vox model verify qwen3-asr-1.7b
./build/bin/vox model repair qwen3-asr-1.7b
```

Model verification checks that expected files exist, are non-empty, and do not
have leftover partial downloads. Checksums are reported when metadata is
available; the current bundled manifests rely on file presence and size. Common
aliases such as `kokoro`, `cosyvoice`, and `qwen3-tts` resolve to their
canonical model entries.

### Whisper ASR

Download or place a local Whisper GGML model under `models/`. For multilingual recognition, use a non-`.en` model.
Expand Down Expand Up @@ -129,7 +146,7 @@ CosyVoice3 remains the default TTS engine.
Download the minimum baked-voice CosyVoice3 GGUF set:

```sh
scripts/download-cosyvoice3-tts-gguf.sh
./build/bin/vox model download cosyvoice3-tts
```

That creates:
Expand All @@ -146,7 +163,7 @@ Pass the LLM GGUF with `--tts-model`. The runtime auto-discovers sibling flow, H
Kokoro-82M is available with `--tts-engine kokoro`:

```sh
scripts/download-kokoro-tts-gguf.sh
./build/bin/vox model download kokoro-tts
```

On Windows PowerShell:
Expand All @@ -167,7 +184,7 @@ Pass the Kokoro model with `--tts-model`. The runtime auto-discovers `kokoro-voi
Qwen3-TTS 0.6B is available with `--tts-engine qwen3-tts`. The recommended quick-test path is CustomVoice Q8_0 because it has built-in speakers and does not need a reference WAV:

```sh
scripts/download-qwen3-tts-gguf.sh
./build/bin/vox model download qwen3-tts
```

On Windows PowerShell:
Expand All @@ -183,7 +200,7 @@ models/tts/qwen3-tts-0.6b-customvoice/qwen3-tts-12hz-0.6b-customvoice-q8_0.gguf
models/tts/qwen3-tts-0.6b-customvoice/qwen3-tts-tokenizer-12hz.gguf
```

Pass the talker GGUF with `--tts-model`. The runtime auto-discovers `qwen3-tts-tokenizer-12hz.gguf` in the same directory, or use `--tts-codec-model PATH`. CustomVoice speakers include `aiden`, `dylan`, `eric`, `ono_anna`, `ryan`, `serena`, `sohee`, `uncle_fu`, and `vivian`; use `dylan` or `eric` for Chinese output tests. The Base variant can also be downloaded with `scripts/download-qwen3-tts-gguf.sh models/tts/qwen3-tts-0.6b-base base q8_0`; it requires `--tts-voice-model` pointing to a baked voice GGUF or a reference WAV plus `--tts-ref-text`.
Pass the talker GGUF with `--tts-model`. The runtime auto-discovers `qwen3-tts-tokenizer-12hz.gguf` in the same directory, or use `--tts-codec-model PATH`. CustomVoice speakers include `aiden`, `dylan`, `eric`, `ono_anna`, `ryan`, `serena`, `sohee`, `uncle_fu`, and `vivian`; use `dylan` or `eric` for Chinese output tests. The Base variant can also be downloaded with `./build/bin/vox model download qwen3-tts-0.6b-base`; it requires `--tts-voice-model` pointing to a baked voice GGUF or a reference WAV plus `--tts-ref-text`.

## Run

Expand Down
12 changes: 12 additions & 0 deletions apps/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,21 @@
add_library(vox_model_manager STATIC
model_manager.cpp
)

target_include_directories(vox_model_manager
PUBLIC
"${CMAKE_CURRENT_SOURCE_DIR}"
)

if(VOX_BUILD_APPS)
add_executable(vox
vox.cpp
microphone_audio_source.cpp
)

target_link_libraries(vox
PRIVATE
vox_model_manager
vox_translation_pipeline
vox_sdl_audio
)
Expand All @@ -18,3 +29,4 @@ target_compile_definitions(vox
PRIVATE
VOX_PROJECT_ROOT="${CMAKE_SOURCE_DIR}"
)
endif()
Loading
Loading