Releases: lpalbou/AbstractVoice
Releases · lpalbou/AbstractVoice
AbstractVoice 0.7
[0.7.0] - 2026-04-06
Added
- ADR 0005: Torch device + dtype selection policy for torch-based TTS/cloning engines.
- Pluggable TTS adapter registry (keeps
autodeterministic; enables opt-in heavy engines). abstractvoice[audiodit]optional extra and LongCat-AudioDiT integration (TTS + prompt-audio cloning).abstractvoice[omnivoice]optional extra and OmniVoice integration (omnilingual TTS + prompt-audio cloning; supports voice design viainstruct).- Shared duration estimation helpers for engines that require explicit duration parameters.
- REPL:
/historycommand to inspect the in-memory LLM message list (what is sent to the provider). - OmniVoice (REPL/adapter):
seedparameter for reproducible voice-design sampling (stable “designed voice” across turns).
Fixed
- AudioDiT TTS now avoids “quiet noise” collapses by retrying with smaller text chunks when outputs are detected as weak.
- AudioDiT TTS now uses a larger default chunk size to reduce voice/pitch drift across multi-sentence utterances.
- AudioDiT TTS duration estimation now follows upstream per-character heuristics (stabilizes voice/pitch across text lengths).
- AudioDiT TTS now defaults to APG guidance (upstream-recommended) for more stable quality.
- AudioDiT TTS now reuses a short “session prompt” (generated prompt-audio + matching prompt-text prefix) to keep a stable speaker identity across multiple
/speakcalls. - AudioDiT explicitly rejects
/speedchanges (upstream doesn’t provide a speed API; attempting speed caused degraded audio). - Audio playback now falls back to stereo output streams when mono initialization fails (macOS AUHAL robustness).
- Suppressed the noisy PyTorch
weight_normdeprecation warning during AudioDiT model load. - Disabled Transformers progress bars (e.g. “Loading weights”) during AudioDiT model load for cleaner REPL UX.
- Offline-first: Faster-Whisper and AudioDiT now force local-only Hugging Face access when downloads are disabled (avoids HF Hub “unauthenticated requests” warnings).
- Suppressed benign Faster-Whisper mel-extraction
RuntimeWarnings (matmul overflow/divide-by-zero) for cleaner REPL output. - REPL: added
/debugand (when enabled) persist each synthesized utterance tountracked/generated_wavs/and print the path. - AudioDiT: expand English digits/years into words for more reliable pronunciation (e.g. “5”, “2025”).
- AudioDiT: load model weights in the resolved torch dtype (MPS default fp16; override via
ABSTRACTVOICE_TORCH_DTYPE) for better accelerator performance. - Added engine-agnostic TTS quality presets (
fast|balanced|high) viaVoiceManager.set_tts_quality_preset(...)(where supported by the active TTS adapter). - Cloning:
set_cloned_tts_quality(...)now persists the preset so it also applies to cloning engines that are loaded later (engines are lazy). - REPL: selecting an AudioDiT cloned voice now performs a small warm-up to pay the one-time load/compile cost up front (reduces first
/speaklatency). - REPL: discard
<think>...</think>blocks in LLM responses before printing/history/TTS. - REPL: LLM chat history is now committed atomically (failed LLM calls no longer leave orphaned user messages that can cause repetition).
- REPL: long pasted prompts no longer crash the clone shortcut path (
File name too long); debug WAV filenames are now length-safe. - Docs: clarify REPL as a demonstrator (minimal LLM client) and recommend AbstractCore for production agent/server integration; refreshed
llms.txt/llms-full.txt. - Docs: clarify AudioDiT language expectations (upstream EN/ZH focus; other languages not guaranteed).
- Docs: expand
docs/faq.mdwith OmniVoice voice design attributes, cross-machine “voice preset” guidance, and fine-tuning/data preparation pointers (plus engine-specific cloning reference recommendations). - OmniVoice cloning now loads reference audio via
soundfile(avoids torchaudiotorchcodecdependency for prompt loading).