Releases · lpalbou/AbstractVoice · GitHub

06 Apr 13:43

lpalbou

AbstractVoice 0.7 Latest

Latest

[0.7.0] - 2026-04-06

Added

ADR 0005: Torch device + dtype selection policy for torch-based TTS/cloning engines.
Pluggable TTS adapter registry (keeps auto deterministic; enables opt-in heavy engines).
abstractvoice[audiodit] optional extra and LongCat-AudioDiT integration (TTS + prompt-audio cloning).
abstractvoice[omnivoice] optional extra and OmniVoice integration (omnilingual TTS + prompt-audio cloning; supports voice design via instruct).
Shared duration estimation helpers for engines that require explicit duration parameters.
REPL: /history command to inspect the in-memory LLM message list (what is sent to the provider).
OmniVoice (REPL/adapter): seed parameter for reproducible voice-design sampling (stable “designed voice” across turns).

Fixed

AudioDiT TTS now avoids “quiet noise” collapses by retrying with smaller text chunks when outputs are detected as weak.
AudioDiT TTS now uses a larger default chunk size to reduce voice/pitch drift across multi-sentence utterances.
AudioDiT TTS duration estimation now follows upstream per-character heuristics (stabilizes voice/pitch across text lengths).
AudioDiT TTS now defaults to APG guidance (upstream-recommended) for more stable quality.
AudioDiT TTS now reuses a short “session prompt” (generated prompt-audio + matching prompt-text prefix) to keep a stable speaker identity across multiple /speak calls.
AudioDiT explicitly rejects /speed changes (upstream doesn’t provide a speed API; attempting speed caused degraded audio).
Audio playback now falls back to stereo output streams when mono initialization fails (macOS AUHAL robustness).
Suppressed the noisy PyTorch weight_norm deprecation warning during AudioDiT model load.
Disabled Transformers progress bars (e.g. “Loading weights”) during AudioDiT model load for cleaner REPL UX.
Offline-first: Faster-Whisper and AudioDiT now force local-only Hugging Face access when downloads are disabled (avoids HF Hub “unauthenticated requests” warnings).
Suppressed benign Faster-Whisper mel-extraction RuntimeWarnings (matmul overflow/divide-by-zero) for cleaner REPL output.
REPL: added /debug and (when enabled) persist each synthesized utterance to untracked/generated_wavs/ and print the path.
AudioDiT: expand English digits/years into words for more reliable pronunciation (e.g. “5”, “2025”).
AudioDiT: load model weights in the resolved torch dtype (MPS default fp16; override via ABSTRACTVOICE_TORCH_DTYPE) for better accelerator performance.
Added engine-agnostic TTS quality presets (fast|balanced|high) via VoiceManager.set_tts_quality_preset(...) (where supported by the active TTS adapter).
Cloning: set_cloned_tts_quality(...) now persists the preset so it also applies to cloning engines that are loaded later (engines are lazy).
REPL: selecting an AudioDiT cloned voice now performs a small warm-up to pay the one-time load/compile cost up front (reduces first /speak latency).
REPL: discard <think>...</think> blocks in LLM responses before printing/history/TTS.
REPL: LLM chat history is now committed atomically (failed LLM calls no longer leave orphaned user messages that can cause repetition).
REPL: long pasted prompts no longer crash the clone shortcut path (File name too long); debug WAV filenames are now length-safe.
Docs: clarify REPL as a demonstrator (minimal LLM client) and recommend AbstractCore for production agent/server integration; refreshed llms.txt / llms-full.txt.
Docs: clarify AudioDiT language expectations (upstream EN/ZH focus; other languages not guaranteed).
Docs: expand docs/faq.md with OmniVoice voice design attributes, cross-machine “voice preset” guidance, and fine-tuning/data preparation pointers (plus engine-specific cloning reference recommendations).
OmniVoice cloning now loads reference audio via soundfile (avoids torchaudio torchcodec dependency for prompt loading).

Assets 2