Skip to content

Releases: lpalbou/AbstractVoice

AbstractVoice 0.7

06 Apr 13:43

Choose a tag to compare

[0.7.0] - 2026-04-06

Added

  • ADR 0005: Torch device + dtype selection policy for torch-based TTS/cloning engines.
  • Pluggable TTS adapter registry (keeps auto deterministic; enables opt-in heavy engines).
  • abstractvoice[audiodit] optional extra and LongCat-AudioDiT integration (TTS + prompt-audio cloning).
  • abstractvoice[omnivoice] optional extra and OmniVoice integration (omnilingual TTS + prompt-audio cloning; supports voice design via instruct).
  • Shared duration estimation helpers for engines that require explicit duration parameters.
  • REPL: /history command to inspect the in-memory LLM message list (what is sent to the provider).
  • OmniVoice (REPL/adapter): seed parameter for reproducible voice-design sampling (stable “designed voice” across turns).

Fixed

  • AudioDiT TTS now avoids “quiet noise” collapses by retrying with smaller text chunks when outputs are detected as weak.
  • AudioDiT TTS now uses a larger default chunk size to reduce voice/pitch drift across multi-sentence utterances.
  • AudioDiT TTS duration estimation now follows upstream per-character heuristics (stabilizes voice/pitch across text lengths).
  • AudioDiT TTS now defaults to APG guidance (upstream-recommended) for more stable quality.
  • AudioDiT TTS now reuses a short “session prompt” (generated prompt-audio + matching prompt-text prefix) to keep a stable speaker identity across multiple /speak calls.
  • AudioDiT explicitly rejects /speed changes (upstream doesn’t provide a speed API; attempting speed caused degraded audio).
  • Audio playback now falls back to stereo output streams when mono initialization fails (macOS AUHAL robustness).
  • Suppressed the noisy PyTorch weight_norm deprecation warning during AudioDiT model load.
  • Disabled Transformers progress bars (e.g. “Loading weights”) during AudioDiT model load for cleaner REPL UX.
  • Offline-first: Faster-Whisper and AudioDiT now force local-only Hugging Face access when downloads are disabled (avoids HF Hub “unauthenticated requests” warnings).
  • Suppressed benign Faster-Whisper mel-extraction RuntimeWarnings (matmul overflow/divide-by-zero) for cleaner REPL output.
  • REPL: added /debug and (when enabled) persist each synthesized utterance to untracked/generated_wavs/ and print the path.
  • AudioDiT: expand English digits/years into words for more reliable pronunciation (e.g. “5”, “2025”).
  • AudioDiT: load model weights in the resolved torch dtype (MPS default fp16; override via ABSTRACTVOICE_TORCH_DTYPE) for better accelerator performance.
  • Added engine-agnostic TTS quality presets (fast|balanced|high) via VoiceManager.set_tts_quality_preset(...) (where supported by the active TTS adapter).
  • Cloning: set_cloned_tts_quality(...) now persists the preset so it also applies to cloning engines that are loaded later (engines are lazy).
  • REPL: selecting an AudioDiT cloned voice now performs a small warm-up to pay the one-time load/compile cost up front (reduces first /speak latency).
  • REPL: discard <think>...</think> blocks in LLM responses before printing/history/TTS.
  • REPL: LLM chat history is now committed atomically (failed LLM calls no longer leave orphaned user messages that can cause repetition).
  • REPL: long pasted prompts no longer crash the clone shortcut path (File name too long); debug WAV filenames are now length-safe.
  • Docs: clarify REPL as a demonstrator (minimal LLM client) and recommend AbstractCore for production agent/server integration; refreshed llms.txt / llms-full.txt.
  • Docs: clarify AudioDiT language expectations (upstream EN/ZH focus; other languages not guaranteed).
  • Docs: expand docs/faq.md with OmniVoice voice design attributes, cross-machine “voice preset” guidance, and fine-tuning/data preparation pointers (plus engine-specific cloning reference recommendations).
  • OmniVoice cloning now loads reference audio via soundfile (avoids torchaudio torchcodec dependency for prompt loading).