Skip to content

feat(voice): add native macOS say TTS engine option#1315

Open
camerondgray wants to merge 1 commit into
danielmiessler:mainfrom
camerondgray:feat/native-macos-say-tts-engine
Open

feat(voice): add native macOS say TTS engine option#1315
camerondgray wants to merge 1 commit into
danielmiessler:mainfrom
camerondgray:feat/native-macos-say-tts-engine

Conversation

@camerondgray

Copy link
Copy Markdown

🗣️ feat(voice): add native macOS say TTS engine option

Summary

Adds an opt-in voice_engine setting to the Pulse voice module so PAI can speak through the native macOS say binary instead of the ElevenLabs cloud API. Setting voice_engine = "say" makes all voice notifications free, fully offline, and key-free. Default behavior is unchanged.

🎯 Motivation and Context

Problem:

  • ElevenLabs TTS is metered — every notification costs credits, and hitting the quota cap silently breaks voice (a 401 with no fallback).
  • Privacy: every notification's text is POSTed to api.elevenlabs.io. For a personal infrastructure tool, sending all assistant speech to a third party is undesirable for many users.
  • It's currently the only engine — there is no offline/local option.

Solution:
Introduce a voice_engine setting ("elevenlabs" | "say"). The "say" engine routes TTS through /usr/bin/say — no API key, no network, nothing leaves the machine. ElevenLabs stays the default, so existing installs are unaffected.

📋 Changes

Single file: Releases/v5.0.0/.claude/PAI/PULSE/VoiceServer/voice.ts

  • VoiceConfig gains voice_engine?: "elevenlabs" | "say" and default_say_voice?.
  • VoiceEntry gains sayVoice? so each daidentity.voices.<name> can map to a macOS voice.
  • New speakWithSay() + sayRateFromSpeed(); playAudio() refactored to share a playAudioFile() helper so the say path keeps the existing per-voice volume control.
  • sendNotification() branches on the engine. The ElevenLabs path and the /notify HTTP contract are unchanged, so existing callers need no edits.
  • startVoice() resolves the engine (config → PAI_VOICE_ENGINE env → default elevenlabs), warns on an unrecognized value, and only requires an API key for the ElevenLabs engine.
  • voiceHealth() reports the active engine.

⚙️ Usage

# PULSE.toml
[voice]
enabled = true
voice_engine = "say"            # "elevenlabs" (default) | "say"
default_say_voice = "Samantha"  # optional; omit to use the system voice

Or via environment: PAI_VOICE_ENGINE=say (and optionally PAI_SAY_VOICE=Samantha).
Optional per-voice mapping in settings.json:

"daidentity": { "voices": { "main": { "voiceId": "", "sayVoice": "Samantha" } } }

Run say -v '?' to list installed voices; higher-quality voices can be added in System Settings → Accessibility → Spoken Content.

✅ Benefits

  • Free — no ElevenLabs usage/credits.
  • Private — notification text never leaves the machine.
  • Resilient — no API key or quota dependency, and no network round-trip (lower latency).
  • Non-breaking — default stays ElevenLabs; existing configs and all /notify callers are unaffected.

🧪 How Has This Been Tested?

  • voice_engine="say" speaks via /notifyhandleVoiceRequest with no API key set (system voice and a named "Samantha" voice both verified audibly)
  • Default (unset) still resolves to elevenlabs — confirmed via voiceHealth()
  • PAI_VOICE_ENGINE=say environment selection works
  • Unrecognized engine value logs a warning and falls back to elevenlabs
  • Per-voice volume preserved (say → AIFF → afplay -v); pronunciation preprocessing applied on both engines

📊 Types of Changes

  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

✅ Checklist

  • My code follows the PAI code style
  • I have tested this change thoroughly
  • This change is backward compatible
  • No shell interpolation — say is invoked via spawn() with an argument array, and -- terminates option parsing so message text can never be treated as flags (consistent with the execSync→execFileSync hardening in security: replace execSync with execFileSync in tab-setter.ts #1046)

🖥️ Platform

macOS only (say is a macOS binary). On non-macOS hosts the engine should be left at the elevenlabs default; selecting say there fails gracefully through the existing voice-error path.

… no API key)

Adds an opt-in `voice_engine` config ("elevenlabs" | "say") to the Pulse voice
module. Setting it to "say" routes all TTS through the native macOS `say` binary
instead of the ElevenLabs cloud API.

Why:
- Cost: ElevenLabs usage is metered; `say` is free.
- Privacy: notification text currently POSTs to api.elevenlabs.io. With the
  `say` engine nothing leaves the machine — fully offline.
- Resilience: no API key or quota dependency.

Behavior:
- Default is unchanged ("elevenlabs") — fully non-breaking.
- Selectable via PULSE.toml [voice] voice_engine, or the PAI_VOICE_ENGINE env var.
- Per-voice macOS voice via daidentity.voices.<name>.sayVoice, or a
  default_say_voice / PAI_SAY_VOICE fallback; omit for the system voice.
- ElevenLabs `speed` maps to a `say` words-per-minute rate (clamped 100-320).
- Pronunciation preprocessing and per-voice volume both still apply.
- The /notify HTTP contract is unchanged, so existing callers need no edits.

Security: `say` is invoked via spawn() with an argument array (no shell), and
`--` terminates option parsing so message text can never be treated as flags.

Testing: drove /notify through handleVoiceRequest with voice_engine="say" and
confirmed audio plays with no API key; verified default stays "elevenlabs" and
an unrecognized engine value warns and falls back. macOS only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant