What It Does
Labs' latest models — including the expressive `eleven_v3` — it supports local audio playback, multiple voices, emotion/delivery tags, and fine-grained pronunciation control. Install it once and generate high-quality spoken audio from any text, script, or AI agent response.
**sag** is a CLI tool that brings Eleven. Labs text-to-speech to your terminal with a UX inspired by macOS's built-in `say` command. Powered by Eleven.
Key Features
- Multiple ElevenLabs Models — Choose between `eleven_v3` (expressive, default), `eleven_multilingual_v2` (stable, multilingual), and `eleven_flash_v2_5` (fast) to balance quality, speed, and language coverage for each use case.
- Expressive Audio Tags (v3) — Embed delivery cues directly in your text using tags like `[whispers]`, `[shouts]`, `[laughs]`, `[excited]`, `[sarcastic]`, and more. Pause control uses `[pause]`, `[short pause]`, and `[long pause]` instead of SSML.
- Voice Selection & Listing — Specify any ElevenLabs voice by name or ID with the `-v` flag, set a default via `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID`, and browse available voices with `sag voices`.
- Pronunciation & Normalization Controls — Fix mispronunciations by respelling words, using hyphens, or adjusting casing. The `--normalize auto|off` flag handles numbers, units, and URLs, while `--lang` guides language-specific normalization.
- Model-Specific Prompting Tips — Run `sag prompting` to get model-specific guidance on how to phrase and format text for best results with the currently selected model.
- Output to File — Save generated audio directly to disk with the `-o` flag (e.g., `-o /tmp/reply.mp3`), making it easy to attach audio files to AI agent responses or downstream workflows.
Requirements
- **ElevenLabs API Key** *(required)* — Powers all text-to-speech generation. Set as `ELEVENLABS_API_KEY` (preferred) or `SAG_API_KEY`.
- **Default Voice** *(optional)* — Set `ELEVENLABS_VOICE_ID` or `SAG_VOICE_ID` to avoid specifying `-v` on every call.
Use Cases
- AI agent voice replies — An AI agent generates a spoken response with a specific character — e.g., `sag -v Clawd -o /tmp/reply.mp3 "[excited] Here's what I found!"` — then includes the file path in its reply for immediate playback.
- Scripted TTS narration — Feed text or document content to `sag` in a shell script to produce narrated audio files in bulk, leveraging `--normalize auto` to handle numbers and URLs cleanly.
- Voice prototyping for content creators — Quickly audition different ElevenLabs voices and delivery styles (`[whispers]`, `[sarcastic]`, `[sings]`) before committing to a production voice-over, all from the terminal.
- Multilingual audio generation — Use `eleven_multilingual_v2` with `--lang de|fr|es|...` to generate correctly normalized TTS in languages other than English, suitable for localized content pipelines.