Voice Transcribe
Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).
clawhub install voice-transcribe文字转语音 (TTS) - 将文本转换为自然语音。
# Install Skill npx skills add noizai/skills@tts # Claude Code will auto-detect and use it after installation
# Same install command — works with all SKILL.md-compatible AI coding tools npx skills add noizai/skills@tts
speak is the default — the subcommand can be omitted:# Basic usage (speak is implicit)
python3 skills/tts/scripts/tts.py -t "Hello world" # add -o path to save
python3 skills/tts/scripts/tts.py -f article.txt -o out.mp3
# Voice cloning — local file path or URL
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio ./ref.wav
python3 skills/tts/scripts/tts.py -t "Hello" --ref-audio https://example.com/my_voice.wav -o clone.wav
# Voice message format
python3 skills/tts/scripts/tts.py -t "Hello" --format opus -o voice.opus
python3 skills/tts/scripts/tts.py -t "Hello" --format ogg -o voice.ogg
python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt
python3 skills/tts/scripts/tts.py to-srt -i article.txt -o article.srt --cps 15 --gap 500
--cps = characters per second (default 4, good for Chinese; ~15 for English). The agent can also write SRT manually.segments keys support single index "3" or range "5-8".{
"default": { "voice": "zf_xiaoni", "lang": "cmn" },
"segments": {
"1": { "voice": "zm_yunxi" },
"5-8": { "voice": "af_sarah", "lang": "en-us", "speed": 0.9 }
}
}
emo, reference_audio support). reference_audio can be a local path or a URL (user’s own audio; Noiz only):{
"default": { "voice_id": "voice_123", "target_lang": "zh" },
"segments": {
"1": { "voice_id": "voice_host", "emo": { "Joy": 0.6 } },
"2-4": { "reference_audio": "./refs/guest.wav" }
}
}
--ref-audio-track argument instead of setting reference_audio in the map:
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --ref-audio-track original_video.mp4 -o output.wav
examples/ for full samples.python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json -o output.wav
python3 skills/tts/scripts/tts.py render --srt input.srt --voice-map vm.json --backend noiz --auto-emotion -o output.wav
"v1:60,v2:40") | Kokoro |
| Voice cloning from reference audio | Noiz |
| Emotion control (emo param) | Noiz |
| Exact server-side duration per segment | Noiz |tts.py automatically falls back to guest mode — a limited Noiz endpoint that requires no authentication. Guest mode only supports --voice-id, --speed, and --format; voice cloning, emotion, duration, and timeline rendering are not available.# Guest mode (auto-detected when no API key is set)
python3 skills/tts/scripts/tts.py -t "Hello" --voice-id 883b6b7c -o hello.wav
# Explicit backend override to use kokoro instead
python3 skills/tts/scripts/tts.py -t "Hello" --backend kokoro
063a4491 | 販売員(なおみ) | ja | F | 喜び |
| 4252b9c8 | 落ち着いた女性 | ja | F | 穏やか |
| 578b4be2 | 熱血漢(たける) | ja | M | 怒り |
| a9249ce7 | 安らぎ(みなと) | ja | M | 穏やか |
| f00e45a1 | 旅人(かいと) | ja | M | 穏やか |
| b4775100 | 悦悦|社交分享 | zh | F | Joyful |
| 77e15f2c | 婉青|情绪抚慰 | zh | F | Calm |
| ac09aeb4 | 阿豪|磁性主持 | zh | M | Calm |
| 87cb2405 | 建国|知识科普 | zh | M | Calm |
| 3b9f1e27 | 小明|科技达人 | zh | M | Joyful |
| 95814add | Science Narration | en | M | Calm |
| 883b6b7c | The Mentor (Alex) | en | M | Joyful |
| a845c7de | The Naturalist (Silas) | en | M | Calm |
| 5a68d66b | The Healer (Serena) | en | F | Calm |
| 0e4ab6ec | The Mentor (Maya) | en | F | Calm |config --set-api-key, the key is saved to ~/.config/noiz/api_key (permissions 0600). The NOIZ_API_KEY environment variable is also supported as an alternative.~/.noiz_api_key exists and ~/.config/noiz/api_key does not, the key is copied (not deleted) to the new location. A message is printed; the old file is left untouched for you to remove manually.https://noiz.ai/v1/ for synthesis. No data is sent unless you invoke a Noiz command.--ref-audio is a URL, the file is downloaded to a temp file, used for the API call, then deleted. If no voice-id or ref-audio is provided, a default reference audio is downloaded from storage.googleapis.com or noiz.ai.render mode to assemble the final audio.~/.config/noiz/ are modified. The Kokoro backend runs entirely offline with no network access.ffmpeg in PATH (timeline mode only)requests package: uv pip install requests (required for Noiz backend)python3 skills/tts/scripts/tts.py config --set-api-key YOUR_KEY (guest mode works without a key but has limited features)--backend kokoro to use the local backendAuthorization—no prefix (e.g. no APIKEY or Bearer ). Any prefix causes 401.# 高级使用示例:批量转换多个文本文件
```bash
for file in *.txt; do
npx skills run tts --text "$(cat $file)" --language zh-CN --voice female --speed 1.2 --output ${file%.txt}.mp3
done
```Create chapters, highlights, and show notes from podcast audio or transcripts. Use when a user wants chapter markers, highlight clips, or show-note drafts without publishing or distribution actions.
clawhub install podcast-chaptering-highlights