Voice Transcribe
Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).
clawhub install voice-transcribe使用 Videoagent Audio Studio 进行音频编辑和处理。
# Install Skill npx skills add pexoai/pexo-skills@videoagent-audio-studio # Claude Code will auto-detect and use it after installation
# Same install command — works with all SKILL.md-compatible AI coding tools npx skills add pexoai/pexo-skills@videoagent-audio-studio
elevenlabs-tts-v3 | ~3s |
| Low-latency TTS (real-time) | elevenlabs-tts-turbo | <1s |
| Background music | cassetteai-music | ~15s |
| Sound effect | elevenlabs-sfx | ~5s |
| Clone a voice from audio | elevenlabs-voice-clone | ~10s |bash {baseDir}/tools/start_server.sh
Use MCP tool: text_to_speech
text: "<the text to narrate>"
voice_id: "JBFqnCBsd6RMkjVDRZzb" # Default: "George" (professional, neutral)
model_id: "eleven_multilingual_v2" # Use "eleven_turbo_v2_5" for low latency
Use MCP tool: text_to_sound_effects (via cassetteai-music on fal.ai)
prompt: "<music description, e.g. 'upbeat lo-fi hip hop, 90 seconds'>"
duration_seconds: <duration>
Use MCP tool: text_to_sound_effects
text: "<sound description>"
duration_seconds: <1-22>
Use MCP tool: voice_add
name: "<voice name>"
files: ["<audio_file_url>"]
→ Route to: text_to_speech
text: "Welcome to our product launch"
voice_id: "JBFqnCBsd6RMkjVDRZzb"
model_id: "eleven_multilingual_v2"
→ Route to: cassetteai-music (fal.ai)
prompt: "relaxing lo-fi background music for a podcast, gentle piano and soft beats, 60 seconds"
duration_seconds: 60
→ Route to: text_to_sound_effects
text: "a futuristic sci-fi door sliding open with a hydraulic hiss"
duration_seconds: 3
ELEVENLABS_API_KEY in ~/.openclaw/openclaw.json:{
"skills": {
"entries": {
"videoagent-audio-studio": {
"enabled": true,
"env": {
"ELEVENLABS_API_KEY": "your_elevenlabs_key_here"
}
}
}
}
}
"FAL_KEY": "your_fal_key_here"
cli.js connects to a hosted proxy by default. If you want full control — or need to serve users in regions where vercel.app is blocked — you can deploy your own instance from the proxy/ directory.cd proxy
npm install
vercel --prod
ELEVENLABS_API_KEY | TTS, SFX, Voice Clone | [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys) |
| FAL_KEY | Music generation | [fal.ai/dashboard/keys](https://fal.ai/dashboard/keys) |
| VALID_PRO_KEYS | (Optional) Restrict access | Comma-separated list of allowed client keys |export AUDIOMIND_PROXY_URL="https://your-domain.com/api/audio"
~/.openclaw/openclaw.json:{
"skills": {
"entries": {
"videoagent-audio-studio": {
"env": {
"AUDIOMIND_PROXY_URL": "https://your-domain.com/api/audio"
}
}
}
}
}
vercel.app.eleven_multilingual_v2 | TTS | ElevenLabs | Best quality, supports 29 languages |
| eleven_turbo_v2_5 | TTS | ElevenLabs | Ultra-low latency, ideal for real-time |
| eleven_monolingual_v1 | TTS | ElevenLabs | English only, fastest |
| cassetteai-music | Music | fal.ai | Reliable, fast music generation |
| elevenlabs-sfx | SFX | ElevenLabs | High-quality sound effects (up to 22s) |
| elevenlabs-voice-clone | Clone | ElevenLabs | Clone any voice from a short audio sample |ELEVENLABS_API_KEY is all you need to get started. FAL_KEY is now optional.cassetteai-music by default, which completes synchronously.cassetteai-music as a stable alternative for music generation.npx skills run videoagent-audio-studio --input input.mp3 --normalize --denoise --output output.mp3Create chapters, highlights, and show notes from podcast audio or transcripts. Use when a user wants chapter markers, highlight clips, or show-note drafts without publishing or distribution actions.
clawhub install podcast-chaptering-highlights