Openai Whisper
OpenAI Whisper:使用 Whisper 模型进行视频语音识别和转录。
clawhub install openai-whisper使用 Videoagent Video Studio 进行专业的视频编辑和处理。
# 安装 Skill npx skills add pexoai/pexo-skills@videoagent-video-studio # 安装后 Claude Code 会自动识别并使用
# 同样的安装命令,兼容所有支持 SKILL.md 的 AI 编程工具 npx skills add pexoai/pexo-skills@videoagent-video-studio
需要注册 Skills.sh 账户,支持本地离线使用
text-to-video | 4–10 s |
| "Animate this image" / "Make this move" | image-to-video | 4–6 s |
| "Turn this into a video with..." | image-to-video | 4–6 s |
| Cinematic, story, ad | Prefer text-to-video with detailed prompt | 5–10 s |--model <id>)minimax | ✅ | ✅ | ✅ | Subject reference image, character consistency |
| kling | ✅ | ✅ | ✅ | Multi-element / character / keyframe (O3) |
| veo | ✅ | ✅ | ✅ | Google Veo 3.1, multiple reference images |
| hunyuan | ✅ | — | ✅ | Video-to-video style transfer |
| pixverse | — | ✅ | — | Stylized image-to-video |
| grok | ✅ | ✅ | ✅ | Video editing via reference video |
| seedance | ✅ | ✅ | ✅ | Seedance 1.5 Pro, synchronized audio, 4–12 s |node {baseDir}/tools/generate.js \
--mode text-to-video \
--prompt "<enhanced prompt>" \
--duration <seconds> \
--aspect-ratio <ratio>
node {baseDir}/tools/generate.js \
--mode image-to-video \
--prompt "<motion description>" \
--image-url "<public image URL>" \
--duration <seconds> \
--aspect-ratio <ratio>
--mode | text-to-video | text-to-video or image-to-video |
| --prompt | *(required)* | Scene or motion description |
| --image-url | — | Required for image-to-video; public image URL |
| --duration | 5 | Length in seconds (typically 4–10) |
| --aspect-ratio | 16:9 | 16:9, 9:16, 1:1, 4:3, 3:4 |
| --model | auto | Model ID (e.g. kling, veo, grok, seedance); auto = proxy picks |node tools/generate.js --list-models | List available models from the proxy |
| node tools/generate.js --status --job-id <id> | Check async job status |{
"success": true,
"mode": "text-to-video",
"videoUrl": "https://...",
"duration": 5,
"aspectRatio": "16:9"
}
videoUrl to the user.node {baseDir}/tools/generate.js \
--mode text-to-video \
--prompt "A cat walking through rain, wet streets, neon reflections, cinematic lighting, slow motion, 4K" \
--duration 5 \
--aspect-ratio 16:9
node {baseDir}/tools/generate.js \
--mode image-to-video \
--prompt "Gentle clouds moving across the sky, subtle grass movement, cinematic atmosphere" \
--image-url "https://..." \
--duration 5 \
--aspect-ratio 16:9
node {baseDir}/tools/generate.js \
--mode text-to-video \
--prompt "Close-up of coffee pouring into a white cup, slow motion, steam rising, soft lighting, product shot" \
--duration 10 \
--aspect-ratio 9:16
node {baseDir}/tools/generate.js \
--mode text-to-video \
--model veo \
--prompt "A dragon flying through cloudy skies, cinematic lighting, 8s" \
--duration 8 \
--aspect-ratio 16:9
node {baseDir}/tools/generate.js \
--mode image-to-video \
--model grok \
--prompt "Gentle smile, subtle head turn" \
--image-url "https://..." \
--duration 5
VIDEO_STUDIO_PROXY_URL | No | Proxy base URL |
| VIDEO_STUDIO_TOKEN | No | Auth token if the proxy requires it |npx skills run videoagent-video-studio --input input.mp4 --effect fade --duration 2 --output output.mp4音频转录:使用本地 Whisper(Docker)将音频文件转录为文本,支持 .mp3、.m4a、.ogg、.wav、.webm 等多种格式。
Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.
clawhub install transcribe