A subtitle-focused AI agent skill that automates caption generation, translation, SRT export, and subtitle burning for any video. Powered by the NemoVideo API.
Description
Use NemoSubtitle when the user wants to
add subtitles,
generate captions,
transcribe video to text,
translate subtitles,
burn hardcoded captions, or
export SRT/VTT files from any video or audio. Powered by NemoVideo AI.
NemoSubtitle handles the complete subtitle and caption workflow end-to-end: auto-transcribe speech in any language, translate captions, export SRT / VTT / plain text, and burn styled subtitles directly into the video — no manual timecoding required.
Trigger phrases: add subtitles, generate captions, auto caption, transcribe video, video to text, subtitle translation, burn subtitles, hardcode captions, export SRT, SRT file, VTT file, bilingual subtitles, Chinese subtitles, subtitle generator, caption generator, video transcript, speech to text video, closed captions, open captions, subtitle burner, add captions to video, subtitle video, caption video
Primary use cases:
- Auto-generate subtitles from video/audio (any language, Whisper-powered ASR)
- Translate existing subtitles or auto-generated captions into any language
- Export subtitles as SRT, VTT, or plain text transcript
- Burn (hardcode) subtitles into video with full style control (font, size, position, color, background)
- Add Chinese subtitles to English videos (or vice versa) — bilingual caption overlay
- Clean up auto-generated captions (re-timing, punctuation fix, speaker diarization)
- Generate accessibility captions for social media (YouTube, TikTok, Instagram, Facebook)
Not for: generating new video from text, screen recording, or full video production workflows (see nemo-video skill for general editing).
---
Setup
Base URL: https://mega-api-prod.nemovideo.aiAll requests require:
Authorization: Bearer <NEMOVIDEO_API_KEY>
Content-Type: application/json
Set
NEMOVIDEO_API_KEY in your environment or OpenClaw secrets.
---
Workflow Overview
Upload video/audio → Start subtitle job → Poll status → Get SRT/VTT/transcript
→ Burn subtitles into video
---
API Reference
1. Upload Media
Upload a video or audio file before processing.
POST /v1/upload
Content-Type: multipart/form-data
file=<binary>
Response:
{
"file_id": "f_abc123",
"filename": "interview.mp4",
"duration_seconds": 182,
"size_bytes": 45000000
}
Use
file_id in subsequent requests.
---
2. Generate Subtitles (Transcription)
Transcribe speech and produce time-coded subtitles.
POST /v1/subtitles/generate
Content-Type: application/json
{
"file_id": "f_abc123",
"language": "en",
"output_formats": ["srt", "vtt", "txt"],
"options": {
"punctuate": true,
"max_chars_per_line": 42,
"max_lines": 2
}
}
Parameters:| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
|
file_id | string | ✅ | Uploaded file ID |
|
language | string | ✅ | Source language code (
en,
zh,
ja,
ko,
es,
fr,
de,
auto) — use
auto for auto-detect |
|
output_formats | array | ✅ | One or more of:
srt,
vtt,
txt,
json |
|
options.punctuate | bool | ❌ | Auto-add punctuation (default: true) |
|
options.max_chars_per_line | int | ❌ | Max characters per subtitle line (default: 42) |
|
options.max_lines | int | ❌ | Max lines per subtitle block (default: 2) |
Response:
{
"job_id": "job_gen_789",
"status": "queued",
"estimated_seconds": 45
}
---
3. Translate Subtitles
Translate existing subtitles (from SRT/VTT file or a prior generation job) into a target language.
POST /v1/subtitles/translate
Content-Type: application/json
{
"source": {
"type": "job_id",
"value": "job_gen_789"
},
"target_language": "zh",
"preserve_timing": true,
"output_formats": ["srt"]
}
Or translate from an uploaded SRT file:
{
"source": {
"type": "file_id",
"value": "f_srt_456"
},
"target_language": "zh",
"preserve_timing": true,
"output_formats": ["srt", "vtt"]
}
Parameters:| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
|
source.type | string | ✅ |
job_id or
file_id |
|
source.value | string | ✅ | Reference to subtitle source |
|
target_language | string | ✅ | Target language code |
|
preserve_timing | bool | ❌ | Keep original timestamps (default: true) |
|
output_formats | array | ✅ | Output format(s) |
Supported language codes: en,
zh,
zh-tw,
ja,
ko,
es,
fr,
de,
pt,
ar,
ru,
hi,
vi,
th,
idResponse:
{
"job_id": "job_translate_321",
"status": "queued",
"estimated_seconds": 30
}
---
4. Burn Subtitles Into Video
Hardcode (burn) subtitles permanently into the video frame.
POST /v1/subtitles/burn
Content-Type: application/json
{
"file_id": "f_abc123",
"subtitle_source": {
"type": "job_id",
"value": "job_gen_789"
},
"style": {
"font": "Arial",
"font_size": 28,
"color": "#FFFFFF",
"outline_color": "#000000",
"outline_width": 2,
"position": "bottom",
"margin_bottom": 40
},
"output_format": "mp4"
}
Or burn from an SRT file:
{
"file_id": "f_abc123",
"subtitle_source": {
"type": "file_id",
"value": "f_srt_456"
},
"style": { ... }
}
Style parameters:| Parameter | Default | Description |
|-----------|---------|-------------|
|
font |
"Arial" | Font family |
|
font_size |
28 | Font size in pixels |
|
color |
"#FFFFFF" | Text color (hex) |
|
outline_color |
"#000000" | Outline/shadow color |
|
outline_width |
2 | Outline thickness (0 = no outline) |
|
position |
"bottom" |
"top" or
"bottom" |
|
margin_bottom |
40 | Pixels from bottom edge |
|
background_box |
false | Add semi-transparent background box |
|
background_opacity |
0.5 | Box opacity (0.0–1.0) |
Response:
{
"job_id": "job_burn_654",
"status": "queued",
"estimated_seconds": 90
}
---
5. Bilingual Subtitles (Dual-Language Overlay)
Burn two subtitle tracks simultaneously (e.g., English on top, Chinese below).
POST /v1/subtitles/burn-bilingual
Content-Type: application/json
{
"file_id": "f_abc123",
"primary": {
"subtitle_source": { "type": "job_id", "value": "job_gen_789" },
"style": {
"font_size": 26,
"color": "#FFFF00",
"position": "bottom",
"margin_bottom": 70
}
},
"secondary": {
"subtitle_source": { "type": "job_id", "value": "job_translate_321" },
"style": {
"font_size": 22,
"color": "#FFFFFF",
"position": "bottom",
"margin_bottom": 30
}
},
"output_format": "mp4"
}
Response:
{
"job_id": "job_bilingual_987",
"status": "queued",
"estimated_seconds": 120
}
---
6. Poll Job Status
All subtitle jobs are async. Poll until
status is
completed or
failed.
GET /v1/jobs/{job_id}
Response (in progress):
{
"job_id": "job_gen_789",
"status": "processing",
"progress": 0.62
}
Response (completed):
{
"job_id": "job_gen_789",
"status": "completed",
"outputs": {
"srt": "https://cdn.nemovideo.ai/outputs/job_gen_789/subtitles.srt",
"vtt": "https://cdn.nemovideo.ai/outputs/job_gen_789/subtitles.vtt",
"txt": "https://cdn.nemovideo.ai/outputs/job_gen_789/transcript.txt"
}
}
Response (failed):
{
"job_id": "job_gen_789",
"status": "failed",
"error": "Unsupported audio codec in input file"
}
Polling strategy:
- Check every 3–5 seconds for short files (<60s audio)
- Check every 10–15 seconds for longer files
- Timeout after 10 minutes; surface error to user
---
7. Download Output File
Download SRT, VTT, transcript, or rendered video from the URL in the job outputs.
GET <output_url>
No auth required (CDN URLs are pre-signed, valid for 24 hours).
---
Common Workflows
Workflow A: Generate + Export SRT
1. Upload video → POST /v1/upload → file_id
- Transcribe → POST /v1/subtitles/generate (language: "auto", formats: ["srt"])
- Poll → GET /v1/jobs/{job_id} until completed
- Download SRT from outputs.srt URL
- Deliver SRT file to user
Workflow B: Translate Existing Subtitles
1. Upload SRT file → POST /v1/upload → file_id
- Translate → POST /v1/subtitles/translate (source: file_id, target_language: "zh")
- Poll until completed
- Download translated SRT
Workflow C: Full Pipeline (Video → Chinese Hardcoded Subtitles)
1. Upload video → file_id
- Generate subtitles (language: "en") → job_gen_id
- Translate to Chinese → job_translate_id
- Burn Chinese subtitles into video → job_burn_id
- Poll burn job
- Return final MP4 download URL
Workflow D: Bilingual Video (EN + ZH)
1. Upload video → file_id
- Generate EN subtitles → job_gen_id
- Translate to ZH → job_translate_id
- POST /v1/subtitles/burn-bilingual (primary: EN, secondary: ZH)
- Poll → return MP4
---
Error Handling
| HTTP Code | Meaning | Action |
|-----------|---------|--------|
| 400 | Bad request (missing params, invalid format) | Check request body; surface error to user |
| 401 | Invalid or missing API key | Prompt user to check
NEMOVIDEO_API_KEY |
| 413 | File too large | Advise user: compress video or use a shorter clip |
| 422 | Unsupported format or language | List supported formats/languages to user |
| 429 | Rate limited | Wait 10s; retry with exponential backoff |
| 500/503 | Server error | Retry after 30s; if persistent, report to user |
For
job.status === "failed", always surface
error message to user with a plain explanation.
---
Behavior Notes
- Language auto-detection: Use
language: "auto" when the source language is unknown. Detection adds ~5s to job time.
- Chinese variants: Use
zh for Simplified Chinese, zh-tw for Traditional Chinese.
- SRT vs VTT: SRT is more universally compatible; VTT is needed for web players (HTML5
<track>). When in doubt, offer both.
- Burn timing: Burning subtitles re-encodes the video, which takes longer than SRT export alone. Set user expectations accordingly.
- File limits: Typical limit is 2GB per upload, 4 hours max duration. Very long files should be split first.
- Output expiry: CDN output URLs expire after 24 hours. If the user needs permanent storage, advise downloading promptly.
---
Example Prompts → Actions
| User says | Action |
|-----------|--------|
| "Add subtitles to my video" | Upload → generate (language: auto) → burn → return MP4 |
| "Generate SRT for this video" | Upload → generate (formats: ["srt"]) → return SRT file |
| "Translate my subtitles to Spanish" | Upload SRT → translate (target: "es") → return SRT |
| "Add Chinese subtitles to this English video" | Upload → generate (en) → translate (zh) → burn → return MP4 |
| "Make bilingual English + Chinese subtitles" | Upload → generate (en) → translate (zh) → burn-bilingual → return MP4 |
| "Get the transcript from this podcast" | Upload → generate (formats: ["txt"]) → return transcript |
| "Export captions as VTT" | Upload → generate (formats: ["vtt"]) → return VTT |
| "Burn captions with white text, black outline" | Upload + provide SRT → burn (style: color #FFF, outline #000) → return MP4 |