Speech Translation

Overview

AudioPod AI’s Speech Translation API provides end-to-end speech-to-speech translation that preserves the original speaker’s voice characteristics. Transform spoken content from audio and video files into 21 languages while maintaining speaker identity, timing, and natural pronunciation.

Key Features

21 Languages: Translate between major world languages
Voice Cloning: Preserve original speaker voice characteristics
Speaker Separation: Maintain distinct speakers in multi-speaker content
Video Support: Translate video files with audio replacement
Automatic Language Detection: Detect source language automatically
High-Quality Synthesis: Natural-sounding translated speech output

Authentication

All endpoints require authentication:

API Key: Authorization: Bearer your_api_key
JWT Token: Authorization: Bearer your_jwt_token

Create Speech Translation Job

Python
Node.js
Raw HTTP
URL Upload
cURL

from audiopod import Client

client = Client()

# Translate speech from file
translation = client.translation.translate_audio(
    audio_file="english_speech.wav",
    target_language="es",  # Spanish
    source_language="en",  # Optional: auto-detect if not provided
    wait_for_completion=True
)

print(f"Translation completed!")
print(f"Original language: {translation.source_language}")
print(f"Target language: {translation.target_language}")
print(f"Translated audio URL: {translation.translated_audio_url}")

# Translate video file (supports both audio and video input)
video_translation = client.translation.translate_audio(
    audio_file="presentation.mp4",  # Video files are supported
    target_language="fr",  # French
    source_language="en",
    wait_for_completion=True
)

print(f"Video translation completed!")
print(f"Translated video URL: {video_translation.video_output_url}")
print(f"Audio-only URL: {video_translation.translated_audio_url}")

# Note: Batch processing supports multiple files to one target language
# For multiple target languages, submit separate jobs for each language

# Translate from URL (YouTube, etc.)
url_translation = client.translation.translate_audio(
    url="https://youtube.com/watch?v=example123",
    target_language="ja",  # Japanese
    source_language="en",
    wait_for_completion=True
)

print(f"URL translation completed!")
print(f"Translated audio: {url_translation.translated_audio_url}")

Supported Languages (Use ISO 639-1 codes):

en: English, es: Spanish, fr: French, de: German
it: Italian, pt: Portuguese, pl: Polish, tr: Turkish
ru: Russian, nl: Dutch, cs: Czech, ar: Arabic
zh-cn: Chinese (Simplified), ja: Japanese, hu: Hungarian, ko: Korean
hi: Hindi, ka: Kannada, te: Telugu, ml: Malayalam, ta: Tamil

Request Parameters:

file: Audio or video file (required if no URL)
url: Direct media URL (required if no file)
target_language: Target language code (required)
source_language: Source language code (optional - auto-detected if not provided)

Response:

{
  "id": 123,
  "status": "PENDING",
  "source_language": "en",
  "target_language": "es", 
  "input_path": "translations/inputs/20241215_143022_uuid.wav",
  "input_source": "FILE",
  "created_at": "2024-12-15T14:30:22Z",
  "original_filename": "english_speech.wav",
  "task_id": "celery-task-uuid",
  "parameters": {
    "source_type": "FILE",
    "is_video": false,
    "duration": 45.6,
    "is_speech_translation": true
  }
}

Job Management

Get Translation Status

GET
Python

GET /api/v1/translation/translations/{job_id}
Authorization: Bearer {api_key}

Response (Processing):

{
  "id": 123,
  "status": "PROCESSING",
  "source_language": "en",
  "target_language": "es",
  "input_path": "translations/inputs/20241215_143022_uuid.wav",
  "display_name": "english_speech.wav",
  "created_at": "2024-12-15T14:30:22Z",
  "parameters": {
    "is_speech_translation": true,
    "duration": 45.6,
    "is_video": false
  }
}

Response (Completed):

{
  "id": 123,
  "status": "COMPLETED",
  "source_language": "en",
  "target_language": "es",
  "display_name": "english_speech.wav",
  "audio_output_path": "outputs/translations/translated_20241215_143525_es.wav",
  "video_output_path": "outputs/translation/videos/20241215_143525_translated.mp4",
  "transcript_path": "transcripts/translations/2024/12/15/20241215_143525_uuid.json",
  "translated_audio_url": "https://presigned-url-to-audio.s3.amazonaws.com/...",
  "video_output_url": "https://presigned-url-to-video.s3.amazonaws.com/...",
  "transcript_urls": {
    "json": "https://presigned-url-to-transcript.s3.amazonaws.com/...",
    "source_audio": "https://presigned-url-to-source.s3.amazonaws.com/..."
  }
}

List Translation Jobs

GET
Python

GET /api/v1/translation/translations?skip=0&limit=50
Authorization: Bearer {api_key}

Retry Failed Translation

POST
Python

POST /api/v1/translation/translations/{job_id}/retry
Authorization: Bearer {api_key}

Delete Translation Job

DELETE
Python

DELETE /api/v1/translation/translations/{job_id}
Authorization: Bearer {api_key}

Error Handling

400 Bad Request - Invalid Input

Causes:

Missing required parameters (file or URL)
Invalid file format or extension
Unsupported language code
Invalid URL format

Solutions:

Provide either file or URL (not both)
Use supported audio/video formats
Use valid ISO 639-1 language codes
Ensure URLs start with http:// or https://

402 Payment Required - Insufficient Credits

Causes:

Not enough credits for processing duration
Account credit balance too low

Solutions:

Purchase additional credits
Check account balance before processing

404 Not Found

Causes:

Job ID not found
Access denied to job
User not found

Solutions:

Verify job ID is correct
Ensure you own the job
Check authentication

500 Internal Server Error

Causes:

Audio extraction failed
Processing pipeline error
Storage service unavailable

Solutions:

Retry the request
Use retry endpoint for failed jobs
Contact support if persistent

Status Values

PENDING: Job created and queued for processing
PROCESSING: Active speech translation in progress
COMPLETED: Translation finished successfully
FAILED: Processing failed (use retry endpoint)

Pricing

Service	Cost	Description
Speech Translation	10+ credits/minute	Complete speech-to-speech translation with voice cloning
Video Translation	10+ credits/minute	Speech translation + video recombination

Authentication & Account

Voice

Music

Speech

Utilities

Speech Translation

Overview

Key Features

Authentication