Overview

AudioPod AI’s Speech Translation API provides end-to-end speech-to-speech translation that preserves the original speaker’s voice characteristics. Transform spoken content from audio and video files into 21 languages while maintaining speaker identity, timing, and natural pronunciation.

Key Features

  • 21 Languages: Translate between major world languages
  • Voice Cloning: Preserve original speaker voice characteristics
  • Speaker Separation: Maintain distinct speakers in multi-speaker content
  • Video Support: Translate video files with audio replacement
  • Automatic Language Detection: Detect source language automatically
  • High-Quality Synthesis: Natural-sounding translated speech output

Authentication

All endpoints require authentication:
  • API Key: Authorization: Bearer your_api_key
  • JWT Token: Authorization: Bearer your_jwt_token

Speech Translation

Create Speech Translation Job

from audiopod import Client

client = Client()

# Translate speech from file
translation = client.translation.translate_audio(
    audio_file="english_speech.wav",
    target_language="es",  # Spanish
    source_language="en",  # Optional: auto-detect if not provided
    wait_for_completion=True
)

print(f"Translation completed!")
print(f"Original language: {translation.source_language}")
print(f"Target language: {translation.target_language}")
print(f"Translated audio URL: {translation.translated_audio_url}")

# Translate video file (supports both audio and video input)
video_translation = client.translation.translate_audio(
    audio_file="presentation.mp4",  # Video files are supported
    target_language="fr",  # French
    source_language="en",
    wait_for_completion=True
)

print(f"Video translation completed!")
print(f"Translated video URL: {video_translation.video_output_url}")
print(f"Audio-only URL: {video_translation.translated_audio_url}")

# Note: Batch processing supports multiple files to one target language
# For multiple target languages, submit separate jobs for each language

# Translate from URL (YouTube, etc.)
url_translation = client.translation.translate_audio(
    url="https://youtube.com/watch?v=example123",
    target_language="ja",  # Japanese
    source_language="en",
    wait_for_completion=True
)

print(f"URL translation completed!")
print(f"Translated audio: {url_translation.translated_audio_url}")
Supported Languages (Use ISO 639-1 codes):
  • en: English, es: Spanish, fr: French, de: German
  • it: Italian, pt: Portuguese, pl: Polish, tr: Turkish
  • ru: Russian, nl: Dutch, cs: Czech, ar: Arabic
  • zh-cn: Chinese (Simplified), ja: Japanese, hu: Hungarian, ko: Korean
  • hi: Hindi, ka: Kannada, te: Telugu, ml: Malayalam, ta: Tamil
Request Parameters:
  • file: Audio or video file (required if no URL)
  • url: Direct media URL (required if no file)
  • target_language: Target language code (required)
  • source_language: Source language code (optional - auto-detected if not provided)
Response:
{
  "id": 123,
  "status": "PENDING",
  "source_language": "en",
  "target_language": "es", 
  "input_path": "translations/inputs/20241215_143022_uuid.wav",
  "input_source": "FILE",
  "created_at": "2024-12-15T14:30:22Z",
  "original_filename": "english_speech.wav",
  "task_id": "celery-task-uuid",
  "parameters": {
    "source_type": "FILE",
    "is_video": false,
    "duration": 45.6,
    "is_speech_translation": true
  }
}

Job Management

Get Translation Status

GET /api/v1/translation/translations/{job_id}
Authorization: Bearer {api_key}
Response (Processing):
{
  "id": 123,
  "status": "PROCESSING",
  "source_language": "en",
  "target_language": "es",
  "input_path": "translations/inputs/20241215_143022_uuid.wav",
  "display_name": "english_speech.wav",
  "created_at": "2024-12-15T14:30:22Z",
  "parameters": {
    "is_speech_translation": true,
    "duration": 45.6,
    "is_video": false
  }
}
Response (Completed):
{
  "id": 123,
  "status": "COMPLETED",
  "source_language": "en",
  "target_language": "es",
  "display_name": "english_speech.wav",
  "audio_output_path": "outputs/translations/translated_20241215_143525_es.wav",
  "video_output_path": "outputs/translation/videos/20241215_143525_translated.mp4",
  "transcript_path": "transcripts/translations/2024/12/15/20241215_143525_uuid.json",
  "translated_audio_url": "https://presigned-url-to-audio.s3.amazonaws.com/...",
  "video_output_url": "https://presigned-url-to-video.s3.amazonaws.com/...",
  "transcript_urls": {
    "json": "https://presigned-url-to-transcript.s3.amazonaws.com/...",
    "source_audio": "https://presigned-url-to-source.s3.amazonaws.com/..."
  }
}

List Translation Jobs

GET /api/v1/translation/translations?skip=0&limit=50
Authorization: Bearer {api_key}

Retry Failed Translation

POST /api/v1/translation/translations/{job_id}/retry
Authorization: Bearer {api_key}

Delete Translation Job

DELETE /api/v1/translation/translations/{job_id}
Authorization: Bearer {api_key}

Error Handling

Status Values

  • PENDING: Job created and queued for processing
  • PROCESSING: Active speech translation in progress
  • COMPLETED: Translation finished successfully
  • FAILED: Processing failed (use retry endpoint)

Pricing

ServiceCostDescription
Speech Translation10+ credits/minuteComplete speech-to-speech translation with voice cloning
Video Translation10+ credits/minuteSpeech translation + video recombination

Next Steps