Overview
AudioPod AI’s Text to Speech service provides unified text-to-speech capabilities for both standard pre-built voices and custom voice clones. Generate speech with any voice in 60+ supported languages using our advanced AI models.Supported Voice Types
- Standard Voices: Pre-built professional voices with various characteristics
- Custom Voices: Your own voice clones created via Voice Management
- Unified API: Same endpoint works for both voice types seamlessly with automatic routing
Key Features
- 50+ Premium Voices: Pre-built voices with unique characteristics
- Custom Voice Support: Use your own voice clones and collections
- Variable Speed Control: Adjust speech speed from 0.25x to 4.0x
- Multiple Formats: MP3, WAV, OGG audio formats
- Async Processing: Background job processing with real-time status tracking
- Credit Management: Automatic credit reservation and billing
Authentication
All endpoints require authentication using either:- API Key:
Authorization: Bearer your_api_key - JWT Token:
Authorization: Bearer your_jwt_token
Generate Speech
Basic Text to Speech
Generate speech from text using any voice (standard or custom) by voice UUID, ID, or name. All generation is processed asynchronously with job tracking.- Python
- Node.js
- Raw HTTP
- cURL
voice_id(required): Voice UUID, ID, or name from your voice collectiontext(required): Text to convert to speech (max 5000 characters)audio_format(optional): Output format - mp3, wav, ogg (default: mp3)speed(optional): Speech speed 0.25-4.0 (default: 1.0)language(optional): Language code - auto-detected if not provided
Voice Identification Examples
- Python
- Node.js
- Raw HTTP
Available Voices
AudioPod AI offers a diverse collection of voices including both standard pre-built voices and custom voice clones:Voice Types
- Standard Voices: Professional pre-built voices with unique characteristics
- Custom Voices: Your own voice clones created from audio samples
- Public Voices: Community-shared voices available to all users
Popular Standard Voices
| Voice Name | Gender | Languages | Style | Description |
|---|---|---|---|---|
aura | Female | 60+ | Bright | Luminous voice with crystal-clear delivery |
jester | Male | 60+ | Playful | Upbeat voice with theatrical flair |
sage | Male | 60+ | Wise | Authoritative voice perfect for narration |
ava | Female | 60+ | Professional | Commanding voice for business content |
surge | Male | 60+ | Energetic | High-energy voice for exciting content |
willow | Female | 60+ | Gentle | Delicate, youthful voice with elegance |
Listing Available Voices
Use the voice profiles endpoint to discover all available voices:Job Status Tracking
Check Job Status
Monitor your text-to-speech generation jobs with real-time status updates:Job Status Response
Status Values
pending: Job created and waiting for processingprocessing: Currently generating audiocompleted: Audio generation finished successfullyfailed: Generation failed with error
Configuration Options
Audio Format Options
| Parameter | Options | Description | Notes |
|---|---|---|---|
audio_format | mp3, wav, ogg | Audio file format | MP3 recommended for most uses |
speed | 0.25 to 4.0 | Speech speed | 1.0 = normal speed |
language | Language codes | Target language | Auto-detected if not specified |
Basic Voice Customization
Language Detection
AudioPod AI automatically detects the language of your input text, but you can specify it explicitly for better results:Multi-Language Support
Language Codes
AudioPod AI supports 60+ languages with automatic detection and consistent quality across all supported languages:| Language | Code | Description |
|---|---|---|
| English (US) | en-US | American English |
| English (UK) | en-GB | British English |
| Spanish | es | Spanish (Spain) |
| Spanish (Mexico) | es-MX | Mexican Spanish |
| French | fr | French (France) |
| German | de | German |
| Chinese (Simplified) | zh-CN | Simplified Chinese (Mandarin) |
| Chinese (Traditional) | zh-TW | Traditional Chinese |
| Japanese | ja | Japanese |
| Korean | ko | Korean |
| Portuguese | pt | Portuguese (Portugal) |
| Portuguese (Brazil) | pt-BR | Brazilian Portuguese |
| Russian | ru | Russian |
| Arabic | ar | Arabic |
| Hindi | hi | Hindi |
| Italian | it | Italian |
| Dutch | nl | Dutch |
| Polish | pl | Polish |
| Turkish | tr | Turkish |
| Swedish | sv | Swedish |
Multi-Language Example
Get Supported Languages for a Voice
Use Cases & Examples
Audiobook Narration
- Python
- Node.js
- Batch Processing
Podcast Introduction
E-Learning Content
Interactive Voice Response (IVR)
Best Practices
Text Optimization
✅ Good Practices:- Use proper punctuation for natural pauses
- Write numbers in word form for better pronunciation
- Include context for abbreviations
- Break long sentences into shorter ones
- ALL CAPS TEXT (sounds like shouting)
- Missing punctuation (unnatural flow)
- Technical jargon without context
- Extremely long paragraphs
Cost Optimization
Caching Strategy
Error Handling
Common Errors and Solutions
400 Bad Request - Invalid Text
400 Bad Request - Invalid Text
Causes: - Text too long (>5000 characters) - Invalid characters or
encoding - Empty text field Solutions: - Split long text into chunks -
Check text encoding (UTF-8) - Validate text is not empty
404 Not Found - Invalid Voice
404 Not Found - Invalid Voice
Causes: - Voice identifier doesn’t exist - Voice not accessible by user
- Voice UUID format invalid Solutions: - Check available voices with voice profiles endpoint - Verify voice UUID format - Ensure voice is public or owned by user
402 Payment Required - Insufficient Credits
402 Payment Required - Insufficient Credits
Causes: - Not enough credits for audio generation - Credit limit exceeded
Solutions: - Check credit balance - Purchase more credits - Wait for credit reset
429 Too Many Requests
429 Too Many Requests
Causes: - Rate limit exceeded - Too many concurrent requests
Solutions: - Implement exponential backoff - Use request queuing -
Upgrade to higher rate limits
Robust Error Handling
Pricing
Text to Speech pricing is based on audio duration:- 330 credits per minute of generated audio
- Pricing is calculated on the actual audio output duration
- All voice types (standard and custom) use the same rate
- Quality settings don’t affect pricing
Cost Examples
| Audio Duration | Credits Used | USD Cost |
|---|---|---|
| 30 seconds | 165 credits | $0.022 |
| 1 minute | 330 credits | $0.044 |
| 2 minutes | 660 credits | $0.088 |
| 5 minutes | 1,650 credits | $0.220 |
