Overview

AudioPod AI’s Text to Speech service provides unified text-to-speech capabilities for both standard pre-built voices and custom voice clones. Generate speech with any voice in 60+ supported languages using our advanced AI models.

Supported Voice Types

  • Standard Voices: Pre-built professional voices with various characteristics
  • Custom Voices: Your own voice clones created via Voice Management
  • Unified API: Same endpoint works for both voice types seamlessly with automatic routing

Key Features

  • 50+ Premium Voices: Pre-built voices with unique characteristics
  • Custom Voice Support: Use your own voice clones and collections
  • Variable Speed Control: Adjust speech speed from 0.25x to 4.0x
  • Multiple Formats: MP3, WAV, OGG audio formats
  • Async Processing: Background job processing with real-time status tracking
  • Credit Management: Automatic credit reservation and billing

Authentication

All endpoints require authentication using either:
  • API Key: Authorization: Bearer your_api_key
  • JWT Token: Authorization: Bearer your_jwt_token

Generate Speech

Basic Text to Speech

Generate speech from text using any voice (standard or custom) by voice UUID, ID, or name. All generation is processed asynchronously with job tracking.
from audiopod import Client
import requests
import time

# Initialize client
client = Client()

# Generate speech using voice UUID
job = client.voice.generate_speech(
    voice_id="550e8400-e29b-41d4-a716-446655440000",  # Voice UUID
    text="Hello! This is AudioPod AI generating natural speech.",
    audio_format="mp3",
    language="en",
    speed=1.0
)

print(f"Generation job created: {job.id}")
print(f"Status: {job.status}")

# Check job status until completion
while True:
    status = client.voice.get_job_status(job.id)
    print(f"Job status: {status.status}")
    
    if status.status == 'completed':
        # Get output URL from job result
        if status.result and 'output_url' in status.result:
            audio_url = status.result['output_url']
            print(f"Audio ready: {audio_url}")
            
            # Download the audio file
            audio_response = requests.get(audio_url)
            with open("generated_speech.mp3", "wb") as f:
                f.write(audio_response.content)
            print("Audio saved as generated_speech.mp3")
        break
    elif status.status == 'failed':
        print(f"Job failed: {status.error_message}")
        break
        
    time.sleep(2)  # Wait 2 seconds before checking again

# Alternative: Generate using voice name (for standard voices)
job = client.voice.generate_speech(
    voice_id="aura",  # Voice name
    text="Hello! This uses a standard voice by name.",
    audio_format="mp3",
    speed=1.0
)
Parameters:
  • voice_id (required): Voice UUID, ID, or name from your voice collection
  • text (required): Text to convert to speech (max 5000 characters)
  • audio_format (optional): Output format - mp3, wav, ogg (default: mp3)
  • speed (optional): Speech speed 0.25-4.0 (default: 1.0)
  • language (optional): Language code - auto-detected if not provided
Response:
{
  "job_id": 12345,
  "status": "pending",
  "message": "Voice generation job created successfully",
  "voice_id": 123,
  "estimated_duration": 30,
  "credits_reserved": 25
}

Voice Identification Examples

from audiopod import Client

client = Client()

# Using standard pre-built voice by name
job1 = client.voice.generate_speech(
    voice_identifier="aura",  # Standard voice name
    input_text="Hello! This uses a standard pre-built voice.",
    audio_format="mp3",
    speed=1.0
)
print(f"Standard voice job: {job1.job_id}")

# Using custom voice clone (by integer ID)
job2 = client.voice.generate_speech(
    voice_identifier=123,  # Your custom voice ID
    input_text="Hello! This uses my custom cloned voice.",
    audio_format="mp3",
    speed=1.0
)
print(f"Custom voice job: {job2.job_id}")

# Using voice via UUID (works for both standard and custom voices)
job3 = client.voice.generate_speech(
    voice_identifier="550e8400-e29b-41d4-a716-446655440000",  # Voice UUID
    input_text="Hello! This uses voice identification via UUID.",
    audio_format="mp3",
    speed=1.0
)
print(f"UUID voice job: {job3.job_id}")

# List available voices
voices = client.voice.list_voices(include_public=True)
print("Available voices:")
for voice in voices:
    print(f"- {voice.name} (UUID: {voice.uuid}): {voice.description}")

Available Voices

AudioPod AI offers a diverse collection of voices including both standard pre-built voices and custom voice clones:

Voice Types

  • Standard Voices: Professional pre-built voices with unique characteristics
  • Custom Voices: Your own voice clones created from audio samples
  • Public Voices: Community-shared voices available to all users
Voice NameGenderLanguagesStyleDescription
auraFemale60+BrightLuminous voice with crystal-clear delivery
jesterMale60+PlayfulUpbeat voice with theatrical flair
sageMale60+WiseAuthoritative voice perfect for narration
avaFemale60+ProfessionalCommanding voice for business content
surgeMale60+EnergeticHigh-energy voice for exciting content
willowFemale60+GentleDelicate, youthful voice with elegance

Listing Available Voices

Use the voice profiles endpoint to discover all available voices:
# List all voices (including your custom voices)
voices = client.voice.list_voices(include_public=True)

# Filter by voice type
standard_voices = [v for v in voices if v.voice_type == "standard"]
custom_voices = [v for v in voices if v.voice_type == "custom"]

print(f"Found {len(standard_voices)} standard voices")
print(f"Found {len(custom_voices)} custom voices")

Job Status Tracking

Check Job Status

Monitor your text-to-speech generation jobs with real-time status updates:
# Check job status
status = client.voice.get_job_status(job_id)

print(f"Job {job_id} status: {status['status']}")
print(f"Progress: {status.get('progress', 'N/A')}%")

if status['status'] == 'completed':
    print(f"Audio ready: {status['output_url']}")
    print(f"Duration: {status.get('total_duration')} seconds")
elif status['status'] == 'failed':
    print(f"Error: {status.get('error_message')}")

Job Status Response

{
  "id": 12345,
  "job_type": "single",
  "user_id": "user-uuid",
  "status": "completed",
  "progress": 100,
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-15T10:30:45Z",
  "completed_at": "2024-01-15T10:30:45Z",
  "output_path": "generated/12345.mp3",
  "output_url": "https://presigned-url-to-audio-file",
  "input_text": "Hello! This is AudioPod AI generating natural speech.",
  "target_language": "en",
  "voice_info": {
    "id": 123,
    "name": "aura",
    "display_name": "Aura"
  },
  "estimated_duration": 30,
  "credits_reserved": 25
}

Status Values

  • pending: Job created and waiting for processing
  • processing: Currently generating audio
  • completed: Audio generation finished successfully
  • failed: Generation failed with error

Configuration Options

Audio Format Options

ParameterOptionsDescriptionNotes
audio_formatmp3, wav, oggAudio file formatMP3 recommended for most uses
speed0.25 to 4.0Speech speed1.0 = normal speed
languageLanguage codesTarget languageAuto-detected if not specified

Basic Voice Customization

# Generate with custom settings
job = client.voice.generate_speech(
    voice_identifier="aura",
    input_text="Customized voice output with speed control",
    audio_format="wav",     # High quality WAV format
    speed=1.1,             # 10% faster than normal
    language="en"          # English language
)

Language Detection

AudioPod AI automatically detects the language of your input text, but you can specify it explicitly for better results:
# Automatic language detection
job1 = client.voice.generate_speech(
    voice_identifier="aura",
    input_text="Hello, how are you today?"
    # Language will be auto-detected as English
)

# Explicit language specification
job2 = client.voice.generate_speech(
    voice_identifier="aura", 
    input_text="Bonjour, comment allez-vous?",
    language="fr"  # Specify French
)

Multi-Language Support

Language Codes

AudioPod AI supports 60+ languages with automatic detection and consistent quality across all supported languages:
LanguageCodeDescription
English (US)en-USAmerican English
English (UK)en-GBBritish English
SpanishesSpanish (Spain)
Spanish (Mexico)es-MXMexican Spanish
FrenchfrFrench (France)
GermandeGerman
Chinese (Simplified)zh-CNSimplified Chinese (Mandarin)
Chinese (Traditional)zh-TWTraditional Chinese
JapanesejaJapanese
KoreankoKorean
PortugueseptPortuguese (Portugal)
Portuguese (Brazil)pt-BRBrazilian Portuguese
RussianruRussian
ArabicarArabic
HindihiHindi
ItalianitItalian
DutchnlDutch
PolishplPolish
TurkishtrTurkish
SwedishsvSwedish

Multi-Language Example

# Generate speech in multiple languages using the same voice
languages = [
    {"text": "Hello, welcome to AudioPod AI", "lang": "en-US"},
    {"text": "Hola, bienvenido a AudioPod AI", "lang": "es"},
    {"text": "Bonjour, bienvenue chez AudioPod AI", "lang": "fr"},
    {"text": "Hallo, willkommen bei AudioPod AI", "lang": "de"}
]

voice_uuid = "550e8400-e29b-41d4-a716-446655440000"  # Your multi-language voice

for item in languages:
    job = client.voice.generate_speech(
        voice_identifier=voice_uuid,
        input_text=item["text"],
        language=item["lang"],
        audio_format="mp3"
    )
    
    # Wait for completion and download
    while True:
        status = client.voice.get_job_status(job.job_id)
        if status['status'] == 'completed':
            # Download audio
            audio_response = requests.get(status['output_url'])
            filename = f"welcome_{item['lang']}.mp3"
            with open(filename, "wb") as f:
                f.write(audio_response.content)
            print(f"Generated {filename}")
            break
        elif status['status'] == 'failed':
            print(f"Failed to generate {item['lang']}: {status.get('error_message')}")
            break
        time.sleep(2)

Get Supported Languages for a Voice

# Get supported languages for a specific voice
supported_languages = client.voice.get_supported_languages(voice_identifier="aura")
print(f"Voice supports {len(supported_languages)} languages:")
for code, name in supported_languages.items():
    print(f"- {code}: {name}")

Use Cases & Examples

Audiobook Narration

from audiopod import Client
import re

client = Client()

def split_text_by_sentences(text, max_length=2000):
    """Split text into chunks by sentences, respecting max length"""
    sentences = re.split(r'[.!?]+', text)
    chunks = []
    current_chunk = ""
    
    for sentence in sentences:
        if len(current_chunk + sentence) > max_length:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = sentence
        else:
            current_chunk += sentence + ". "
    
    if current_chunk:
        chunks.append(current_chunk.strip())
    
    return chunks

def create_audiobook_chapter(chapter_text, narrator_voice="sage"):
    """Create audiobook chapter with professional narration"""
    # Split long text into manageable chunks
    chunks = split_text_by_sentences(chapter_text, max_length=2000)
    audio_urls = []

    for i, chunk in enumerate(chunks):
        print(f"Processing chunk {i+1}/{len(chunks)}: {chunk[:50]}...")

        result = client.voice.generate_speech(
            voice_id=narrator_voice,
            text=chunk,
            audio_format="wav",  # High quality for audiobooks
            speed=0.95,  # Slightly slower for audiobooks
            wait_for_completion=True
        )

        audio_urls.append(result.output_url)
        print(f"Generated audio for chunk {i+1}")

    return audio_urls

# Example usage
chapter_text = """
Chapter 1: The Beginning

It was the best of times, it was the worst of times. The era was filled with
contradictions that would shape the destiny of nations. In this tumultuous
period, heroes would rise and fall, love would conquer fear, and the very
fabric of society would be tested by forces beyond imagination.

As dawn broke over the ancient city, our protagonist began a journey that
would change everything they thought they knew about the world.
"""

audio_urls = create_audiobook_chapter(chapter_text, "sage")
print(f"Audiobook chapter complete! Generated {len(audio_urls)} audio segments.")

Podcast Introduction

intro_script = '''
<speak>
    <p><emphasis level="strong">Welcome back to the Tech Talk Podcast!</emphasis></p>
    <break time="1s"/>
    <p>I'm your host, and today we're diving deep into
    <emphasis>artificial intelligence</emphasis> and its impact on content creation.</p>
    <break time="0.5s"/>
    <p>Let's get started!</p>
</speak>
'''

intro_audio = client.text_to_speech.create(
    text=intro_script,
    voice_id="ava",
    text_format="ssml",
    emotion="professional",
    quality="high"
)

E-Learning Content

def create_lesson_audio(lesson_content):
    # Add natural pauses for learning
    formatted_content = f'''
    <speak>
        <p>Lesson begins now.</p>
        <break time="1s"/>
        {lesson_content}
        <break time="2s"/>
        <p>That concludes this lesson. Take a moment to review what you've learned.</p>
    </speak>
    '''

    return client.text_to_speech.create(
        text=formatted_content,
        voice_id="sage",  # Educational voice
        text_format="ssml",
        speed=0.9,  # Slower for learning
        quality="high"
    )

Interactive Voice Response (IVR)

ivr_prompts = {
    "welcome": "Thank you for calling AudioPod AI. Your call is important to us.",
    "menu": "Press 1 for sales, 2 for support, or 3 for billing.",
    "hold": "Please hold while we connect you to the next available agent."
}

for prompt_name, text in ivr_prompts.items():
    audio = client.text_to_speech.create(
        text=text,
        voice_id="ava",  # Professional voice for business
        quality="standard",  # Lower quality for phone systems
        sample_rate=8000,   # Phone quality
        output_format="wav"
    )

    with open(f"ivr_{prompt_name}.wav", "wb") as f:
        f.write(audio.audio_data)

Best Practices

Text Optimization

✅ Good Practices:
  • Use proper punctuation for natural pauses
  • Write numbers in word form for better pronunciation
  • Include context for abbreviations
  • Break long sentences into shorter ones
❌ Common Issues:
  • ALL CAPS TEXT (sounds like shouting)
  • Missing punctuation (unnatural flow)
  • Technical jargon without context
  • Extremely long paragraphs

Cost Optimization

# Efficient: Single request for multiple sentences
long_text = "First sentence. Second sentence. Third sentence."
job = client.voice.generate_speech(
    voice_identifier="aura",
    input_text=long_text,
    audio_format="mp3"
)

# Inefficient: Multiple requests for short texts
# This creates multiple jobs and uses more credits due to per-job overhead
sentences = ["First sentence.", "Second sentence.", "Third sentence."]
for sentence in sentences:
    job = client.voice.generate_speech(
        voice_identifier="aura", 
        input_text=sentence,
        audio_format="mp3"
    )

Caching Strategy

import hashlib
import os
import requests
import time

def get_cached_audio(text, voice_identifier, cache_dir="audio_cache"):
    # Create cache key from text and voice identifier
    cache_key = hashlib.md5(f"{text}_{voice_identifier}".encode()).hexdigest()
    cache_file = os.path.join(cache_dir, f"{cache_key}.mp3")

    if os.path.exists(cache_file):
        with open(cache_file, "rb") as f:
            return f.read()

    # Generate new audio using job-based API
    job = client.voice.generate_speech(
        voice_identifier=voice_identifier,
        input_text=text,
        audio_format="mp3"
    )
    
    # Wait for completion
    while True:
        status = client.voice.get_job_status(job.job_id)
        if status['status'] == 'completed':
            # Download audio
            audio_response = requests.get(status['output_url'])
            audio_data = audio_response.content
            
            # Cache the result
            os.makedirs(cache_dir, exist_ok=True)
            with open(cache_file, "wb") as f:
                f.write(audio_data)
            
            return audio_data
        elif status['status'] == 'failed':
            raise Exception(f"Audio generation failed: {status.get('error_message')}")
        
        time.sleep(2)

Error Handling

Common Errors and Solutions

Robust Error Handling

import time
import random
import requests
from requests.exceptions import HTTPError

def generate_speech_with_retry(text, voice_identifier, max_retries=3):
    for attempt in range(max_retries):
        try:
            # Create generation job
            job = client.voice.generate_speech(
                voice_identifier=voice_identifier,
                input_text=text,
                audio_format="mp3"
            )
            
            # Monitor job with retry logic for status checks
            while True:
                try:
                    status = client.voice.get_job_status(job.job_id)
                    
                    if status['status'] == 'completed':
                        # Download and return audio
                        audio_response = requests.get(status['output_url'])
                        audio_response.raise_for_status()
                        return audio_response.content
                    elif status['status'] == 'failed':
                        raise Exception(f"Job failed: {status.get('error_message')}")
                    
                    time.sleep(2)  # Wait before checking again
                    
                except HTTPError as e:
                    if e.response.status_code == 429:
                        # Rate limit on status check - wait longer
                        time.sleep(5)
                        continue
                    else:
                        raise

        except HTTPError as e:
            if e.response.status_code == 429:  # Rate limit
                # Exponential backoff with jitter
                delay = (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited, retrying in {delay:.1f}s (attempt {attempt + 1})")
                time.sleep(delay)
                continue
            elif e.response.status_code == 400:
                # Bad request - don't retry
                print(f"Bad request: {e.response.text}")
                raise e
            elif e.response.status_code == 402:
                # Insufficient credits - don't retry
                print("Insufficient credits")
                raise e
            elif e.response.status_code == 404:
                # Voice not found - don't retry
                print(f"Voice not found: {voice_identifier}")
                raise e
            else:
                # Other errors - retry with delay
                print(f"Error {e.response.status_code}, retrying...")
                time.sleep(1)
                continue
        except Exception as e:
            print(f"Unexpected error: {e}")
            time.sleep(1)
            continue

    raise Exception("Max retries exceeded")

# Usage example
try:
    audio_data = generate_speech_with_retry(
        text="Hello, this is a test with retry logic.",
        voice_identifier="aura"
    )
    with open("output.mp3", "wb") as f:
        f.write(audio_data)
    print("Audio generated successfully with retry logic")
except Exception as e:
    print(f"Failed to generate audio: {e}")

Pricing

Text to Speech pricing is based on audio duration:
  • 330 credits per minute of generated audio
  • Pricing is calculated on the actual audio output duration
  • All voice types (standard and custom) use the same rate
  • Quality settings don’t affect pricing

Cost Examples

Audio DurationCredits UsedUSD Cost
30 seconds165 credits$0.022
1 minute330 credits$0.044
2 minutes660 credits$0.088
5 minutes1,650 credits$0.220

Next Steps