Text to Speech - AudioPod AI

Overview

AudioPod AI’s Text to Speech service provides unified text-to-speech capabilities for both standard pre-built voices and custom voice clones. Generate speech with any voice in 60+ supported languages using our advanced AI models.

Supported Voice Types

Standard Voices: Pre-built professional voices with various characteristics
Custom Voices: Your own voice clones created via Voice Management
Unified API: Same endpoint works for both voice types seamlessly with automatic routing

Key Features

50+ Premium Voices: Pre-built voices with unique characteristics
Custom Voice Support: Use your own voice clones and collections
Variable Speed Control: Adjust speech speed from 0.25x to 4.0x
Multiple Formats: MP3, WAV, OGG audio formats
Async Processing: Background job processing with real-time status tracking
Credit Management: Automatic credit reservation and billing

Authentication

All endpoints require authentication using either:

API Key: Authorization: Bearer your_api_key
JWT Token: Authorization: Bearer your_jwt_token

Generate Speech

Basic Text to Speech

Generate speech from text using any voice (standard or custom) by voice UUID, ID, or name. All generation is processed asynchronously with job tracking.

Python
Node.js
Raw HTTP
cURL

from audiopod import Client
import requests
import time

# Initialize client
client = Client()

# Generate speech using voice UUID
job = client.voice.generate_speech(
    voice_id="550e8400-e29b-41d4-a716-446655440000",  # Voice UUID
    text="Hello! This is AudioPod AI generating natural speech.",
    audio_format="mp3",
    language="en",
    speed=1.0
)

print(f"Generation job created: {job.id}")
print(f"Status: {job.status}")

# Check job status until completion
while True:
    status = client.voice.get_job_status(job.id)
    print(f"Job status: {status.status}")
    
    if status.status == 'completed':
        # Get output URL from job result
        if status.result and 'output_url' in status.result:
            audio_url = status.result['output_url']
            print(f"Audio ready: {audio_url}")
            
            # Download the audio file
            audio_response = requests.get(audio_url)
            with open("generated_speech.mp3", "wb") as f:
                f.write(audio_response.content)
            print("Audio saved as generated_speech.mp3")
        break
    elif status.status == 'failed':
        print(f"Job failed: {status.error_message}")
        break
        
    time.sleep(2)  # Wait 2 seconds before checking again

# Alternative: Generate using voice name (for standard voices)
job = client.voice.generate_speech(
    voice_id="aura",  # Voice name
    text="Hello! This uses a standard voice by name.",
    audio_format="mp3",
    speed=1.0
)

const { AudioPodClient } = require('audiopod-js');
const fs = require('fs');

// Initialize client
const client = new AudioPodClient();

async function generateSpeech() {
  try {
    // Generate speech using voice UUID
    const job = await client.voice.generateSpeech(
      '550e8400-e29b-41d4-a716-446655440000',  // Voice ID/UUID
      'Hello! This is AudioPod AI generating natural speech.',  // Text
      {
        language: 'en',
        audioFormat: 'mp3',
        generationParams: {
          speed: 1.0
        }
      }
    );

    console.log(`Generation job created: ${job.id}`);
    console.log(`Status: ${job.status}`);

    // Check job status until completion
    while (true) {
      const status = await client.voice.getJobStatus(job.id);
      console.log(`Job status: ${status.status}`);
      
      if (status.status === 'completed') {
        const audioUrl = status.output_url;
        console.log(`Audio ready: ${audioUrl}`);
        
        // Download the audio file
        const fetch = require('node-fetch');
        const audioResponse = await fetch(audioUrl);
        const buffer = await audioResponse.buffer();
        fs.writeFileSync('generated_speech.mp3', buffer);
        console.log('Audio saved as generated_speech.mp3');
        break;
      } else if (status.status === 'failed') {
        console.log(`Job failed: ${status.error_message}`);
        break;
      }
      
      await new Promise(resolve => setTimeout(resolve, 2000)); // Wait 2 seconds
    }

  } catch (error) {
    console.error('Error:', error.message);
  }
}

generateSpeech();

import requests
import time

# Create TTS job using voice UUID
response = requests.post(
    "https://api.audiopod.ai/api/v1/voice/voices/550e8400-e29b-41d4-a716-446655440000/generate",
    headers={"Authorization": f"Bearer {api_key}"},
    data={
        "input_text": "Hello! This is AudioPod AI generating natural speech.",
        "audio_format": "mp3",
        "speed": 1.0,
        "language": "en"
    }
)

if response.status_code == 200:
    job_data = response.json()
    job_id = job_data["job_id"]
    print(f"Voice generation job created: {job_id}")
    print(f"Credits reserved: {job_data.get('credits_reserved')}")
    
    # Poll job status until completion
    while True:
        status_response = requests.get(
            f"https://api.audiopod.ai/api/v1/voice/tts-jobs/{job_id}/status",
            headers={"Authorization": f"Bearer {api_key}"}
        )
        
        if status_response.status_code == 200:
            status_data = status_response.json()
            print(f"Job status: {status_data['status']}")
            
            if status_data['status'] == 'completed':
                audio_url = status_data['output_url']
                print(f"Audio ready: {audio_url}")
                
                # Download the audio file
                audio_response = requests.get(audio_url)
                with open("generated_speech.mp3", "wb") as f:
                    f.write(audio_response.content)
                print("Audio saved as generated_speech.mp3")
                break
            elif status_data['status'] == 'failed':
                print(f"Job failed: {status_data.get('error_message')}")
                break
                
        time.sleep(2)  # Wait 2 seconds before checking again

# Create TTS job using voice UUID
JOB_RESPONSE=$(curl -s -X POST "https://api.audiopod.ai/api/v1/voice/voices/550e8400-e29b-41d4-a716-446655440000/generate" \
  -H "Authorization: Bearer $AUDIOPOD_API_KEY" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d 'input_text=Hello! This is AudioPod AI generating natural speech.' \
  -d 'audio_format=mp3' \
  -d 'speed=1.0' \
  -d 'language=en')

# Extract job ID
JOB_ID=$(echo $JOB_RESPONSE | jq -r '.job_id')
echo "Job created: $JOB_ID"

# Poll job status
while true; do
  STATUS_RESPONSE=$(curl -s -X GET "https://api.audiopod.ai/api/v1/voice/tts-jobs/$JOB_ID/status" \
    -H "Authorization: Bearer $AUDIOPOD_API_KEY")
  
  STATUS=$(echo $STATUS_RESPONSE | jq -r '.status')
  echo "Job status: $STATUS"
  
  if [ "$STATUS" = "completed" ]; then
    AUDIO_URL=$(echo $STATUS_RESPONSE | jq -r '.output_url')
    echo "Audio ready: $AUDIO_URL"
    
    # Download the audio
    curl -o generated_speech.mp3 "$AUDIO_URL"
    echo "Audio saved as generated_speech.mp3"
    break
  elif [ "$STATUS" = "failed" ]; then
    echo "Job failed"
    break
  fi
  
  sleep 2
done

Parameters:

voice_id (required): Voice UUID, ID, or name from your voice collection
text (required): Text to convert to speech (max 5000 characters)
audio_format (optional): Output format - mp3, wav, ogg (default: mp3)
speed (optional): Speech speed 0.25-4.0 (default: 1.0)
language (optional): Language code - auto-detected if not provided

Response:

{
  "job_id": 12345,
  "status": "pending",
  "message": "Voice generation job created successfully",
  "voice_id": 123,
  "estimated_duration": 30,
  "credits_reserved": 25
}

Voice Identification Examples

Python
Node.js
Raw HTTP

from audiopod import Client

client = Client()

# Using standard pre-built voice by name
job1 = client.voice.generate_speech(
    voice_identifier="aura",  # Standard voice name
    input_text="Hello! This uses a standard pre-built voice.",
    audio_format="mp3",
    speed=1.0
)
print(f"Standard voice job: {job1.job_id}")

# Using custom voice clone (by integer ID)
job2 = client.voice.generate_speech(
    voice_identifier=123,  # Your custom voice ID
    input_text="Hello! This uses my custom cloned voice.",
    audio_format="mp3",
    speed=1.0
)
print(f"Custom voice job: {job2.job_id}")

# Using voice via UUID (works for both standard and custom voices)
job3 = client.voice.generate_speech(
    voice_identifier="550e8400-e29b-41d4-a716-446655440000",  # Voice UUID
    input_text="Hello! This uses voice identification via UUID.",
    audio_format="mp3",
    speed=1.0
)
print(f"UUID voice job: {job3.job_id}")

# List available voices
voices = client.voice.list_voices(include_public=True)
print("Available voices:")
for voice in voices:
    print(f"- {voice.name} (UUID: {voice.uuid}): {voice.description}")

const { AudioPodClient } = require('audiopod-js');
const client = new AudioPodClient();

async function demonstrateVoiceIdentification() {
  try {
    // Using standard pre-built voice by name
    const job1 = await client.voice.generateSpeech({
      voiceIdentifier: 'aura',  // Standard voice name
      inputText: 'Hello! This uses a standard pre-built voice.',
      audioFormat: 'mp3',
      speed: 1.0
    });
    console.log(`Standard voice job: ${job1.jobId}`);

    // Using custom voice clone (by integer ID)
    const job2 = await client.voice.generateSpeech({
      voiceIdentifier: 123,  // Your custom voice ID
      inputText: 'Hello! This uses my custom cloned voice.',
      audioFormat: 'mp3',
      speed: 1.0
    });
    console.log(`Custom voice job: ${job2.jobId}`);

    // Using voice via UUID (works for both standard and custom voices)
    const job3 = await client.voice.generateSpeech({
      voiceIdentifier: '550e8400-e29b-41d4-a716-446655440000',  // Voice UUID
      inputText: 'Hello! This uses voice identification via UUID.',
      audioFormat: 'mp3',
      speed: 1.0
    });
    console.log(`UUID voice job: ${job3.jobId}`);

    // List available voices
    const voices = await client.voice.listVoices({ includePublic: true });
    console.log('Available voices:');
    voices.forEach(voice => {
      console.log(`- ${voice.name} (UUID: ${voice.uuid}): ${voice.description}`);
    });

  } catch (error) {
    console.error('Error:', error.message);
  }
}

demonstrateVoiceIdentification();

import requests

headers = {"Authorization": f"Bearer {api_key}"}

# Generate with a standard voice by name
response1 = requests.post(
    "https://api.audiopod.ai/api/v1/voice/voices/aura/generate",
    headers=headers,
    data={
        "input_text": "Hello! This uses a standard pre-built voice.",
        "audio_format": "mp3",
        "speed": 1.0
    }
)
print(f"Standard voice job: {response1.json()['job_id']}")

# Generate with your custom voice by ID
response2 = requests.post(
    "https://api.audiopod.ai/api/v1/voice/voices/123/generate",  # Your custom voice ID
    headers=headers,
    data={
        "input_text": "Hello! This uses my custom cloned voice.",
        "audio_format": "mp3",
        "speed": 1.0
    }
)
print(f"Custom voice job: {response2.json()['job_id']}")

# Generate with voice using UUID
response3 = requests.post(
    "https://api.audiopod.ai/api/v1/voice/voices/550e8400-e29b-41d4-a716-446655440000/generate",
    headers=headers,
    data={
        "input_text": "Hello! This uses voice identification via UUID.",
        "audio_format": "mp3",
        "speed": 1.0
    }
)
print(f"UUID voice job: {response3.json()['job_id']}")

# List available voices
voices_response = requests.get(
    "https://api.audiopod.ai/api/v1/voice/voice-profiles?include_public=true",
    headers=headers
)
voices = voices_response.json()
print("Available voices:")
for voice in voices:
    print(f"- {voice['name']} (UUID: {voice['uuid']}): {voice['description']}")

Available Voices

AudioPod AI offers a diverse collection of voices including both standard pre-built voices and custom voice clones:

Voice Types

Standard Voices: Professional pre-built voices with unique characteristics
Custom Voices: Your own voice clones created from audio samples
Public Voices: Community-shared voices available to all users

Popular Standard Voices

Voice Name	Gender	Languages	Style	Description
`aura`	Female	60+	Bright	Luminous voice with crystal-clear delivery
`jester`	Male	60+	Playful	Upbeat voice with theatrical flair
`sage`	Male	60+	Wise	Authoritative voice perfect for narration
`ava`	Female	60+	Professional	Commanding voice for business content
`surge`	Male	60+	Energetic	High-energy voice for exciting content
`willow`	Female	60+	Gentle	Delicate, youthful voice with elegance

Listing Available Voices

Use the voice profiles endpoint to discover all available voices:

# List all voices (including your custom voices)
voices = client.voice.list_voices(include_public=True)

# Filter by voice type
standard_voices = [v for v in voices if v.voice_type == "standard"]
custom_voices = [v for v in voices if v.voice_type == "custom"]

print(f"Found {len(standard_voices)} standard voices")
print(f"Found {len(custom_voices)} custom voices")

Job Status Tracking

Check Job Status

Monitor your text-to-speech generation jobs with real-time status updates:

# Check job status
status = client.voice.get_job_status(job_id)

print(f"Job {job_id} status: {status['status']}")
print(f"Progress: {status.get('progress', 'N/A')}%")

if status['status'] == 'completed':
    print(f"Audio ready: {status['output_url']}")
    print(f"Duration: {status.get('total_duration')} seconds")
elif status['status'] == 'failed':
    print(f"Error: {status.get('error_message')}")

Job Status Response

{
  "id": 12345,
  "job_type": "single",
  "user_id": "user-uuid",
  "status": "completed",
  "progress": 100,
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-15T10:30:45Z",
  "completed_at": "2024-01-15T10:30:45Z",
  "output_path": "generated/12345.mp3",
  "output_url": "https://presigned-url-to-audio-file",
  "input_text": "Hello! This is AudioPod AI generating natural speech.",
  "target_language": "en",
  "voice_info": {
    "id": 123,
    "name": "aura",
    "display_name": "Aura"
  },
  "estimated_duration": 30,
  "credits_reserved": 25
}

Status Values

pending: Job created and waiting for processing
processing: Currently generating audio
completed: Audio generation finished successfully
failed: Generation failed with error

Configuration Options

Audio Format Options

Parameter	Options	Description	Notes
`audio_format`	`mp3`, `wav`, `ogg`	Audio file format	MP3 recommended for most uses
`speed`	`0.25` to `4.0`	Speech speed	1.0 = normal speed
`language`	Language codes	Target language	Auto-detected if not specified

Basic Voice Customization

# Generate with custom settings
job = client.voice.generate_speech(
    voice_identifier="aura",
    input_text="Customized voice output with speed control",
    audio_format="wav",     # High quality WAV format
    speed=1.1,             # 10% faster than normal
    language="en"          # English language
)

Language Detection

AudioPod AI automatically detects the language of your input text, but you can specify it explicitly for better results:

# Automatic language detection
job1 = client.voice.generate_speech(
    voice_identifier="aura",
    input_text="Hello, how are you today?"
    # Language will be auto-detected as English
)

# Explicit language specification
job2 = client.voice.generate_speech(
    voice_identifier="aura", 
    input_text="Bonjour, comment allez-vous?",
    language="fr"  # Specify French
)

Multi-Language Support

Language Codes

AudioPod AI supports 60+ languages with automatic detection and consistent quality across all supported languages:

Language	Code	Description
English (US)	`en-US`	American English
English (UK)	`en-GB`	British English
Spanish	`es`	Spanish (Spain)
Spanish (Mexico)	`es-MX`	Mexican Spanish
French	`fr`	French (France)
German	`de`	German
Chinese (Simplified)	`zh-CN`	Simplified Chinese (Mandarin)
Chinese (Traditional)	`zh-TW`	Traditional Chinese
Japanese	`ja`	Japanese
Korean	`ko`	Korean
Portuguese	`pt`	Portuguese (Portugal)
Portuguese (Brazil)	`pt-BR`	Brazilian Portuguese
Russian	`ru`	Russian
Arabic	`ar`	Arabic
Hindi	`hi`	Hindi
Italian	`it`	Italian
Dutch	`nl`	Dutch
Polish	`pl`	Polish
Turkish	`tr`	Turkish
Swedish	`sv`	Swedish

Multi-Language Example

# Generate speech in multiple languages using the same voice
languages = [
    {"text": "Hello, welcome to AudioPod AI", "lang": "en-US"},
    {"text": "Hola, bienvenido a AudioPod AI", "lang": "es"},
    {"text": "Bonjour, bienvenue chez AudioPod AI", "lang": "fr"},
    {"text": "Hallo, willkommen bei AudioPod AI", "lang": "de"}
]

voice_uuid = "550e8400-e29b-41d4-a716-446655440000"  # Your multi-language voice

for item in languages:
    job = client.voice.generate_speech(
        voice_identifier=voice_uuid,
        input_text=item["text"],
        language=item["lang"],
        audio_format="mp3"
    )
    
    # Wait for completion and download
    while True:
        status = client.voice.get_job_status(job.job_id)
        if status['status'] == 'completed':
            # Download audio
            audio_response = requests.get(status['output_url'])
            filename = f"welcome_{item['lang']}.mp3"
            with open(filename, "wb") as f:
                f.write(audio_response.content)
            print(f"Generated {filename}")
            break
        elif status['status'] == 'failed':
            print(f"Failed to generate {item['lang']}: {status.get('error_message')}")
            break
        time.sleep(2)

Get Supported Languages for a Voice

# Get supported languages for a specific voice
supported_languages = client.voice.get_supported_languages(voice_identifier="aura")
print(f"Voice supports {len(supported_languages)} languages:")
for code, name in supported_languages.items():
    print(f"- {code}: {name}")

Use Cases & Examples

Audiobook Narration

Python
Node.js
Batch Processing

from audiopod import Client
import re

client = Client()

def split_text_by_sentences(text, max_length=2000):
    """Split text into chunks by sentences, respecting max length"""
    sentences = re.split(r'[.!?]+', text)
    chunks = []
    current_chunk = ""
    
    for sentence in sentences:
        if len(current_chunk + sentence) > max_length:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = sentence
        else:
            current_chunk += sentence + ". "
    
    if current_chunk:
        chunks.append(current_chunk.strip())
    
    return chunks

def create_audiobook_chapter(chapter_text, narrator_voice="sage"):
    """Create audiobook chapter with professional narration"""
    # Split long text into manageable chunks
    chunks = split_text_by_sentences(chapter_text, max_length=2000)
    audio_urls = []

    for i, chunk in enumerate(chunks):
        print(f"Processing chunk {i+1}/{len(chunks)}: {chunk[:50]}...")

        result = client.voice.generate_speech(
            voice_id=narrator_voice,
            text=chunk,
            audio_format="wav",  # High quality for audiobooks
            speed=0.95,  # Slightly slower for audiobooks
            wait_for_completion=True
        )

        audio_urls.append(result.output_url)
        print(f"Generated audio for chunk {i+1}")

    return audio_urls

# Example usage
chapter_text = """
Chapter 1: The Beginning

It was the best of times, it was the worst of times. The era was filled with
contradictions that would shape the destiny of nations. In this tumultuous
period, heroes would rise and fall, love would conquer fear, and the very
fabric of society would be tested by forces beyond imagination.

As dawn broke over the ancient city, our protagonist began a journey that
would change everything they thought they knew about the world.
"""

audio_urls = create_audiobook_chapter(chapter_text, "sage")
print(f"Audiobook chapter complete! Generated {len(audio_urls)} audio segments.")

const { AudioPodClient } = require('audiopod-js');
const client = new AudioPodClient();

function splitTextBySentences(text, maxLength = 2000) {
  const sentences = text.split(/[.!?]+/);
  const chunks = [];
  let currentChunk = '';
  
  for (const sentence of sentences) {
    if ((currentChunk + sentence).length > maxLength) {
      if (currentChunk) {
        chunks.push(currentChunk.trim());
      }
      currentChunk = sentence;
    } else {
      currentChunk += sentence + '. ';
    }
  }
  
  if (currentChunk) {
    chunks.push(currentChunk.trim());
  }
  
  return chunks;
}

async function createAudiobookChapter(chapterText, narratorVoice = 'sage') {
  try {
    // Split long text into manageable chunks
    const chunks = splitTextBySentences(chapterText, 2000);
    const audioUrls = [];

    for (let i = 0; i < chunks.length; i++) {
      const chunk = chunks[i];
      console.log(`Processing chunk ${i+1}/${chunks.length}: ${chunk.substring(0, 50)}...`);

      const result = await client.voice.generateSpeech({
        voiceId: narratorVoice,
        text: chunk,
        audioFormat: 'wav',  // High quality for audiobooks
        speed: 0.95,  // Slightly slower for audiobooks
        waitForCompletion: true
      });

      audioUrls.push(result.outputUrl);
      console.log(`Generated audio for chunk ${i+1}`);
    }

    return audioUrls;

  } catch (error) {
    console.error('Error creating audiobook:', error.message);
    throw error;
  }
}

// Example usage
const chapterText = `
Chapter 1: The Beginning

It was the best of times, it was the worst of times. The era was filled with
contradictions that would shape the destiny of nations. In this tumultuous
period, heroes would rise and fall, love would conquer fear, and the very
fabric of society would be tested by forces beyond imagination.

As dawn broke over the ancient city, our protagonist began a journey that
would change everything they thought they knew about the world.
`;

createAudiobookChapter(chapterText, 'sage')
  .then(audioUrls => {
    console.log(`Audiobook chapter complete! Generated ${audioUrls.length} audio segments.`);
    audioUrls.forEach((url, index) => {
      console.log(`Segment ${index + 1}: ${url}`);
    });
  })
  .catch(error => {
    console.error('Failed to create audiobook:', error);
  });

from audiopod import Client
import asyncio
import aiohttp
import json

client = Client()

async def create_audiobook_chapter_parallel(chapter_text, narrator_voice="sage"):
    """Create audiobook with parallel processing for faster generation"""
    chunks = split_text_by_sentences(chapter_text, max_length=2000)
    
    # Create all jobs first (parallel creation)
    jobs = []
    for chunk in chunks:
        job = client.voice.generate_speech(
            voice_id=narrator_voice,
            text=chunk,
            audio_format="wav",
            speed=0.95,
            wait_for_completion=False  # Don't wait, create jobs in parallel
        )
        jobs.append(job)
    
    print(f"Created {len(jobs)} generation jobs")
    
    # Wait for all jobs to complete
    audio_urls = []
    for i, job in enumerate(jobs):
        print(f"Waiting for job {i+1}/{len(jobs)}")
        
        # Poll job status
        while True:
            status = client.voice.get_job_status(job.id)
            if status.status == 'completed':
                audio_urls.append(status.output_url)
                break
            elif status.status == 'failed':
                print(f"Job {job.id} failed: {status.error_message}")
                break
            
            await asyncio.sleep(2)
    
    return audio_urls

# Example with error handling and progress tracking
def create_audiobook_with_progress(chapter_text, narrator_voice="sage"):
    """Create audiobook with detailed progress tracking"""
    chunks = split_text_by_sentences(chapter_text, max_length=2000)
    total_chunks = len(chunks)
    completed_chunks = 0
    audio_urls = []
    
    print(f"Starting audiobook generation: {total_chunks} chunks")
    
    for i, chunk in enumerate(chunks):
        try:
            print(f"[{i+1}/{total_chunks}] Processing: {chunk[:50]}...")
            
            result = client.voice.generate_speech(
                voice_id=narrator_voice,
                text=chunk,
                audio_format="wav",
                speed=0.95,
                wait_for_completion=True
            )
            
            audio_urls.append(result.output_url)
            completed_chunks += 1
            
            progress = (completed_chunks / total_chunks) * 100
            print(f"[{i+1}/{total_chunks}] ✅ Complete ({progress:.1f}%)")
            
        except Exception as e:
            print(f"[{i+1}/{total_chunks}] ❌ Failed: {e}")
            # You could implement retry logic here
            
    print(f"Audiobook generation complete! {completed_chunks}/{total_chunks} chunks successful")
    return audio_urls

Podcast Introduction

intro_script = '''
<speak>
    <p><emphasis level="strong">Welcome back to the Tech Talk Podcast!</emphasis></p>
    <break time="1s"/>
    <p>I'm your host, and today we're diving deep into
    <emphasis>artificial intelligence</emphasis> and its impact on content creation.</p>
    <break time="0.5s"/>
    <p>Let's get started!</p>
</speak>
'''

intro_audio = client.text_to_speech.create(
    text=intro_script,
    voice_id="ava",
    text_format="ssml",
    emotion="professional",
    quality="high"
)

E-Learning Content

def create_lesson_audio(lesson_content):
    # Add natural pauses for learning
    formatted_content = f'''
    <speak>
        <p>Lesson begins now.</p>
        <break time="1s"/>
        {lesson_content}
        <break time="2s"/>
        <p>That concludes this lesson. Take a moment to review what you've learned.</p>
    </speak>
    '''

    return client.text_to_speech.create(
        text=formatted_content,
        voice_id="sage",  # Educational voice
        text_format="ssml",
        speed=0.9,  # Slower for learning
        quality="high"
    )

Interactive Voice Response (IVR)

ivr_prompts = {
    "welcome": "Thank you for calling AudioPod AI. Your call is important to us.",
    "menu": "Press 1 for sales, 2 for support, or 3 for billing.",
    "hold": "Please hold while we connect you to the next available agent."
}

for prompt_name, text in ivr_prompts.items():
    audio = client.text_to_speech.create(
        text=text,
        voice_id="ava",  # Professional voice for business
        quality="standard",  # Lower quality for phone systems
        sample_rate=8000,   # Phone quality
        output_format="wav"
    )

    with open(f"ivr_{prompt_name}.wav", "wb") as f:
        f.write(audio.audio_data)

Best Practices

Text Optimization

✅ Good Practices:

Use proper punctuation for natural pauses
Write numbers in word form for better pronunciation
Include context for abbreviations
Break long sentences into shorter ones

❌ Common Issues:

ALL CAPS TEXT (sounds like shouting)
Missing punctuation (unnatural flow)
Technical jargon without context
Extremely long paragraphs

Cost Optimization

# Efficient: Single request for multiple sentences
long_text = "First sentence. Second sentence. Third sentence."
job = client.voice.generate_speech(
    voice_identifier="aura",
    input_text=long_text,
    audio_format="mp3"
)

# Inefficient: Multiple requests for short texts
# This creates multiple jobs and uses more credits due to per-job overhead
sentences = ["First sentence.", "Second sentence.", "Third sentence."]
for sentence in sentences:
    job = client.voice.generate_speech(
        voice_identifier="aura", 
        input_text=sentence,
        audio_format="mp3"
    )

Caching Strategy

import hashlib
import os
import requests
import time

def get_cached_audio(text, voice_identifier, cache_dir="audio_cache"):
    # Create cache key from text and voice identifier
    cache_key = hashlib.md5(f"{text}_{voice_identifier}".encode()).hexdigest()
    cache_file = os.path.join(cache_dir, f"{cache_key}.mp3")

    if os.path.exists(cache_file):
        with open(cache_file, "rb") as f:
            return f.read()

    # Generate new audio using job-based API
    job = client.voice.generate_speech(
        voice_identifier=voice_identifier,
        input_text=text,
        audio_format="mp3"
    )
    
    # Wait for completion
    while True:
        status = client.voice.get_job_status(job.job_id)
        if status['status'] == 'completed':
            # Download audio
            audio_response = requests.get(status['output_url'])
            audio_data = audio_response.content
            
            # Cache the result
            os.makedirs(cache_dir, exist_ok=True)
            with open(cache_file, "wb") as f:
                f.write(audio_data)
            
            return audio_data
        elif status['status'] == 'failed':
            raise Exception(f"Audio generation failed: {status.get('error_message')}")
        
        time.sleep(2)

Error Handling

Common Errors and Solutions

400 Bad Request - Invalid Text

Causes: - Text too long (>5000 characters) - Invalid characters or encoding - Empty text field Solutions: - Split long text into chunks - Check text encoding (UTF-8) - Validate text is not empty

404 Not Found - Invalid Voice

Causes: - Voice identifier doesn’t exist - Voice not accessible by user

Voice UUID format invalid Solutions: - Check available voices with voice profiles endpoint - Verify voice UUID format - Ensure voice is public or owned by user

402 Payment Required - Insufficient Credits

Causes: - Not enough credits for audio generation - Credit limit exceeded Solutions: - Check credit balance - Purchase more credits - Wait for credit reset

429 Too Many Requests

Causes: - Rate limit exceeded - Too many concurrent requests Solutions: - Implement exponential backoff - Use request queuing - Upgrade to higher rate limits

Robust Error Handling

import time
import random
import requests
from requests.exceptions import HTTPError

def generate_speech_with_retry(text, voice_identifier, max_retries=3):
    for attempt in range(max_retries):
        try:
            # Create generation job
            job = client.voice.generate_speech(
                voice_identifier=voice_identifier,
                input_text=text,
                audio_format="mp3"
            )
            
            # Monitor job with retry logic for status checks
            while True:
                try:
                    status = client.voice.get_job_status(job.job_id)
                    
                    if status['status'] == 'completed':
                        # Download and return audio
                        audio_response = requests.get(status['output_url'])
                        audio_response.raise_for_status()
                        return audio_response.content
                    elif status['status'] == 'failed':
                        raise Exception(f"Job failed: {status.get('error_message')}")
                    
                    time.sleep(2)  # Wait before checking again
                    
                except HTTPError as e:
                    if e.response.status_code == 429:
                        # Rate limit on status check - wait longer
                        time.sleep(5)
                        continue
                    else:
                        raise

        except HTTPError as e:
            if e.response.status_code == 429:  # Rate limit
                # Exponential backoff with jitter
                delay = (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited, retrying in {delay:.1f}s (attempt {attempt + 1})")
                time.sleep(delay)
                continue
            elif e.response.status_code == 400:
                # Bad request - don't retry
                print(f"Bad request: {e.response.text}")
                raise e
            elif e.response.status_code == 402:
                # Insufficient credits - don't retry
                print("Insufficient credits")
                raise e
            elif e.response.status_code == 404:
                # Voice not found - don't retry
                print(f"Voice not found: {voice_identifier}")
                raise e
            else:
                # Other errors - retry with delay
                print(f"Error {e.response.status_code}, retrying...")
                time.sleep(1)
                continue
        except Exception as e:
            print(f"Unexpected error: {e}")
            time.sleep(1)
            continue

    raise Exception("Max retries exceeded")

# Usage example
try:
    audio_data = generate_speech_with_retry(
        text="Hello, this is a test with retry logic.",
        voice_identifier="aura"
    )
    with open("output.mp3", "wb") as f:
        f.write(audio_data)
    print("Audio generated successfully with retry logic")
except Exception as e:
    print(f"Failed to generate audio: {e}")

Pricing

Text to Speech pricing is based on audio duration:

330 credits per minute of generated audio
Pricing is calculated on the actual audio output duration
All voice types (standard and custom) use the same rate
Quality settings don’t affect pricing

Cost Examples

Audio Duration	Credits Used	USD Cost
30 seconds	165 credits	$0.022
1 minute	330 credits	$0.044
2 minutes	660 credits	$0.088
5 minutes	1,650 credits	$0.220

Audio Processing

Speech

Voice

Music

Utilities

​Overview

​Supported Voice Types

​Key Features

​Authentication

​Generate Speech

​Basic Text to Speech

​Voice Identification Examples

​Available Voices

​Voice Types

​Popular Standard Voices

​Listing Available Voices

​Job Status Tracking

​Check Job Status

​Job Status Response

​Status Values

​Configuration Options

​Audio Format Options

​Basic Voice Customization

​Language Detection

​Multi-Language Support

​Language Codes

​Multi-Language Example

​Get Supported Languages for a Voice

​Use Cases & Examples

​Audiobook Narration

​Podcast Introduction

​E-Learning Content

​Interactive Voice Response (IVR)

​Best Practices

​Text Optimization

​Cost Optimization

​Caching Strategy

​Error Handling

​Common Errors and Solutions

​Robust Error Handling

​Pricing

​Cost Examples

​Next Steps

Voice Management

Voice Examples