Overview

AudioPod AI’s Voice Management API provides comprehensive tools for managing custom and pre-built voices. Browse available voices, organize custom voice collections, preview voices, and manage voice metadata for text-to-speech and voice cloning workflows.

Key Features

  • Voice Browsing: Explore available pre-built voices by category
  • Custom Voice Creation: Create new voice models from audio samples
  • Custom Voice Management: Organize and manage cloned voices
  • Voice Collections: Group voices into organized collections
  • Voice Preview: Generate preview samples to test voices
  • Metadata Management: Update voice names, descriptions, and tags
  • Voice Analytics: Track usage and performance metrics
  • Batch Operations: Manage multiple voices efficiently

Authentication

All endpoints require authentication:
  • API Key: Authorization: Bearer your_api_key
  • JWT Token: Authorization: Bearer your_jwt_token

Voice Discovery

List Available Voices

Browse all available voices including pre-built and custom voices.
GET /api/v1/voice/voice-profiles?limit=50
Authorization: Bearer {api_key}
Filter Parameters:
  • voice_type: CUSTOM, STANDARD
  • is_public: true, false
  • include_public: true, false
  • skip: Number to skip for pagination
  • limit: Number of results to return (max 50)
Response:
{
  "voices": [
    {
      "voice_id": "voice_abc123def456",
      "name": "Sarah Professional",
      "category": "professional",
      "language": "en",
      "gender": "female",
      "age_range": "adult",
      "style": "conversational",
      "description": "Clear, professional female voice perfect for business presentations",
      "preview_url": "https://api.audiopod.ai/voice-previews/sarah_professional.mp3",
      "is_custom": false,
      "created_by": null,
      "tags": ["business", "clear", "professional"],
      "usage_stats": {
        "total_generations": 15420,
        "avg_rating": 4.7
      },
      "supported_features": {
        "speed_control": true,
        "pitch_control": true,
      }
    },
    {
      "voice_id": "voice_xyz789ghi012",
      "name": "My Custom Voice",
      "category": "custom",
      "language": "en",
      "gender": "male",
      "age_range": "adult",
      "style": "conversational",
      "description": "Custom voice cloned from my recordings",
      "preview_url": "https://api.audiopod.ai/voice-previews/custom_xyz789.mp3",
      "is_custom": true,
      "created_by": "550e8400-e29b-41d4-a716-446655440000",
      "created_at": "2024-01-10T15:30:00Z",
      "tags": ["personal", "english"],
      "clone_job_id": 456,
      "training_status": "completed",
      "supported_features": {
        "speed_control": true,
        "pitch_control": true,
      }
    }
  ],
  "total": 127,
  "hasMore": true,
  "filters_applied": {
    "category": "professional",
    "language": "en"
  }
}

Get Voice Details

Retrieve detailed information about a specific voice.
GET /api/v1/voice/voices/{voice_id}/status
Authorization: Bearer {api_key}

Create Custom Voices

Upload Single Audio Sample

Create a custom voice model from a single audio file.
POST /api/v1/voice/voice-profiles
Authorization: Bearer {api_key}
Content-Type: multipart/form-data

name: My Custom Voice
description: Professional voice for presentations
file: (audio file)
denoise: true
Response:
{
  "id": 123,
  "uuid": "550e8400-e29b-41d4-a716-446655440000",
  "name": "My Custom Voice",
  "description": "Professional voice for presentations",
  "voice_type": "CUSTOM",
  "status": "PROCESSING",
  "created_at": "2024-01-15T10:30:00Z",
  "processed_at": null,
  "is_public": false,
  "file_path": "/voices/custom/voice_123.wav"
}

Upload Multiple Audio Samples

Create a higher-quality voice model using multiple audio samples.
POST /api/v1/voice/voice-profiles/multi-sample
Authorization: Bearer {api_key}
Content-Type: multipart/form-data

name: High Quality Voice
description: Multi-sample voice for better quality
is_public: false
voice_files: (multiple audio files)

Audio Quality Requirements

For best voice creation results:
  • Duration: 10-60 seconds of clean audio per sample
  • Format: WAV, MP3, or M4A
  • Sample Rate: 22kHz or higher
  • Quality: Clear speech without background noise
  • Content: Natural speech with varied intonation
  • Multiple Samples: 2-5 samples recommended for higher quality

Voice Preview

Generate Voice Preview

Create a preview sample to test how a voice sounds.
GET /api/v1/voice/voice-profiles/{voice_identifier}/preview?audio_format=mp3
Authorization: Bearer {api_key}

Batch Preview Generation

Generate previews for multiple voices with the same text.
POST /api/v1/voice/preview/batch
Authorization: Bearer {api_key}
Content-Type: application/json

{
  "voice_ids": ["voice_abc123", "voice_def456", "voice_ghi789"],
  "text": "Compare how different voices sound with this text.",
  "speed": 1.0
}

Custom Voice Management

List Custom Voices

Get all custom voices created by the authenticated user.
GET /api/v1/voice/voice-profiles?voice_type=CUSTOM&limit=25
Authorization: Bearer {api_key}

Update Custom Voice

Update metadata and settings for a custom voice.
PATCH /api/v1/voice/voices/{voice_id}
Authorization: Bearer {api_key}
Content-Type: application/json

{
  "name": "Updated Voice Name",
  "description": "My updated custom voice description",
  "is_public": false
}

Delete Custom Voice

Remove a custom voice and all associated data.
DELETE /api/v1/voice/voices/{voice_id}
Authorization: Bearer {api_key}

Voice Collections

Create Voice Collection

Organize voices into collections for better management.
POST /api/v1/voice/collections
Authorization: Bearer {api_key}
Content-Type: application/json

{
  "name": "My Podcast Voices",
  "description": "Voices for podcast production",
  "voice_ids": ["voice_abc123", "voice_def456"],
  "tags": ["podcast", "production"]
}

List Voice Collections

Get all voice collections for the authenticated user.
GET /api/v1/voice/collections?limit=25
Authorization: Bearer {api_key}

Add Voices to Collection

Add voices to an existing collection.
POST /api/v1/voice/collections/{collection_id}/voices
Authorization: Bearer {api_key}
Content-Type: application/json

{
  "voice_ids": ["voice_ghi789", "voice_jkl012"]
}

Voice Analytics

Get Voice Usage Statistics

Retrieve usage statistics for voices.
GET /api/v1/voice/analytics/usage?voice_id=voice_abc123&period=30d
Authorization: Bearer {api_key}

Error Handling

Use Cases & Examples

Voice Selection Assistant

def find_best_voice_for_content(content_type, language, gender_preference, api_key):
    """Help users find the best voice for their content"""
    
    # Define voice selection criteria based on content type
    criteria_map = {
        "podcast": {"category": "professional", "style": "conversational"},
        "audiobook": {"category": "professional", "style": "narrative"},
        "assistant": {"category": "professional", "style": "assistant"},
        "character": {"category": "character", "style": "character"},
        "casual": {"category": "casual", "style": "conversational"}
    }
    
    criteria = criteria_map.get(content_type, {"category": "professional"})
    
    # Search for matching voices
    params = {
        "language": language,
        "limit": 10
    }
    
    if gender_preference:
        params["gender"] = gender_preference
        
    if "category" in criteria:
        params["category"] = criteria["category"]
        
    if "style" in criteria:
        params["style"] = criteria["style"]
    
    response = requests.get(
        "https://api.audiopod.ai/api/v1/voice/voices",
        headers={"Authorization": f"Bearer {api_key}"},
        params=params
    )
    
    if response.status_code != 200:
        return {"error": "Failed to fetch voices"}
    
    voices_data = response.json()
    voices = voices_data["voices"]
    
    # Rank voices by suitability
    ranked_voices = []
    
    for voice in voices:
        score = 0
        
        # Prefer voices with higher ratings
        if voice.get("usage_stats", {}).get("avg_rating"):
            score += voice["usage_stats"]["avg_rating"] * 10
        
        # Prefer voices with proven usage
        if voice.get("usage_stats", {}).get("total_generations", 0) > 1000:
            score += 20
        
        # Prefer exact style matches
        if voice.get("style") == criteria.get("style"):
            score += 30
        
        ranked_voices.append({
            "voice": voice,
            "suitability_score": score
        })
    
    # Sort by score
    ranked_voices.sort(key=lambda x: x["suitability_score"], reverse=True)
    
    return {
        "content_type": content_type,
        "language": language,
        "criteria": criteria,
        "recommended_voices": ranked_voices[:5],  # Top 5 recommendations
        "total_matches": len(voices)
    }

# Usage
recommendations = find_best_voice_for_content(
    content_type="podcast",
    language="en", 
    gender_preference="female",
    api_key="your_api_key"
)

print(f"Top recommendations for {recommendations['content_type']} in {recommendations['language']}:")
for i, rec in enumerate(recommendations["recommended_voices"], 1):
    voice = rec["voice"]
    print(f"{i}. {voice['name']} (Score: {rec['suitability_score']:.1f})")
    print(f"   Category: {voice['category']}, Style: {voice.get('style', 'N/A')}")
    print(f"   Rating: {voice.get('usage_stats', {}).get('avg_rating', 'N/A')}")
    print()

Voice Collection Manager

def organize_voices_by_project(voices_to_organize, api_key):
    """Organize voices into project-based collections"""
    
    # Group voices by characteristics for project organization
    voice_groups = {
        "narrative_voices": [],
        "character_voices": [],
        "professional_voices": [],
        "multilingual_voices": []
    }
    
    for voice_id in voices_to_organize:
        # Get voice details
        response = requests.get(
            f"https://api.audiopod.ai/api/v1/voice/voices/{voice_id}",
            headers={"Authorization": f"Bearer {api_key}"}
        )
        
        if response.status_code == 200:
            voice = response.json()
            
            # Categorize voice
            style = voice.get("style", "")
            category = voice.get("category", "")
            
            if style == "narrative" or "audiobook" in voice.get("tags", []):
                voice_groups["narrative_voices"].append(voice_id)
            elif category == "character" or style == "character":
                voice_groups["character_voices"].append(voice_id)
            elif category == "professional":
                voice_groups["professional_voices"].append(voice_id)
            
            # Check if multilingual
            if voice.get("language") != "en" or "multilingual" in voice.get("tags", []):
                voice_groups["multilingual_voices"].append(voice_id)
    
    # Create collections for non-empty groups
    created_collections = []
    
    collection_configs = {
        "narrative_voices": {
            "name": "Narrative & Audiobook Voices",
            "description": "Voices optimized for storytelling and long-form content"
        },
        "character_voices": {
            "name": "Character & Creative Voices", 
            "description": "Unique voices for character work and creative projects"
        },
        "professional_voices": {
            "name": "Professional & Business Voices",
            "description": "Clear, professional voices for business content"
        },
        "multilingual_voices": {
            "name": "International & Multilingual Voices",
            "description": "Voices for international content and multiple languages"
        }
    }
    
    for group_name, voice_ids in voice_groups.items():
        if voice_ids:  # Only create collection if there are voices
            config = collection_configs[group_name]
            
            collection_data = {
                "name": config["name"],
                "description": config["description"],
                "voice_ids": voice_ids,
                "tags": [group_name.replace("_", "-"), "auto-organized"]
            }
            
            response = requests.post(
                "https://api.audiopod.ai/api/v1/voice/collections",
                headers={
                    "Authorization": f"Bearer {api_key}",
                    "Content-Type": "application/json"
                },
                json=collection_data
            )
            
            if response.status_code == 201:
                collection = response.json()
                created_collections.append({
                    "collection_id": collection["collection_id"],
                    "name": collection["name"],
                    "voice_count": len(voice_ids),
                    "voices": voice_ids
                })
    
    return {
        "organized": True,
        "collections_created": len(created_collections),
        "collections": created_collections,
        "total_voices_organized": sum(len(group) for group in voice_groups.values())
    }

# Usage
voice_list = ["voice_abc123", "voice_def456", "voice_ghi789", "voice_jkl012"]
organization_result = organize_voices_by_project(voice_list, "your_api_key")

print(f"Organized {organization_result['total_voices_organized']} voices into {organization_result['collections_created']} collections:")
for collection in organization_result["collections"]:
    print(f"  {collection['name']}: {collection['voice_count']} voices")

Best Practices

Voice Selection Guidelines

# Guidelines for choosing the right voice for different use cases
voice_selection_guide = {
    "content_types": {
        "podcast": {
            "recommended_categories": ["professional", "casual"],
            "preferred_styles": ["conversational"],
            "features_needed": ["speed_control"],
            "avoid": ["character", "synthetic"]
        },
        "audiobook": {
            "recommended_categories": ["professional"],
            "preferred_styles": ["narrative"],
            "features_needed": ["pitch_control", "speed_control"],
            "duration_considerations": "Choose voices with consistent quality for long content"
        },
        "e_learning": {
            "recommended_categories": ["professional"],
            "preferred_styles": ["assistant", "educational"],
            "features_needed": ["speed_control"],
            "clarity_priority": "High clarity and pronunciation accuracy required"
        },
        "marketing": {
            "recommended_categories": ["professional", "character"],
            "preferred_styles": ["promotional", "conversational"],
            "brand_alignment": "Choose voice that matches brand personality"
        }
    },
    "technical_considerations": {
        "processing_speed": "Pre-built voices are faster than custom voices",
        "customization": "Custom voices offer more personalization but require training time",
        "quality": "Professional category voices generally have higher quality",
        "cost": "Custom voices have training costs, pre-built voices are pay-per-use"
    }
}

Pricing

Voice management operations have different pricing structures:
OperationCostDescription
Custom Voice CreationFreeCreate new voice model from audio samples
Voice PreviewFreeGenerate preview samples (up to 50 words)
Voice ListingFreeBrowse available voices
Collection ManagementFreeCreate and manage voice collections
Custom Voice UpdatesFreeUpdate custom voice metadata

Next Steps