Skip to main content

Overview

AudioPod AI’s Speaker Extraction API automatically separates multiple speakers in audio recordings into individual speaker-specific audio files. The service identifies who speaks when and creates clean, separate audio tracks for each speaker while preserving original audio quality.

Key Features

  • Speaker Separation: Generate separate audio files for each detected speaker
  • Timeline Generation: Get detailed RTTM files with speaker timestamps
  • Speaker Analytics: Duration and quality statistics for each speaker
  • Multi-Format Support: Process audio and video files (WAV, MP3, M4A, MP4, etc.)
  • URL Processing: Extract speakers from YouTube and other video platforms
  • Smart Detection: Automatic speaker detection or specify expected number
  • Quality Preservation: Maintains original audio quality in extracted files

Authentication

All endpoints require authentication:
  • API Key: Authorization: Bearer your_api_key
  • JWT Token: Authorization: Bearer your_jwt_token

Speaker Extraction

Extract from File Upload

Upload an audio or video file to extract individual speaker tracks.
  • POST
  • Python
  • cURL
POST /api/v1/speaker/extract
Authorization: Bearer {api_key}
Content-Type: multipart/form-data

file: (audio/video file)
num_speakers: 4

Extract from URL

Extract speakers from audio/video URLs (YouTube, Vimeo, etc.).
  • POST
  • Python
  • cURL
POST /api/v1/speaker/extract
Authorization: Bearer {api_key}
Content-Type: application/x-www-form-urlencoded

url=https://youtube.com/watch?v=example123&num_speakers=3
Response:
{
  "id": 123,
  "job_type": "extraction",
  "status": "PENDING",
  "created_at": "2024-01-15T10:30:00Z",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "task_id": "celery_task_uuid_here"
}

Job Management

Get Job Status

Monitor the progress of speaker extraction jobs.
  • GET
  • Python
GET /api/v1/speaker/jobs/{job_id}
Authorization: Bearer {api_key}
Response (Completed Extraction):
{
  "id": 123,
  "job_type": "extraction",
  "status": "COMPLETED",
  "created_at": "2024-01-15T10:30:00Z",
  "completed_at": "2024-01-15T10:35:30Z",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "task_id": "celery_task_uuid_here",
  "result": {
    "speakers": [
      {
        "id": 0,
        "label": "SPEAKER_0",
        "audio_path": "processed/123/speaker_0.wav",
        "download_url": "https://s3.amazonaws.com/...",
        "audio_stats": {
          "rms_db": -12.3,
          "peak": 0.85
        }
      },
      {
        "id": 1,
        "label": "SPEAKER_1",
        "audio_path": "processed/123/speaker_1.wav",
        "download_url": "https://s3.amazonaws.com/...",
        "audio_stats": {
          "rms_db": -15.7,
          "peak": 0.72
        }
      }
    ],
    "files": [
      {
        "type": "audio",
        "speaker": "SPEAKER_0",
        "path": "processed/123/speaker_0.wav",
        "download_url": "https://s3.amazonaws.com/..."
      },
      {
        "type": "audio",
        "speaker": "SPEAKER_1",
        "path": "processed/123/speaker_1.wav",
        "download_url": "https://s3.amazonaws.com/..."
      },
      {
        "type": "rttm",
        "path": "processed/123/extraction.rttm",
        "download_url": "https://s3.amazonaws.com/..."
      }
    ],
    "rttm_path": "processed/123/extraction.rttm"
  }
}

List Extraction Jobs

Get all speaker extraction jobs for the authenticated user.
  • GET
  • Python
GET /api/v1/speaker/jobs?job_type=extraction&status=COMPLETED&limit=50
Authorization: Bearer {api_key}
Response:
{
  "items": [
    {
      "id": 123,
      "job_type": "extraction",
      "status": "COMPLETED",
      "created_at": "2024-01-15T10:30:00Z",
      "completed_at": "2024-01-15T10:35:30Z",
      "user_id": "550e8400-e29b-41d4-a716-446655440000",
      "task_id": "celery_task_uuid_here",
      "filename": "podcast_episode.mp3",
      "display_name": "podcast_episode.mp3",
      "outputFiles": [
        {
          "type": "audio",
          "speaker": "SPEAKER_0",
          "path": "processed/123/speaker_0.wav"
        },
        {
          "type": "audio", 
          "speaker": "SPEAKER_1",
          "path": "processed/123/speaker_1.wav"
        }
      ]
    }
  ],
  "hasMore": false,
  "total": 1
}

Retry Failed Job

Retry a failed speaker extraction job.
  • POST
  • Python
  • cURL
POST /api/v1/speaker/jobs/{job_id}/retry
Authorization: Bearer {api_key}
Response:
{
  "id": 123,
  "job_type": "extraction",
  "status": "PROCESSING",
  "created_at": "2024-01-15T10:30:00Z",
  "task_id": "new_celery_task_uuid_here",
  "user_id": "550e8400-e29b-41d4-a716-446655440000"
}

Delete Job

Remove a speaker extraction job and its associated files.
  • DELETE
  • Python
  • cURL
DELETE /api/v1/speaker/jobs/{job_id}
Authorization: Bearer {api_key}
Response: 204 No Content on successful deletion

Supported Formats

Audio Formats:
  • WAV, MP3, M4A, AAC, FLAC, OGG, OPUS, WebM
  • WMA, Speex, and other common formats
Video Formats:
  • MP4, AVI, MOV, MKV, WebM
  • Audio will be extracted automatically from video files
URL Sources:
  • YouTube, Vimeo, and other video platforms
  • Direct audio/video file URLs

Error Handling

{
  "error_code": "INVALID_AUDIO_FORMAT",
  "message": "Invalid file type. Must be audio or video file.",
  "details": {
    "content_type": "text/plain",
    "extension": ".txt",
    "supported_formats": ["audio/wav", "audio/mp3", "audio/m4a", "video/mp4"],
    "supported_extensions": [".wav", ".mp3", ".m4a", ".mp4", ".avi", ".mov"]
  }
}
Causes: Invalid file format, missing file/URL, or both file and URL providedSolutions: Use supported audio/video formats, provide either file OR URL (not both)
{
  "detail": "Insufficient credits for processing. Required: 8250, Available: 1000"
}
Causes: Not enough credits for the audio durationSolutions: Purchase additional credits or process shorter audio files
{
  "error_code": "PROCESSING_FAILED", 
  "message": "Failed to extract speakers from audio",
  "details": {
    "reason": "Audio quality too poor or no distinguishable speakers found"
  }
}
Causes: Poor audio quality, no speech content, or indistinguishable speakersSolutions: Ensure clear speech content, try noise reduction first, or verify multiple speakers exist
{
  "detail": "Job not found or access denied"
}
Causes: Invalid job ID or trying to access another user’s jobSolutions: Verify job ID and ensure you own the job
{
  "detail": "Rate limit exceeded. Try again later."
}
Causes: Exceeded 100 requests per minute limitSolutions: Wait before making additional requests or implement request throttling

Pricing

Speaker extraction costs are based on audio duration:
ServiceCostDescription
Speaker Extraction1650 credits/minuteGenerate separate audio files for each speaker
Note: Credits are charged per second of audio (27.5 credits/second)

Cost Examples

DurationServiceCreditsUSD Cost*
5 minutesExtraction8,250~$1.10
15 minutesExtraction24,750~$3.30
30 minutesExtraction49,500~$6.60
1 hourExtraction99,000~$13.20
*USD cost estimates based on standard credit pricing. Actual costs may vary based on subscription plan.

Rate Limits

  • 100 requests per minute per API key
  • Rate limits apply per endpoint
  • Exceeding limits returns 429 Too Many Requests

Next Steps