Speaker Extraction

Overview

AudioPod AI’s Speaker Extraction API automatically separates multiple speakers in audio recordings into individual speaker-specific audio files. The service identifies who speaks when and creates clean, separate audio tracks for each speaker while preserving original audio quality.

Key Features

Speaker Separation: Generate separate audio files for each detected speaker
Timeline Generation: Get detailed RTTM files with speaker timestamps
Speaker Analytics: Duration and quality statistics for each speaker
Multi-Format Support: Process audio and video files (WAV, MP3, M4A, MP4, etc.)
URL Processing: Extract speakers from YouTube and other video platforms
Smart Detection: Automatic speaker detection or specify expected number
Quality Preservation: Maintains original audio quality in extracted files

Authentication

All endpoints require authentication:

API Key: Authorization: Bearer your_api_key
JWT Token: Authorization: Bearer your_jwt_token

Extract from File Upload

Upload an audio or video file to extract individual speaker tracks.

POST
Python
cURL

POST /api/v1/speaker/extract
Authorization: Bearer {api_key}
Content-Type: multipart/form-data

file: (audio/video file)
num_speakers: 4

import requests

with open("podcast_episode.mp3", "rb") as audio_file:
    response = requests.post(
        "https://api.audiopod.ai/api/v1/speaker/extract",
        headers={"Authorization": f"Bearer {api_key}"},
        data={"num_speakers": 4},  # Optional: specify expected speakers
        files={"file": audio_file}
    )

if response.status_code == 200:
    extraction_job = response.json()
    job_id = extraction_job["id"]
    print(f"Speaker extraction job created: {job_id}")

curl -X POST "https://api.audiopod.ai/api/v1/speaker/extract" \
  -H "Authorization: Bearer your_api_key" \
  -F "file=@podcast_episode.mp3" \
  -F "num_speakers=4"

Extract from URL

Extract speakers from audio/video URLs (YouTube, Vimeo, etc.).

POST
Python
cURL

POST /api/v1/speaker/extract
Authorization: Bearer {api_key}
Content-Type: application/x-www-form-urlencoded

url=https://youtube.com/watch?v=example123&num_speakers=3

response = requests.post(
    "https://api.audiopod.ai/api/v1/speaker/extract",
    headers={"Authorization": f"Bearer {api_key}"},
    data={
        "url": "https://youtube.com/watch?v=example123",
        "num_speakers": 3  # Optional: specify expected speakers
    }
)

if response.status_code == 200:
    job_data = response.json()
    print(f"URL extraction started: {job_data['id']}")

curl -X POST "https://api.audiopod.ai/api/v1/speaker/extract" \
  -H "Authorization: Bearer your_api_key" \
  -d "url=https://youtube.com/watch?v=example123" \
  -d "num_speakers=3"

Response:

{
  "id": 123,
  "job_type": "extraction",
  "status": "PENDING",
  "created_at": "2024-01-15T10:30:00Z",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "task_id": "celery_task_uuid_here"
}

Job Management

Get Job Status

Monitor the progress of speaker extraction jobs.

GET
Python

GET /api/v1/speaker/jobs/{job_id}
Authorization: Bearer {api_key}

response = requests.get(
    f"https://api.audiopod.ai/api/v1/speaker/jobs/{job_id}",
    headers={"Authorization": f"Bearer {api_key}"}
)

if response.status_code == 200:
    job_status = response.json()
    print(f"Status: {job_status['status']}")
    
    if job_status["status"] == "COMPLETED":
        print("Extraction complete!")
        if job_status["result"]:
            result = job_status["result"]
            print(f"Extracted {len(result['speakers'])} speakers")
            for speaker in result['speakers']:
                print(f"- {speaker['label']}: {speaker.get('download_url', 'Processing...')}")

Response (Completed Extraction):

{
  "id": 123,
  "job_type": "extraction",
  "status": "COMPLETED",
  "created_at": "2024-01-15T10:30:00Z",
  "completed_at": "2024-01-15T10:35:30Z",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "task_id": "celery_task_uuid_here",
  "result": {
    "speakers": [
      {
        "id": 0,
        "label": "SPEAKER_0",
        "audio_path": "processed/123/speaker_0.wav",
        "download_url": "https://s3.amazonaws.com/...",
        "audio_stats": {
          "rms_db": -12.3,
          "peak": 0.85
        }
      },
      {
        "id": 1,
        "label": "SPEAKER_1",
        "audio_path": "processed/123/speaker_1.wav",
        "download_url": "https://s3.amazonaws.com/...",
        "audio_stats": {
          "rms_db": -15.7,
          "peak": 0.72
        }
      }
    ],
    "files": [
      {
        "type": "audio",
        "speaker": "SPEAKER_0",
        "path": "processed/123/speaker_0.wav",
        "download_url": "https://s3.amazonaws.com/..."
      },
      {
        "type": "audio",
        "speaker": "SPEAKER_1",
        "path": "processed/123/speaker_1.wav",
        "download_url": "https://s3.amazonaws.com/..."
      },
      {
        "type": "rttm",
        "path": "processed/123/extraction.rttm",
        "download_url": "https://s3.amazonaws.com/..."
      }
    ],
    "rttm_path": "processed/123/extraction.rttm"
  }
}

List Extraction Jobs

Get all speaker extraction jobs for the authenticated user.

GET
Python

GET /api/v1/speaker/jobs?job_type=extraction&status=COMPLETED&limit=50
Authorization: Bearer {api_key}

response = requests.get(
    "https://api.audiopod.ai/api/v1/speaker/jobs",
    headers={"Authorization": f"Bearer {api_key}"},
    params={
        "job_type": "extraction",
        "status": "COMPLETED",     # Optional filter
        "skip": 0,
        "limit": 50
    }
)

if response.status_code == 200:
    jobs_data = response.json()
    print(f"Total jobs: {jobs_data['total']}")
    print(f"Has more: {jobs_data['hasMore']}")
    
    for job in jobs_data["items"]:
        print(f"Job {job['id']}: {job['status']} - {job.get('filename', 'N/A')}")
        if job['status'] == 'COMPLETED' and job.get('outputFiles'):
            print(f"  Output files: {len(job['outputFiles'])}")

Response:

{
  "items": [
    {
      "id": 123,
      "job_type": "extraction",
      "status": "COMPLETED",
      "created_at": "2024-01-15T10:30:00Z",
      "completed_at": "2024-01-15T10:35:30Z",
      "user_id": "550e8400-e29b-41d4-a716-446655440000",
      "task_id": "celery_task_uuid_here",
      "filename": "podcast_episode.mp3",
      "display_name": "podcast_episode.mp3",
      "outputFiles": [
        {
          "type": "audio",
          "speaker": "SPEAKER_0",
          "path": "processed/123/speaker_0.wav"
        },
        {
          "type": "audio", 
          "speaker": "SPEAKER_1",
          "path": "processed/123/speaker_1.wav"
        }
      ]
    }
  ],
  "hasMore": false,
  "total": 1
}

Retry Failed Job

Retry a failed speaker extraction job.

POST
Python
cURL

POST /api/v1/speaker/jobs/{job_id}/retry
Authorization: Bearer {api_key}

response = requests.post(
    f"https://api.audiopod.ai/api/v1/speaker/jobs/{job_id}/retry",
    headers={"Authorization": f"Bearer {api_key}"}
)

if response.status_code == 200:
    retried_job = response.json()
    print(f"Extraction job {retried_job['id']} retried successfully")
    print(f"New task ID: {retried_job['task_id']}")

curl -X POST "https://api.audiopod.ai/api/v1/speaker/jobs/123/retry" \
  -H "Authorization: Bearer your_api_key"

Response:

{
  "id": 123,
  "job_type": "extraction",
  "status": "PROCESSING",
  "created_at": "2024-01-15T10:30:00Z",
  "task_id": "new_celery_task_uuid_here",
  "user_id": "550e8400-e29b-41d4-a716-446655440000"
}

Delete Job

Remove a speaker extraction job and its associated files.

DELETE
Python
cURL

DELETE /api/v1/speaker/jobs/{job_id}
Authorization: Bearer {api_key}

response = requests.delete(
    f"https://api.audiopod.ai/api/v1/speaker/jobs/{job_id}",
    headers={"Authorization": f"Bearer {api_key}"}
)

if response.status_code == 204:
    print("Extraction job and files deleted successfully")
elif response.status_code == 404:
    print("Job not found or access denied")

curl -X DELETE "https://api.audiopod.ai/api/v1/speaker/jobs/123" \
  -H "Authorization: Bearer your_api_key"

Response: 204 No Content on successful deletion

Supported Formats

Audio Formats:

WAV, MP3, M4A, AAC, FLAC, OGG, OPUS, WebM
WMA, Speex, and other common formats

Video Formats:

MP4, AVI, MOV, MKV, WebM
Audio will be extracted automatically from video files

URL Sources:

YouTube, Vimeo, and other video platforms
Direct audio/video file URLs

Error Handling

400 Bad Request - Invalid Input

{
  "error_code": "INVALID_AUDIO_FORMAT",
  "message": "Invalid file type. Must be audio or video file.",
  "details": {
    "content_type": "text/plain",
    "extension": ".txt",
    "supported_formats": ["audio/wav", "audio/mp3", "audio/m4a", "video/mp4"],
    "supported_extensions": [".wav", ".mp3", ".m4a", ".mp4", ".avi", ".mov"]
  }
}

Causes: Invalid file format, missing file/URL, or both file and URL providedSolutions: Use supported audio/video formats, provide either file OR URL (not both)

402 Payment Required - Insufficient Credits

{
  "detail": "Insufficient credits for processing. Required: 8250, Available: 1000"
}

Causes: Not enough credits for the audio durationSolutions: Purchase additional credits or process shorter audio files

422 Processing Error - Extraction Failed

{
  "error_code": "PROCESSING_FAILED", 
  "message": "Failed to extract speakers from audio",
  "details": {
    "reason": "Audio quality too poor or no distinguishable speakers found"
  }
}

Causes: Poor audio quality, no speech content, or indistinguishable speakersSolutions: Ensure clear speech content, try noise reduction first, or verify multiple speakers exist

404 Not Found - Job Not Found

{
  "detail": "Job not found or access denied"
}

Causes: Invalid job ID or trying to access another user’s jobSolutions: Verify job ID and ensure you own the job

429 Too Many Requests - Rate Limit

{
  "detail": "Rate limit exceeded. Try again later."
}

Causes: Exceeded 100 requests per minute limitSolutions: Wait before making additional requests or implement request throttling

Pricing

Speaker extraction costs are based on audio duration:

Service	Cost	Description
Speaker Extraction	1650 credits/minute	Generate separate audio files for each speaker

Note: Credits are charged per second of audio (27.5 credits/second)

Cost Examples

Duration	Service	Credits	USD Cost*
5 minutes	Extraction	8,250	~$1.10
15 minutes	Extraction	24,750	~$3.30
30 minutes	Extraction	49,500	~$6.60
1 hour	Extraction	99,000	~$13.20

*USD cost estimates based on standard credit pricing. Actual costs may vary based on subscription plan.

Rate Limits

100 requests per minute per API key
Rate limits apply per endpoint
Exceeding limits returns 429 Too Many Requests

Next Steps

Speech-to-Text

Transcribe individual speaker tracks with improved accuracy.

Noise Reduction

Clean up audio before speaker extraction for better results.

Audio Processing

Speech

Voice

Music

Utilities

Speaker Extraction

Overview

Key Features

Authentication