> ## Documentation Index
> Fetch the complete documentation index at: https://docs.audiopod.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Speaker Extraction

> Extract individual speakers from multi-speaker audio recordings into separate, clean audio files using advanced AI-powered speaker separation technology.

## Overview

AudioPod AI's Speaker Extraction API automatically separates multiple speakers in audio recordings into individual speaker-specific audio files. The service identifies who speaks when and creates clean, separate audio tracks for each speaker while preserving original audio quality.

### Key Features

* **Speaker Separation**: Generate separate audio files for each detected speaker
* **Timeline Generation**: Get detailed RTTM files with speaker timestamps
* **Speaker Analytics**: Duration and quality statistics for each speaker
* **Multi-Format Support**: Process audio and video files (WAV, MP3, M4A, MP4, etc.)
* **URL Processing**: Extract speakers from YouTube and other video platforms
* **Smart Detection**: Automatic speaker detection or specify expected number
* **Quality Preservation**: Maintains original audio quality in extracted files

## Authentication

All endpoints require authentication. Use one of these methods:

* **API Key (Recommended)**: `X-API-Key: your_api_key` header
* **JWT Token**: `Authorization: Bearer your_jwt_token` (for session-based auth)

## Speaker Extraction

### Extract from File Upload

Upload an audio or video file to extract individual speaker tracks.

<Tabs>
  <Tab title="POST">
    ```http theme={null}
    POST /api/v1/speaker/extract
    X-API-Key: {api_key}
    Content-Type: multipart/form-data

    file: (audio/video file)
    num_speakers: 4
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    import requests

    with open("podcast_episode.mp3", "rb") as audio_file:
        response = requests.post(
            "https://api.audiopod.ai/api/v1/speaker/extract",
            headers={"X-API-Key": api_key},
            data={"num_speakers": 4},  # Optional: specify expected speakers
            files={"file": audio_file}
        )

    if response.status_code == 200:
        extraction_job = response.json()
        job_id = extraction_job["id"]
        print(f"Speaker extraction job created: {job_id}")
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl -X POST "https://api.audiopod.ai/api/v1/speaker/extract" \
      -H "X-API-Key: your_api_key" \
      -F "file=@podcast_episode.mp3" \
      -F "num_speakers=4"
    ```
  </Tab>
</Tabs>

### Extract from URL

Extract speakers from audio/video URLs (YouTube, Vimeo, etc.).

<Tabs>
  <Tab title="POST">
    ```http theme={null}
    POST /api/v1/speaker/extract
    X-API-Key: {api_key}
    Content-Type: application/x-www-form-urlencoded

    url=https://youtube.com/watch?v=example123&num_speakers=3
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    response = requests.post(
        "https://api.audiopod.ai/api/v1/speaker/extract",
        headers={"X-API-Key": api_key},
        data={
            "url": "https://youtube.com/watch?v=example123",
            "num_speakers": 3  # Optional: specify expected speakers
        }
    )

    if response.status_code == 200:
        job_data = response.json()
        print(f"URL extraction started: {job_data['id']}")
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl -X POST "https://api.audiopod.ai/api/v1/speaker/extract" \
      -H "X-API-Key: your_api_key" \
      -d "url=https://youtube.com/watch?v=example123" \
      -d "num_speakers=3"
    ```
  </Tab>
</Tabs>

**Response:**

```json theme={null}
{
  "id": 123,
  "job_type": "extraction",
  "status": "PENDING",
  "created_at": "2024-01-15T10:30:00Z",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "task_id": "celery_task_uuid_here"
}
```

## Job Management

### Get Job Status

Monitor the progress of speaker extraction jobs.

<Tabs>
  <Tab title="GET">
    ```http theme={null}
    GET /api/v1/speaker/jobs/{job_id}
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    response = requests.get(
        f"https://api.audiopod.ai/api/v1/speaker/jobs/{job_id}",
        headers={"X-API-Key": api_key}
    )

    if response.status_code == 200:
        job_status = response.json()
        print(f"Status: {job_status['status']}")
        
        if job_status["status"] == "COMPLETED":
            print("Extraction complete!")
            if job_status["result"]:
                result = job_status["result"]
                print(f"Extracted {len(result['speakers'])} speakers")
                for speaker in result['speakers']:
                    print(f"- {speaker['label']}: {speaker.get('download_url', 'Processing...')}")
    ```
  </Tab>
</Tabs>

**Response (Completed Extraction):**

```json theme={null}
{
  "id": 123,
  "job_type": "extraction",
  "status": "COMPLETED",
  "created_at": "2024-01-15T10:30:00Z",
  "completed_at": "2024-01-15T10:35:30Z",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "task_id": "celery_task_uuid_here",
  "result": {
    "speakers": [
      {
        "id": 0,
        "label": "SPEAKER_0",
        "audio_path": "processed/123/speaker_0.wav",
        "download_url": "https://s3.amazonaws.com/...",
        "audio_stats": {
          "rms_db": -12.3,
          "peak": 0.85
        }
      },
      {
        "id": 1,
        "label": "SPEAKER_1",
        "audio_path": "processed/123/speaker_1.wav",
        "download_url": "https://s3.amazonaws.com/...",
        "audio_stats": {
          "rms_db": -15.7,
          "peak": 0.72
        }
      }
    ],
    "files": [
      {
        "type": "audio",
        "speaker": "SPEAKER_0",
        "path": "processed/123/speaker_0.wav",
        "download_url": "https://s3.amazonaws.com/..."
      },
      {
        "type": "audio",
        "speaker": "SPEAKER_1",
        "path": "processed/123/speaker_1.wav",
        "download_url": "https://s3.amazonaws.com/..."
      },
      {
        "type": "rttm",
        "path": "processed/123/extraction.rttm",
        "download_url": "https://s3.amazonaws.com/..."
      }
    ],
    "rttm_path": "processed/123/extraction.rttm"
  }
}
```

### List Extraction Jobs

Get all speaker extraction jobs for the authenticated user.

<Tabs>
  <Tab title="GET">
    ```http theme={null}
    GET /api/v1/speaker/jobs?job_type=extraction&status=COMPLETED&limit=50
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    response = requests.get(
        "https://api.audiopod.ai/api/v1/speaker/jobs",
        headers={"X-API-Key": api_key},
        params={
            "job_type": "extraction",
            "status": "COMPLETED",     # Optional filter
            "skip": 0,
            "limit": 50
        }
    )

    if response.status_code == 200:
        jobs_data = response.json()
        print(f"Total jobs: {jobs_data['total']}")
        print(f"Has more: {jobs_data['hasMore']}")
        
        for job in jobs_data["items"]:
            print(f"Job {job['id']}: {job['status']} - {job.get('filename', 'N/A')}")
            if job['status'] == 'COMPLETED' and job.get('outputFiles'):
                print(f"  Output files: {len(job['outputFiles'])}")
    ```
  </Tab>
</Tabs>

**Response:**

```json theme={null}
{
  "items": [
    {
      "id": 123,
      "job_type": "extraction",
      "status": "COMPLETED",
      "created_at": "2024-01-15T10:30:00Z",
      "completed_at": "2024-01-15T10:35:30Z",
      "user_id": "550e8400-e29b-41d4-a716-446655440000",
      "task_id": "celery_task_uuid_here",
      "filename": "podcast_episode.mp3",
      "display_name": "podcast_episode.mp3",
      "outputFiles": [
        {
          "type": "audio",
          "speaker": "SPEAKER_0",
          "path": "processed/123/speaker_0.wav"
        },
        {
          "type": "audio", 
          "speaker": "SPEAKER_1",
          "path": "processed/123/speaker_1.wav"
        }
      ]
    }
  ],
  "hasMore": false,
  "total": 1
}
```

### Retry Failed Job

Retry a failed speaker extraction job.

<Tabs>
  <Tab title="POST">
    ```http theme={null}
    POST /api/v1/speaker/jobs/{job_id}/retry
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    response = requests.post(
        f"https://api.audiopod.ai/api/v1/speaker/jobs/{job_id}/retry",
        headers={"X-API-Key": api_key}
    )

    if response.status_code == 200:
        retried_job = response.json()
        print(f"Extraction job {retried_job['id']} retried successfully")
        print(f"New task ID: {retried_job['task_id']}")
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl -X POST "https://api.audiopod.ai/api/v1/speaker/jobs/123/retry" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    ```
  </Tab>
</Tabs>

**Response:**

```json theme={null}
{
  "id": 123,
  "job_type": "extraction",
  "status": "PROCESSING",
  "created_at": "2024-01-15T10:30:00Z",
  "task_id": "new_celery_task_uuid_here",
  "user_id": "550e8400-e29b-41d4-a716-446655440000"
}
```

### Delete Job

Remove a speaker extraction job and its associated files.

<Tabs>
  <Tab title="DELETE">
    ```http theme={null}
    DELETE /api/v1/speaker/jobs/{job_id}
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    response = requests.delete(
        f"https://api.audiopod.ai/api/v1/speaker/jobs/{job_id}",
        headers={"X-API-Key": api_key}
    )

    if response.status_code == 204:
        print("Extraction job and files deleted successfully")
    elif response.status_code == 404:
        print("Job not found or access denied")
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    curl -X DELETE "https://api.audiopod.ai/api/v1/speaker/jobs/123" \
      -H "X-API-Key: your_api_key"
    ```
  </Tab>
</Tabs>

**Response:** `204 No Content` on successful deletion

## Supported Formats

**Audio Formats:**

* WAV, MP3, M4A, AAC, FLAC, OGG, OPUS, WebM
* WMA, Speex, and other common formats

**Video Formats:**

* MP4, AVI, MOV, MKV, WebM
* Audio will be extracted automatically from video files

**URL Sources:**

* YouTube, Vimeo, and other video platforms
* Direct audio/video file URLs

## Error Handling

<AccordionGroup>
  <Accordion title="400 Bad Request - Invalid Input">
    ```json theme={null}
    {
      "error_code": "INVALID_AUDIO_FORMAT",
      "message": "Invalid file type. Must be audio or video file.",
      "details": {
        "content_type": "text/plain",
        "extension": ".txt",
        "supported_formats": ["audio/wav", "audio/mp3", "audio/m4a", "video/mp4"],
        "supported_extensions": [".wav", ".mp3", ".m4a", ".mp4", ".avi", ".mov"]
      }
    }
    ```

    **Causes:** Invalid file format, missing file/URL, or both file and URL provided

    **Solutions:** Use supported audio/video formats, provide either file OR URL (not both)
  </Accordion>

  <Accordion title="402 Payment Required - Insufficient Credits">
    ```json theme={null}
    {
      "detail": "Insufficient credits for processing. Required: 8250, Available: 1000"
    }
    ```

    **Causes:** Not enough credits for the audio duration

    **Solutions:** Purchase additional credits or process shorter audio files
  </Accordion>

  <Accordion title="422 Processing Error - Extraction Failed">
    ```json theme={null}
    {
      "error_code": "PROCESSING_FAILED", 
      "message": "Failed to extract speakers from audio",
      "details": {
        "reason": "Audio quality too poor or no distinguishable speakers found"
      }
    }
    ```

    **Causes:** Poor audio quality, no speech content, or indistinguishable speakers

    **Solutions:** Ensure clear speech content, try noise reduction first, or verify multiple speakers exist
  </Accordion>

  <Accordion title="404 Not Found - Job Not Found">
    ```json theme={null}
    {
      "detail": "Job not found or access denied"
    }
    ```

    **Causes:** Invalid job ID or trying to access another user's job

    **Solutions:** Verify job ID and ensure you own the job
  </Accordion>

  <Accordion title="429 Too Many Requests - Rate Limit">
    ```json theme={null}
    {
      "detail": "Rate limit exceeded. Try again later."
    }
    ```

    **Causes:** Exceeded 100 requests per minute limit

    **Solutions:** Wait before making additional requests or implement request throttling
  </Accordion>
</AccordionGroup>

## Pricing

Speaker extraction costs are based on audio duration:

| Service            | Cost               | Description                                    |
| ------------------ | ------------------ | ---------------------------------------------- |
| Speaker Extraction | 330 credits/minute | Generate separate audio files for each speaker |

*Note: Credits are charged per second of audio (5.5 credits/second)*

### Cost Examples

| Duration   | Service    | Credits | USD Cost\* |
| ---------- | ---------- | ------- | ---------- |
| 5 minutes  | Extraction | 1,650   | \~\$0.22   |
| 15 minutes | Extraction | 4,950   | \~\$0.66   |
| 30 minutes | Extraction | 9,900   | \~\$1.32   |
| 1 hour     | Extraction | 19,800  | \~\$2.64   |

\*USD cost estimates based on standard credit pricing. Actual costs may vary based on subscription plan.

## Rate Limits

* **100 requests per minute** per API key
* Rate limits apply per endpoint
* Exceeding limits returns `429 Too Many Requests`

## Next Steps

<Columns cols={2}>
  <Card title="Speech-to-Text" icon="waveform" href="/api-reference/speech-to-text">
    Transcribe individual speaker tracks with improved accuracy.
  </Card>

  <Card title="Noise Reduction" icon="volume-xmark" href="/api-reference/noise-reduction">
    Clean up audio before speaker extraction for better results.
  </Card>
</Columns>
