Overview
AudioPod AI’s Speaker Extraction API automatically separates multiple speakers in audio recordings into individual speaker-specific audio files. The service identifies who speaks when and creates clean, separate audio tracks for each speaker while preserving original audio quality.Key Features
- Speaker Separation: Generate separate audio files for each detected speaker
- Timeline Generation: Get detailed RTTM files with speaker timestamps
- Speaker Analytics: Duration and quality statistics for each speaker
- Multi-Format Support: Process audio and video files (WAV, MP3, M4A, MP4, etc.)
- URL Processing: Extract speakers from YouTube and other video platforms
- Smart Detection: Automatic speaker detection or specify expected number
- Quality Preservation: Maintains original audio quality in extracted files
Authentication
All endpoints require authentication:- API Key:
Authorization: Bearer your_api_key - JWT Token:
Authorization: Bearer your_jwt_token
Speaker Extraction
Extract from File Upload
Upload an audio or video file to extract individual speaker tracks.- POST
- Python
- cURL
Extract from URL
Extract speakers from audio/video URLs (YouTube, Vimeo, etc.).- POST
- Python
- cURL
Job Management
Get Job Status
Monitor the progress of speaker extraction jobs.- GET
- Python
List Extraction Jobs
Get all speaker extraction jobs for the authenticated user.- GET
- Python
Retry Failed Job
Retry a failed speaker extraction job.- POST
- Python
- cURL
Delete Job
Remove a speaker extraction job and its associated files.- DELETE
- Python
- cURL
204 No Content on successful deletion
Supported Formats
Audio Formats:- WAV, MP3, M4A, AAC, FLAC, OGG, OPUS, WebM
- WMA, Speex, and other common formats
- MP4, AVI, MOV, MKV, WebM
- Audio will be extracted automatically from video files
- YouTube, Vimeo, and other video platforms
- Direct audio/video file URLs
Error Handling
400 Bad Request - Invalid Input
400 Bad Request - Invalid Input
402 Payment Required - Insufficient Credits
402 Payment Required - Insufficient Credits
422 Processing Error - Extraction Failed
422 Processing Error - Extraction Failed
404 Not Found - Job Not Found
404 Not Found - Job Not Found
429 Too Many Requests - Rate Limit
429 Too Many Requests - Rate Limit
Pricing
Speaker extraction costs are based on audio duration:| Service | Cost | Description |
|---|---|---|
| Speaker Extraction | 1650 credits/minute | Generate separate audio files for each speaker |
Cost Examples
| Duration | Service | Credits | USD Cost* |
|---|---|---|---|
| 5 minutes | Extraction | 8,250 | ~$1.10 |
| 15 minutes | Extraction | 24,750 | ~$3.30 |
| 30 minutes | Extraction | 49,500 | ~$6.60 |
| 1 hour | Extraction | 99,000 | ~$13.20 |
Rate Limits
- 100 requests per minute per API key
- Rate limits apply per endpoint
- Exceeding limits returns
429 Too Many Requests
