Documentation Index Fetch the complete documentation index at: https://docs.audiopod.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
AudioPod AI’s Speaker Extraction API automatically separates multiple speakers in audio recordings into individual speaker-specific audio files. The service identifies who speaks when and creates clean, separate audio tracks for each speaker while preserving original audio quality.
Key Features
Speaker Separation : Generate separate audio files for each detected speaker
Timeline Generation : Get detailed RTTM files with speaker timestamps
Speaker Analytics : Duration and quality statistics for each speaker
Multi-Format Support : Process audio and video files (WAV, MP3, M4A, MP4, etc.)
URL Processing : Extract speakers from YouTube and other video platforms
Smart Detection : Automatic speaker detection or specify expected number
Quality Preservation : Maintains original audio quality in extracted files
Authentication
All endpoints require authentication. Use one of these methods:
API Key (Recommended) : X-API-Key: your_api_key header
JWT Token : Authorization: Bearer your_jwt_token (for session-based auth)
Upload an audio or video file to extract individual speaker tracks.
POST /api/v1/speaker/extract
X-API-Key : {api_key}
Content-Type : multipart/form-data
file : (audio/video file)
num_speakers : 4
import requests
with open ( "podcast_episode.mp3" , "rb" ) as audio_file:
response = requests.post(
"https://api.audiopod.ai/api/v1/speaker/extract" ,
headers = { "X-API-Key" : api_key},
data = { "num_speakers" : 4 }, # Optional: specify expected speakers
files = { "file" : audio_file}
)
if response.status_code == 200 :
extraction_job = response.json()
job_id = extraction_job[ "id" ]
print ( f "Speaker extraction job created: { job_id } " )
curl -X POST "https://api.audiopod.ai/api/v1/speaker/extract" \
-H "X-API-Key: your_api_key" \
-F "file=@podcast_episode.mp3" \
-F "num_speakers=4"
Extract speakers from audio/video URLs (YouTube, Vimeo, etc.).
POST /api/v1/speaker/extract
X-API-Key : {api_key}
Content-Type : application/x-www-form-urlencoded
url=https://youtube.com/watch?v=example123&num_speakers=3
response = requests.post(
"https://api.audiopod.ai/api/v1/speaker/extract" ,
headers = { "X-API-Key" : api_key},
data = {
"url" : "https://youtube.com/watch?v=example123" ,
"num_speakers" : 3 # Optional: specify expected speakers
}
)
if response.status_code == 200 :
job_data = response.json()
print ( f "URL extraction started: { job_data[ 'id' ] } " )
curl -X POST "https://api.audiopod.ai/api/v1/speaker/extract" \
-H "X-API-Key: your_api_key" \
-d "url=https://youtube.com/watch?v=example123" \
-d "num_speakers=3"
Response:
{
"id" : 123 ,
"job_type" : "extraction" ,
"status" : "PENDING" ,
"created_at" : "2024-01-15T10:30:00Z" ,
"user_id" : "550e8400-e29b-41d4-a716-446655440000" ,
"task_id" : "celery_task_uuid_here"
}
Job Management
Get Job Status
Monitor the progress of speaker extraction jobs.
GET /api/v1/speaker/jobs/{job_id}
X-API-Key : {api_key}
response = requests.get(
f "https://api.audiopod.ai/api/v1/speaker/jobs/ { job_id } " ,
headers = { "X-API-Key" : api_key}
)
if response.status_code == 200 :
job_status = response.json()
print ( f "Status: { job_status[ 'status' ] } " )
if job_status[ "status" ] == "COMPLETED" :
print ( "Extraction complete!" )
if job_status[ "result" ]:
result = job_status[ "result" ]
print ( f "Extracted { len (result[ 'speakers' ]) } speakers" )
for speaker in result[ 'speakers' ]:
print ( f "- { speaker[ 'label' ] } : { speaker.get( 'download_url' , 'Processing...' ) } " )
Response (Completed Extraction):
{
"id" : 123 ,
"job_type" : "extraction" ,
"status" : "COMPLETED" ,
"created_at" : "2024-01-15T10:30:00Z" ,
"completed_at" : "2024-01-15T10:35:30Z" ,
"user_id" : "550e8400-e29b-41d4-a716-446655440000" ,
"task_id" : "celery_task_uuid_here" ,
"result" : {
"speakers" : [
{
"id" : 0 ,
"label" : "SPEAKER_0" ,
"audio_path" : "processed/123/speaker_0.wav" ,
"download_url" : "https://s3.amazonaws.com/..." ,
"audio_stats" : {
"rms_db" : -12.3 ,
"peak" : 0.85
}
},
{
"id" : 1 ,
"label" : "SPEAKER_1" ,
"audio_path" : "processed/123/speaker_1.wav" ,
"download_url" : "https://s3.amazonaws.com/..." ,
"audio_stats" : {
"rms_db" : -15.7 ,
"peak" : 0.72
}
}
],
"files" : [
{
"type" : "audio" ,
"speaker" : "SPEAKER_0" ,
"path" : "processed/123/speaker_0.wav" ,
"download_url" : "https://s3.amazonaws.com/..."
},
{
"type" : "audio" ,
"speaker" : "SPEAKER_1" ,
"path" : "processed/123/speaker_1.wav" ,
"download_url" : "https://s3.amazonaws.com/..."
},
{
"type" : "rttm" ,
"path" : "processed/123/extraction.rttm" ,
"download_url" : "https://s3.amazonaws.com/..."
}
],
"rttm_path" : "processed/123/extraction.rttm"
}
}
Get all speaker extraction jobs for the authenticated user.
GET /api/v1/speaker/jobs?job_type=extraction&status=COMPLETED&limit=50
X-API-Key : {api_key}
response = requests.get(
"https://api.audiopod.ai/api/v1/speaker/jobs" ,
headers = { "X-API-Key" : api_key},
params = {
"job_type" : "extraction" ,
"status" : "COMPLETED" , # Optional filter
"skip" : 0 ,
"limit" : 50
}
)
if response.status_code == 200 :
jobs_data = response.json()
print ( f "Total jobs: { jobs_data[ 'total' ] } " )
print ( f "Has more: { jobs_data[ 'hasMore' ] } " )
for job in jobs_data[ "items" ]:
print ( f "Job { job[ 'id' ] } : { job[ 'status' ] } - { job.get( 'filename' , 'N/A' ) } " )
if job[ 'status' ] == 'COMPLETED' and job.get( 'outputFiles' ):
print ( f " Output files: { len (job[ 'outputFiles' ]) } " )
Response:
{
"items" : [
{
"id" : 123 ,
"job_type" : "extraction" ,
"status" : "COMPLETED" ,
"created_at" : "2024-01-15T10:30:00Z" ,
"completed_at" : "2024-01-15T10:35:30Z" ,
"user_id" : "550e8400-e29b-41d4-a716-446655440000" ,
"task_id" : "celery_task_uuid_here" ,
"filename" : "podcast_episode.mp3" ,
"display_name" : "podcast_episode.mp3" ,
"outputFiles" : [
{
"type" : "audio" ,
"speaker" : "SPEAKER_0" ,
"path" : "processed/123/speaker_0.wav"
},
{
"type" : "audio" ,
"speaker" : "SPEAKER_1" ,
"path" : "processed/123/speaker_1.wav"
}
]
}
],
"hasMore" : false ,
"total" : 1
}
Retry Failed Job
Retry a failed speaker extraction job.
POST /api/v1/speaker/jobs/{job_id}/retry
X-API-Key : {api_key}
response = requests.post(
f "https://api.audiopod.ai/api/v1/speaker/jobs/ { job_id } /retry" ,
headers = { "X-API-Key" : api_key}
)
if response.status_code == 200 :
retried_job = response.json()
print ( f "Extraction job { retried_job[ 'id' ] } retried successfully" )
print ( f "New task ID: { retried_job[ 'task_id' ] } " )
curl -X POST "https://api.audiopod.ai/api/v1/speaker/jobs/123/retry" \
-H "X-API-Key: $AUDIOPOD_API_KEY "
Response:
{
"id" : 123 ,
"job_type" : "extraction" ,
"status" : "PROCESSING" ,
"created_at" : "2024-01-15T10:30:00Z" ,
"task_id" : "new_celery_task_uuid_here" ,
"user_id" : "550e8400-e29b-41d4-a716-446655440000"
}
Delete Job
Remove a speaker extraction job and its associated files.
DELETE /api/v1/speaker/jobs/{job_id}
X-API-Key : {api_key}
response = requests.delete(
f "https://api.audiopod.ai/api/v1/speaker/jobs/ { job_id } " ,
headers = { "X-API-Key" : api_key}
)
if response.status_code == 204 :
print ( "Extraction job and files deleted successfully" )
elif response.status_code == 404 :
print ( "Job not found or access denied" )
curl -X DELETE "https://api.audiopod.ai/api/v1/speaker/jobs/123" \
-H "X-API-Key: your_api_key"
Response: 204 No Content on successful deletion
Audio Formats:
WAV, MP3, M4A, AAC, FLAC, OGG, OPUS, WebM
WMA, Speex, and other common formats
Video Formats:
MP4, AVI, MOV, MKV, WebM
Audio will be extracted automatically from video files
URL Sources:
YouTube, Vimeo, and other video platforms
Direct audio/video file URLs
Error Handling
400 Bad Request - Invalid Input
402 Payment Required - Insufficient Credits
{
"detail" : "Insufficient credits for processing. Required: 8250, Available: 1000"
}
Causes: Not enough credits for the audio durationSolutions: Purchase additional credits or process shorter audio files
422 Processing Error - Extraction Failed
404 Not Found - Job Not Found
{
"detail" : "Job not found or access denied"
}
Causes: Invalid job ID or trying to access another user’s jobSolutions: Verify job ID and ensure you own the job
429 Too Many Requests - Rate Limit
{
"detail" : "Rate limit exceeded. Try again later."
}
Causes: Exceeded 100 requests per minute limitSolutions: Wait before making additional requests or implement request throttling
Pricing
Speaker extraction costs are based on audio duration:
Service Cost Description Speaker Extraction 1650 credits/minute Generate separate audio files for each speaker
Note: Credits are charged per second of audio (27.5 credits/second)
Cost Examples
Duration Service Credits USD Cost* 5 minutes Extraction 8,250 ~$1.10 15 minutes Extraction 24,750 ~$3.30 30 minutes Extraction 49,500 ~$6.60 1 hour Extraction 99,000 ~$13.20
*USD cost estimates based on standard credit pricing. Actual costs may vary based on subscription plan.
Rate Limits
100 requests per minute per API key
Rate limits apply per endpoint
Exceeding limits returns 429 Too Many Requests
Next Steps
Speech-to-Text Transcribe individual speaker tracks with improved accuracy.
Noise Reduction Clean up audio before speaker extraction for better results.