Documentation Index Fetch the complete documentation index at: https://docs.audiopod.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
AudioPod AI’s Speech-to-Text API converts audio and video content into accurate text transcriptions using advanced AI models including WhisperX and Faster-Whisper. Get detailed transcriptions with speaker diarization, word-level timestamps, and confidence scores.
Key Features
Multi-Model Support : WhisperX, Whisper-Timestamped, Faster-Whisper
Speaker Diarization : Automatic speaker identification and separation
Word-Level Timestamps : Precise timing for each word
Confidence Scores : Quality metrics for transcription accuracy
50+ Languages : Automatic language detection or manual specification
Large File Support : Handle videos up to 15 hours with chunking
Multiple Sources : Upload files or provide YouTube/video URLs
Editable Transcripts : Edit and refine transcription results
Authentication
All endpoints require authentication. Use one of these methods:
API Key (Recommended) : X-API-Key: your_api_key header
JWT Token : Authorization: Bearer your_jwt_token (for session-based auth)
Transcribe from URLs
Transcribe YouTube Videos
Transcribe audio from YouTube or other video platforms.
Python
Node.js
Raw HTTP
cURL
from audiopod import Client
client = Client()
# Simple YouTube transcription
transcription = client.transcription.transcribe_from_url(
url = "https://youtube.com/watch?v=example123" ,
wait_for_completion = True # Wait for result
)
print ( f "Transcription completed!" )
print ( f "Duration: { transcription.duration } s" )
print ( f "Language: { transcription.language } " )
print ( f "Full text: { transcription.text } " )
# Advanced transcription with speaker diarization
advanced_transcription = client.transcription.transcribe_from_url(
url = "https://youtube.com/watch?v=example123" ,
language = "en" , # Optional: auto-detect if not specified
model_type = "whisperx" , # whisperx, whisper_timestamped, faster_whisper
enable_speaker_diarization = True ,
min_speakers = 2 ,
max_speakers = 5 ,
enable_word_timestamps = True ,
enable_confidence_scores = True ,
chunk_duration = 1800 , # 30 minutes per chunk
wait_for_completion = True
)
# Access speaker-separated text
for segment in advanced_transcription.segments:
speaker = segment.get( 'speaker' , 'Unknown' )
text = segment[ 'text' ]
start_time = segment[ 'start' ]
end_time = segment[ 'end' ]
confidence = segment.get( 'confidence' , 0.0 )
print ( f "[ { start_time :.2f} s - { end_time :.2f} s] { speaker } : { text } (confidence: { confidence :.2f} )" )
# Batch processing multiple URLs
urls = [
"https://youtube.com/watch?v=video1" ,
"https://youtube.com/watch?v=video2" ,
"https://vimeo.com/123456789"
]
batch_results = client.transcription.transcribe_batch_from_urls(
urls = urls,
enable_speaker_diarization = True ,
model_type = "whisperx" ,
wait_for_completion = True
)
for i, result in enumerate (batch_results):
print ( f " \n Video { i + 1 } : { urls[i] } " )
print ( f "Status: { result.status } " )
if result.status == "completed" :
print ( f "Text preview: { result.text[: 100 ] } ..." )
const { AudioPodClient } = require ( 'audiopod-js' );
const client = new AudioPodClient ();
async function transcribeYouTubeVideo () {
try {
// Simple transcription
const transcription = await client . transcription . transcribeFromUrl ({
url: "https://youtube.com/watch?v=example123" ,
waitForCompletion: true
});
console . log ( `Transcription completed!` );
console . log ( `Duration: ${ transcription . duration } s` );
console . log ( `Language: ${ transcription . language } ` );
console . log ( `Full text: ${ transcription . text } ` );
// Advanced transcription with speaker diarization
const advancedTranscription = await client . transcription . transcribeFromUrl ({
url: "https://youtube.com/watch?v=example123" ,
language: "en" ,
modelType: "whisperx" ,
enableSpeakerDiarization: true ,
minSpeakers: 2 ,
maxSpeakers: 5 ,
enableWordTimestamps: true ,
enableConfidenceScores: true ,
chunkDuration: 1800 ,
waitForCompletion: true
});
// Process speaker-separated segments
advancedTranscription . segments . forEach ( segment => {
const speaker = segment . speaker || 'Unknown' ;
const text = segment . text ;
const startTime = segment . start ;
const endTime = segment . end ;
const confidence = segment . confidence || 0.0 ;
console . log ( `[ ${ startTime . toFixed ( 2 ) } s - ${ endTime . toFixed ( 2 ) } s] ${ speaker } : ${ text } (confidence: ${ confidence . toFixed ( 2 ) } )` );
});
// Batch processing
const urls = [
"https://youtube.com/watch?v=video1" ,
"https://youtube.com/watch?v=video2" ,
"https://vimeo.com/123456789"
];
const batchResults = await client . transcription . transcribeBatchFromUrls ({
urls: urls ,
enableSpeakerDiarization: true ,
modelType: "whisperx" ,
waitForCompletion: true
});
batchResults . forEach (( result , index ) => {
console . log ( ` \n Video ${ index + 1 } : ${ urls [ index ] } ` );
console . log ( `Status: ${ result . status } ` );
if ( result . status === "completed" ) {
console . log ( `Text preview: ${ result . text . substring ( 0 , 100 ) } ...` );
}
});
} catch ( error ) {
console . error ( 'Transcription error:' , error . message );
}
}
transcribeYouTubeVideo ();
import requests
import time
# Start transcription job
response = requests.post(
"https://api.audiopod.ai/api/v1/transcription/transcribe" ,
headers = { "X-API-Key" : api_key},
json = {
"source_urls" : [
"https://youtube.com/watch?v=example123" ,
"https://vimeo.com/123456789"
],
"language" : "en" , # Optional: auto-detect if not specified
"model_type" : "whisperx" ,
"enable_speaker_diarization" : True ,
"min_speakers" : 2 ,
"max_speakers" : 5 ,
"enable_word_timestamps" : True ,
"enable_confidence_scores" : True ,
"chunk_duration" : 1800 # 30 minutes
}
)
if response.status_code == 200 :
job_data = response.json()
job_id = job_data[ "job_id" ]
print ( f "Transcription job created: { job_id } " )
# Poll for completion
while True :
status_response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/status/ { job_id } " ,
headers = { "X-API-Key" : api_key}
)
if status_response.status_code == 200 :
status_data = status_response.json()
print ( f "Status: { status_data[ 'status' ] } " )
if status_data[ 'status' ] == 'completed' :
# Get final result
result_response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/result/ { job_id } " ,
headers = { "X-API-Key" : api_key}
)
if result_response.status_code == 200 :
result = result_response.json()
print ( f "Transcription text: { result[ 'transcription' ] } " )
# Print speaker segments if diarization was enabled
if 'segments' in result:
for segment in result[ 'segments' ]:
speaker = segment.get( 'speaker' , 'Unknown' )
text = segment[ 'text' ]
start = segment[ 'start' ]
end = segment[ 'end' ]
print ( f "[ { start :.2f} s - { end :.2f} s] { speaker } : { text } " )
break
elif status_data[ 'status' ] == 'failed' :
print ( f "Transcription failed: { status_data.get( 'error' , 'Unknown error' ) } " )
break
time.sleep( 10 ) # Wait 10 seconds before checking again
# Start transcription job
curl -X POST "https://api.audiopod.ai/api/v1/transcription/transcribe" \
-H "X-API-Key: $AUDIOPOD_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"source_urls": ["https://youtube.com/watch?v=example123"],
"language": "en",
"model_type": "whisperx",
"enable_speaker_diarization": true,
"enable_word_timestamps": true,
"enable_confidence_scores": true
}'
# Check job status (replace JOB_ID with actual job ID)
curl -X GET "https://api.audiopod.ai/api/v1/transcription/status/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY "
# Get transcription result when completed
curl -X GET "https://api.audiopod.ai/api/v1/transcription/result/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY "
Response:
{
"job_id" : 123 ,
"task_id" : "celery_task_uuid_here" ,
"status" : "PENDING" ,
"message" : "Transcription job created successfully" ,
"estimated_credits" : 150 ,
"estimated_duration" : 1800.0 ,
"source_urls" : [
"https://youtube.com/watch?v=example123"
]
}
Transcribe from Files
Upload Audio/Video Files
Transcribe from uploaded audio or video files.
Python
Node.js
Raw HTTP
cURL
from audiopod import Client
import os
client = Client()
# Single file transcription
transcription = client.transcription.transcribe_from_file(
audio_file = "meeting_recording.mp3" ,
language = "en" ,
model_type = "whisperx" ,
enable_speaker_diarization = True ,
min_speakers = 2 ,
max_speakers = 8 ,
enable_word_timestamps = True ,
enable_confidence_scores = True ,
wait_for_completion = True
)
print ( f "Transcription completed!" )
print ( f "File: { transcription.source_file } " )
print ( f "Duration: { transcription.duration } s" )
print ( f "Language: { transcription.language } " )
print ( f "Full text: { transcription.text } " )
# Access detailed segments with speakers
for segment in transcription.segments:
speaker = segment.get( 'speaker' , 'Unknown' )
text = segment[ 'text' ]
start_time = segment[ 'start' ]
end_time = segment[ 'end' ]
confidence = segment.get( 'confidence' , 0.0 )
print ( f "[ { start_time :.2f} s - { end_time :.2f} s] { speaker } : { text } " )
# Batch file transcription
audio_files = [
"meeting_recording.mp3" ,
"interview.wav" ,
"presentation.mp4" ,
"podcast_episode.m4a"
]
batch_results = client.transcription.transcribe_batch_from_files(
audio_files = audio_files,
enable_speaker_diarization = True ,
model_type = "whisperx" ,
enable_word_timestamps = True ,
chunk_duration = 1800 , # 30 minutes per chunk
wait_for_completion = True
)
# Process results
for i, result in enumerate (batch_results):
print ( f " \n File { i + 1 } : { audio_files[i] } " )
print ( f "Status: { result.status } " )
if result.status == "completed" :
print ( f "Duration: { result.duration } s" )
print ( f "Language: { result.language } " )
print ( f "Text preview: { result.text[: 100 ] } ..." )
# Save transcription to file
output_file = f "transcript_ { i + 1 } .txt"
with open (output_file, 'w' , encoding = 'utf-8' ) as f:
f.write(result.text)
print ( f "Saved to: { output_file } " )
# Advanced file processing with custom settings
def process_interview_file ( file_path ):
"""Process interview with optimized settings"""
transcription = client.transcription.transcribe_from_file(
audio_file = file_path,
model_type = "whisperx" , # Best for accuracy
enable_speaker_diarization = True ,
min_speakers = 2 , # Interviewer + interviewee
max_speakers = 4 , # Allow for additional participants
enable_word_timestamps = True ,
enable_confidence_scores = True ,
language = "auto" , # Auto-detect language
wait_for_completion = True
)
# Generate formatted transcript
formatted_output = []
current_speaker = None
for segment in transcription.segments:
speaker = segment.get( 'speaker' , 'Unknown' )
text = segment[ 'text' ].strip()
if speaker != current_speaker:
formatted_output.append( f " \n { speaker } :" )
current_speaker = speaker
formatted_output.append( f " { text } " )
return '' .join(formatted_output)
# Process interview
interview_transcript = process_interview_file( "important_interview.wav" )
print ( " \n Formatted Interview Transcript:" )
print (interview_transcript)
const { AudioPodClient } = require ( 'audiopod-js' );
const fs = require ( 'fs' );
const path = require ( 'path' );
const client = new AudioPodClient ();
async function transcribeAudioFiles () {
try {
// Single file transcription
const transcription = await client . transcription . transcribeFromFile ({
audioFile: fs . createReadStream ( 'meeting_recording.mp3' ),
language: "en" ,
modelType: "whisperx" ,
enableSpeakerDiarization: true ,
minSpeakers: 2 ,
maxSpeakers: 8 ,
enableWordTimestamps: true ,
enableConfidenceScores: true ,
waitForCompletion: true
});
console . log ( `Transcription completed!` );
console . log ( `File: ${ transcription . sourceFile } ` );
console . log ( `Duration: ${ transcription . duration } s` );
console . log ( `Language: ${ transcription . language } ` );
console . log ( `Full text: ${ transcription . text } ` );
// Process segments with speakers
transcription . segments . forEach ( segment => {
const speaker = segment . speaker || 'Unknown' ;
const text = segment . text ;
const startTime = segment . start ;
const endTime = segment . end ;
const confidence = segment . confidence || 0.0 ;
console . log ( `[ ${ startTime . toFixed ( 2 ) } s - ${ endTime . toFixed ( 2 ) } s] ${ speaker } : ${ text } ` );
});
// Batch processing
const audioFiles = [
'meeting_recording.mp3' ,
'interview.wav' ,
'presentation.mp4' ,
'podcast_episode.m4a'
];
const fileStreams = audioFiles . map ( file => ({
file: fs . createReadStream ( file ),
name: path . basename ( file )
}));
const batchResults = await client . transcription . transcribeBatchFromFiles ({
audioFiles: fileStreams ,
enableSpeakerDiarization: true ,
modelType: "whisperx" ,
enableWordTimestamps: true ,
chunkDuration: 1800 ,
waitForCompletion: true
});
// Process batch results
batchResults . forEach (( result , index ) => {
console . log ( ` \n File ${ index + 1 } : ${ audioFiles [ index ] } ` );
console . log ( `Status: ${ result . status } ` );
if ( result . status === "completed" ) {
console . log ( `Duration: ${ result . duration } s` );
console . log ( `Language: ${ result . language } ` );
console . log ( `Text preview: ${ result . text . substring ( 0 , 100 ) } ...` );
// Save transcription to file
const outputFile = `transcript_ ${ index + 1 } .txt` ;
fs . writeFileSync ( outputFile , result . text , 'utf8' );
console . log ( `Saved to: ${ outputFile } ` );
}
});
// Advanced processing function for interviews
async function processInterviewFile ( filePath ) {
const transcription = await client . transcription . transcribeFromFile ({
audioFile: fs . createReadStream ( filePath ),
modelType: "whisperx" ,
enableSpeakerDiarization: true ,
minSpeakers: 2 ,
maxSpeakers: 4 ,
enableWordTimestamps: true ,
enableConfidenceScores: true ,
language: "auto" ,
waitForCompletion: true
});
// Format transcript with speaker labels
const formattedOutput = [];
let currentSpeaker = null ;
transcription . segments . forEach ( segment => {
const speaker = segment . speaker || 'Unknown' ;
const text = segment . text . trim ();
if ( speaker !== currentSpeaker ) {
formattedOutput . push ( ` \n ${ speaker } :` );
currentSpeaker = speaker ;
}
formattedOutput . push ( ` ${ text } ` );
});
return formattedOutput . join ( '' );
}
// Process an interview
const interviewTranscript = await processInterviewFile ( 'important_interview.wav' );
console . log ( ' \n Formatted Interview Transcript:' );
console . log ( interviewTranscript );
} catch ( error ) {
console . error ( 'Transcription error:' , error . message );
}
}
transcribeAudioFiles ();
import requests
import os
# Prepare files for upload
files_to_transcribe = [
"meeting_recording.mp3" ,
"interview.wav" ,
"presentation.mp4"
]
files = []
for file_path in files_to_transcribe:
if os.path.exists(file_path):
files.append(( 'files' , open (file_path, 'rb' )))
# Upload and transcribe
response = requests.post(
"https://api.audiopod.ai/api/v1/transcription/transcribe-upload" ,
headers = { "X-API-Key" : api_key},
data = {
"language" : "en" ,
"model_type" : "whisperx" ,
"enable_speaker_diarization" : True ,
"min_speakers" : 1 ,
"max_speakers" : 10 ,
"enable_word_timestamps" : True ,
"enable_confidence_scores" : True ,
"chunk_duration" : 1800
},
files = files
)
# Close file handles
for _, file_obj in files:
file_obj.close()
if response.status_code == 200 :
job_data = response.json()
job_id = job_data[ 'job_id' ]
print ( f "Upload transcription job: { job_id } " )
# Poll for completion
import time
while True :
status_response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } " ,
headers = { "X-API-Key" : api_key}
)
if status_response.status_code == 200 :
status_data = status_response.json()
print ( f "Status: { status_data[ 'status' ] } " )
if status_data[ 'status' ] == 'COMPLETED' :
# Get transcription result
result_response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/transcript/ { job_id } " ,
headers = { "X-API-Key" : api_key}
)
if result_response.status_code == 200 :
transcript_data = result_response.json()
print ( f "Transcription completed!" )
print ( f "Full text: { transcript_data[ 'transcript' ] } " )
# Save to file
with open ( f "transcript_ { job_id } .txt" , 'w' , encoding = 'utf-8' ) as f:
f.write(transcript_data[ 'transcript' ])
print ( f "Saved to transcript_ { job_id } .txt" )
break
elif status_data[ 'status' ] == 'FAILED' :
print ( f "Transcription failed: { status_data.get( 'error' , 'Unknown error' ) } " )
break
time.sleep( 15 ) # Check every 15 seconds
else :
print ( f "Upload failed: { response.status_code } " )
print (response.text)
# Upload files for transcription
curl -X POST "https://api.audiopod.ai/api/v1/transcription/transcribe-upload" \
-H "X-API-Key: $AUDIOPOD_API_KEY " \
-F "files=@meeting_recording.mp3" \
-F "[email protected] " \
-F "language=en" \
-F "model_type=whisperx" \
-F "enable_speaker_diarization=true" \
-F "enable_word_timestamps=true"
# Check job status (replace JOB_ID with actual job ID)
curl -X GET "https://api.audiopod.ai/api/v1/transcription/jobs/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY "
# Get completed transcription
curl -X GET "https://api.audiopod.ai/api/v1/transcription/transcript/JOB_ID" \
-H "X-API-Key: $AUDIOPOD_API_KEY "
Job Management
Get Transcription Status
Check the progress and status of transcription jobs.
GET /api/v1/transcription/jobs/{job_id}
X-API-Key : {api_key}
response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } " ,
headers = { "X-API-Key" : api_key}
)
job_status = response.json()
print ( f "Status: { job_status[ 'status' ] } " )
print ( f "Progress: { job_status[ 'progress' ] } %" )
if job_status[ "status" ] == "COMPLETED" :
print ( f "Transcript ready! Duration: { job_status[ 'total_duration' ] } seconds" )
print ( f "Detected language: { job_status[ 'detected_language' ] } " )
print ( f "Confidence: { job_status[ 'confidence_score' ] } " )
Response (Completed):
{
"id" : 123 ,
"user_id" : "550e8400-e29b-41d4-a716-446655440000" ,
"source_urls" : [ "https://youtube.com/watch?v=example123" ],
"language" : "en" ,
"model_type" : "whisperx" ,
"enable_speaker_diarization" : true ,
"min_speakers" : 2 ,
"max_speakers" : 5 ,
"status" : "COMPLETED" ,
"progress" : 100 ,
"transcript_path" : "/transcripts/job_123.json" ,
"total_duration" : 1847.5 ,
"detected_language" : "en" ,
"confidence_score" : 0.92 ,
"created_at" : "2024-01-15T10:30:00Z" ,
"completed_at" : "2024-01-15T10:45:30Z" ,
"estimated_credits" : 150 ,
"display_name" : "YouTube Video Transcription"
}
List Transcription Jobs
Get all transcription jobs for the authenticated user.
GET /api/v1/transcription/jobs?status=COMPLETED&limit=50&offset=0
X-API-Key : {api_key}
response = requests.get(
"https://api.audiopod.ai/api/v1/transcription/jobs" ,
headers = { "X-API-Key" : api_key},
params = {
"status" : "COMPLETED" , # Optional filter
"limit" : 50 ,
"offset" : 0
}
)
jobs = response.json()
for job in jobs:
print ( f "Job { job[ 'id' ] } : { job[ 'status' ] } - { job[ 'total_duration' ] } s" )
Download Transcripts
Download transcripts in various formats including JSON, TXT, PDF, SRT, VTT, DOCX, and HTML.
GET /api/v1/transcription/jobs/{job_id}/transcript?format=json
X-API-Key : {api_key}
# Download as JSON with full details
response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } /transcript" ,
headers = { "X-API-Key" : api_key},
params = { "format" : "json" }
)
transcript_data = response.json()
# Download as SRT subtitle file
response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } /transcript" ,
headers = { "X-API-Key" : api_key},
params = { "format" : "srt" }
)
with open ( "transcript.srt" , "w" ) as f:
f.write(response.text)
# Download as PDF document
response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } /transcript" ,
headers = { "X-API-Key" : api_key},
params = { "format" : "pdf" }
)
with open ( "transcript.pdf" , "wb" ) as f:
f.write(response.content)
JSON Response Format:
{
"job_id" : 123 ,
"detected_language" : "en" ,
"confidence_score" : 0.92 ,
"total_duration" : 1847.5 ,
"segments" : [
{
"id" : 1 ,
"start" : 0.0 ,
"end" : 4.5 ,
"text" : "Welcome to our podcast about artificial intelligence." ,
"confidence" : 0.95 ,
"speaker_id" : 0 ,
"speaker_label" : "SPEAKER_00" ,
"words" : [
{
"word" : "Welcome" ,
"start" : 0.0 ,
"end" : 0.8 ,
"probability" : 0.98
},
{
"word" : "to" ,
"start" : 0.8 ,
"end" : 1.0 ,
"probability" : 0.99
}
]
},
{
"id" : 2 ,
"start" : 5.0 ,
"end" : 8.2 ,
"text" : "Thank you for having me on the show." ,
"confidence" : 0.89 ,
"speaker_id" : 1 ,
"speaker_label" : "SPEAKER_01" ,
"words" : [ ... ]
}
],
"speakers" : [
{
"id" : 0 ,
"label" : "SPEAKER_00" ,
"total_speaking_time" : 920.3
},
{
"id" : 1 ,
"label" : "SPEAKER_01" ,
"total_speaking_time" : 827.2
}
],
"video_metadata" : [
{
"video_id" : "example123" ,
"title" : "AI Technology Discussion" ,
"description" : "A deep dive into AI technology trends" ,
"duration" : 1847.5 ,
"uploader" : "Tech Channel" ,
"upload_date" : "20240115"
}
]
}
Edit Transcripts
Get Editable Transcript
Retrieve transcript in editable format for corrections.
GET /api/v1/transcription/jobs/{job_id}/edit
X-API-Key : {api_key}
response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } /edit" ,
headers = { "X-API-Key" : api_key}
)
editable_transcript = response.json()
print ( f "Found { len (editable_transcript[ 'segments' ]) } segments to edit" )
Update Transcript
Submit edited transcript with corrections.
PUT /api/v1/transcription/jobs/{job_id}/edit
X-API-Key : {api_key}
Content-Type : application/json
{
"segments" : [
{
"id" : 1 ,
"start" : 0.0 ,
"end" : 4.5 ,
"text" : "Welcome to our podcast about artificial intelligence." ,
"speaker_label" : "SPEAKER_00" ,
"confidence" : 0.95
}
],
"edit_notes" : "Corrected technical terms and speaker labels"
}
# Get current transcript
response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } /edit" ,
headers = { "X-API-Key" : api_key}
)
transcript = response.json()
segments = transcript[ "segments" ]
# Make edits
segments[ 0 ][ "text" ] = "Welcome to our podcast about artificial intelligence."
segments[ 0 ][ "speaker_label" ] = "Host"
# Submit updates
response = requests.put(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } /edit" ,
headers = { "X-API-Key" : api_key},
json = {
"segments" : segments,
"edit_notes" : "Corrected speaker labels and technical terms"
}
)
if response.status_code == 200 :
update_info = response.json()
print ( f "Updated { update_info[ 'changes_count' ] } segments" )
Get Transcript Versions
View edit history and versions of transcripts.
GET /api/v1/transcription/jobs/{job_id}/versions
X-API-Key : {api_key}
response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } /versions" ,
headers = { "X-API-Key" : api_key}
)
versions = response.json()
for version in versions[ "versions" ]:
print ( f "Version { version[ 'version' ] } : { version[ 'edit_notes' ] } " )
print ( f " Updated: { version[ 'updated_at' ] } " )
print ( f " Changes: { version[ 'changes_count' ] } " )
Get clean audio files extracted from videos during transcription.
GET /api/v1/transcription/jobs/{job_id}/audio/{audio_index}
X-API-Key : {api_key}
# Download first audio file (index 0)
response = requests.get(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } /audio/0" ,
headers = { "X-API-Key" : api_key}
)
if response.status_code == 200 :
with open ( "extracted_audio.wav" , "wb" ) as f:
f.write(response.content)
print ( "Audio file downloaded" )
Delete Jobs
Delete Transcription Job
Remove transcription jobs and associated data.
DELETE /api/v1/transcription/jobs/{job_id}
X-API-Key : {api_key}
response = requests.delete(
f "https://api.audiopod.ai/api/v1/transcription/jobs/ { job_id } " ,
headers = { "X-API-Key" : api_key}
)
if response.status_code == 204 :
print ( "Transcription job deleted successfully" )
Supported Languages
AudioPod AI supports automatic language detection or manual specification for 50+ languages:
Language Code Quality Notes English enExcellent Best supported language Spanish esExcellent High accuracy French frExcellent Good speaker diarization German deExcellent Technical content support Portuguese ptVery Good Brazilian and European Italian itVery Good Good word timestamps Russian ruVery Good Cyrillic text support Japanese jaGood Hiragana/Katakana/Kanji Chinese zhGood Simplified and Traditional Arabic arGood RTL text support Hindi hiGood Devanagari script Korean koGood Hangul script
Model Comparison
Choose the best model for your use case:
Model Speed Accuracy Speaker Diarization Best For whisperx Medium Highest Excellent Production transcription faster-whisper Fastest High Good Real-time applications whisper-timestamped Slow High Good Detailed analysis
Best Practices
Audio Quality Guidelines
For best transcription results:
# Recommended audio specifications
audio_requirements = {
"sample_rate" : "16kHz or higher" ,
"format" : "WAV, MP3, M4A, or video formats" ,
"duration" : "Up to 15 hours supported" ,
"background_noise" : "Minimize for better accuracy" ,
"speech_clarity" : "Clear articulation preferred" ,
"multiple_speakers" : "Distinct voices work best"
}
# Chunking for long content
chunking_strategy = {
"chunk_duration" : 1800 , # 30 minutes per chunk
"overlap" : 30 , # 30 seconds overlap
"boundary_detection" : "sentence_level" # Smart chunk boundaries
}
Cost Optimization
# Efficient transcription workflow
def transcribe_efficiently ( audio_files , language = "auto" ):
# Use appropriate model based on needs
model_choice = {
"speed_priority" : "faster-whisper" ,
"accuracy_priority" : "whisperx" ,
"analysis_priority" : "whisper-timestamped"
}
# Batch similar files together
batch_files = group_by_language_and_type(audio_files)
for batch in batch_files:
job = create_transcription_job(
files = batch,
language = language,
model_type = model_choice[ "accuracy_priority" ],
enable_speaker_diarization = True ,
chunk_duration = 1800 # Optimal chunk size
)
monitor_job_progress(job[ "job_id" ])
Error Handling
400 Bad Request - Invalid Audio
Causes: - Unsupported audio format - Corrupted audio file - Audio too short
Solutions: - Use supported formats (WAV, MP3, M4A, MP4) - Verify file integrity -
Ensure minimum 10 seconds audio
Causes: - File size exceeds limits - Too many files in single request
Solutions: - Split large files into smaller chunks - Reduce number of files per request -
Use URL transcription for large videos
Causes: - Audio has no speech content - Extremely poor audio quality
Solutions: - Verify audio contains speech - Improve audio quality -
Try different transcription model
Pricing
Transcription pricing is based on audio duration:
Service Cost Description Basic Transcription 660 credits/minute Text-only transcription With Speaker Diarization 660 credits/minute Speaker identification included With Word Timestamps 660 credits/minute Word-level timing data Transcript Editing Free No additional cost for edits
Cost Examples
Duration Features Credits USD Cost 10 minutes Basic transcription 6600 $0.88 30 minutes With speakers + timestamps 19800 $2.64 1 hour Full features 39600 $5.28 2 hours Full features 79200 $10.56
Next Steps
Speaker Separation Identify and separate individual speakers from audio.
Noise Reduction Clean up audio for better transcription accuracy.