> ## Documentation Index
> Fetch the complete documentation index at: https://docs.audiopod.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech-to-Text

> Convert audio and video to accurate text transcriptions with speaker identification, timestamps, and multi-language support.

## Overview

AudioPod AI's Speech-to-Text API converts audio and video content into accurate text transcriptions using AudioPod's transcription engine. Get detailed transcriptions with speaker diarization, word-level timestamps, and confidence scores.

### Key Features

* **Multiple accuracy / speed tiers**: choose the engine variant that matches your latency and quality needs
* **Speaker Diarization**: Automatic speaker identification and separation
* **Word-Level Timestamps**: Precise timing for each word
* **Confidence Scores**: Quality metrics for transcription accuracy
* **50+ Languages**: Automatic language detection or manual specification
* **Large File Support**: Handle videos up to 15 hours with chunking
* **Multiple Sources**: Upload files or provide YouTube/video URLs
* **Editable Transcripts**: Edit and refine transcription results

## Authentication

All endpoints require authentication. Use one of these methods:

* **API Key (Recommended)**: `X-API-Key: your_api_key` header
* **JWT Token**: `Authorization: Bearer your_jwt_token` (for session-based auth)

## Transcribe from URLs

### Transcribe YouTube Videos

Transcribe audio from YouTube or other video platforms.

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from audiopod import Client

    client = Client()

    # Simple YouTube transcription
    transcription = client.transcription.transcribe_from_url(
        url="https://youtube.com/watch?v=example123",
        wait_for_completion=True  # Wait for result
    )

    print(f"Transcription completed!")
    print(f"Duration: {transcription.duration}s")
    print(f"Language: {transcription.language}")
    print(f"Full text: {transcription.text}")

    # Advanced transcription with speaker diarization
    advanced_transcription = client.transcription.transcribe_from_url(
        url="https://youtube.com/watch?v=example123",
        language="en",  # Optional: auto-detect if not specified
        model_type="premium",  # "premium" / "premium-diarized"; omit for Standard accuracy
        enable_speaker_diarization=True,
        min_speakers=2,
        max_speakers=5,
        enable_word_timestamps=True,
        enable_confidence_scores=True,
        chunk_duration=1800,  # 30 minutes per chunk
        wait_for_completion=True
    )

    # Access speaker-separated text
    for segment in advanced_transcription.segments:
        speaker = segment.get('speaker', 'Unknown')
        text = segment['text']
        start_time = segment['start']
        end_time = segment['end']
        confidence = segment.get('confidence', 0.0)
        
        print(f"[{start_time:.2f}s - {end_time:.2f}s] {speaker}: {text} (confidence: {confidence:.2f})")

    # Batch processing multiple URLs
    urls = [
        "https://youtube.com/watch?v=video1",
        "https://youtube.com/watch?v=video2",
        "https://vimeo.com/123456789"
    ]

    batch_results = client.transcription.transcribe_batch_from_urls(
        urls=urls,
        enable_speaker_diarization=True,
        model_type="premium",
        wait_for_completion=True
    )

    for i, result in enumerate(batch_results):
        print(f"\nVideo {i+1}: {urls[i]}")
        print(f"Status: {result.status}")
        if result.status == "completed":
            print(f"Text preview: {result.text[:100]}...")
    ```
  </Tab>

  <Tab title="Node.js">
    ```javascript theme={null}
    const { AudioPodClient } = require('audiopod-js');

    const client = new AudioPodClient();

    async function transcribeYouTubeVideo() {
      try {
        // Simple transcription
        const transcription = await client.transcription.transcribeFromUrl({
          url: "https://youtube.com/watch?v=example123",
          waitForCompletion: true
        });

        console.log(`Transcription completed!`);
        console.log(`Duration: ${transcription.duration}s`);
        console.log(`Language: ${transcription.language}`);
        console.log(`Full text: ${transcription.text}`);

        // Advanced transcription with speaker diarization
        const advancedTranscription = await client.transcription.transcribeFromUrl({
          url: "https://youtube.com/watch?v=example123",
          language: "en",
          modelType: "premium",
          enableSpeakerDiarization: true,
          minSpeakers: 2,
          maxSpeakers: 5,
          enableWordTimestamps: true,
          enableConfidenceScores: true,
          chunkDuration: 1800,
          waitForCompletion: true
        });

        // Process speaker-separated segments
        advancedTranscription.segments.forEach(segment => {
          const speaker = segment.speaker || 'Unknown';
          const text = segment.text;
          const startTime = segment.start;
          const endTime = segment.end;
          const confidence = segment.confidence || 0.0;
          
          console.log(`[${startTime.toFixed(2)}s - ${endTime.toFixed(2)}s] ${speaker}: ${text} (confidence: ${confidence.toFixed(2)})`);
        });

        // Batch processing
        const urls = [
          "https://youtube.com/watch?v=video1",
          "https://youtube.com/watch?v=video2",
          "https://vimeo.com/123456789"
        ];

        const batchResults = await client.transcription.transcribeBatchFromUrls({
          urls: urls,
          enableSpeakerDiarization: true,
          modelType: "premium",
          waitForCompletion: true
        });

        batchResults.forEach((result, index) => {
          console.log(`\nVideo ${index + 1}: ${urls[index]}`);
          console.log(`Status: ${result.status}`);
          if (result.status === "completed") {
            console.log(`Text preview: ${result.text.substring(0, 100)}...`);
          }
        });

      } catch (error) {
        console.error('Transcription error:', error.message);
      }
    }

    transcribeYouTubeVideo();
    ```
  </Tab>

  <Tab title="Raw HTTP">
    ```python theme={null}
    import requests
    import time

    # Start transcription job
    response = requests.post(
        "https://api.audiopod.ai/api/v1/transcription/transcribe",
        headers={"X-API-Key": api_key},
        json={
            "source_urls": [
                "https://youtube.com/watch?v=example123",
                "https://vimeo.com/123456789"
            ],
            "language": "en",  # Optional: auto-detect if not specified
            "model_type": "premium",
            "enable_speaker_diarization": True,
            "min_speakers": 2,
            "max_speakers": 5,
            "enable_word_timestamps": True,
            "enable_confidence_scores": True,
            "chunk_duration": 1800  # 30 minutes
        }
    )

    if response.status_code == 200:
        job_data = response.json()
        job_id = job_data["job_id"]
        print(f"Transcription job created: {job_id}")
        
        # Poll for completion
        while True:
            status_response = requests.get(
                f"https://api.audiopod.ai/api/v1/transcription/status/{job_id}",
                headers={"X-API-Key": api_key}
            )
            
            if status_response.status_code == 200:
                status_data = status_response.json()
                print(f"Status: {status_data['status']}")
                
                if status_data['status'] == 'completed':
                    # Get final result
                    result_response = requests.get(
                        f"https://api.audiopod.ai/api/v1/transcription/result/{job_id}",
                        headers={"X-API-Key": api_key}
                    )
                    
                    if result_response.status_code == 200:
                        result = result_response.json()
                        print(f"Transcription text: {result['transcription']}")
                        
                        # Print speaker segments if diarization was enabled
                        if 'segments' in result:
                            for segment in result['segments']:
                                speaker = segment.get('speaker', 'Unknown')
                                text = segment['text']
                                start = segment['start']
                                end = segment['end']
                                print(f"[{start:.2f}s - {end:.2f}s] {speaker}: {text}")
                    break
                elif status_data['status'] == 'failed':
                    print(f"Transcription failed: {status_data.get('error', 'Unknown error')}")
                    break
                    
            time.sleep(10)  # Wait 10 seconds before checking again
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    # Start transcription job
    curl -X POST "https://api.audiopod.ai/api/v1/transcription/transcribe" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "source_urls": ["https://youtube.com/watch?v=example123"],
        "language": "en",
        "model_type": "premium",
        "enable_speaker_diarization": true,
        "enable_word_timestamps": true,
        "enable_confidence_scores": true
      }'

    # Check job status (replace JOB_ID with actual job ID)
    curl -X GET "https://api.audiopod.ai/api/v1/transcription/status/JOB_ID" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"

    # Get transcription result when completed
    curl -X GET "https://api.audiopod.ai/api/v1/transcription/result/JOB_ID" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    ```
  </Tab>
</Tabs>

**Response:**

```json theme={null}
{
  "job_id": 123,
  "task_id": "celery_task_uuid_here",
  "status": "PENDING",
  "message": "Transcription job created successfully",
  "estimated_credits": 150,
  "estimated_duration": 1800.0,
  "source_urls": [
    "https://youtube.com/watch?v=example123"
  ]
}
```

## Transcribe from Files

### Upload Audio/Video Files

Transcribe from uploaded audio or video files.

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    from audiopod import Client
    import os

    client = Client()

    # Single file transcription
    transcription = client.transcription.transcribe_from_file(
        audio_file="meeting_recording.mp3",
        language="en",
        model_type="premium",
        enable_speaker_diarization=True,
        min_speakers=2,
        max_speakers=8,
        enable_word_timestamps=True,
        enable_confidence_scores=True,
        wait_for_completion=True
    )

    print(f"Transcription completed!")
    print(f"File: {transcription.source_file}")
    print(f"Duration: {transcription.duration}s")
    print(f"Language: {transcription.language}")
    print(f"Full text: {transcription.text}")

    # Access detailed segments with speakers
    for segment in transcription.segments:
        speaker = segment.get('speaker', 'Unknown')
        text = segment['text']
        start_time = segment['start']
        end_time = segment['end']
        confidence = segment.get('confidence', 0.0)
        
        print(f"[{start_time:.2f}s - {end_time:.2f}s] {speaker}: {text}")

    # Batch file transcription
    audio_files = [
        "meeting_recording.mp3",
        "interview.wav", 
        "presentation.mp4",
        "podcast_episode.m4a"
    ]

    batch_results = client.transcription.transcribe_batch_from_files(
        audio_files=audio_files,
        enable_speaker_diarization=True,
        model_type="premium",
        enable_word_timestamps=True,
        chunk_duration=1800,  # 30 minutes per chunk
        wait_for_completion=True
    )

    # Process results
    for i, result in enumerate(batch_results):
        print(f"\nFile {i+1}: {audio_files[i]}")
        print(f"Status: {result.status}")
        if result.status == "completed":
            print(f"Duration: {result.duration}s")
            print(f"Language: {result.language}")
            print(f"Text preview: {result.text[:100]}...")
            
            # Save transcription to file
            output_file = f"transcript_{i+1}.txt"
            with open(output_file, 'w', encoding='utf-8') as f:
                f.write(result.text)
            print(f"Saved to: {output_file}")

    # Advanced file processing with custom settings
    def process_interview_file(file_path):
        """Process interview with optimized settings"""
        
        transcription = client.transcription.transcribe_from_file(
            audio_file=file_path,
            model_type="premium",  # Premium accuracy
            enable_speaker_diarization=True,
            min_speakers=2,  # Interviewer + interviewee
            max_speakers=4,  # Allow for additional participants
            enable_word_timestamps=True,
            enable_confidence_scores=True,
            language="auto",  # Auto-detect language
            wait_for_completion=True
        )
        
        # Generate formatted transcript
        formatted_output = []
        current_speaker = None
        
        for segment in transcription.segments:
            speaker = segment.get('speaker', 'Unknown')
            text = segment['text'].strip()
            
            if speaker != current_speaker:
                formatted_output.append(f"\n{speaker}:")
                current_speaker = speaker
                
            formatted_output.append(f" {text}")
        
        return ''.join(formatted_output)

    # Process interview
    interview_transcript = process_interview_file("important_interview.wav")
    print("\nFormatted Interview Transcript:")
    print(interview_transcript)
    ```
  </Tab>

  <Tab title="Node.js">
    ```javascript theme={null}
    const { AudioPodClient } = require('audiopod-js');
    const fs = require('fs');
    const path = require('path');

    const client = new AudioPodClient();

    async function transcribeAudioFiles() {
      try {
        // Single file transcription
        const transcription = await client.transcription.transcribeFromFile({
          audioFile: fs.createReadStream('meeting_recording.mp3'),
          language: "en",
          modelType: "premium",
          enableSpeakerDiarization: true,
          minSpeakers: 2,
          maxSpeakers: 8,
          enableWordTimestamps: true,
          enableConfidenceScores: true,
          waitForCompletion: true
        });

        console.log(`Transcription completed!`);
        console.log(`File: ${transcription.sourceFile}`);
        console.log(`Duration: ${transcription.duration}s`);
        console.log(`Language: ${transcription.language}`);
        console.log(`Full text: ${transcription.text}`);

        // Process segments with speakers
        transcription.segments.forEach(segment => {
          const speaker = segment.speaker || 'Unknown';
          const text = segment.text;
          const startTime = segment.start;
          const endTime = segment.end;
          const confidence = segment.confidence || 0.0;
          
          console.log(`[${startTime.toFixed(2)}s - ${endTime.toFixed(2)}s] ${speaker}: ${text}`);
        });

        // Batch processing
        const audioFiles = [
          'meeting_recording.mp3',
          'interview.wav', 
          'presentation.mp4',
          'podcast_episode.m4a'
        ];

        const fileStreams = audioFiles.map(file => ({
          file: fs.createReadStream(file),
          name: path.basename(file)
        }));

        const batchResults = await client.transcription.transcribeBatchFromFiles({
          audioFiles: fileStreams,
          enableSpeakerDiarization: true,
          modelType: "premium",
          enableWordTimestamps: true,
          chunkDuration: 1800,
          waitForCompletion: true
        });

        // Process batch results
        batchResults.forEach((result, index) => {
          console.log(`\nFile ${index + 1}: ${audioFiles[index]}`);
          console.log(`Status: ${result.status}`);
          if (result.status === "completed") {
            console.log(`Duration: ${result.duration}s`);
            console.log(`Language: ${result.language}`);
            console.log(`Text preview: ${result.text.substring(0, 100)}...`);
            
            // Save transcription to file
            const outputFile = `transcript_${index + 1}.txt`;
            fs.writeFileSync(outputFile, result.text, 'utf8');
            console.log(`Saved to: ${outputFile}`);
          }
        });

        // Advanced processing function for interviews
        async function processInterviewFile(filePath) {
          const transcription = await client.transcription.transcribeFromFile({
            audioFile: fs.createReadStream(filePath),
            modelType: "premium",
            enableSpeakerDiarization: true,
            minSpeakers: 2,
            maxSpeakers: 4,
            enableWordTimestamps: true,
            enableConfidenceScores: true,
            language: "auto",
            waitForCompletion: true
          });
          
          // Format transcript with speaker labels
          const formattedOutput = [];
          let currentSpeaker = null;
          
          transcription.segments.forEach(segment => {
            const speaker = segment.speaker || 'Unknown';
            const text = segment.text.trim();
            
            if (speaker !== currentSpeaker) {
              formattedOutput.push(`\n${speaker}:`);
              currentSpeaker = speaker;
            }
            
            formattedOutput.push(` ${text}`);
          });
          
          return formattedOutput.join('');
        }

        // Process an interview
        const interviewTranscript = await processInterviewFile('important_interview.wav');
        console.log('\nFormatted Interview Transcript:');
        console.log(interviewTranscript);

      } catch (error) {
        console.error('Transcription error:', error.message);
      }
    }

    transcribeAudioFiles();
    ```
  </Tab>

  <Tab title="Raw HTTP">
    ```python theme={null}
    import requests
    import os

    # Prepare files for upload
    files_to_transcribe = [
        "meeting_recording.mp3",
        "interview.wav",
        "presentation.mp4"
    ]

    files = []
    for file_path in files_to_transcribe:
        if os.path.exists(file_path):
            files.append(('files', open(file_path, 'rb')))

    # Upload and transcribe
    response = requests.post(
        "https://api.audiopod.ai/api/v1/transcription/transcribe-upload",
        headers={"X-API-Key": api_key},
        data={
            "language": "en",
            "model_type": "premium",
            "enable_speaker_diarization": True,
            "min_speakers": 1,
            "max_speakers": 10,
            "enable_word_timestamps": True,
            "enable_confidence_scores": True,
            "chunk_duration": 1800
        },
        files=files
    )

    # Close file handles
    for _, file_obj in files:
        file_obj.close()

    if response.status_code == 200:
        job_data = response.json()
        job_id = job_data['job_id']
        print(f"Upload transcription job: {job_id}")
        
        # Poll for completion
        import time
        while True:
            status_response = requests.get(
                f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}",
                headers={"X-API-Key": api_key}
            )
            
            if status_response.status_code == 200:
                status_data = status_response.json()
                print(f"Status: {status_data['status']}")
                
                if status_data['status'] == 'COMPLETED':
                    # Get transcription result
                    result_response = requests.get(
                        f"https://api.audiopod.ai/api/v1/transcription/transcript/{job_id}",
                        headers={"X-API-Key": api_key}
                    )
                    
                    if result_response.status_code == 200:
                        transcript_data = result_response.json()
                        print(f"Transcription completed!")
                        print(f"Full text: {transcript_data['transcript']}")
                        
                        # Save to file
                        with open(f"transcript_{job_id}.txt", 'w', encoding='utf-8') as f:
                            f.write(transcript_data['transcript'])
                        print(f"Saved to transcript_{job_id}.txt")
                    break
                elif status_data['status'] == 'FAILED':
                    print(f"Transcription failed: {status_data.get('error', 'Unknown error')}")
                    break
                    
            time.sleep(15)  # Check every 15 seconds
    else:
        print(f"Upload failed: {response.status_code}")
        print(response.text)
    ```
  </Tab>

  <Tab title="cURL">
    ```bash theme={null}
    # Upload files for transcription
    curl -X POST "https://api.audiopod.ai/api/v1/transcription/transcribe-upload" \
      -H "X-API-Key: $AUDIOPOD_API_KEY" \
      -F "files=@meeting_recording.mp3" \
      -F "files=@interview.wav" \
      -F "language=en" \
      -F "model_type=premium" \
      -F "enable_speaker_diarization=true" \
      -F "enable_word_timestamps=true"

    # Check job status (replace JOB_ID with actual job ID)
    curl -X GET "https://api.audiopod.ai/api/v1/transcription/jobs/JOB_ID" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"

    # Get completed transcription
    curl -X GET "https://api.audiopod.ai/api/v1/transcription/transcript/JOB_ID" \
      -H "X-API-Key: $AUDIOPOD_API_KEY"
    ```
  </Tab>
</Tabs>

## Job Management

### Get Transcription Status

Check the progress and status of transcription jobs.

<Tabs>
  <Tab title="GET">
    ```http theme={null}
    GET /api/v1/transcription/jobs/{job_id}
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    response = requests.get(
        f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}",
        headers={"X-API-Key": api_key}
    )

    job_status = response.json()
    print(f"Status: {job_status['status']}")
    print(f"Progress: {job_status['progress']}%")

    if job_status["status"] == "COMPLETED":
        print(f"Transcript ready! Duration: {job_status['total_duration']} seconds")
        print(f"Detected language: {job_status['detected_language']}")
        print(f"Confidence: {job_status['confidence_score']}")
    ```
  </Tab>
</Tabs>

**Response (Completed):**

```json theme={null}
{
  "id": 123,
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "source_urls": ["https://youtube.com/watch?v=example123"],
  "language": "en",
  "model_type": "premium",
  "enable_speaker_diarization": true,
  "min_speakers": 2,
  "max_speakers": 5,
  "status": "COMPLETED",
  "progress": 100,
  "transcript_path": "/transcripts/job_123.json",
  "total_duration": 1847.5,
  "detected_language": "en",
  "confidence_score": 0.92,
  "created_at": "2024-01-15T10:30:00Z",
  "completed_at": "2024-01-15T10:45:30Z",
  "estimated_credits": 150,
  "display_name": "YouTube Video Transcription"
}
```

### List Transcription Jobs

Get all transcription jobs for the authenticated user.

<Tabs>
  <Tab title="GET">
    ```http theme={null}
    GET /api/v1/transcription/jobs?status=COMPLETED&limit=50&offset=0
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    response = requests.get(
        "https://api.audiopod.ai/api/v1/transcription/jobs",
        headers={"X-API-Key": api_key},
        params={
            "status": "COMPLETED",  # Optional filter
            "limit": 50,
            "offset": 0
        }
    )

    jobs = response.json()
    for job in jobs:
        print(f"Job {job['id']}: {job['status']} - {job['total_duration']}s")
    ```
  </Tab>
</Tabs>

## Download Transcripts

### Get Transcript in Multiple Formats

Download transcripts in various formats including JSON, TXT, PDF, SRT, VTT, DOCX, and HTML.

<Tabs>
  <Tab title="GET">
    ```http theme={null}
    GET /api/v1/transcription/jobs/{job_id}/transcript?format=json
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Download as JSON with full details
    response = requests.get(
        f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}/transcript",
        headers={"X-API-Key": api_key},
        params={"format": "json"}
    )

    transcript_data = response.json()

    # Download as SRT subtitle file
    response = requests.get(
        f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}/transcript",
        headers={"X-API-Key": api_key},
        params={"format": "srt"}
    )

    with open("transcript.srt", "w") as f:
        f.write(response.text)

    # Download as PDF document
    response = requests.get(
        f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}/transcript",
        headers={"X-API-Key": api_key},
        params={"format": "pdf"}
    )

    with open("transcript.pdf", "wb") as f:
        f.write(response.content)
    ```
  </Tab>
</Tabs>

**JSON Response Format:**

```json theme={null}
{
  "job_id": 123,
  "detected_language": "en",
  "confidence_score": 0.92,
  "total_duration": 1847.5,
  "segments": [
    {
      "id": 1,
      "start": 0.0,
      "end": 4.5,
      "text": "Welcome to our podcast about artificial intelligence.",
      "confidence": 0.95,
      "speaker_id": 0,
      "speaker_label": "SPEAKER_00",
      "words": [
        {
          "word": "Welcome",
          "start": 0.0,
          "end": 0.8,
          "probability": 0.98
        },
        {
          "word": "to",
          "start": 0.8,
          "end": 1.0,
          "probability": 0.99
        }
      ]
    },
    {
      "id": 2,
      "start": 5.0,
      "end": 8.2,
      "text": "Thank you for having me on the show.",
      "confidence": 0.89,
      "speaker_id": 1,
      "speaker_label": "SPEAKER_01",
      "words": [...]
    }
  ],
  "speakers": [
    {
      "id": 0,
      "label": "SPEAKER_00",
      "total_speaking_time": 920.3
    },
    {
      "id": 1,
      "label": "SPEAKER_01", 
      "total_speaking_time": 827.2
    }
  ],
  "video_metadata": [
    {
      "video_id": "example123",
      "title": "AI Technology Discussion",
      "description": "A deep dive into AI technology trends",
      "duration": 1847.5,
      "uploader": "Tech Channel",
      "upload_date": "20240115"
    }
  ]
}
```

## Edit Transcripts

### Get Editable Transcript

Retrieve transcript in editable format for corrections.

<Tabs>
  <Tab title="GET">
    ```http theme={null}
    GET /api/v1/transcription/jobs/{job_id}/edit
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    response = requests.get(
        f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}/edit",
        headers={"X-API-Key": api_key}
    )

    editable_transcript = response.json()
    print(f"Found {len(editable_transcript['segments'])} segments to edit")
    ```
  </Tab>
</Tabs>

### Update Transcript

Submit edited transcript with corrections.

<Tabs>
  <Tab title="PUT">
    ```http theme={null}
    PUT /api/v1/transcription/jobs/{job_id}/edit
    X-API-Key: {api_key}
    Content-Type: application/json

    {
      "segments": [
        {
          "id": 1,
          "start": 0.0,
          "end": 4.5,
          "text": "Welcome to our podcast about artificial intelligence.",
          "speaker_label": "SPEAKER_00",
          "confidence": 0.95
        }
      ],
      "edit_notes": "Corrected technical terms and speaker labels"
    }
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Get current transcript
    response = requests.get(
        f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}/edit",
        headers={"X-API-Key": api_key}
    )

    transcript = response.json()
    segments = transcript["segments"]

    # Make edits
    segments[0]["text"] = "Welcome to our podcast about artificial intelligence."
    segments[0]["speaker_label"] = "Host"

    # Submit updates
    response = requests.put(
        f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}/edit",
        headers={"X-API-Key": api_key},
        json={
            "segments": segments,
            "edit_notes": "Corrected speaker labels and technical terms"
        }
    )

    if response.status_code == 200:
        update_info = response.json()
        print(f"Updated {update_info['changes_count']} segments")
    ```
  </Tab>
</Tabs>

### Get Transcript Versions

View edit history and versions of transcripts.

<Tabs>
  <Tab title="GET">
    ```http theme={null}
    GET /api/v1/transcription/jobs/{job_id}/versions
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    response = requests.get(
        f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}/versions",
        headers={"X-API-Key": api_key}
    )

    versions = response.json()
    for version in versions["versions"]:
        print(f"Version {version['version']}: {version['edit_notes']}")
        print(f"  Updated: {version['updated_at']}")
        print(f"  Changes: {version['changes_count']}")
    ```
  </Tab>
</Tabs>

## Extract Audio

### Download Extracted Audio

Get clean audio files extracted from videos during transcription.

<Tabs>
  <Tab title="GET">
    ```http theme={null}
    GET /api/v1/transcription/jobs/{job_id}/audio/{audio_index}
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # Download first audio file (index 0)
    response = requests.get(
        f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}/audio/0",
        headers={"X-API-Key": api_key}
    )

    if response.status_code == 200:
        with open("extracted_audio.wav", "wb") as f:
            f.write(response.content)
        print("Audio file downloaded")
    ```
  </Tab>
</Tabs>

## Delete Jobs

### Delete Transcription Job

Remove transcription jobs and associated data.

<Tabs>
  <Tab title="DELETE">
    ```http theme={null}
    DELETE /api/v1/transcription/jobs/{job_id}
    X-API-Key: {api_key}
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    response = requests.delete(
        f"https://api.audiopod.ai/api/v1/transcription/jobs/{job_id}",
        headers={"X-API-Key": api_key}
    )

    if response.status_code == 204:
        print("Transcription job deleted successfully")
    ```
  </Tab>
</Tabs>

## Supported Languages

AudioPod AI supports automatic language detection or manual specification for 50+ languages:

| Language   | Code | Quality   | Notes                      |
| ---------- | ---- | --------- | -------------------------- |
| English    | `en` | Excellent | Best supported language    |
| Spanish    | `es` | Excellent | High accuracy              |
| French     | `fr` | Excellent | Good speaker diarization   |
| German     | `de` | Excellent | Technical content support  |
| Portuguese | `pt` | Very Good | Brazilian and European     |
| Italian    | `it` | Very Good | Good word timestamps       |
| Russian    | `ru` | Very Good | Cyrillic text support      |
| Japanese   | `ja` | Good      | Hiragana/Katakana/Kanji    |
| Chinese    | `zh` | Good      | Simplified and Traditional |
| Arabic     | `ar` | Good      | RTL text support           |
| Hindi      | `hi` | Good      | Devanagari script          |
| Korean     | `ko` | Good      | Hangul script              |

## Model Comparison

Choose an accuracy tier for your use case. **Standard** is the default — just
omit `model_type`. **Premium** unlocks our highest-accuracy engine (punctuation,
per-word confidence, language detection, and native speaker labels).

| Accuracy tier                | `model_type` value   | Speaker labels                         | Best for                                  |
| ---------------------------- | -------------------- | -------------------------------------- | ----------------------------------------- |
| **Standard** (default)       | *omit `model_type`*  | set `enable_speaker_diarization: true` | Fast, low-cost transcription at scale     |
| **Premium**                  | `"premium"`          | set `enable_speaker_diarization: true` | Highest accuracy, punctuation, confidence |
| **Premium + speaker labels** | `"premium-diarized"` | built in                               | Interviews, meetings, podcasts            |

<Note>
  Premium accuracy is a paid feature (see [Pricing](#pricing)). Standard accuracy
  is available on every plan, including the free tier.
</Note>

## Best Practices

### Audio Quality Guidelines

For best transcription results:

```python theme={null}
# Recommended audio specifications
audio_requirements = {
    "sample_rate": "16kHz or higher",
    "format": "WAV, MP3, M4A, or video formats",
    "duration": "Up to 15 hours supported",
    "background_noise": "Minimize for better accuracy",
    "speech_clarity": "Clear articulation preferred",
    "multiple_speakers": "Distinct voices work best"
}

# Chunking for long content
chunking_strategy = {
    "chunk_duration": 1800,  # 30 minutes per chunk
    "overlap": 30,           # 30 seconds overlap
    "boundary_detection": "sentence_level"  # Smart chunk boundaries
}
```

### Cost Optimization

```python theme={null}
# Efficient transcription workflow
def transcribe_efficiently(audio_files, language="auto"):
    # Use appropriate model based on needs
    model_choice = {
        "cost_priority": None,            # omit model_type -> Standard (lowest cost)
        "accuracy_priority": "premium",
        "speaker_priority": "premium-diarized",
    }
    
    # Batch similar files together
    batch_files = group_by_language_and_type(audio_files)
    
    for batch in batch_files:
        job = create_transcription_job(
            files=batch,
            language=language,
            model_type=model_choice["accuracy_priority"],
            enable_speaker_diarization=True,
            chunk_duration=1800  # Optimal chunk size
        )
        
        monitor_job_progress(job["job_id"])
```

## Error Handling

<AccordionGroup>
  <Accordion title="400 Bad Request - Invalid Audio">
    **Causes:** - Unsupported audio format - Corrupted audio file - Audio too short
    **Solutions:** - Use supported formats (WAV, MP3, M4A, MP4) - Verify file integrity -
    Ensure minimum 10 seconds audio
  </Accordion>

  <Accordion title="413 Payload Too Large">
    **Causes:** - File size exceeds limits - Too many files in single request
    **Solutions:** - Split large files into smaller chunks - Reduce number of files per request -
    Use URL transcription for large videos
  </Accordion>

  <Accordion title="422 Processing Error">
    **Causes:** - Audio has no speech content - Extremely poor audio quality
    **Solutions:** - Verify audio contains speech - Improve audio quality -
    Try different transcription model
  </Accordion>
</AccordionGroup>

## Pricing

Transcription pricing is based on audio duration:

| Service                  | Cost               | Description                     |
| ------------------------ | ------------------ | ------------------------------- |
| Basic Transcription      | 220 credits/minute | Text-only transcription         |
| With Speaker Diarization | 220 credits/minute | Speaker identification included |
| With Word Timestamps     | 220 credits/minute | Word-level timing data          |
| Transcript Editing       | Free               | No additional cost for edits    |

### Cost Examples

| Duration   | Features                   | Credits | USD Cost |
| ---------- | -------------------------- | ------- | -------- |
| 10 minutes | Basic transcription        | 2,200   | \$0.29   |
| 30 minutes | With speakers + timestamps | 6,600   | \$0.88   |
| 1 hour     | Full features              | 13,200  | \$1.76   |
| 2 hours    | Full features              | 26,400  | \$3.52   |

## Next Steps

<Columns cols={2}>
  <Card title="Speaker Separation" icon="users" href="/api-reference/speaker-separation">
    Identify and separate individual speakers from audio.
  </Card>

  <Card title="Noise Reduction" icon="volume-xmark" href="/api-reference/noise-reduction">
    Clean up audio for better transcription accuracy.
  </Card>
</Columns>
