Overview

AudioPod AI’s Stem Separation API uses state-of-the-art AI models to separate mixed audio recordings into individual components (stems). Extract vocals, drums, bass, and other instruments from songs, or separate speech from background music in recordings.

Key Features

  • Multi-Stem Extraction: Separate vocals, drums, bass, and other instruments
  • AI-Powered Models: Advanced neural networks for high-quality separation
  • Multiple Output Formats: Get individual stems or specific combinations
  • Background Music Removal: Extract clean speech from music backgrounds
  • Karaoke Track Creation: Generate instrumental versions by removing vocals
  • Music Production Ready: Professional quality for remixing and production
  • Batch Processing: Handle multiple files efficiently

Authentication

All endpoints require authentication:
  • API Key: Authorization: Bearer your_api_key
  • JWT Token: Authorization: Bearer your_jwt_token

Music Stem Separation

Extract All Stems

Separate a full song into individual instrument and vocal tracks.
from audiopod import Client

client = Client()

# Extract all stems from a song
separation = client.stem_extraction.extract_all_stems(
    audio_file="song.mp3",
    stem_types=["vocals", "drums", "bass", "other"],  # Model auto-selected based on stems
    wait_for_completion=True
)

print(f"Stem separation completed!")
print(f"Input file: {separation.input_file}")
print(f"Model used: {separation.model}")
print(f"Available stems: {separation.available_stems}")

# Access individual stems
for stem_name, stem_url in separation.stems.items():
    print(f"\n{stem_name}: {stem_url}")
    
    # Download each stem
    stem_audio = client.download_file(stem_url)
    output_filename = f"{stem_name}_isolated.wav"
    with open(output_filename, "wb") as f:
        f.write(stem_audio)
    print(f"Downloaded: {output_filename}")

# Process multiple songs individually
songs = [
    "album_track_1.mp3",
    "album_track_2.mp3", 
    "album_track_3.mp3"
]

for i, song_file in enumerate(songs):
    print(f"\nProcessing Song {i+1}: {song_file}")
    
    separation = client.stem_extraction.extract_all_stems(
        audio_file=song_file,
        stem_types=["vocals", "drums", "bass", "other", "piano", "guitar"],
        wait_for_completion=True
    )
    
    song_name = song_file.split('.')[0]
    print(f"Status: {separation.status}")
    
    if separation.status == "completed":
        print(f"Extracted stems: {list(separation.stems.keys())}")
        
        # Download all stems for this song
        for stem_name, stem_url in separation.stems.items():
            stem_audio = client.download_file(stem_url)
            output_filename = f"{song_name}_{stem_name}.wav"
            with open(output_filename, "wb") as f:
                f.write(stem_audio)
            print(f"  Saved: {output_filename}")

# Example 1: Auto-select model based on stems (piano/guitar → 6-stem model)
piano_separation = client.stem_extraction.extract_all_stems(
    audio_file="piano_song.wav",
    stem_types=["vocals", "piano", "other"],  
    wait_for_completion=True
)

# Example 2: Standard 4-stem separation
standard_separation = client.stem_extraction.extract_all_stems(
    audio_file="rock_song.wav",
    stem_types=["vocals", "drums", "bass", "other"],  
    wait_for_completion=True
)

print(f"Piano separation completed!")
print(f"Quality: {piano_separation.quality_level}")
print(f"Processing time: {piano_separation.processing_time}s")

# Download individual stems for manual combination
stems = standard_separation.stems

# Download all stems except vocals to create instrumental mix manually  
instrumental_stems = {k: v for k, v in stems.items() if k != 'vocals'}

print("Downloaded stems for manual combination:")
for stem_name, stem_url in instrumental_stems.items():
    stem_audio = client.download_file(stem_url)
    with open(f"instrumental_{stem_name}.wav", "wb") as f:
        f.write(stem_audio)
    print(f"  Saved: instrumental_{stem_name}.wav")

Extract Specific Stems

Extract only specific components from audio.
POST /api/v1/stem-extraction/extract
Authorization: Bearer {api_key}
Content-Type: multipart/form-data

file: (audio file)
stem_types: ["vocals"]

Extract from URL

Separate stems from online audio/video content.
POST /api/v1/stem-extraction/extract
Authorization: Bearer {api_key}
Content-Type: application/x-www-form-urlencoded

url=https://youtube.com/watch?v=example123&stem_types=["vocals","other"]
Automatic Model Selection: The API automatically selects the most appropriate model based on your requested stem_types:
  • Requests for piano or guitar stems → automatically uses the 6-stem model for highest quality
  • All other stem combinations → automatically uses the 4-stem model for balanced performance
Stem Types:
  • vocals: Extract vocal tracks
  • drums: Extract drum tracks
  • bass: Extract bass tracks
  • other: Extract other instruments (available in 4-stem mode)
  • piano: Extract piano tracks (available in 6-stem mode only)
  • guitar: Extract guitar tracks (available in 6-stem mode only)
Two-Stems Mode:
  • vocals: Extract vocals vs everything else
  • drums: Extract drums vs everything else
  • bass: Extract bass vs everything else
Response:
{
  "id": 123,
  "status": "PROCESSING",
  "input_path": "inputs/stem_extraction/job123/song.mp3",
  "source_type": "FILE",
  "stem_paths": null,
  "task_id": "celery_task_uuid_here",
  "error_message": null,
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": null,
  "completed_at": null,
  "quality_scores": null,
  "download_urls": null
}

Job Management

Get Job Status

Monitor the progress of stem separation jobs.
GET /api/v1/stem-extraction/status/{job_id}
Authorization: Bearer {api_key}
Response (Completed):
{
  "id": 123,
  "status": "COMPLETED",
  "input_path": "inputs/stem_extraction/job123/song.mp3",
  "source_type": "FILE",
  "stem_paths": {
    "vocals": "stems/job123/vocals.wav",
    "drums": "stems/job123/drums.wav",
    "bass": "stems/job123/bass.wav",
    "other": "stems/job123/other.wav"
  },
  "task_id": "celery_task_uuid_here",
  "error_message": null,
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-15T10:35:45Z",
  "completed_at": "2024-01-15T10:35:45Z",
  "quality_scores": {
    "vocals": 0.85,
    "drums": 0.92,
    "bass": 0.78,
    "other": 0.83
  },
  "download_urls": {
    "vocals": "https://s3.amazonaws.com/bucket/stems/job123/vocals.wav?presigned_params",
    "drums": "https://s3.amazonaws.com/bucket/stems/job123/drums.wav?presigned_params",
    "bass": "https://s3.amazonaws.com/bucket/stems/job123/bass.wav?presigned_params",
    "other": "https://s3.amazonaws.com/bucket/stems/job123/other.wav?presigned_params"
  }
}

List Extraction Jobs

Get all stem separation jobs for the authenticated user.
GET /api/v1/stem-extraction/jobs?skip=0&limit=50
Authorization: Bearer {api_key}

Download Individual Stems

Use the presigned URLs from the status endpoint response to download individual stems.

Job Control

Job retries are handled automatically by the system

Delete Job

Remove a stem extraction job and its results.
DELETE /api/v1/stem-extraction/jobs/{job_id}
Authorization: Bearer {api_key}

Use Cases & Examples

Music Production Workflow

def extract_stems_for_remix(song_file, api_key):
    """Extract all stems from a song for remixing"""
    
    print("Starting stem extraction for remix...")
    
    with open(song_file, "rb") as audio_file:
        response = requests.post(
            "https://api.audiopod.ai/api/v1/stem-extraction/extract",
            headers={"Authorization": f"Bearer {api_key}"},
            data={
                "stem_types": '["vocals", "drums", "bass", "other"]'  # Model auto-selected for production
            },
            files={"file": audio_file}
        )
    
    if response.status_code != 200:
        return {"error": "Failed to start extraction"}
    
    job_data = response.json()
    job_id = job_data["id"]
    
    # Wait for completion
    import time
    while True:
        status_response = requests.get(
            f"https://api.audiopod.ai/api/v1/stem-extraction/status/{job_id}",
            headers={"Authorization": f"Bearer {api_key}"}
        )
        
        job_status = status_response.json()
        print(f"Progress: {job_status['progress']}%")
        
        if job_status["status"] == "COMPLETED":
            break
        elif job_status["status"] == "FAILED":
            return {"error": "Job failed"}
        
        time.sleep(10)
    
    # Download stems
    result = job_status["result"]
    downloaded_stems = []
    
    for stem in result["stems"]:
        stem_response = requests.get(stem["download_url"])
        filename = f"remix_{stem['name']}.wav"
        
        with open(filename, "wb") as f:
            f.write(stem_response.content)
        
        downloaded_stems.append({
            "name": stem["name"],
            "filename": filename,
            "rms_level": stem["rms_level"]
        })
        
        print(f"Downloaded {stem['name']}: {filename}")
    
    return {
        "success": True,
        "job_id": job_id,
        "stems": downloaded_stems,
        "total_stems": result["num_stems"]
    }

# Usage
result = extract_stems_for_remix("song.mp3", "your_api_key")
if result.get("success"):
    print(f"Successfully extracted {result['total_stems']} stems")
    for stem in result["stems"]:
        print(f"  {stem['name']}: {stem['filename']}")

Karaoke Track Generation

def create_karaoke_track(song_file, api_key):
    """Create karaoke version by removing vocals"""
    
    with open(song_file, "rb") as audio_file:
        response = requests.post(
            "https://api.audiopod.ai/api/v1/stem-extraction/extract",
            headers={"Authorization": f"Bearer {api_key}"},
            data={
                "two_stems_mode": "vocals"    # Separate vocals from everything else, model auto-selected
            },
            files={"file": audio_file}
        )
    
    job_id = response.json()["id"]
    
    # Monitor job (simplified)
    while True:
        status_response = requests.get(
            f"https://api.audiopod.ai/api/v1/stem-extraction/status/{job_id}",
            headers={"Authorization": f"Bearer {api_key}"}
        )
        
        job_status = status_response.json()
        if job_status["status"] == "COMPLETED":
            break
        time.sleep(5)
    
    # Download instrumental track (non-vocals)
    download_urls = job_status.get("download_urls", {})
    other_url = download_urls.get("other")  # This contains the instrumental mix
    
    if other_url:
        karaoke_response = requests.get(other_url)
        karaoke_filename = f"karaoke_{song_file}"
        
        with open(karaoke_filename, "wb") as f:
            f.write(karaoke_response.content)
        
        return {
            "karaoke_file": karaoke_filename,
            "job_id": job_id
        }
    
    return {"error": "No instrumental track found"}

Use vocal extraction for speech enhancement by extracting the vocals stem from podcast recordings

Error Handling

Best Practices

Stem Selection Guide

The API automatically selects the optimal model based on your stem requirements. Here’s how to choose the right stems for your use case:
def choose_stems_for_use_case(use_case):
    """Choose optimal stem types based on your use case"""
    
    use_cases = {
        "karaoke": ["vocals", "other"],  # Separate vocals from instrumental
        "vocal_removal": ["vocals"],     # Extract just vocals
        "music_production": ["vocals", "drums", "bass", "other"],  # Standard 4-stem
        "piano_music": ["vocals", "piano", "other"],  # Uses 6-stem model
        "guitar_music": ["vocals", "guitar", "other"], # Uses 6-stem model
        "professional_production": ["vocals", "drums", "bass", "other", "piano", "guitar"], # Full 6-stem
        "podcast_cleanup": ["vocals"],   # Extract speech/vocals only
        "drum_isolation": ["drums"],     # Extract drums only
        "bass_isolation": ["bass"]       # Extract bass only
    }
    
    return use_cases.get(use_case, ["vocals", "drums", "bass", "other"])  # Default to 4-stem

# Usage examples:
# For karaoke: stem_types = ["vocals", "other"]
# For piano separation: stem_types = ["vocals", "piano", "other"]  → uses 6-stem model
# For full production: stem_types = ["vocals", "drums", "bass", "other"] → uses 4-stem model

Quality Assessment

def assess_separation_quality(job_id, api_key):
    """Assess the quality of stem separation results"""
    
    response = requests.get(
        f"https://api.audiopod.ai/api/v1/stem-extraction/status/{job_id}",
        headers={"Authorization": f"Bearer {api_key}"}
    )
    
    job_data = response.json()
    result = job_data.get("result", {})
    
    if not result:
        return {"assessment": "no_results"}
    
    stems = result.get("stems", [])
    processing_stats = result.get("processing_stats", {})
    
    # Analyze RMS levels for balance
    rms_levels = [stem.get("rms_level", 0) for stem in stems if stem.get("rms_level")]
    
    assessment = {
        "num_stems": len(stems),
        "separation_quality": processing_stats.get("separation_quality", "unknown"),
        "rms_analysis": {
            "avg_level": sum(rms_levels) / len(rms_levels) if rms_levels else 0,
            "level_variance": max(rms_levels) - min(rms_levels) if rms_levels else 0,
            "balanced_levels": (max(rms_levels) - min(rms_levels)) < 10 if rms_levels else False
        },
        "processing_efficiency": {
            "processing_time": processing_stats.get("model_processing_time", 0),
            "audio_duration": processing_stats.get("audio_duration", 0),
            "efficiency_ratio": 0
        },
        "quality_indicators": {}
    }
    
    # Calculate efficiency ratio
    if processing_stats.get("audio_duration", 0) > 0:
        assessment["processing_efficiency"]["efficiency_ratio"] = (
            processing_stats.get("model_processing_time", 0) / 
            processing_stats.get("audio_duration", 1)
        )
    
    # Quality indicators
    assessment["quality_indicators"] = {
        "good_balance": assessment["rms_analysis"]["balanced_levels"],
        "reasonable_processing": assessment["processing_efficiency"]["efficiency_ratio"] < 5.0,
        "all_stems_present": len(stems) >= 2,
        "high_quality_model": processing_stats.get("separation_quality") == "high"
    }
    
    # Overall score
    quality_score = sum(assessment["quality_indicators"].values())
    assessment["overall_rating"] = {
        4: "excellent",
        3: "good", 
        2: "acceptable",
        1: "poor",
        0: "failed"
    }.get(quality_score, "unknown")
    
    return assessment

Pricing

Stem separation pricing is based on audio duration. The API automatically selects the appropriate model based on your requested stems:
Model TypeCostDescription
4-stem990 credits/minuteStandard separation (vocals, drums, bass, other)
6-stem990 credits/minuteAdvanced separation (vocals, drums, bass, other, piano, guitar)

Cost Examples

DurationStem TypesModel UsedCreditsUSD Cost
4 minutesvocals, drums, bass, other4-stem3960$0.53
6 minutesvocals, piano, guitar, other6-stem5940$0.79
5 minutesvocals, other4-stem4950$0.66
10 minutesall 6 stems6-stem9900$1.32

Next Steps