Documentation Index Fetch the complete documentation index at: https://docs.audiopod.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
AudioPod AI’s AudioMusic V2 is a state-of-the-art music generation engine that transforms text descriptions into original musical compositions. Powered by a Transformer Model with Chain-of-Thought reasoning, AudioMusic V2 delivers studio-quality results.
Key Capabilities
Text-to-Music : Generate complete songs with vocals from prompts and lyrics
Simple Mode : Describe what you want in natural language — AudioMusic V2 handles everything
Cover / Style Transfer : Transform existing audio into different styles and genres
Reference Audio : Guide generation using a reference track’s timbre and mixing style
Audio Analysis : Extract metadata (BPM, key, lyrics, caption) from any audio
Stem Extraction : Isolate specific instruments (vocals, drums, bass, guitar, and more)
Repaint / Edit / Extend : Modify specific sections, edit with new prompts, or extend tracks
Batch Generation : Generate 1-8 variations in a single request
LRC Timestamps : Synchronized lyric timestamps for karaoke-style playback
Quality Scoring : Automatic quality assessment for every generation
50+ Languages : Generate music with vocals in over 50 languages
Multiple Formats : Output in WAV, MP3, FLAC, or OGG
Royalty-Free : All generated music is 100% royalty-free for commercial use
Authentication
All music endpoints require authentication. Use one of these methods:
API Key (Recommended) : X-API-Key: your_api_key header
JWT Token : Authorization: Bearer your_jwt_token (for session-based auth)
# Example with API Key
curl -X POST "https://api.audiopod.ai/api/v1/music/simple" \
-H "X-API-Key: $AUDIOPOD_API_KEY " \
-H "Content-Type: application/json" \
-d '{"query": "upbeat pop song with catchy melody"}'
Quick Start — Simple Mode
The fastest way to generate music. Just describe what you want:
from audiopod import AudioPod
client = AudioPod( api_key = "ap_your_api_key" )
# Simple mode — AudioMusic V2 auto-generates everything
job = client.music.simple(
query = "a soft Bengali love song for a quiet evening" ,
wait_for_completion = True
)
print ( f "Music URL: { job[ 'output_url' ] } " )
import AudioPod from 'audiopod' ;
const client = new AudioPod ({ apiKey: 'ap_your_api_key' });
// Simple mode — AudioMusic V2 auto-generates everything
const job = await client . music . simple ({
query: 'a soft Bengali love song for a quiet evening' ,
});
const completed = await client . music . waitForCompletion ( job . id );
console . log ( `Music URL: ${ completed . output_url } ` );
# Create job
JOB = $( curl -s -X POST "https://api.audiopod.ai/api/v1/music/simple" \
-H "X-API-Key: $AUDIOPOD_API_KEY " \
-H "Content-Type: application/json" \
-d '{"query": "a soft Bengali love song for a quiet evening"}' )
JOB_ID = $( echo $JOB | jq -r '.job.id' )
echo "Job created: $JOB_ID "
# Poll for completion
while true ; do
STATUS = $( curl -s "https://api.audiopod.ai/api/v1/music/jobs/ $JOB_ID /status" \
-H "X-API-Key: $AUDIOPOD_API_KEY " )
STATE = $( echo $STATUS | jq -r '.status' )
echo "Status: $STATE "
[ " $STATE " = "COMPLETED" ] && echo "URL: $( echo $STATUS | jq -r '.output_url')" && break
[ " $STATE " = "FAILED" ] && echo "Failed" && break
sleep 5
done
Complete Parameters Reference
All music generation endpoints inherit from a common base. Here are all available parameters:
Core Parameters
Parameter Type Default Description durationfloat -1.0 Audio duration in seconds. Use -1 for automatic duration based on lyrics length. Max: 600 formatstring "flac"Output audio format: wav, mp3, flac, ogg model_versionstring "audiomusic-v2"Model version: audiomusic-v2 (recommended) or audiomusic-v1.5 batch_sizeint 1 Number of variations to generate (1-8) seedint -1 Random seed for reproducibility. Use -1 for random
Diffusion Parameters
Parameter Type Default Description inference_stepsint 64 Number of diffusion steps. Higher = better quality, slower. Range: 32-100 guidance_scalefloat 7.0 Classifier-free guidance scale. Higher = more prompt adherence. Range: 1.0-15.0 shiftfloat 3.0 Timestep shift factor. Range: 1.0-5.0 infer_methodstring "ode"Inference method: ode (Euler, faster) or sde (stochastic, more diverse) use_adgbool false Enable Adaptive Dual Guidance for enhanced control cfg_interval_startfloat 0.0 CFG interval start ratio (0.0-1.0) cfg_interval_endfloat 1.0 CFG interval end ratio (0.0-1.0)
Language Model Parameters
AudioMusic V2 includes a language model that can auto-generate and refine metadata, captions, lyrics, and more.
Parameter Type Default Description thinkingbool true Enable Chain-of-Thought reasoning for higher quality lm_temperaturefloat 0.85 Sampling temperature (0.0-2.0). Lower = more deterministic lm_cfg_scalefloat 2.0 LM classifier-free guidance (0.0-10.0) lm_top_kint 0 Top-k sampling (0-100, 0 disables) lm_top_pfloat 0.9 Nucleus sampling threshold (0.0-1.0) lm_negative_promptstring "NO USER INPUT"Negative prompt for LM guidance use_cot_metasbool true Auto-generate BPM, key, time signature via Chain-of-Thought use_cot_captionbool true Refine caption/tags via Chain-of-Thought use_cot_lyricsbool false Generate or refine lyrics via Chain-of-Thought use_cot_languagebool true Auto-detect vocal language via Chain-of-Thought use_constrained_decodingbool false Enable structured LM output
These are auto-detected when thinking: true, but can be specified manually.
Parameter Type Default Description bpmint null Beats per minute (30-300). Auto-detected if null keyscalestring null Musical key, e.g. "C Major", "Am". Auto-detected if null timesignaturestring null Time signature: "2/4", "3/4", "4/4", "6/8". Auto-detected if null vocal_languagestring "unknown"ISO 639-1 language code (e.g. "en", "zh", "ja"). Use "unknown" for auto-detection
Output & Quality Parameters
Parameter Type Default Description generate_lrcbool false Generate synchronized LRC lyric timestamps calculate_qualitybool true Calculate quality score (0.0-1.0) for the output thumbnail_urlstring null URL for custom track thumbnail image
LoRA Parameters
Parameter Type Default Description lora_name_or_pathstring null Path to LoRA adapter for fine-tuned styles lora_weightfloat 1.0 LoRA adapter weight (-3.0 to 3.0)
Genre Presets
Pre-configured genre settings for quick generation:
Preset Description Modern PopContemporary pop with electronic elements RockGuitar-driven rock music Hip HopModern hip hop beats and production JazzJazz compositions with improvisation ClassicalOrchestral and classical arrangements ElectronicEDM, synth-driven electronic music CountryCountry music with acoustic instruments FolkFolk music with traditional elements BluesBlues with soulful expression ReggaeReggae with Caribbean rhythms LatinLatin music styles R&BRhythm and blues MetalHeavy metal and rock subgenres CustomCustom genre (specify in caption/prompt)
Format Extension Quality Use Case FLAC .flacLossless Default. Best quality, larger files WAV .wavLossless Uncompressed, maximum compatibility MP3 .mp3Lossy Smaller files, streaming OGG .oggLossy Open format, good compression
Job Lifecycle
All music generation is asynchronous:
Submit a generation request → receive a job with status: "PENDING"
Poll the job status endpoint → "PROCESSING" while generating
Complete → status: "COMPLETED" with output_url and output_urls (multiple formats)
Failed → status: "FAILED" with error_message (use retry endpoint to re-queue)
Pricing
Music generation is billed per minute of audio generated. Two billing paths
apply depending on how you authenticate:
Path Rate Account credits (subscription / JWT) 990 credits/minute API wallet (API key, USD-denominated) $0.04/minute
Duration Credits API wallet (USD) 30 seconds 495 $0.02 1 minute 990 $0.04 2 minutes 1,980 $0.08 5 minutes 4,950 $0.20 10 minutes 9,900 $0.40
AudioMusic Premium (dit_variant: "xl") is a higher-fidelity variant that
bills at 2× the base rate — 1,980 credits/minute or $0.08/minute. Requires
a Pro-tier or higher subscription; calls from Free/Basic tiers return
402 PREMIUM_TIER_REQUIRED.
Credits are reserved when the job is created and refunded if generation fails.
Best Practices
Use Simple Mode for quick results — describe what you want in natural language and let AudioMusic V2 handle the details
Enable thinking: true (default) for best quality — Chain-of-Thought reasoning significantly improves output
Use caption over prompt — AudioMusic V2 uses caption internally; prompt is a legacy alias
Set inference_steps: 64 for production quality. Use 32 for fast drafts, 8 for turbo previews
Default guidance_scale: 7.0 works well for most cases. Increase to 10-15 for stronger prompt adherence
Use structure tags in lyrics — [verse], [chorus], [bridge], [intro], [outro], [Instrumental]
Use batch_size > 1 to generate multiple variations and pick the best one
Enable generate_lrc: true when you need synchronized lyric timestamps
Error Handling
Status Code Description 200 Success 400 Invalid parameters (check error message for details) 401 Authentication required 402 Insufficient credits 404 Job or resource not found 429 Rate limit exceeded (wait and retry) 500 Internal server error
“Insufficient credits” — Your wallet balance is too low. Top up at Dashboard .“Job must be completed” — You tried to retake/extend/edit a job that hasn’t finished yet. Wait for completion first.“Original music job not found” — The source job ID doesn’t exist or doesn’t belong to your account.“Rate limit exceeded” — You’ve exceeded the rate limit. Wait a moment and try again.
Next Steps
Music Generation Text-to-Music, Instrumental, Rap, Vocals, Samples, Simple Mode
Audio Tools Cover, Reference Audio, Analysis, Stem Extraction, Edit, Extend
Job Management Job status, listing, filtering, retry, presets