Skip to main content

Overview

AudioPod AI’s AudioMusic V2 is a state-of-the-art music generation engine that transforms text descriptions into original musical compositions. Powered by a Transformer Model with Chain-of-Thought reasoning, AudioMusic V2 delivers studio-quality results.

Key Capabilities

  • Text-to-Music: Generate complete songs with vocals from prompts and lyrics
  • Simple Mode: Describe what you want in natural language — AudioMusic V2 handles everything
  • Cover / Style Transfer: Transform existing audio into different styles and genres
  • Reference Audio: Guide generation using a reference track’s timbre and mixing style
  • Audio Analysis: Extract metadata (BPM, key, lyrics, caption) from any audio
  • Stem Extraction: Isolate specific instruments (vocals, drums, bass, guitar, and more)
  • Repaint / Edit / Extend: Modify specific sections, edit with new prompts, or extend tracks
  • Batch Generation: Generate 1-8 variations in a single request
  • LRC Timestamps: Synchronized lyric timestamps for karaoke-style playback
  • Quality Scoring: Automatic quality assessment for every generation
  • 50+ Languages: Generate music with vocals in over 50 languages
  • Multiple Formats: Output in WAV, MP3, FLAC, or OGG
  • Royalty-Free: All generated music is 100% royalty-free for commercial use

Authentication

All music endpoints require authentication. Use one of these methods:
  • API Key (Recommended): X-API-Key: your_api_key header
  • JWT Token: Authorization: Bearer your_jwt_token (for session-based auth)
# Example with API Key
curl -X POST "https://api.audiopod.ai/api/v1/music/simple" \
  -H "X-API-Key: $AUDIOPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query": "upbeat pop song with catchy melody"}'

Quick Start — Simple Mode

The fastest way to generate music. Just describe what you want:
from audiopod import AudioPod

client = AudioPod(api_key="ap_your_api_key")

# Simple mode — AudioMusic V2 auto-generates everything
job = client.music.simple(
    query="a soft Bengali love song for a quiet evening",
    wait_for_completion=True
)
print(f"Music URL: {job['output_url']}")

Complete Parameters Reference

All music generation endpoints inherit from a common base. Here are all available parameters:

Core Parameters

ParameterTypeDefaultDescription
durationfloat-1.0Audio duration in seconds. Use -1 for automatic duration based on lyrics length. Max: 600
formatstring"flac"Output audio format: wav, mp3, flac, ogg
model_versionstring"audiomusic-v2"Model version: audiomusic-v2 (recommended) or audiomusic-v1.5
batch_sizeint1Number of variations to generate (1-8)
seedint-1Random seed for reproducibility. Use -1 for random

Diffusion Parameters

ParameterTypeDefaultDescription
inference_stepsint64Number of diffusion steps. Higher = better quality, slower. Range: 32-100
guidance_scalefloat7.0Classifier-free guidance scale. Higher = more prompt adherence. Range: 1.0-15.0
shiftfloat3.0Timestep shift factor. Range: 1.0-5.0
infer_methodstring"ode"Inference method: ode (Euler, faster) or sde (stochastic, more diverse)
use_adgboolfalseEnable Adaptive Dual Guidance for enhanced control
cfg_interval_startfloat0.0CFG interval start ratio (0.0-1.0)
cfg_interval_endfloat1.0CFG interval end ratio (0.0-1.0)

Language Model Parameters

AudioMusic V2 includes a language model that can auto-generate and refine metadata, captions, lyrics, and more.
ParameterTypeDefaultDescription
thinkingbooltrueEnable Chain-of-Thought reasoning for higher quality
lm_temperaturefloat0.85Sampling temperature (0.0-2.0). Lower = more deterministic
lm_cfg_scalefloat2.0LM classifier-free guidance (0.0-10.0)
lm_top_kint0Top-k sampling (0-100, 0 disables)
lm_top_pfloat0.9Nucleus sampling threshold (0.0-1.0)
lm_negative_promptstring"NO USER INPUT"Negative prompt for LM guidance
use_cot_metasbooltrueAuto-generate BPM, key, time signature via Chain-of-Thought
use_cot_captionbooltrueRefine caption/tags via Chain-of-Thought
use_cot_lyricsboolfalseGenerate or refine lyrics via Chain-of-Thought
use_cot_languagebooltrueAuto-detect vocal language via Chain-of-Thought
use_constrained_decodingboolfalseEnable structured LM output

Music Metadata Parameters

These are auto-detected when thinking: true, but can be specified manually.
ParameterTypeDefaultDescription
bpmintnullBeats per minute (30-300). Auto-detected if null
keyscalestringnullMusical key, e.g. "C Major", "Am". Auto-detected if null
timesignaturestringnullTime signature: "2/4", "3/4", "4/4", "6/8". Auto-detected if null
vocal_languagestring"unknown"ISO 639-1 language code (e.g. "en", "zh", "ja"). Use "unknown" for auto-detection

Output & Quality Parameters

ParameterTypeDefaultDescription
generate_lrcboolfalseGenerate synchronized LRC lyric timestamps
calculate_qualitybooltrueCalculate quality score (0.0-1.0) for the output
thumbnail_urlstringnullURL for custom track thumbnail image

LoRA Parameters

ParameterTypeDefaultDescription
lora_name_or_pathstringnullPath to LoRA adapter for fine-tuned styles
lora_weightfloat1.0LoRA adapter weight (-3.0 to 3.0)

Genre Presets

Pre-configured genre settings for quick generation:
PresetDescription
Modern PopContemporary pop with electronic elements
RockGuitar-driven rock music
Hip HopModern hip hop beats and production
JazzJazz compositions with improvisation
ClassicalOrchestral and classical arrangements
ElectronicEDM, synth-driven electronic music
CountryCountry music with acoustic instruments
FolkFolk music with traditional elements
BluesBlues with soulful expression
ReggaeReggae with Caribbean rhythms
LatinLatin music styles
R&BRhythm and blues
MetalHeavy metal and rock subgenres
CustomCustom genre (specify in caption/prompt)

Output Formats

FormatExtensionQualityUse Case
FLAC.flacLosslessDefault. Best quality, larger files
WAV.wavLosslessUncompressed, maximum compatibility
MP3.mp3LossySmaller files, streaming
OGG.oggLossyOpen format, good compression

Job Lifecycle

All music generation is asynchronous:
  1. Submit a generation request → receive a job with status: "PENDING"
  2. Poll the job status endpoint → "PROCESSING" while generating
  3. Completestatus: "COMPLETED" with output_url and output_urls (multiple formats)
  4. Failedstatus: "FAILED" with error_message (use retry endpoint to re-queue)

Pricing

Music generation is billed per minute of audio generated:
DurationCredits
Up to 30s5 credits
30s - 60s10 credits
60s - 120s15 credits
120s - 300s25 credits
300s - 600s40 credits
Credits are reserved when the job is created and refunded if generation fails.

Best Practices

  1. Use Simple Mode for quick results — describe what you want in natural language and let AudioMusic V2 handle the details
  2. Enable thinking: true (default) for best quality — Chain-of-Thought reasoning significantly improves output
  3. Use caption over prompt — AudioMusic V2 uses caption internally; prompt is a legacy alias
  4. Set inference_steps: 64 for production quality. Use 32 for fast drafts, 8 for turbo previews
  5. Default guidance_scale: 7.0 works well for most cases. Increase to 10-15 for stronger prompt adherence
  6. Use structure tags in lyrics[verse], [chorus], [bridge], [intro], [outro], [Instrumental]
  7. Use batch_size > 1 to generate multiple variations and pick the best one
  8. Enable generate_lrc: true when you need synchronized lyric timestamps

Error Handling

Status CodeDescription
200Success
400Invalid parameters (check error message for details)
401Authentication required
402Insufficient credits
404Job or resource not found
429Rate limit exceeded (wait and retry)
500Internal server error
“Insufficient credits” — Your wallet balance is too low. Top up at Dashboard.“Job must be completed” — You tried to retake/extend/edit a job that hasn’t finished yet. Wait for completion first.“Original music job not found” — The source job ID doesn’t exist or doesn’t belong to your account.“Rate limit exceeded” — You’ve exceeded the rate limit. Wait a moment and try again.

Next Steps