Skip to main content

Overview

AudioPod exposes OpenAI-compatible audio endpoints, so code written for the OpenAI audio API runs against AudioPod with two changes only: the base URL and the API key. Use the official OpenAI SDKs — no AudioPod-specific client required.
EndpointPurpose
POST /v1/audio/speechText to speech — returns audio bytes
POST /v1/audio/transcriptionsSpeech to text (source language)
POST /v1/audio/translationsSpeech to text, translated to English
These three are the only audio shapes the OpenAI API defines. AudioPod’s other capabilities — stem separation, music generation, speaker separation, voice cloning, noise reduction, media conversion — are available through the native REST API, SDKs and MCP server.

Configuration

Set the base URL to https://api.audiopod.ai/api/v1 and use your AudioPod API key (starts with ap_). Create one in your dashboard.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.audiopod.ai/api/v1",
    api_key="ap_your_api_key",
)
Calls are billed to your API wallet (prepaid, in USD). Nothing is charged for a request that fails — you’re only billed once audio or a transcript is produced.

Text to speech

POST /v1/audio/speech synthesizes input with the requested voice and returns the audio bytes.
response = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input="Hello from AudioPod.",
    response_format="mp3",  # mp3 | opus | aac | flac | wav | pcm
)
response.stream_to_file("speech.mp3")
FieldNotes
inputThe text to synthesize (required).
voiceA voice name (e.g. nova, onyx) or a voice ID. List options at GET /api/v1/voice/voice-profiles.
response_formatmp3 (default), opus, aac, flac, wav, pcm.
speed0.254.0 (default 1.0).

Transcription

POST /v1/audio/transcriptions converts speech to text in the source language.
with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
        response_format="json",  # json | text | verbose_json | srt | vtt
    )
print(transcript.text)
FieldNotes
fileThe audio file to transcribe (required).
languageOptional ISO-639-1 hint (e.g. en); auto-detected when omitted.
promptOptional text to guide the transcription.
response_formatjson (default), text, verbose_json, srt, vtt.
verbose_json includes the detected language, duration, and segment-level timestamps.

Translation

POST /v1/audio/translations converts speech in any language to English text. The request shape matches transcription (the language field is not used — output is always English).
with open("spanish.mp3", "rb") as f:
    translation = client.audio.translations.create(
        model="whisper-1",
        file=f,
    )
print(translation.text)  # English

Notes & differences

  • Auth: pass your AudioPod API key as a Bearer token (the scheme the OpenAI SDKs use). X-API-Key is also accepted.
  • Models: the model field is accepted for compatibility. AudioPod selects the appropriate engine for each request; you don’t need to change model strings when migrating.
  • Voices: common OpenAI voice names map to AudioPod voices. Browse the full catalog — including custom voice clones — at GET /api/v1/voice/voice-profiles and pass any voice name or ID as voice.
  • Errors follow the OpenAI shape ({ "error": { "message", "type", "param" } }), so existing OpenAI error handling works unchanged.

Beyond OpenAI

Already migrated? AudioPod does more than the OpenAI audio API. Explore stem separation, music generation, speaker separation, and voice cloning through the native API and SDKs.