Overview
AudioPod exposes OpenAI-compatible audio endpoints, so code written for the OpenAI audio API runs against AudioPod with two changes only: the base URL and the API key. Use the official OpenAI SDKs — no AudioPod-specific client required.| Endpoint | Purpose |
|---|---|
POST /v1/audio/speech | Text to speech — returns audio bytes |
POST /v1/audio/transcriptions | Speech to text (source language) |
POST /v1/audio/translations | Speech to text, translated to English |
These three are the only audio shapes the OpenAI API defines. AudioPod’s other
capabilities — stem separation, music generation, speaker separation, voice
cloning, noise reduction, media conversion — are available through the native
REST API, SDKs and
MCP server.
Configuration
Set the base URL tohttps://api.audiopod.ai/api/v1 and use your AudioPod API
key (starts with ap_). Create one in your
dashboard.
Text to speech
POST /v1/audio/speech synthesizes input with the requested voice and
returns the audio bytes.
| Field | Notes |
|---|---|
input | The text to synthesize (required). |
voice | A voice name (e.g. nova, onyx) or a voice ID. List options at GET /api/v1/voice/voice-profiles. |
response_format | mp3 (default), opus, aac, flac, wav, pcm. |
speed | 0.25–4.0 (default 1.0). |
Transcription
POST /v1/audio/transcriptions converts speech to text in the source language.
| Field | Notes |
|---|---|
file | The audio file to transcribe (required). |
language | Optional ISO-639-1 hint (e.g. en); auto-detected when omitted. |
prompt | Optional text to guide the transcription. |
response_format | json (default), text, verbose_json, srt, vtt. |
verbose_json includes the detected language, duration, and segment-level
timestamps.
Translation
POST /v1/audio/translations converts speech in any language to English
text. The request shape matches transcription (the language field is not used
— output is always English).
Notes & differences
- Auth: pass your AudioPod API key as a Bearer token (the scheme the OpenAI
SDKs use).
X-API-Keyis also accepted. - Models: the
modelfield is accepted for compatibility. AudioPod selects the appropriate engine for each request; you don’t need to changemodelstrings when migrating. - Voices: common OpenAI voice names map to AudioPod voices. Browse the full
catalog — including custom voice clones — at
GET /api/v1/voice/voice-profilesand pass any voice name or ID asvoice. - Errors follow the OpenAI shape (
{ "error": { "message", "type", "param" } }), so existing OpenAI error handling works unchanged.
