Skip to main content

Generating Spoken Audio from GPT Responses

In this lesson, you'll learn how to generate both text and audio output using OpenAI’s gpt-4o-audio-preview model, and save the audio as a .wav file.

Prerequisites

Install the OpenAI SDK:

pip install openai

The Code

import base64
from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "user",
"content": "Is a golden retriever a good family dog?"
}
]
)

print(completion.choices[0])

wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
f.write(wav_bytes)

Explanation

  • model="gpt-4o-audio-preview": Enables both text and audio generation.
  • modalities=["text", "audio"]: Specifies the desired output types.
  • audio.voice: Chooses the voice for the output (e.g., alloy).
  • completion.choices[0].message.audio.data: Contains Base64-encoded audio.

Use Case

Best for:

  • Text-to-speech assistants
  • Conversational AI with audio responses
  • Voice-enabled apps and bots