Generating Spoken Audio from GPT Responses
In this lesson, you'll learn how to generate both text and audio output using OpenAI’s gpt-4o-audio-preview model, and save the audio as a .wav file.
Prerequisites
Install the OpenAI SDK:
pip install openai
The Code
import base64
from openai import OpenAI
client = OpenAI()
completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "user",
"content": "Is a golden retriever a good family dog?"
}
]
)
print(completion.choices[0])
wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
f.write(wav_bytes)
Explanation
model="gpt-4o-audio-preview": Enables both text and audio generation.modalities=["text", "audio"]: Specifies the desired output types.audio.voice: Chooses the voice for the output (e.g.,alloy).completion.choices[0].message.audio.data: Contains Base64-encoded audio.
Use Case
Best for:
- Text-to-speech assistants
- Conversational AI with audio responses
- Voice-enabled apps and bots