Generating Spoken Audio from GPT Responses

In this lesson, you'll learn how to generate both text and audio output using OpenAI’s gpt-4o-audio-preview model, and save the audio as a .wav file.

Prerequisites

Install the OpenAI SDK:

pip install openai

The Code

import base64
from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(completion.choices[0])

wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

Explanation

model="gpt-4o-audio-preview": Enables both text and audio generation.
modalities=["text", "audio"]: Specifies the desired output types.
audio.voice: Chooses the voice for the output (e.g., alloy).
completion.choices[0].message.audio.data: Contains Base64-encoded audio.

Use Case

Best for:

Text-to-speech assistants
Conversational AI with audio responses
Voice-enabled apps and bots

Prerequisites​

The Code​

Explanation​

Use Case​

Prerequisites

The Code

Explanation

Use Case