Analyzing and Responding to Audio Input

In this lesson, you'll learn how to send an audio file to OpenAI’s gpt-4o-audio-preview model for transcription and analysis, and receive both textual and audio responses.

Prerequisites

Install dependencies:

pip install openai requests

The Code

import base64
import requests
from openai import OpenAI

client = OpenAI()

# Fetch the audio file and convert it to a base64 encoded string
url = "https://cdn.openai.com/API/docs/audio/alloy.wav"
response = requests.get(url)
response.raise_for_status()
wav_data = response.content
encoded_string = base64.b64encode(wav_data).decode('utf-8')

completion = client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is in this recording?"
                },
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": encoded_string,
                        "format": "wav"
                    }
                }
            ]
        },
    ]
)

print(completion.choices[0].message)

Explanation

Downloads and encodes a .wav file into Base64.
Sends the audio along with a text prompt for transcription or analysis.
input_audio: Accepts Base64 audio content directly in the request.
Returns structured message output including transcribed text and optional audio.

Use Case

Perfect for:

Transcribing meetings, voice notes, or lectures
Audio-based customer support
Building multimodal assistants that understand voice input

Prerequisites​

The Code​

Explanation​

Use Case​

Prerequisites

The Code

Explanation

Use Case