Skip to main content

Analyzing and Responding to Audio Input

In this lesson, you'll learn how to send an audio file to OpenAI’s gpt-4o-audio-preview model for transcription and analysis, and receive both textual and audio responses.

Prerequisites

Install dependencies:

pip install openai requests

The Code

import base64
import requests
from openai import OpenAI

client = OpenAI()

# Fetch the audio file and convert it to a base64 encoded string
url = "https://cdn.openai.com/API/docs/audio/alloy.wav"
response = requests.get(url)
response.raise_for_status()
wav_data = response.content
encoded_string = base64.b64encode(wav_data).decode('utf-8')

completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this recording?"
},
{
"type": "input_audio",
"input_audio": {
"data": encoded_string,
"format": "wav"
}
}
]
},
]
)

print(completion.choices[0].message)

Explanation

  • Downloads and encodes a .wav file into Base64.
  • Sends the audio along with a text prompt for transcription or analysis.
  • input_audio: Accepts Base64 audio content directly in the request.
  • Returns structured message output including transcribed text and optional audio.

Use Case

Perfect for:

  • Transcribing meetings, voice notes, or lectures
  • Audio-based customer support
  • Building multimodal assistants that understand voice input