Analyzing and Responding to Audio Input
In this lesson, you'll learn how to send an audio file to OpenAI’s gpt-4o-audio-preview model for transcription and analysis, and receive both textual and audio responses.
Prerequisites
Install dependencies:
pip install openai requests
The Code
import base64
import requests
from openai import OpenAI
client = OpenAI()
# Fetch the audio file and convert it to a base64 encoded string
url = "https://cdn.openai.com/API/docs/audio/alloy.wav"
response = requests.get(url)
response.raise_for_status()
wav_data = response.content
encoded_string = base64.b64encode(wav_data).decode('utf-8')
completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this recording?"
},
{
"type": "input_audio",
"input_audio": {
"data": encoded_string,
"format": "wav"
}
}
]
},
]
)
print(completion.choices[0].message)
Explanation
- Downloads and encodes a
.wavfile into Base64. - Sends the audio along with a text prompt for transcription or analysis.
input_audio: Accepts Base64 audio content directly in the request.- Returns structured message output including transcribed text and optional audio.
Use Case
Perfect for:
- Transcribing meetings, voice notes, or lectures
- Audio-based customer support
- Building multimodal assistants that understand voice input