Vision Input with GPT-4o
In this lesson, you'll learn how to use OpenAI’s GPT-4o model to interpret and respond to image inputs. This is useful for building AI tools that understand visual content.
Prerequisites
Ensure you have:
- Python 3.8 or later
- Installed OpenAI SDK:
pip install openai
The Code
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-4o-mini",
input=[
{"role": "user", "content": "What's in the image?"},
{
"role": "user",
"content": [
{
"type": "input_image",
"image_url": "https://images.pexels.com/photos/1108099/pexels-photo-1108099.jpeg"
}
]
}
]
)
print(response.output_text)
Explanation
input=[...]: A list containing a text prompt and an image reference."type": "input_image": Tells the API this content is an image."image_url": Publicly accessible image URL to analyze.response.output_text: The model's interpretation of the image.
Note
- The image must be hosted online and accessible via HTTPS.
- This feature is only supported by models with vision capability like
gpt-4o, not legacy GPT-3.5/4 models.
Use Case Ideas
- Visual content captioning
- Product cataloging
- Accessibility for the visually impaired