Vision Input with GPT-4o

In this lesson, you'll learn how to use OpenAI’s GPT-4o model to interpret and respond to image inputs. This is useful for building AI tools that understand visual content.

Prerequisites

Ensure you have:

Python 3.8 or later
Installed OpenAI SDK:

pip install openai

The Code

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-4o-mini",
    input=[
        {"role": "user", "content": "What's in the image?"},
        {
            "role": "user",
            "content": [
                {
                    "type": "input_image",
                    "image_url": "https://images.pexels.com/photos/1108099/pexels-photo-1108099.jpeg"
                }
            ]
        }
    ]
)

print(response.output_text)

Explanation

input=[...]: A list containing a text prompt and an image reference.
"type": "input_image": Tells the API this content is an image.
"image_url": Publicly accessible image URL to analyze.
response.output_text: The model's interpretation of the image.

Note

The image must be hosted online and accessible via HTTPS.
This feature is only supported by models with vision capability like gpt-4o, not legacy GPT-3.5/4 models.

Use Case Ideas

Visual content captioning
Product cataloging
Accessibility for the visually impaired

Prerequisites​

The Code​

Explanation​

Note​

Use Case Ideas​

Prerequisites

The Code

Explanation

Note

Use Case Ideas