Skip to main content

Streaming Responses from OpenAI

In this lesson, you'll learn how to receive responses in real-time using OpenAI's streaming API, allowing your app to react immediately as the model generates output.

Prerequisites

Make sure to:

  • Use a model that supports streaming (like gpt-4.1)
  • Install OpenAI SDK:
pip install openai

The Code

from openai import OpenAI

client = OpenAI()

stream = client.responses.create(
model="gpt-4.1",
input=[{"role": "user", "content": "Say 'double bubble bath' ten times fast."}],
stream=True,
)

# Only print the text deltas as they arrive
for event in stream:
if hasattr(event, "delta"):
print(event.delta, end="", flush=True)

Explanation

  • stream=True: Enables token-by-token response.
  • event.delta: Contains the partial text output received from the model.
  • flush=True: Ensures immediate printing without buffering.

Use Case

Perfect for:

  • Live chat experiences
  • Streaming assistants or CLI bots
  • Apps needing instant model feedback