Skip to main content

Automating UI Tasks with Computer Use Preview

In this lesson, you'll learn how to create a loop that lets the computer_use_preview tool perform and visualize browser-based actions iteratively with model feedback.

Prerequisites

Install OpenAI SDK:

pip install openai

Also define two external functions beforehand:

  • handle_model_action(instance, action): Executes a computer action.
  • get_screenshot(instance): Captures a screenshot from the environment.

The Code

import time
import base64
from openai import OpenAI

client = OpenAI()

def computer_use_loop(instance, response):
"""
Run the loop that executes computer actions until no 'computer_call' is found.
"""
while True:
computer_calls = [item for item in response.output if item.type == "computer_call"]
if not computer_calls:
print("No computer call found. Output from model:")
for item in response.output:
print(item)
break

computer_call = computer_calls[0] # Expect one per step
last_call_id = computer_call.call_id
action = computer_call.action

handle_model_action(instance, action)
time.sleep(1)

screenshot_bytes = get_screenshot(instance)
screenshot_base64 = base64.b64encode(screenshot_bytes).decode("utf-8")

response = client.responses.create(
model="computer-use-preview",
previous_response_id=response.id,
tools=[
{
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "browser"
}
],
input=[
{
"call_id": last_call_id,
"type": "computer_call_output",
"output": {
"type": "input_image",
"image_url": f"data:image/png;base64,{screenshot_base64}"
}
}
],
truncation="auto"
)

return response

Explanation

  • Loops until the model no longer returns a computer_call.
  • Sends input_image screenshots after each step to provide visual feedback.
  • Enables an agent to “see” and interact with a web interface like a human.

Use Case

Best for:

  • Automating browser-based workflows
  • Simulating visual agents that respond to screen updates
  • Building AI agents that interact with real UIs