Automating UI Tasks with Computer Use Preview
In this lesson, you'll learn how to create a loop that lets the computer_use_preview tool perform and visualize browser-based actions iteratively with model feedback.
Prerequisites
Install OpenAI SDK:
pip install openai
Also define two external functions beforehand:
handle_model_action(instance, action): Executes a computer action.get_screenshot(instance): Captures a screenshot from the environment.
The Code
import time
import base64
from openai import OpenAI
client = OpenAI()
def computer_use_loop(instance, response):
"""
Run the loop that executes computer actions until no 'computer_call' is found.
"""
while True:
computer_calls = [item for item in response.output if item.type == "computer_call"]
if not computer_calls:
print("No computer call found. Output from model:")
for item in response.output:
print(item)
break
computer_call = computer_calls[0] # Expect one per step
last_call_id = computer_call.call_id
action = computer_call.action
handle_model_action(instance, action)
time.sleep(1)
screenshot_bytes = get_screenshot(instance)
screenshot_base64 = base64.b64encode(screenshot_bytes).decode("utf-8")
response = client.responses.create(
model="computer-use-preview",
previous_response_id=response.id,
tools=[
{
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "browser"
}
],
input=[
{
"call_id": last_call_id,
"type": "computer_call_output",
"output": {
"type": "input_image",
"image_url": f"data:image/png;base64,{screenshot_base64}"
}
}
],
truncation="auto"
)
return response
Explanation
- Loops until the model no longer returns a
computer_call. - Sends
input_imagescreenshots after each step to provide visual feedback. - Enables an agent to “see” and interact with a web interface like a human.
Use Case
Best for:
- Automating browser-based workflows
- Simulating visual agents that respond to screen updates
- Building AI agents that interact with real UIs