Automating UI Tasks with Computer Use Preview

In this lesson, you'll learn how to create a loop that lets the computer_use_preview tool perform and visualize browser-based actions iteratively with model feedback.

Prerequisites

Install OpenAI SDK:

pip install openai

Also define two external functions beforehand:

handle_model_action(instance, action): Executes a computer action.
get_screenshot(instance): Captures a screenshot from the environment.

The Code

import time
import base64
from openai import OpenAI

client = OpenAI()

def computer_use_loop(instance, response):
    """
    Run the loop that executes computer actions until no 'computer_call' is found.
    """
    while True:
        computer_calls = [item for item in response.output if item.type == "computer_call"]
        if not computer_calls:
            print("No computer call found. Output from model:")
            for item in response.output:
                print(item)
            break

        computer_call = computer_calls[0]  # Expect one per step
        last_call_id = computer_call.call_id
        action = computer_call.action

        handle_model_action(instance, action)
        time.sleep(1)

        screenshot_bytes = get_screenshot(instance)
        screenshot_base64 = base64.b64encode(screenshot_bytes).decode("utf-8")

        response = client.responses.create(
            model="computer-use-preview",
            previous_response_id=response.id,
            tools=[
                {
                    "type": "computer_use_preview",
                    "display_width": 1024,
                    "display_height": 768,
                    "environment": "browser"
                }
            ],
            input=[
                {
                    "call_id": last_call_id,
                    "type": "computer_call_output",
                    "output": {
                        "type": "input_image",
                        "image_url": f"data:image/png;base64,{screenshot_base64}"
                    }
                }
            ],
            truncation="auto"
        )

    return response

Explanation

Loops until the model no longer returns a computer_call.
Sends input_image screenshots after each step to provide visual feedback.
Enables an agent to “see” and interact with a web interface like a human.

Use Case

Best for:

Automating browser-based workflows
Simulating visual agents that respond to screen updates
Building AI agents that interact with real UIs

Prerequisites​

The Code​

Explanation​

Use Case​

Prerequisites

The Code

Explanation

Use Case