Takes in a screenshot and a natural language description of an element and returns the element’s bounding box coordinates. Coordinates are in pixels. The origin is the top-left corner of the image.

Bearer token authentication is required.

Request

element_description
string
required

Natural language description of the element to find (e.g. “login button”, “email input field”)

image_data
file
required

Screenshot image file in PNG or JPEG format

Response

x1
int
required

X coordinate of the top-left corner of the element.

y1
int
required

Y coordinate of the top-left corner of the element.

x2
int
required

X coordinate of the bottom-right corner of the element.

y2
int
required

Y coordinate of the bottom-right corner of the element.

Example Request

curl -X POST https://api.simplex.sh/find-element \
  -F "element_description=login button" \
  -F "image_data=@screenshot.png" \

Example Response

{
  "x1": 107,
  "y1": 264,
  "x2": 231,
  "y2": 290
}