API Reference
/find-element
Locate an element based on a natural language description
Takes in a screenshot and a natural language description of an element and returns the element’s bounding box coordinates. Coordinates are in pixels. The origin is the top-left corner of the image.
Bearer token authentication is required.
Request
Natural language description of the element to find (e.g. “login button”, “email input field”)
Screenshot image file in PNG or JPEG format
Response
X coordinate of the top-left corner of the element.
Y coordinate of the top-left corner of the element.
X coordinate of the bottom-right corner of the element.
Y coordinate of the bottom-right corner of the element.