
The machine learning model was built and trained using TensorFlow.
List of elements it can distinguish: paragraph, label, header, button, checkbox, radiobutto, rating, toggle, dropdown, listbox, textarea, textinput, datepicker, stepperinput, slider, progressbar, image, video.
The API is currently in closed alpha, but feel free to contact us if you want early access.
Send all requests to the API endpoint: https://api.vision.teleporthq.io/detection
Make sure to add a Content-Type key with the value application/json and a Teleport-Token key with the key provided by us.
The body of the request is a json with two keys: image and threshold.
imageis a required string parameter that denotes the direct url to a publicly available jpg or png image.thresholdis an optional parameter. Default value is0.1. The detection model outputs a confidence score for each detection (between 0 and 1) and won't include in the response detections with confidence lower than this threshold.
Request body example:
{
"image": "https://i.imgur.com/eF9KN8U.jpg",
"threshold": 0.5
}
curl -d '{"image": "https://i.imgur.com/eF9KN8U.jpg", "threshold": 0.5}' \
-H "Teleport-Token: 123" -H "Content-Type: application/json" \
-X POST https://api.vision.teleporthq.io/detection
If your request is a valid one, you will recieve back a json with the following structure:
[
{
"box": [y, x, height, width],
"detectionClass": numeric_label,
"detectionString": string_label,
"score": confidence_rating
},
...
]
The json contains a list of objects, each one of this objects corresponding to a detected atomic UI element in the image sent in the request. All of the keys will appear in all of the objects in your response array.
boxcontains the coordinates of the bounding box surrounding the detected element.xandyare the coordinates of the top left corner of the box andwidthandheightare self explanatory. All coordinates are normalized between [0, 1] where(0,0)is the top left corner of your image and(1, 1)is the bottom right corner. In other words, if you want to get the pixel coordinates you have to multiplyxandwidthwith the width of your image andyandheightwith the height of your image.detectionClassis the numeric class of the detection.detectionStringis the human-readable label of the detection.scorerepresents how confident the algorithm is that the predicted object is a correct / valid one. It takes values between[0, 1], where1represents a 100% confidence in its detection.
The detectionClass to detectionString mapping is done according to this dictionary:
{
1: "paragraph",
2: "label",
3: "header",
4: "button",
5: "checkbox",
6: "radiobutton",
7: "rating",
8: "toggle",
9: "dropdown",
10: "listbox",
11: "textarea",
12: "textinput",
13: "datepicker",
14: "stepperinput",
15: "slider",
16: "progressbar",
17: "image",
18: "video"
}
Full response here.
[
{
"box": [
0.144408,
0.521686,
0.548181,
0.276308
],
"detectionClass": 17,
"detectionString": "image",
"score": 0.999999
},
{
"box": [
0.886546,
0.333103,
0.06273400000000007,
0.11624700000000004
],
"detectionClass": 4,
"detectionString": "button",
"score": 0.989777
},
{
"box": [
0.252631,
0.126722,
0.04488399999999998,
0.066244
],
"detectionClass": 2,
"detectionString": "label",
"score": 0.98929
}
]
If you are interested in using this API, feel free to get in touch with us via email, twitter or LinkedIn.
