Forum Posts

Omar Faruk
Jun 29, 2022
In General Discussions
In one of the early columns, I introduced the idea of ​​automating the Chrome browser so that it can perform repetitive tasks in the absence of an API. Our idea is to take a screenshot of a web page and apply object detection to the image to get structured data, so browser automation can automate part of the process. For example, if you have a list of product URLs, you can run the puppeteer script (or pyppeteer in Python) to open all the web pages and save the screenshot to disk. Well, screenshots alone are not enough. You also need a label for each structured data type and bounding box. There are at least two approaches to doing this. The first obvious way is to manually label the screenshots to the data entry person / team. This is the approach I took when I first tried it a few years ago. How to use computer vision to automatically generate structured data Using an open ghost mannequin effect source tool called LabelImg, the data entry person manually drew a box over the object of interest. advertisement Continue reading below Labeling results are saved in standard format. In this example, we used the DetectNet format. This format creates a txt file for all images, displaying one line for each object. Each line contains a label and coordinates. Another more scalable approach I've been using these days is to take a screenshot of a page that already contains structured data and use the structured data to calculate the coordinates of the label and box. We still combine it with human supervision, mainly to ensure that the process produces quality training data. Google AutoML for vision object detection When we first tried this idea a few years ago, we had to do it in a really difficult way! How to use computer vision to automatically generate structured data I touched on it briefly in the 2018 TechSEO Boost talk. advertisement Continue reading below I learned about object detection in my deep learning discipline at Coursera and quickly made a point. After learning how to use YOLO, I found that Faster-RCNN was actually the best performing approach at the time. The Tensorflow Object Detection API makes things a little easier, but not as simple as with
0
0
1
 

Omar Faruk

More actions