In this chapter, let us learn in detail about data scraping and screen scraping in Uipath.
Data scraping is a technique with the help of which structured data can be extracted from web or any application and saved to a database or spreadsheet or .CSV file. UiPath studio also provides the facility of data scraping with the help of scraping wizard. We can find the scraping wizard under Design tab.
Following is the screenshot for the same −
For using UiPath data scraping wizard you can follow the following steps −
Step 1 − First, open the web page or application from which you want to extract the data. We are extracting data from our Google contacts as an example.
Step 2 − Then click the Data Scraping button under Design Tab. You will get the following message box −
Step 3 − Click the 'Next' button and it will give you the option to select the first and last fields in the web page you want to extract the data. In this example, you will be able to select from the Google Contacts page.
Step 4 − Once you finished selecting the First element, it will prompt a dialog box for selecting second element as follows −
Step 5 − Now once you click Next and select the second element, it will prompt another dialog box with the help of which we can customize column headers and chose whether or not to extract URLs.
You can rename the name of the Text column as per your requirement. We have renamed column1 it to 'Name'.
Step 6 − Next, UiPath studio will give us the Extract Wizard to preview the data. It would be our choice to Extract Correlated data or Finish the extraction here. If you Extract Correlated data, then it will again take you to the web page from which you want to extract the data.
Step 7 − Once you finished the extraction it will ask the question” is data spanning multiple pages?” If you are extracting the data from multiple pages then click on Yes, otherwise No. We have clicked No because data extraction here is happening from single page only.
Step 8 − At last it will create the activity sequence in the Designer tab as follows −
UiPath studio provides us methods to extract data from a specified UI element or document. These methods are called screen scraping or output methods. We can find the screen scraping wizard under Design tab.
UiPath studio screen scrapping wizard has three methods for scraping data from a specified UI element. The method will be automatically chosen by UiPath studio and displayed at the top of the Screen Scraping window.
Now the question arises that if the method is selected automatically, can I change it as per my requirement? Yes, it can be changed from Option panel where Scraping Method is written and then press the Refresh button.
After clicking the Refresh button, UiPath studio will save the information in the Designer panel. On the other side, if you want to copy the information to the Clipboard then you can click the Finish button.
Screen scraping, as seen in the case of Desktop recording, generates a container containing activities and partial selectors for each activates. You can refer the following screenshot −
All the three methods come with different features and the following is the explanation of all three screen scraping methods along with their features −
Native − If you choose Native screen scraping method then you will get the following features.
No Formatting − As the name suggests, this option does not extract formatting information from the text.
Get Words Info − This option will extract the screen coordinates of each word.
Custom Separators − This option/field enables us to specify the characters used as separators. If you put this field empty, then all the known text separators will be used.
Full Text − If you choose Full Text screen scraping method then you will get the following features −
Ignore Hidden − As name suggests, if you select this option then the hidden text from the selected UI element will not be copied.
Google OCR − If you choose Google OCR screen scraping method, then you will get the following features −
OCR Engine − By default, you will get the option Google OCR.
Languages: By default, you will get the option English.
Characters − This option enables us to select which type of characters we want to extract: Any character, Number only, Letters, Uppercase, Lowercase, Phone Numbers, Currency, Date and Custom are the options you can select from.
Invert − With the help of this option, you can invert the color of the UI element, in case when the background is darker than the text color, before scraping.
Scale − As name suggests, this option will scale the selected UI element or image. It is recommended with small images as more the scaling factor is, more you can enlarge the image.
Get Words Info − This option enables us to get the on-screen position of every scraped word.
For using UiPath screen scraping wizard, you can follow the given steps −
Step 1 − First, open the Ui element, may be a PDF file or Word file or any other, from which you want to extract the data. Here, we are implementing it on PDF file.
Step 2 − Now, click the Screen Scraping option under Design tab.
Step 3 − Next, click the Ui element from which you want to extract information, in our example we are clicking on the PDF document.
Step 4 − Now, you will get the following screen −
UiPath studio will give the screen scraping method by default, but you can change it as per your requirement. We have discussed about it earlier also.
Step 5 − Then at last, you can either click Refresh button or Finish button. We clicked Finish button and it will be saved in the Designer panel.
As discussed, Screen scraping generates a container containing activities and partial selectors for each activates.
We can see the output in the following screenshot −