In the first chapter, we have learnt what web scraping is all about. In this chapter, let us see how to implement web scraping using Python.
Python is a popular tool for implementing web scraping. Python programming language is also used for other useful projects related to cyber security, penetration testing as well as digital forensic applications. Using the base programming of Python, web scraping can be performed without using any other third party tool.
Python programming language is gaining huge popularity and the reasons that make Python a good fit for web scraping projects are as below −
Python has the simplest structure when compared to other programming languages. This feature of Python makes the testing easier and a developer can focus more on programming.
Another reason for using Python for web scraping is the inbuilt as well as external useful libraries it possesses. We can perform many implementations related to web scraping by using Python as the base for programming.
Python has huge support from the community because it is an open source programming language.
Python can be used for various programming tasks ranging from small shell scripts to enterprise web applications.
Python distribution is available for platforms like Windows, MAC and Unix/Linux. We need to download only the binary code applicable for our platform to install Python. But in case if the binary code for our platform is not available, we must have a C compiler so that source code can be compiled manually.
We can install Python on various platforms as follows −
You need to followings steps given below to install Python on Unix/Linux machines −
Step 1 − Go to the link https://www.python.org/downloads/
Step 2 − Download the zipped source code available for Unix/Linux on above link.
Step 3 − Extract the files onto your computer.
Step 4 − Use the following commands to complete the installation −
run ./configure script make make install
You can find installed Python at the standard location /usr/local/bin and its libraries at /usr/local/lib/pythonXX, where XX is the version of Python.
You need to followings steps given below to install Python on Windows machines −
Step 1 − Go to the link https://www.python.org/downloads/
Step 2 − Download the Windows installer python-XYZ.msi file, where XYZ is the version we need to install.
Step 3 − Now, save the installer file to your local machine and run the MSI file.
Step 4 − At last, run the downloaded file to bring up the Python install wizard.
We must use Homebrew for installing Python 3 on Mac OS X. Homebrew is easy to install and a great package installer.
Homebrew can also be installed by using the following command −
$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
For updating the package manager, we can use the following command −
$ brew update
With the help of the following command, we can install Python3 on our MAC machine −
$ brew install python3
You can use the following instructions to set up the path on various environments −
Use the following commands for setting up paths using various command shells −
setenv PATH "$PATH:/usr/local/bin/python".
ATH="$PATH:/usr/local/bin/python".
PATH="$PATH:/usr/local/bin/python".
For setting the path on Windows, we can use the path %path%;C:\Python at the command prompt and then press Enter.
We can start Python using any of the following three ways −
An operating system such as UNIX and DOS that is providing a command-line interpreter or shell can be used for starting Python.
We can start coding in interactive interpreter as follows −
Step 1 − Enter python at the command line.
Step 2 − Then, we can start coding right away in the interactive interpreter.
$python # Unix/Linux or python% # Unix/Linux or C:> python # Windows/DOS
We can execute a Python script at command line by invoking the interpreter. It can be understood as follows −
$python script.py # Unix/Linux or python% script.py # Unix/Linux or C: >python script.py # Windows/DOS
We can also run Python from GUI environment if the system is having GUI application that is supporting Python. Some IDEs that support Python on various platforms are given below −
IDE for UNIX − UNIX, for Python, has IDLE IDE.
IDE for Windows − Windows has PythonWin IDE which has GUI too.
IDE for Macintosh − Macintosh has IDLE IDE which is downloadable as either MacBinary or BinHex'd files from the main website.