The requests module can help us build the URLS and manipulate the URL value dynamically. Any sub-directory of the URL can be fetched programmatically and then some part of it can be substituted with new values to build new URLs.
The below example uses urljoin to fetch the different subfolders in the URL path. The urljoin method is used to add new values to the base URL.
from requests.compat import urljoin base='https://stackoverflow.com/questions/3764291' print urljoin(base,'.') print urljoin(base,'..') print urljoin(base,'...') print urljoin(base,'/3764299/') url_query = urljoin(base,'?vers=1.0') print url_query url_sec = urljoin(url_query,'#section-5.4') print url_sec
When we run the above program, we get the following output −
https://stackoverflow.com/questions/ https://stackoverflow.com/ https://stackoverflow.com/questions/... https://stackoverflow.com/3764299/ https://stackoverflow.com/questions/3764291?vers=1.0 https://stackoverflow.com/questions/3764291?vers=1.0#section-5.4
The URLs can also be split into many parts beyond the main address. The additional parameters which are used for a specific query or tags attached to the URL are separated by using the urlparse method as shown below.
from requests.compat import urlparse url1 = 'https://docs.python.org/2/py-modindex.html#cap-f' url2='https://docs.python.org/2/search.html?q=urlparse' print urlparse(url1) print urlparse(url2)
When we run the above program, we get the following output −
ParseResult(scheme='https', netloc='docs.python.org', path='/2/py-modindex.html', params='', query='', fragment='cap-f') ParseResult(scheme='https', netloc='docs.python.org', path='/2/search.html', params='', query='q=urlparse', fragment='')