Python - Building URLs


Advertisements

The requests module can help us build the URLS and manipulate the URL value dynamically. Any sub-directory of the URL can be fetched programmatically and then some part of it can be substituted with new values to build new URLs.

Build_URL

The below example uses urljoin to fetch the different subfolders in the URL path. The urljoin method is used to add new values to the base URL.

from requests.compat import urljoin
base='https://stackoverflow.com/questions/3764291'
print urljoin(base,'.')
print urljoin(base,'..')
print urljoin(base,'...')
print urljoin(base,'/3764299/')
url_query = urljoin(base,'?vers=1.0')
print url_query
url_sec = urljoin(url_query,'#section-5.4')
print url_sec

When we run the above program, we get the following output −

https://stackoverflow.com/questions/
https://stackoverflow.com/
https://stackoverflow.com/questions/...
https://stackoverflow.com/3764299/
https://stackoverflow.com/questions/3764291?vers=1.0
https://stackoverflow.com/questions/3764291?vers=1.0#section-5.4

Split the URLS

The URLs can also be split into many parts beyond the main address. The additional parameters which are used for a specific query or tags attached to the URL are separated by using the urlparse method as shown below.

from requests.compat import urlparse
url1 = 'https://docs.python.org/2/py-modindex.html#cap-f'
url2='https://docs.python.org/2/search.html?q=urlparse'
print urlparse(url1)
print urlparse(url2)

When we run the above program, we get the following output −

ParseResult(scheme='https', netloc='docs.python.org', path='/2/py-modindex.html', params='', query='', fragment='cap-f')
ParseResult(scheme='https', netloc='docs.python.org', path='/2/search.html', params='', query='q=urlparse', fragment='')
Advertisements