There are multiple situations where you want to extract specific types of information (only <a> tags) using Beautifulsoup4. The SoupStrainer class in Beautifulsoup allows you to parse only specific part of an incoming document.
One way is to create a SoupStrainer and pass it on to the Beautifulsoup4 constructor as the parse_only argument.
A SoupStrainer tells BeautifulSoup what parts extract, and the parse tree consists of only these elements. If you narrow down your required information to a specific portion of the HTML, this will speed up your search result.
product = SoupStrainer('div',{'id': 'products_list'}) soup = BeautifulSoup(html,parse_only=product)
Above lines of code will parse only the titles from a product site, which might be inside a tag field.
Similarly, like above we can use other soupStrainer objects, to parse specific information from an HTML tag. Below are some of the examples −
from bs4 import BeautifulSoup, SoupStrainer #Only "a" tags only_a_tags = SoupStrainer("a") #Will parse only the below mentioned "ids". parse_only = SoupStrainer(id=["first", "third", "my_unique_id"]) soup = BeautifulSoup(my_document, "html.parser", parse_only=parse_only) #parse only where string length is less than 10 def is_short_string(string): return len(string) < 10 only_short_strings =SoupStrainer(string=is_short_string)