9. Simple Web Crawler
Last updated
Was this helpful?
Last updated
Was this helpful?
Was this helpful?
The following code crawl the first five pages of the BBC news search result (with the search term "hong kong")
import requests
from bs4 import BeautifulSoup
for i in range(1, 5):
url = 'https://www.bbc.co.uk/search/more?page=' + str(i) + '&q=hong+kong'
html_text = requests.get(url).text
html_data = BeautifulSoup(html_text, "html.parser")
headline_list = html_data.find_all('h1')
for headline in headline_list:
print(headline.find('a').get_text())