9. Simple Web Crawler

The following code crawl the first five pages of the BBC news search result (with the search term "hong kong")

import requests
from bs4 import BeautifulSoup

for i in range(1, 5):
    url = 'https://www.bbc.co.uk/search/more?page=' + str(i) + '&q=hong+kong'

    html_text = requests.get(url).text
    html_data = BeautifulSoup(html_text, "html.parser")

    headline_list = html_data.find_all('h1')

    for headline in headline_list:
        print(headline.find('a').get_text())

Last updated