Introduction to Python
  • 1. Introduction
  • 2. Numbers in Python
  • 3. Text in Python
  • 4. Making Comparisons
  • 5. Making Choices
  • 6. Exercise: Paper, Scissors, Stone game
  • 7. Lists
  • 8. Loops
  • 9. Simple Web Crawler
Powered by GitBook
On this page

Was this helpful?

9. Simple Web Crawler

The following code crawl the first five pages of the BBC news search result (with the search term "hong kong")

import requests
from bs4 import BeautifulSoup

for i in range(1, 5):
    url = 'https://www.bbc.co.uk/search/more?page=' + str(i) + '&q=hong+kong'

    html_text = requests.get(url).text
    html_data = BeautifulSoup(html_text, "html.parser")

    headline_list = html_data.find_all('h1')

    for headline in headline_list:
        print(headline.find('a').get_text())
Previous8. Loops

Last updated 5 years ago

Was this helpful?