Gathering data from a web page is known as web scraping, and is typically performed either by fetching web page via URL and reading the data directly online, or by reading the data from a saved HTML file. Understanding web scraping is a skill crucial to anyone interested in data science or those just looking to obtain information from web pages.
This course covers:
- Downloading and installing the Python library BeautifulSoup
- Inspecting a web page to identify the relevant data
- Scraping and parsing the data using BeautifulSoup (formatting it into arrays and variables)
- Storing and sanitizing the data in a correctly formatted CSV sheet
- Reading from local HTML files instead of URLs
- How to read non-table data
…and more!