In this repository, I have used a dummy website to scrape data due to scraping rules. I have used Requests and BeautifulSoup for scraping.
- Check a website's Term and Conditions before scraping it and read the statements about legal use of the data.
- Do not request data from the website too aggressiely and ensure that your program behaves in a reasonable manner.
- You can download and open the python file on your preferred editor.
- You can download and open the notebook on Jupyter Notebook or Google Colab
- Insect the page
- Obtain HTML
- Choose a parser (lxml , html5lib , html.parser)
- Create a beautifulsoup object
- Extract tags that we need
- Store the data in lists
- Make a dataframe
- Download a CSV file that contains all data scraped
The scrapers are different between one site and another. So, to use those scrapers, you have to change the value of base_site with the url desired, and identify tags to extract.
- from bs4 import BeautifulSoup
- import requests
- import pandas as pd