Premise
Today I’ll tell you how I scrape websites and use the data for crafting useful databases. I know Python and JavaScript and use both of them to scrape as per project requirements. However, today in this post we are going to set up a Python environment. (Also, because I use Cheerio for DOM traversal in JavaScript and the library does not have a decent documentation site for beginners.)
I’m a Bleach fan. Hence, we’ll scrape data from the fandom website about the protagonist of Bleach, Ichigo Kurosaki, and store it as a JSON file.
Step 1: Set up an environment
Start by creating a folder wherever you please in your local system,
mkdir kurosaki
Initialize a virtual environment in the folder, (we need a virtual environment: first to replicate an isolated environment like one running on a server and secondly it will help you remove dependency clashes in your local system)
python -m venv kurosaki
Activate the virtual environment every time you start coding, ( venv creates a folder named venv
and in turn, a folder bin
which contains all of the scripts required to activate and deactivate the virtual environment)
source <venv>/bin/activate